CHWP B.5 Merrilees, Edwards, Megginson, "The Dictionarius of Firmin Le Ver (1440)"

2. WordCruncher: strengths and weaknesses (W. Edwards)

WordCruncher promotes itself accurately and concisely as text indexing and retrieval software; that is, a two-stage process involving the retrieval of data from pre-indexed DOS text files; in fact, the program operates as two interrelated, but distinct components, namely indexing and viewing. The program, in short, massages any DOS text file, creating in the process a word-frequency list which the user can then manipulate to gain random access to the text file and to retrieve data. The program's strength is the ease with which designated character-strings, suffixes and prefixes, or various combinations of words or letters can be searched; in our preliminary work of transcribing, entering and checking the dictionary text content, that searching capacity was invaluable. However, as we moved to an analysis of the structure of our text, as prepared, WordCruncher had its limitations, though we should point out that such limitations relate as much to our application as to the program itself. For example, WordCruncher can list all occurrences of a particular word, but cannot identify the most frequent word in the post-lemmatic position; it can list all French occurrences of the suffix -iet but cannot identify the frequency of French in the definitional position; it can provide all examples where words 1 and 2 are followed within a designated number of spaces by word 3 and/or 4, but the program cannot identify the schematic structure of an entry. In the initial stages this was less a concern to us than the capacity to have rapid access to the textbase.

WordCruncher will triangulate a given reference, provided the user has prepared (pre-indexed) the text according to a three-tiered system. The generation of publishable indices in a variety of formats -- book index, key-word-in-context, key-word-in-line -- is apparently the controlling principle of pre-indexing. We found that, for our purposes, a basic, 'untreated' text file in DOS format, with a unique filename, will provide access and data retrieval as satisfactory as one that has been given a more sophisticated pre-indexing treatment; additional preparatory marking, or adapting the text file to the hierarchical structure suggested by the program, yields limited further returns. Even if we had marked our text more than was done, the structure of the dictionary article could only have been partially captured by the three-level hierarchy.

With whatever level of sophistication the text is prepared, the program produces a word-frequency list, which is then used to mark the text, and through which both the program and the user access the file. The levels of indexing, or lack of them, do not in any way affect this random accessibility, nor the power of recall: lists of specified citations, modified or not, are available to the screen, as a DOS file, or can be sent to the printer.

As Brian Merrilees has shown, our medieval compiler greatly anticipated our task, simplifying the need for the detailed marking of our text, using principles of lay-out as pre-indexing tools which we have chosen to reproduce. Alphabetically arranged dictionaries, after all, are already largely pre-indexed.

The principal benefit of concording Le Ver's text is to provide random access to the French imbedded within the Latin text. It was hoped initially that WordCruncher's three levels of referencing could be used to provide a ready reference for each French word as follows: the designated French citation, in its extended, French context; its 'book' (dictionary and letter); together with its Latin referents: Latin headword and Latin sub-headword. However, extensive marking and preparation of the text to achieve this end yielded minimal advantages over the use of a largely unmarked text. For the Le Ver text the practical limitations of WordCruncher prevented a useful application of the three available levels of reference. As we progressed in our analysis, it became clear that a dictionary entry structure as we have described it above would have required a different kind of software. However, it was through WordCruncher's search results that we came to a fuller understanding of Le Ver's methods of compilation. For example, it was our searches of metalinguistic terms that confirmed the importance of the link between information and its location.

The preparatory efforts that proved the most useful, curiously, were not with respect to the preparation of the text file, but rather to the manipulation of the Character Sequence file, which provides five built-in default options for organising the generated word list: four default language sequences -- English, French, German, Spanish -- and a fifth, user-specific, personally tailored and modified sequence to reflect editorial practice; the dictionary of the text file's vocabulary, its list of unique words, can be here adapted to any hierarchical sequence or equivalence, as the user decides. Every ASCII character can be assigned, by the user, to one of seven types: Upper case, Lower case, Delimiter, DelimLower (marks the end of a word and is a separate word in itself, and can be searched as such), Hyphen, Apostrophe, or Ignore; the text will be indexed and the word-frequency file sorted accordingly.

The most immediate and useful product of a crunched text is this word-frequency file -- a list of all words found in the text, with the frequency of usage of each, sorted according to the designated character sequence. In addition to being an integral point of access by the program to the text, under the View option, this file can be manipulated as a generic word-processor file in its own right -- particularly useful as a proof-reading device, when special attention can be paid to single frequency occurrences. Further, it can be manipulated and sorted by frequency, suffix, etc. as a word processor file in its own right.

In this project our main purpose, which was to provide a traditional (i.e. printed) edition of a medieval manuscript text was obviously somewhat at odds with the preparation of a text for electronic manipulation. Nonetheless, WordCruncher proved to be a powerful and useful tool, but within prescribed parameters.

[Return to Table of Contents] [Continue]