CHWP B.5 Merrilees, Edwards, Megginson, "The Dictionarius of Firmin Le Ver (1440)"

3. Old-Fashioned Concording (D. Megginson)

Electronic concording programs like WordCruncher create interactive concordances: the user decides what information to retrieve while using them. Printed concordances like the Microfiche Concordance to Old English Literature, on the other hand, are static concordances: the editors decide how to organise the information, and the user can access it only in that way. Interactive concordances are very useful tools within research projects like the Dictionarius Le Ver, but when we want to share our work with other scholars, they are still unsuitable for several reasons.

The first problem is distribution. An interactive concordance requires access to a computer, and if there is software bundled with it, it requires access to a specific type of computer. Scholars cannot simply pull the concordance off a library or bookstore shelf and browse through it, or bring it with them on research trips.

The second problem is the lack of standards among computers. Nearly all computers can exchange simple digits and Latin text using the ASCII or EBCDIC standards, but there is no universally accepted method for exchanging even something as simple as é or a 4-byte machine word (long integer), much less a complex binary file structure like the one used by WordCruncher. Today, an interactive concordance must be bound not only to a single computer, but to a single software package.

When it comes to publishing, static concordances avoid nearly all of the problems of interactive concordances. When they are printed on paper, they require no special technology to use, and they can follow standards of typesetting and book-binding which are already well established. Printed concordances also take advantage of the existing distribution system of book sellers and libraries to reach the largest possible audience, and are easy to bring into research facilities for field work.

Furthermore, looking up a single, complete word in a printed (paper) concordance can be as fast as looking it up in an interactive concordance on a computer. However, there are several serious disadvantages to printed static concordances.

First of all, static concordances always limit the user's choices in ways that interactive concordances do not. If a concordance is in alphabetical order, the user can find all words beginning with b grouped together, but not all words containing or, for example. Static concordances also allow only one way to access each citation: you can find all of the citations containing et and all of the citations containing on, but not the citations which contain both.

The second problem stems partly from the first. Concordances are very long, and become even longer when one tries to provide more options for the user. Even a simple, alphabetical concordance can be considerably longer than the original text. For example, if you are concording a 200-page text where the average citation is 40 words long, the concordance will be over 8,000 pages long in the same type size. If you add another type of listing, such as reverse spelling, the concordance will be over 16,000 pages long, and so on. Electronic interactive concordances can generate this information as required -- the average user will never need most of it -- but a static concordance must contain it all explicitly.

It will usually not be possible to publish an 8,000- or 16,000-page concordance printed on paper. The best alternative is microfiche, as the Dictionary of Old English project has done with its concordance. However, now the users are tied to a microfiche reader, and have already lost one of the greatest advantages of the printed static concordance -- its portability and freedom from technological constraints -- without gaining any of the advantages of interactive concordances. The only remaining advantage is that microfiche readers are more commonly available in libraries than computers. The rest of this paper will explore the options which we have considered at the Dictionarius Le Ver project to generate concise, useful static concordances for publication.

Usually, concordances show keywords in context, either with a fixed number of words on either side or within an entire quotation. The simplest way to generate a smaller concordance is to omit the context altogether. Here is a sample French concordance of an early draft of the Dictionarius Le Ver M section without context:

    punir:
      1 multo
    punis:
      1 multo (multatus)
    punition:
      1 multo (multatio)
    pur:
      1 merum (merum)
    puree:
      1 merula (merula)
    purement:
      1 merax (meraciter)
      2 merosus (merose)
      3 merus (mere)
    purgier:
      1 mucus (muco)
    purgiés:
      1 mucus (mucatus)
    purifiés:
      1 merax
    purs:
      1 merax
      2 merosus (merosus)
      3 merus
    putain:
      1 manzer
      2 multicuba
    puterie:
      1 meretrix (meretricatio)

A lexicon like the Dictionarius Le Ver is ideal for this sort of concordance. A non-contextual concordance of a novel, for example, would have to list only page numbers, and would be difficult to use because a page contains so many different words. The Dictionarius Le Ver is organised hierarchically by headword and sub-headword, and each sub-headword passage contains only a small amount of French. Furthermore, unlike such references as "page 38" or "Act 3, scene 5", the headword and sub-headword still give a fair bit of useful information about a word's context. Still, in this concordance, we are considering including the surrounding French for more context. In the case of putain, for example, the entries would look like this:

    putain:
      1 manzer bastard fil de putain publique de bordel
      2 multicuba putain qui couche aveuc chescun ribaude

Since our first concordance is considerably smaller than the original text, we are able to list other types of information. For example, Le Ver often includes etymologies in his entries, usually Latin or Greek. Since these are all unambiguously marked in the text, we can concord them separately, and study the use of etymology throughout the lexicon. Here is an extract from the concordance of etymons from the same M section:

    cedo:
      1 matricida
      2 morticinus
      3 morticinus (morticína)
      4 muricida
    centaurus:
      1 monocentaurus
    ceros:
      1 monoceros
    colera:
      1 melan (melancolia)

In this case, the headword and sub-headword alone will often be all the context required, as with colera, in the melancolia sub-entry under the headword MELAN. The etymon concordance is very short, but it still presents a single type of information well.

Fortunately for us, Le Ver produced his lexicon in fairly good alphabetical order. However, while the headwords are fully ordered and there are many cross-references, it is sometimes difficult to find where a sub-headword is defined within a headword article. Again, we have marked the sub-headwords in the electronic text, so it is a simple matter to concord them. The final sample concordance is a list of sub-headwords with their corresponding headwords:

    emembris:
      1 membrum
    emembro:
      1 membrum
    emendo:
      1 mando
      2 menda
    emensus:
      1 metior
    emergo:
      1 mergo
    emeritio:
      1 meritus
    emeritus:
      1 meritus

This concordance will be short, but it can be very useful, both for finding sub-headword articles within the lexicon and for studying the structure of the headword articles themselves.

The Dictionarius Le Ver project can also produce short concordances of Latin words, cross-references within the lexicon, cited forms and even marginalia, since we have marked all of these in our text. Rather than producing one large, awkward printed concordance with extensive context, we are concentrating on small, easy to use lists. Without the context, the user will have to make frequent reference to the text itself, but the printed (or microfiche) concordances will permit use of the text in many different ways, and in many different places.

We have generated these concordances using standard Unix shell tools, with all of the files in plain ASCII format. This is one of our best defenses against obsolete technology, since an ASCII text file is usually easy to convert to any format. The concordance files themselves are also plain ASCII, although for the sake of this paper we have converted them to proper foreign characters and added boldface and italics.

One day computers will be more standardised and more easily available. The Text Encoding Initiative (TEI), headed by Lou Bernard and Michael Sperberg-McQueen, is working to establish standards which will allow different computers to exchange all types of textual information. Once the new standards are in place and there are programs on the market using them, publishing an electronic interactive concordance will be simple and cheap. Until then, however, static concordances will remain the best option.

Throughout this part of the paper I have been careful to specify printed static concordances. It is also possible to release the text of static concordances in an electronic format, using plain text escape sequences for foreign characters like é. Users will be able to take advantage of their own software (for example, a wordprocessor with macros) to generate new types of concordances from it. There are already good distribution systems in place for electronic texts (as opposed to binary files), such as the Oxford Text Archive and the Usenet computer network. Perhaps this is the best compromise we can make for now -- releasing a static concordance, both as a printed text (on paper or microfiche) and as electronic text for further work by other computer-literate scholars.

[Return to Table of Contents]