CHWP B.13 Tompa, "Experiences with the OED"

5. Using Pat and Lector together for the OED

We have worked with many researchers to extract information from the OED. One outcome has been the creation of "A Guide for Scholars", which includes outline search strategies and commentary for the following queries (Berg, 1989):

  1. Find the number of citations in the dictionary's quotations to a work entitled Travels in Arabia Deserta by the Victorian traveller and poet Charles M. Doughty.
  2. Find all the words derived from Italian between the years 1650 and 1725 which were first cited in a drama.
  3. Find all the words of Dutch origin contained in the OED.
  4. Find the percentage of citations from "journals" in the dictionary within a given time period.
  5. Find all the words in the dictionary that are associated with anthropology as a discipline.
Rather than reexamine these, however, we present here an alternative example worked out in detail to illustrate the potential of the OED text database. Professor Delbert Russell, a colleague in the Department of French, was interested in exploring the Anglo-Norman roots of English (Russell, forthcoming). His starting point was to find entries in the OED that satisfy any of the following conditions:
  1. the first language named in the etymology is Anglo-French ("AFr." or "AF.");
  2. the first language named in the etymology is Old French ("OFr." or "OF.");
  3. the first language named in the etymology is French ("Fr." or "F.") and the earliest quotation is dated prior to 1500.
Having extracted these entries, they were to be further evaluated and filtered based on a closer look at the complete etymology and the complete list of citation dates.

Figure 7 shows a display window that provides an interface to the OED. The box at the top contains a sequence of pull-down menus that give the user access to the operators within Pat (for combining query results, limiting queries to particular fields, for proximity searches, and so forth). The next box is an input window for entering a search string. The box in the centre of the figure shows the history of the queries in the session (transliterated from 'point-and-click' actions into textual form), and the bottom window displays a sample of matches from the OED resulting from the last query. Finally, the box on the right provides a menu for the pre-defined fields in the OED which can be selected for restricting searches or for limiting output. The queries listed in Figure 7 define all strings in the OED that begin with a language label related to Anglo-Norman (as defined above). It is interesting to note the distribution of occurrences of each of the language names in the OED by examining the count fields.

Query 10 (Figure 8) then finds the entries in which the language French is used. The user specifies this query by selecting Structure Including Last from the Structure pull-down menu, and then pointing at Entry under the list of pre-defined fields (at the top of the box labelled Document Structure). Next we restrict our attention to the earliest quotations within these entries and finally to the dates within just those quotations (again by selecting one of the menu options under Structure and pointing at the appropriate elements on the screen). To restrict ourselves to entries in which the date precedes 1500, we find all dates that are between 15.. and 19.. (Queries 13-16), and remove these dates from the set of interest (the operators for Queries 15 and 17 being found under the menu labelled Combine). However we then add back those dates containing the form "ante 1500". Finally in the last two queries we redirect our attention back to the strings starting with "<L>Fr." or "<L>F." within this restricted set of entries.

Queries 23 and 24 (Figure 9) then collect together the occurrences of Anglo-French, Old French, and the restricted French strings. Each string is to be checked to verify that it occurs as the first language in an etymology and that it is not preceded by a cross-reference (which would represent a formation within English rather than a borrowing). We first restrict the strings to those found within etymologies. Next, a user-defined field is declared to start with these selected etymologies and end at the first closing tag for language (</L>) or opening tag for a cross-reference (<XR>). Finally Query 30 restricts the language strings to those that start within these user-defined fields.

A sample of the results is shown in the bottom window of Figure 9. By pointing at any one of these results, the corresponding entry is displayed in a separate Lector window. Figure 10 shows the entry for chape according to the style sheet "Etymology and Dates" and Figure 11 shows the same entry in standard form. Figure 12 and Figure 13 show entries for words from Anglo-French and from Old French, respectively.

Once the strategy had been determined, this application required less than 15 minutes in total to execute on a SUN4; under six minutes of this time was spent waiting for computations to complete. In our experience, this is acceptable response time for pursuing research productively. Interested readers may wish to read a description of a user's experiences with an earlier version of the software in comparison to accessing the OED through the first release on CD-ROM (Logan & Logan, 1988). As the software improves, we expect more and more users to find innovative ways to benefit from electronic texts.

[Return to table of contents] [Continue]