CHWP B.12 | Lancashire, "English Renaissance Knowledge Base" |
Tagging and text-retrieval requirements for Tudor bilingual dictionaries vary according to two things: the nature of the texts themselves; and the purposes to which they are put. Any scholarly system must be able to identify textual phenomena whatever they are. Because these text features cannot be all distinguished unambiguously, but must sometimes be recovered by interpretation, it is difficult to anticipate everything that might be tagged. For instance, I can tag Cotgrave's head- lemmas and phrasal lemmas, but satisfactorily dissecting the 'meaning' tag asks for lengthy analysis. Second, different scholars put texts to different uses. To analyse the texts automatically for collocational patterns, as I am doing in a separate research project, texts must be lemmatized; and at a miminum each word in such a file has to have three tags, its word-form, its part-of-speech and its inflectional form. Multiply the size of the file, then, threefold. The English lemmatizing system I am designing with Michael Stairs may well overload existing software. Scholars searching for names, on the other hand, can be satisfied with far fewer tags. Versatile text-retrieval software, then, has to work well with densely- or sparsely-tagged texts.
Since the early 1950s and until recently, anyone using computers in the humanities has had to cut cloth, the text, according to the tailor, that is, the computer platform and the software available for it. In the past five years, this situation has changed. Software has become increasingly general-purpose. Encouraged by this turn of events, we should insist on a tagging system suitable for our texts rather than training those texts and restricting their encoding systems for available software. "If you build it, he will come", says the voice in Field of Dreams. If we define our tags according to the needs of our texts, the software will be written to analyze them.