© Martin Wynne
The Brown corpus was one of the very first computer corpora. It contains approximately one million words of written American English, sampled from various text types. All of the texts date from the early 1960s.
The LOB corpus is a corpus of written British English, modelled on the Brown corpus and following the same design and selection criteria. The texts in the LOB corpus also date from the early 1960s.
For many years, corpus linguists have made comparisons between British and American English using these two resources.
More recently however, researchers have become aware that the texts in these corpora are very old, and not appropriate for studying many features of modern English usage. A team at Freiburg University in Germany therefore decided to make new corpora with modern texts following the same design and sampling procedures as Brown and LOB. The resulting corpora are called FLOB and FROWN. Since they are corpora of the same size and design as Brown and LOB, it is easy to do comparisons of statistical properties of linguistic features in these corpora.
There are versions of each corpus which have been part-of-speech tagged. The versions available here have different tagsets for the originals and the Freiburg corpora.
Geoffrey Leech recently gave a paper at the ICAME conference in Gothenburg in Sweden reporting on research he had carried out with Nick Smith, in which he compared frequencies of various lexical and syntactic features in all four of these corpora.
This exercise involves looking at modal and semi-modal verbs, which were among categories studied by Leech, in all four corpora, in tagged and untagged versions.
(If you want to get hold of these corpora to use after the course, you can get them all on the ICAME CD-ROM. The tagged Freiburg corpora have not however been made officially available yet, but will be released on a future ICAME CD-ROM.)
You may be able to work out how the relevant words are tagged by looking at concordances of them. If you want help on identifying the relevant tags, here are some hints.