Oxford Text Archive
Corpus Linguistics Resources

Eurocall 2003 Workshop

http:/www.ota.ox.ac.uk/

Wednesday 1st September 2003

Course Info

Workshop organisers:

Ylva Berglund, Oxford Text Archive

ylva.berglund@oucs.ox.ac.uk

Pernilla Danielsson, Centre for Corpus Linguistics, Birmingham University

pernilla@ccl.bham.ac.uk

Martin Wynne, Oxford Text Archive

martin.wynne@oucs.ox.ac.uk

Programme

9:00

Arrival, registration, introduction

9:15

Warm-up exercises

10:00

Talk: Corpus Resources for Language Learning

10:30

Coffee break

11:00

Hands-on session

12:00

Feedback from exercises and discussion

12:30

Lunch

14:00

Parallel Corpora

15:30

Tea break

16:00

Parallel Corpora

17:00

End of seminar

Warm-up exercises

Missing keyword

What's the missing word in the concordance files on the handout sheet? Try not to shout out the answer while other people are still thinking about it!

How did you work out what the word was? Were there some lines which told you that there could only be one answer? Why? What different parts of speech does the missing word have in each line?

You can play this game on the COBUILD website, with a chance to win a CD dictionary every day (http://titania.cobuild.collins.co.uk/comp_entry.html). Some concordancers can do it, such as Monoconc and Wordsmith, although saving and printing the concordances is not easy! If you want to use this sort of exercise in teaching, you may have to save the concordances, choose the lines you want and then do some formatting to make it look right on the screen or printed sheet.

A linguist from Mars

If you can speak Polish, ask for the special sheet!

Now look at the concordance from a Polish corpus on the handout.

What do you think the keyword means? What clues do you get from the concordance lines? What other knowledge do you need to bring to bear in order to form a hypothesis?

The Polish word was chosen so that you might have some clues even if you don't know any Polish. The next one is Irish, and was chosen more or less at random. Can you find out anything about the word? It's probably impossible to guess the meaning.

Hands-on session

  1. borrow
  2. enormous
  3. minitasks
  4. visa
  5. diachronic comparisons

Exercises 1 and 4 above do not require the use of the online corpora or Wordsmith, as small example concordance files are provided in the exercises. You can however if you wish explore these exercises further by looking at the corpora listed below.

Further exercises

Take a look at the different formats of the various corpora on the CDs using Wordpad, or some other text viewer. What are the advantages and disadvantages of the different formats? What sorts of information will it be possible or difficult to extract from them?

 

Take a look at some of your own data if you have any with you. What decisions do you have to make about the format and design of your corpus?

 

Using the different software tools, try to extract some information from the corpora listed above. This way you can really find out the advantages and disadvantages of the different types of corpus.

Bibliography

  • Guy Aston, and Lou Burnard. 1998. The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh University Press, Edinburgh.
  • Sue Atkins, Jeremy Clear and Nicholas Ostler. 1992. ‘Corpus Design Criteria’ in Literary and Linguistic Computing 7(1): 1-16.
  • Douglas Biber, Susan Conrad, and Randi Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge University Press, Cambridge, UK.
  • Geoff Barnbrook. 1996. Language and Computers. Edinburgh University Press, Edinburgh.
  • Francis Condron, Michael Fraser and Stuart Sutherland. 2000. Guide to Digital Resources in the Humanities. CTI Textual Studies, Oxford.
  • Nelson Francis and Henry Kučera. 1964. Manual of Information to accompany the a standard corpus of present-day edited American English, for use with digital computers. Department of Linguistics, Brown University, Providence, Rhode Island.
  • Roger Garside, Geoffrey Leech and Anthony McEnery (eds.). 1997. Corpus Annotation. London: Longman.
  • Mohsen Ghadessy, Alex Henry and Robert L. Roseberry (eds.). 2001. Small Corpus Studies and ELT: Theory and practice. John Benjamin, Amsterdam.
  • Stig Johannson and Geoffrey Leech. 1986. Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computers. Department of English, University of Oslo.
  • Graeme Kennedy. 1998. An Introduction to Corpus Linguistics. Longman, London.
  • Anthony McEnery and Andrew Wilson. 2001. Corpus Linguistics (2nd edition). Edinburgh University Press, Edinburgh.
  • Alan Morrison, Michael Popham and Karen Wikander, Creating and Documenting Electronic Texts, Oxbow Books, Oxford and freely available online at http://www.ota.ox.ac.uk/documents/creating/cdet/index.html.
  • Oakes, M. P. 1998. Statistics for corpus linguistics. Edinburgh, Edinburgh University Press.
  • John Sinclair. 1991. Corpus, Concordance, Collocation. Oxford University Press, Oxford. Currently out of print but PDFs available here: Introduction Chapter 1 Chapter 2 Chapter 6
  • John Sinclair (ed.). 1987. Looking Up. HarperCollins, London.
  • C. M. Sperberg-McQueen and Lou Burnard, (eds.). 1999. Guidelines for Electronic Text Encoding and Interchange. TEI P3 Text Encoding Initiative. Revised reprint: Oxford May 1999 (http://www.hcu.ox.ac.uk/TEI/Guidelines/index.htm).
  • Stubbs, M. 1995. "Collocations and semantic profiles: On the cause of the trouble with quantitative studies." Functions of Language 2(1): 23-55. Available free online.
  • Svartvik, J. (1992). Directions in corpus linguistics : proceedings of Nobel Symposium 82, Stokholm, 4-8 August 1991. Mouton de Gruyter, Berlin.
  • Martin Wynne, Mick Short and Elena Semino. 1998. 'A corpus-based investigation of speech, thought and writing presentation in English narrative texts' in Antoinette Renouf (ed), Explorations in Corpus Linguistics. Rodopi, Amsterdam.
  • Antonio Zampolli and Nicholas Ostler (eds.). 1993. 'Special Section on Corpora', Literary and Linguistic Computing 8(4).

References and Links

Make Humbul (http://www.humbul.ac.uk) your starting point for looking for online resources. All of the links below come from the Humbul catalogue. For more on corpus linguistics, including a bibliography, you can go to http://www.ota.ox.ac.uk/documents/creating/dlc/.

Software

Concordance

This program is available from http://www.rjcw.freeserve.co.uk/.

Archives

Online Corpora

More information and links: