Course Info
Workshop organisers:
Ylva Berglund, Oxford Text Archive
ylva.berglund@oucs.ox.ac.uk
Pernilla
Danielsson, Centre for Corpus
Linguistics, Birmingham University
pernilla@ccl.bham.ac.uk
Martin Wynne, Oxford Text Archive
martin.wynne@oucs.ox.ac.uk
Programme
|
9:00
|
Arrival,
registration, introduction
|
|
9:15
|
Warm-up exercises
|
|
10:00
|
Talk: Corpus Resources
for Language Learning
|
|
10:30
|
Coffee break
|
|
11:00
|
Hands-on session
|
|
12:00
|
Feedback from exercises and
discussion
|
|
12:30
|
Lunch
|
|
14:00
|
Parallel Corpora
|
|
15:30
|
Tea break
|
|
16:00
|
Parallel Corpora
|
|
17:00
|
End of seminar
|
Missing keyword
What's the missing word in the concordance
files on the handout sheet? Try not to shout out the answer while other
people are still thinking about it!
How did you work out what the word was? Were
there some lines which told you that there could only be one answer? Why?
What different parts of speech does the missing word have in each line?
You can play this game on the COBUILD
website, with a chance to win a CD dictionary every day (http://titania.cobuild.collins.co.uk/comp_entry.html).
Some concordancers can do it, such as Monoconc and Wordsmith, although saving
and printing the concordances is not easy! If you want to use this sort of
exercise in teaching, you may have to save the concordances, choose the lines
you want and then do some formatting to make it look right on the screen or
printed sheet.
A linguist from Mars
If you can speak Polish, ask for the
special sheet!
Now look at the concordance from a Polish
corpus on the handout.
What do you think the keyword means? What
clues do you get from the concordance lines? What other knowledge do you need
to bring to bear in order to form a hypothesis?
The Polish word was chosen so that you
might have some clues even if you don't know any Polish. The next one is
Irish, and was chosen more or less at random. Can you find out anything about
the word? It's probably impossible to guess the meaning.
Hands-on session
- borrow
- enormous
- minitasks
- visa
- diachronic
comparisons
Exercises 1 and 4 above do not require the
use of the online corpora or Wordsmith, as small example concordance files
are provided in the exercises. You can however if you wish explore these
exercises further by looking at the corpora listed below.
Further exercises
Take a look at the
different formats of the various corpora on the CDs using Wordpad, or some
other text viewer. What are the advantages and disadvantages of the different
formats? What sorts of information will it be possible or difficult to
extract from them?
Take a look at some of your
own data if you have any with you. What decisions do you have to make about
the format and design of your corpus?
Using the different
software tools, try to extract some information from the corpora listed
above. This way you can really find out the advantages and disadvantages of
the different types of corpus.
Bibliography
- Guy Aston, and Lou Burnard. 1998. The BNC Handbook: Exploring
the British National Corpus with SARA. Edinburgh University Press,
Edinburgh.
- Sue Atkins, Jeremy Clear and Nicholas Ostler. 1992. ‘Corpus
Design Criteria’ in Literary and Linguistic Computing 7(1): 1-16.
- Douglas Biber, Susan Conrad, and Randi Reppen. 1998. Corpus
Linguistics: Investigating Language Structure and Use. Cambridge
University Press, Cambridge, UK.
- Geoff Barnbrook. 1996. Language and Computers. Edinburgh
University Press, Edinburgh.
- Francis Condron, Michael Fraser and Stuart Sutherland. 2000. Guide
to Digital Resources in the Humanities. CTI Textual Studies, Oxford.
- Nelson Francis and Henry Kučera. 1964. Manual of
Information to accompany the a standard corpus of present-day edited
American English, for use with digital computers. Department of
Linguistics, Brown University, Providence, Rhode Island.
- Roger Garside, Geoffrey Leech and Anthony McEnery (eds.). 1997. Corpus
Annotation. London: Longman.
- Mohsen Ghadessy, Alex Henry and Robert L. Roseberry (eds.). 2001.
Small Corpus Studies and ELT: Theory and practice. John Benjamin,
Amsterdam.
- Stig Johannson and Geoffrey Leech. 1986. Manual of Information
to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for
use with digital computers. Department of English, University of
Oslo.
- Graeme Kennedy. 1998. An Introduction to Corpus Linguistics.
Longman, London.
- Anthony McEnery and Andrew Wilson. 2001. Corpus Linguistics
(2nd edition). Edinburgh University Press, Edinburgh.
- Alan Morrison, Michael Popham and Karen Wikander, Creating and
Documenting Electronic Texts, Oxbow Books, Oxford and freely
available online at http://www.ota.ox.ac.uk/documents/creating/cdet/index.html.
- Oakes, M. P. 1998. Statistics for corpus linguistics. Edinburgh,
Edinburgh University Press.
- John Sinclair. 1991. Corpus, Concordance, Collocation.
Oxford University Press, Oxford. Currently out of print but PDFs
available here: Introduction
Chapter
1 Chapter
2 Chapter
6
- John Sinclair (ed.). 1987. Looking Up. HarperCollins,
London.
- C. M. Sperberg-McQueen and Lou Burnard, (eds.). 1999. Guidelines
for Electronic Text Encoding and Interchange. TEI P3 Text Encoding
Initiative. Revised reprint: Oxford May 1999
(http://www.hcu.ox.ac.uk/TEI/Guidelines/index.htm).
- Stubbs, M. 1995. "Collocations and semantic profiles: On the cause
of the trouble with quantitative studies." Functions of Language
2(1): 23-55. Available free
online.
- Svartvik, J. (1992). Directions in corpus linguistics :
proceedings of Nobel Symposium 82, Stokholm, 4-8 August 1991. Mouton
de Gruyter, Berlin.
- Martin Wynne, Mick Short and Elena Semino. 1998. 'A corpus-based
investigation of speech, thought and writing presentation in English
narrative texts' in Antoinette Renouf (ed), Explorations in Corpus
Linguistics. Rodopi, Amsterdam.
- Antonio Zampolli and Nicholas Ostler (eds.).
1993. 'Special Section on Corpora', Literary and Linguistic Computing
8(4).
References and Links
Make Humbul (http://www.humbul.ac.uk) your starting point for looking for
online resources. All of the links below come from the Humbul catalogue. For
more on corpus linguistics, including a bibliography, you can go to http://www.ota.ox.ac.uk/documents/creating/dlc/.
Software
Concordance
This program is available from http://www.rjcw.freeserve.co.uk/.
Archives
Online Corpora
More information and links:
|