Practical session 3

Exercises by Michaela Mahlberg, University of Liverpool

Task 1: Cluster lists

1. Create an index for the Dickens corpus

 

Create an index for all the texts in the Dickens Corpus with the following steps: 

 

a.    Open the Wordlist tool

b.     Click on the Choose texts now button (or it may be renamed Change selection if you already have texts selected. If Wordsmith already shows a text in the list simply delete them)

c.     Select the 23 files in the Dickens corpus folder. Click on one of the little arrows in bar in the middle to transfer the files from the left to the right window. Close by clicking on the box with the green tick.

d.    Click on the Make/add an index button

e.    (If you have done an index before and not renamed it, WordSmith will ask you whether you want to add to or back up an existing index, if this question comes up today, choose delete and create a new one.)

f.      Wait while the index is built – this may take a couple of minutes.

g.    When it is finished, the index file will have been saved, and you will get a pop-up box telling you where it has been saved – remember this location.

h.    In Windows Explorer, go to the folder where the Wordsmith files are saved (possibly C:\Wsmith4\Wordslist\Index) and rename the two files main_index.types and main_index.tokens to Dickens.types and Dickens.tokens respectively.

 

You have now created and saved an index file from the Dickens corpus, which you will need for task 2.

   

 

 

2. Create a cluster list for the Dickens corpus  

Create a 5-word cluster list from your index file with the following steps:

 

a.    On the Wordlist menu bar, go to File > Open, and in the folder index choose your file Dickens.tokens

b.    You may see a pop-up warning about the way the data is displayed. Just click on OK.

c.     On the Wordlist menu bar, go to Compute > Clusters. In the dialogue box select the following settings: 

                       

 

d.    Click on OK

e.    On the Wordlist menu bar, go to File > Save, and save your list in the Wordlist folder (probably C:\Wsmith4\Wordlist, but see above), under the name D5clusters (the .lst suffix should be added automatically)

f.      Analysing the results

Now look at the first 50 of your 5-word clusters in

                    the index in your Wordlist window. What questions

                    does the information in the columns ‘Frequency’ and

                    ‘Texts’ raise? If you look at the formal differences of

                    the clusters, what would you want to find out?

 

Leave the cluster list open, as you need it for the next task.

 

 

 

 

 

3. Concordancing clusters   

 

 

Cluster lists can give us information on frequencies of clusters and the number of texts in which they occur. However, to investigate the meanings and functions that these clusters fulfil in a text, we need to look at the clusters in their textual context, a good starting-point are concordances.

 

a.    Highlight the cluster with his back to the (no. 26) On the menu bar go to Compute > Concordance, which will produce a concordance of the cluster.

b.    Go to the menu bar again and choose Edit > Resort. For the tab Main sort select R5. Describe the patterns that become now visible.

c.     Try out different sort options for all 3 tabs.

d.    Run a concordance for leaning back in his chair (no. 34) and identify patterns again.

e.    Are there any similarities between the two clusters?

 

 

Task 2:  Splitting a book into chapters

 

It is often useful to look at distributions of linguistic features across the chapters of a novel. To split a novel into chapters follow the steps below. The task is to split the file BH.text.

 

a.    The file BH.txt does not contain annotation indicating chapter boundaries, but it is possible to use the chapter headings to separate the file. First, however, you need to open the file BH.txt in a text editor (e.g. Wordpad) and delete the title, table of contents and preface, so that your file starts with CHAPTER 1. Now save the file as BHsplit.txt so you don’t alter the original version of the file.

b.    In the main Controller Window (the one that says “Oxford Wordsmith Tools” across the top), choose from the menu bar Utilities > File Utilities, and choose the Splitter tab.

c.     Enter CHAPTER into the field for text separator. Choose a destination folder and the file to split (BHsplit.txt). Untick the option ‘bracket first line’. If you set the required sizes min. to 100 this will avoid that you get empty files (the choice will of course depend on the book you are looking at). Click split now.

 

 

 

d.    Open a few of your resulting files. Do you notice anything about the format? Does the beginning of the chapter look as you want it to?  At the end of the handout you will find some information on how to modify the process so that you can get a different picture – this, however, is to complex for the beginning.

 

When you use corpora for the stylistic analysis of text it is important that you are aware of the format of your texts and the different ways of looking at the text. Task 2 prepared the novel Bleak House for the analysis in Task 3.

 

 

 

 

 

Task 3 Focusing on a single novel

 

a.    Use the 67 chapter files that you produced in Task 2 (if you didn’t do Task 2, use the files we prepared for you in the folder BHsplit) and create a 5-word cluster list for them – with the same settings as in task 1.

b.    In the lecture the functional groups labels (L), speech (S), body parts (BP), as if (as if), and time and place (TP) were introduced. Go through the cluster list and assign each cluster to a group by putting the respective abbreviation into the colum Set. To activate the set function right-click on the field in the bottom-left corner of your screen (saying “Type-in” in the picture), and choose Set.

 

 

 

c.     When you look at a single novel, more clusters may go into the group label than you would expect by just looking at the surface of a cluster. Look at some clusters in detail. For instance, run a concordance for the cluster AS WELL AS ANYTHING ELSE (See Task 1 on how to do that). To see more context go to View > Grow repeatedly. If you want to focus on one example, double-click on a concordance line and you will get into the full text.

 

 

Further information on task 2

 

To keep the chapter headings when splitting a novel use the Text Converter (under main menu Utilities again). Choose the tab ‘Conversion’ and then in the middle of the same window ‘Within files’, and convert 

 

CHAPTER  

to

CHAPTER{CHR(13)}{CHR(10)}{CHR(13)}{CHR(10)}CHAPTER

 

Use the resulting file to perform the same steps as in Task 2.