Practical session 3
Exercises
by Michaela Mahlberg,
Create an index
for all the texts in the Dickens Corpus with the following steps:
a.
Open
the Wordlist tool
b.
Click on the Choose
texts now button (or it may
be renamed Change selection if you already have texts selected. If Wordsmith already shows a text
in the list simply delete them)
c.
Select
the 23 files in the Dickens corpus folder. Click on one of the little arrows in
bar in the middle to transfer the files from the left to the right window.
Close by clicking on the box with the green tick.
d.
Click
on the Make/add an index button
e.
(If
you have done an index before and not renamed it, WordSmith will ask you
whether you want to add to or back up an existing index, if this question comes
up today, choose delete and create a new one.)
f.
Wait
while the index is built – this may take a couple of minutes.
g.
When
it is finished, the index file will have been saved, and you will get a pop-up
box telling you where it has been saved – remember this location.
h.
In
Windows Explorer, go to the folder where the Wordsmith files are saved
(possibly C:\Wsmith4\Wordslist\Index) and rename the two files main_index.types and main_index.tokens to Dickens.types and Dickens.tokens respectively.
You have now
created and saved an index file from the Dickens corpus, which you will need
for task 2.
Create a 5-word
cluster list from your index file with the following steps:
a.
On
the Wordlist menu bar, go to File > Open, and in the
folder index
choose your file Dickens.tokens
b.
You
may see a pop-up warning about the way the data is displayed. Just click on OK.
c.
On
the Wordlist menu bar, go to Compute > Clusters. In the
dialogue box select the following settings:

d.
Click
on OK
e.
On
the Wordlist menu bar, go to File > Save, and save your list
in the Wordlist folder (probably C:\Wsmith4\Wordlist,
but see above), under the name
D5clusters (the .lst suffix should be added automatically)
f.
Analysing
the results
Now look at the first 50 of your 5-word
clusters in
the index in your Wordlist window. What questions
does the
information in the columns ‘Frequency’ and
‘Texts’ raise? If
you look at the formal differences of
the clusters, what
would you want to find out?
Leave the cluster list open, as you need it for the next task.
3. Concordancing clusters
Cluster lists
can give us information on frequencies of clusters and the number of texts in
which they occur. However, to investigate the meanings and functions that these
clusters fulfil in a text, we need to look at the clusters in their textual
context, a good starting-point are concordances.
a.
Highlight
the cluster with his back to the (no. 26) On
the menu bar go to Compute > Concordance, which will produce a
concordance of the cluster.
b.
Go
to the menu bar again and choose Edit > Resort. For the tab Main
sort select R5. Describe the patterns that become now visible.
c.
Try
out different sort options for all 3 tabs.
d.
Run
a concordance for leaning back in his chair (no. 34) and identify
patterns again.
e.
Are
there any similarities between the two clusters?
It is often
useful to look at distributions of linguistic features across the chapters of a
novel. To split a novel into chapters follow the steps below. The task is to split
the file BH.text.
a.
The
file BH.txt
does not contain annotation indicating chapter boundaries, but it is possible
to use the chapter headings to separate the file. First, however, you need to
open the file BH.txt in a text editor (e.g. Wordpad) and
delete the title, table of contents and preface, so that your file starts with
CHAPTER 1. Now save the file as BHsplit.txt so you don’t alter the original version
of the file.
b.
In
the main Controller Window (the one that says “Oxford Wordsmith Tools”
across the top), choose from the menu bar Utilities > File
Utilities, and choose the Splitter tab.
c.
Enter
CHAPTER into the field for text separator. Choose a destination folder and the
file to split (BHsplit.txt). Untick the option ‘bracket
first line’. If you set the
required sizes min. to 100 this will avoid that you get empty files (the choice
will of course depend on the book you are looking at). Click split now.

d.
Open
a few of your resulting files. Do you notice anything about the format? Does
the beginning of the chapter look as you want it to? At the end of the handout you will find some
information on how to modify the process so that you can get a different
picture – this, however, is to complex for the beginning.
When you use
corpora for the stylistic analysis of text it is important that you are aware
of the format of your texts and the different ways of looking at the text. Task
2 prepared the novel Bleak House for the analysis in Task 3.
a.
Use
the 67 chapter files that you produced in Task 2 (if you didn’t do Task 2, use
the files we prepared for you in the folder BHsplit) and create a 5-word
cluster list for them – with the same settings as in task 1.
b.
In
the lecture the functional groups labels (L), speech (S), body parts (BP), as
if (as if), and time and place (TP) were introduced. Go through the cluster
list and assign each cluster to a group by putting the respective abbreviation
into the colum Set. To activate the set
function right-click on the field in the bottom-left corner of your screen
(saying “Type-in” in the picture), and choose Set.

c.
When
you look at a single novel, more clusters may go into the group label than you
would expect by just looking at the surface of a cluster. Look at some clusters
in detail. For instance, run a concordance for the
cluster AS WELL AS ANYTHING ELSE (See Task 1 on how to do that). To see more
context go to View > Grow repeatedly. If you want to focus on one
example, double-click on a concordance line and you will get into the full
text.
Further
information on task 2
To
keep the chapter headings when splitting a novel use the Text Converter (under
main menu Utilities again). Choose the tab ‘Conversion’ and then in the middle of the same window
‘Within files’, and convert
CHAPTER
to
CHAPTER{CHR(13)}{CHR(10)}{CHR(13)}{CHR(10)}CHAPTER
Use the
resulting file to perform the same steps as in Task 2.