Three Linguistic Uses of
Statistical NLP
Chris Brew
Linguistics
The Ohio State University
November 2000
Linguistic SNLP
Plan of the talk
Introduction
Three applications
Verb-classes
Fluency for NLG
Aphasia and swearing
Conclusions
November 2000
Linguistic SNLP
Why language?
Machine learning is flavour of the decade
in computational linguistics.
But machine learning also does games,
robotics, medicine, scene interpretation,
motion tracking, credit ratings …
So why do language?
November 2000
Linguistic SNLP
Which technique?
Very diverse tasks, many challenges.
Area
Topic
Techniques
Intonation
Classify contours
Continuous HMM
Tokenization
Guess tokenisation
Error-driven learning
POS Tagging
Stochastic CUF
Discrete HMM, Logic
Programming
Syntax
Stochastic HPSG
EM algorithm
Verb classes
Classify verb
occurrences
Graphical models
Style
Evaluate translation
quality
Multidimensional scaling,
clustering.
Translation
Find and classify
translation pairs
Contingency table
measures (G score, mutual
information)
November 2000
Linguistic SNLP
What’s in it for human sciences?
If you have a clear hypothesis, you can
run machine learning experiments to test
it. Cheaper than psycholinguistics.
It is possible to systematically explore
large classes of theories, even ones too
costly to code up by hand.
Not necessarily explanatory
November 2000
Linguistic SNLP
Learning Levin’s verb classes
Verbs are central to most linguistic
theories. And needed for all applications.
Beth Levin “English Verb Classes and
Alternations”.
Systematic and theory neutral account of
verbs and their behaviour.
Coverage necessarily incomplete.
November 2000
Linguistic SNLP
Levin’s hypothesis
Verbs with similar semantics show similar
alternations. Load, rub and plaster are like
spray (
SPRAY/LOAD
verbs). Make, build and
knit pattern with carve (
BUILD
verbs). Levin
uses this as a basis for 200-odd classes.
Jessica sprayed paint on the wall.
Martha carved the baby a toy.
Jessica sprayed the wall with paint.
Martha carved a toy for the baby.
Jessica sprayed water at the baby.
*Martha carved a toy at the baby.
*Jessica Sprayed me water.
November 2000
Linguistic SNLP
Ambiguity
784 of Levin’s 3,024 are class
ambiguous. Ambiguity correlates with
high frequency. Verbs can be class
ambiguous even after syntactic frame is
known.
Remember to write your aunt a thankyou letter
MESSAGE_TRANSFER
Our lawyer will write you a Green Card application.
PERFORMANCE
The attendant will call you a cab.
GET
The prosecution will call you a liar.
DUB
November 2000
Linguistic SNLP
Lapata and Brew (1999)
Statistical model of verb class ambiguity.
Task: infer class for ambiguous cases.
Goal: investigate and test Levin’s
hypothesis.
Goal: infer class for verbs omitted from
Levin’s list.
November 2000
Linguistic SNLP
The General Approach
Stochastic process generating class,
frame and verb. Express this process as
a causal model (Bayes net).
Find reasonable estimates of the
conditional probabilities which
parameterize the network.
Find class which maximizes p(class|
frame,verb).
November 2000
Linguistic SNLP
Problem
We have millions of words of POS
tagged English in the British national
corpus, but we don’t know frames or
classes.
We certainly can’t afford to generate
complete training data.
If we had a really good broad-coverage
parser, we would have frames.
We would still need classes.
November 2000
Linguistic SNLP
PP attachment
A similar problem, whose solution we
adapt
S
NP
I
VP
V
washed
NP
DET
the
NN
shirt
PP
P
with
NP
soap
S
NP
I
VP
V
bought
NP
NP
DET
the
NN
shirt
PP
P
with
NP
pockets
November 2000
Linguistic SNLP
Hindle and Rooth
Wanted to obtain (automatically) lexical
information for use in deciding attachment.
Key idea: may not have a perfect parser, but if
we have a reasonable parser, we can use its
(statistically filtered) output to make reasonable
decisions.
You need a lot of text, but it doesn’t have to be
marked up.
November 2000
Linguistic SNLP
Discovering Lexical Association
in text
Church’s part of speech analyser.
Hindle’s FIDDICH partial parser.
13 million words of AP news wire.
Classified probable tokens of
attachments, compared likelihoods of
different attachments. Iterative process.
November 2000
Linguistic SNLP
Ratnaparkhi (Coling 98)
MaxEnt tagger, simple chunker.
Heuristics based on POS sequence.
Slight drop in accuracy.Portable to Spanish.
(v,p,n2) if
p is a real preposition (not “of”)
v is the first verb that occurs < K words left of p
v is not a form of the verb “to be”
No noun occurs between v and p
n2 is first word < K words right of p
No verb occurs between p and n2
November 2000
Linguistic SNLP
Verb frames from the BNC
Wrote simple grammars for.
V NP NP.
V NP PP
for.
V NP PP
to.
Filtered to remove noise (compound
nouns in particular), obtaining joint
frequency distribution of frame and verb.
November 2000
Linguistic SNLP
A causal model
P class∣verb , frame » P class∣ frame
P verb , frame ,class =P verb P frame∣verb Pclass∣verb , frame
frame
verb
class
November 2000
Linguistic SNLP
The causal model
We have
P(verb),P(frame),P(frame|verb)
but need
P
(class),P(frame|class).
First approach is to approximate. (works)
Second approach is to use EM to
iteratively estimate
P(class),P(frame|class).
(?)
P verb , frame ,class »
P verb P class P frame∣verb P frame∣class
P frame
November 2000
Linguistic SNLP
P(frame|class)
For each class, counted the syntactic
frames listed in Levin. Fairly coarse and
easy classification of frames. For GIVE
there were NP-v-NP-PP
to
and NP-V-NP-
NP only. 6 frames for PERFORMANCE.
Assumed uniform distribution.
P(NP-V-NP-NP|GIVE) estimated as 1/2.
P(NP-V|PERFORMANCE) estimated as 1/6.
November 2000
Linguistic SNLP
P(class)
P class =
∑
verb
P verb ,class
P class =
∑
verb
P verb Pclass∣verb
P class =
∑
verb
P verb P class∣amb
class
Ambiguity class: a set of
verbs which show the same
patterns of ambiguity. For
example, all the verbs which
can be either of the classes
MESSAGE_TRANSFER or
PERFORMANCE, but no
other. Ambiguity classes
reduce sparse data
problems. We still need a
principled way of estimating
P(class|ambiguity class)
November 2000
Linguistic SNLP
P(class|amb_class)
Key idea: use class size measured on
verb types to stand in for true class
population, which we don’t know.
P class∣amb
class
»
size class
∑
cÎ amb
class
size c
Verb
Class
Size
P(class|amb_class)
f(verb,class)
Pass
THROW
27
27/72
7783
Pass
SEND
20
20/72
5253
Pass
GIVE
15
15/72
3891
Pass
MARRY
10
10/72
2530
November 2000
Linguistic SNLP
Evaluation
For some verbs, knowing the frame is
sufficient. We checked whether our model
predicts the class that Levin specifies.
Baseline was to use our estimated p
(class).
Frame
Verbs
Baseline
Model
NP-V-NP-NP
123
61.8%
87.8%
NP-V-NP-PP
to
113
67.2%
92%
NP-V-NP-PP
for
70
70%
98.5%
Combined
306
65.7%
91.8%
November 2000
Linguistic SNLP
Evaluation
For other verbs, ambiguity persists. We
marked up some instances with our
judgements. Same baseline.
Frame
Verbs
Baseline
Model
NP-V-NP-NP
14
42.8%
85.7%
NP-V-NP-PP
to
15
73.4%
86.6%
NP-V-NP-PP
for
2
0%
50%
Combined
31
61.3%
83.9%
November 2000
Linguistic SNLP
Evaluation
We think that assigning classes is an
easy and uncontroversial task, but we are
going to test this using external judges.
In many cases the order of preference for
classes seems right. In others the
independence assumptions (class
independent of verb given frame) are
clearly in error.
November 2000
Linguistic SNLP
Second approach
Use EM.
Search issues. Danger of over-fitting.
class
verb
frame
November 2000
Linguistic SNLP
Stochastic text generation
Joint work with Jon Oberlander
November 2000
Linguistic SNLP
Goals
To argue that
Making text fluent demands the achievement of
norms involving “macroscopic” textual properties
To exemplify
Two instances of macroscopic properties
Which are displayed in even simple generation models
To provide
A simple mechanism for achieving macroscopic
norms
Using a two-component architecture
Both parts of which can be stochastic in nature
November 2000
Linguistic SNLP
Authors & reviewers
The process of writing documents (and
preparing talks) often involves more than one
party:
An author
A reviewer
The reviewer can note the need for changes to
both
fidelity
(content) and
fluency
(form)
These can be implemented by the author
As in academic practice
Or by the reviewer
As in journalistic practice
November 2000
Linguistic SNLP
In Mark Twain’s words
Sometimes the reviewer:
“Saves you—and offends you—with this
cold sign in the margin: (?) and you
search the passage and find that the
insulter is right—it doesn't say what you
thought it did: the gas-fixtures are there,
but you didn't light the jets.”
November 2000
Linguistic SNLP
Macroscopic textual properties
Some of the reviewer’s requirements relate to
individual textual items
“sentence Z is too long”
Others relate to properties of the text as a whole
“the style is generally too informal”
“as to the adjectives: when in doubt, strike them out”
“sentences are often too short, or don’t vary enough
in length”
Such distributional properties emerge from a large
number of decisions about individual sentences.
November 2000
Linguistic SNLP
Ex 1: sentence lengths in ILEX
This jewel is a finger ring. It is a
remarkably fluid piece. It is rather
reminiscent of molten metal. It
was made by Frances Beck. It is
also in the Organic style. It was
made in 1969. It is also made
from diamonds. It is made from
tourmaline. It is made from 18-
carat gold. It was made in
Buckingham. It draws on natural
themes for inspiration. It is
inscribed with Hallmarks: (…).
Beck was English. She lived in
Buckingham.
This
jewel
is a finger ring and is
rather reminiscent of molten
metal. It is a remarkably fluid
piece and draws on natural
themes for inspiration. It was
made by Frances Beck, who
was English and lived in
Buckingham. It is also in the
Organic style. It was made in
1969. It is also made from
diamonds. It is made from
tourmaline and 18-carat gold.
The jewel, which is inscribed
with Hallmarks: (…), was made
in Buckingham.
N
o
ag
gre
ga
tio
n
A
gg
re
ga
tio
n
Mean = 6.0, SD = 2.3 Mean = 11.1, SD = 5.2
November 2000
Linguistic SNLP
Ex 2: projecting personality in ILEX
This jewel is sort of Arts and Crafts
in style. It's set with jewels, isn't it? I
think Arts and Crafts style jewels
were usually made with rounded
stones but our jewel here wasn't. It
was made with faceted stones. It
was made by a single craftsman,
but that wasn't unusual. Arts and
Crafts style jewels were never
made by groups of craftsmen, or so
I'm told. Arts and Crafts style jewels
probably demonstrated the artistic
sensibilities of the wearer, but this
jewel didn't. Interestingly, it
identified the wearer as a Christian,
didn't it?
This jewel is in the Arts and Crafts
style. It is set with gems. This kind
of jewellery usually features
rounded stones but this item uses
faceted ones. This was made by a
single craftsman; indeed that's
typical of the style. Like most Arts
and Crafts pieces, this pendant has
an elaborate design. Although this
style of work usually demonstrates
the artistic sensibilities of the
wearer, this particular jewel
identifies its wearer as a Christian.
Co
m
pe
te
nt
In
co
m
pe
te
nt
Type/token = 0.75 Type/token = 0.90
November 2000
Linguistic SNLP
Achieving macroscopic targets
Usability requires text to be naturalistic, and
appropriate to user
Reviewers can identify relevant fluency targets
Specifications of macroscopic textual properties
But to help authors and reviewers achieve those
targets, we need a mechanism:
How do we reduce a target for a macroscopic
property of the text to a prescription for change at the
level of individual authorial decisions?
November 2000
Linguistic SNLP
Approximating Mr Shakespeare
, armed in their stings , Make boot upon the
summer ' s velvet buds , Which pillage they with
merry march bring home To the tent - royal of
their emperor ; Who , busied in his majesty ,
surveys The singing masons building roofs of
gold , The civil citizens kneading up the honey ,
The poor mechanic porters crowding in Their
heavy burdens at his narrow gate , The sad - ey
' d justice , with his surly hum , Delivering o ' er
to executors pale The lazy yawning drone .
November 2000
Linguistic SNLP
Unigram Shakespeare
great , , of . and nothing Who than ; , ; they
gentleman ecus . That Till Britaine of Where
Salisbury even about unprovided that sum
Gainst. behind serve a it offend perdurable ;
friends sort spirit whereof them English me
mouth not Would thy put of peers civil ' pasture
our READABLE the d , ? madame if that Isabel
DAUPHIN need widow KING a shall ' like .
wonderful he The Southampton ? the
Consideration terre Hugh an snatchers is ' keep
repose IS Exeunt ry , mothers inward was words
are BOY another I , Europe
November 2000
Linguistic SNLP
Bigram Shakespeare
retreat . By my hand into the clouds , as cold as
any military man in divers functions , Setting
endeavour in continual motion ; To - morrow is
Saint Crispian . He trots the air , a subject for a
sovereign ' s ambition ! He was thinking of civil
wars when he speaks , The King hath heard that
men of mould . Abate thy rage . Use lenity ,
sweet chuck . NYM . They know your
worthiness . My liege , as you shall read that my
Nell is dead i ' faith , my cousin Suffolk
November 2000
Linguistic SNLP
Trigram Shakespeare
HENRY . We are in God ' s peace ! I have an
excellent armour ; but in loving me you should
love the lovely bully . What men have defeated
the law ; Who when they were as cold as any ' s
ambition ! He was thinking of civil wars when he
was a merry message . KING HENRY . Thou
dost thy office fairly . Turn head and stop
pursuit ; for we hear Your greeting is from him ,
you men of mould . Abate thy rage , abate thy
manly rage ; Abate thy rage ,
November 2000
Linguistic SNLP
Simulating a macroscopic property
Author
µ
σ
2
σ
2
(binomial)
Real
Shakespeare
13.17
326.83
186.645
Twain
16.5
214.83
288.62
Lambs
32.09
753.93
1061.53
Trigram
Shakespeare
13.23
286.56
188.32
Twain
16.09
195.9
274.82
Lambs
32.07
901.337
1060.41
Bigram
Shakespeare
13.16
265.37
186.227
Twain
16.32
209.47
295.49
Lambs
31.8
906.82
1043.14
Unigram
Shakespeare
12.85
174.68
178.09
Twain
16.48
272.43
288.36
Lambs
30.5
955.98
960.5
November 2000
Linguistic SNLP
Bad ways to Twain Shakespeare
Procrustean model:
Erase existing full stops
Insert full stops at fixed intervals
every 16.5 words or so
Result: gibberish
Stochastic punctuation:
Erase all existing punctuation
Stochastically insert punctuation using
model of Twain’s punctuation use
Result: gibberish
November 2000
Linguistic SNLP
A better way
Require the author
eg: Shakespearean trigram model
to produce a set or lattice of alternative
texts.
That way, we can exploit an architecture
based on Langkilde and Knight’s
Nitrogen.
November 2000
Linguistic SNLP
Two-level generation
Nitrogen is a two-component architecture
for generation developed by Knight’s
group at USC.
Symbolic
Generator
Stochastic
Evaluator
Search
Space
November 2000
Linguistic SNLP
Two-level generation for MT
In the original Nitrogen, the generator is a non-
deterministic, symbolic generator, the evaluator
is a bigram or trigram language model.
Application is to Japanese/English MT, where
the input to generation may lack crucial number
information.
Number agreement is treated as a fluency goal,
since the propositional input does not specify it.
The n-gram model selects for number
agreement
November 2000
Linguistic SNLP
The Nitrogen architecture
There are two components, only one of which is
stochastic.
The stochastic evaluator may make fine grained
distinctions, but the generator cannot.
Architecturally, there is no reason why the
generator should not be stochastic too.
If it is, both components can have fine-grained
preferences, and applications can choose how
to strike a balance between fluency and fidelity.
November 2000
Linguistic SNLP
Two-level generation for length
Stochastic
Evaluator
Markov chain generator produces
weighted word lattice.
Binomial (or Negative binomial, Katz k-
mixture, …) evaluates.
Stochastic
Generator
Search
Space
November 2000
Linguistic SNLP
Adding type-to-token ratios
This suffices for sentence length, allowing us to
approach a specified sentence length norm,
while preserving (some of) the text’s integrity.
But for TTR the combinatorics of achieving a
specified norm are more complex.
Still possible to achieve the goal, but need to
replace stochastic evaluator with an appropriate
Maximum Entropy distribution.
Reduces the problem to definition of an appropriate
set of feature functions.
November 2000
Linguistic SNLP
Features for vocabulary diversity
Predicate templates
The label I appears at position p.
Positions p and q carry the same label.
Positions p and q are both labelled with I.
The label I is repeated with inter-token distance d.
These are sufficient to fix TTR and sentence
length.
Designed so that one author’s model can
evaluate another author’s text.
i
ii
iii
iv
v
vi
vii
viii
ix
x
xi
…
window
onto
a
text
.
This
is
the
window
which
is
…
1
2
3
4
5
6
7
8
1
9
7
…
November 2000
Linguistic SNLP
Application
So one can, if required, instantiate the Nitrogen
architecture so as to generate text from
Shakespeare’s trigrams but Twain’s vocabulary
diversity preferences.
The power of MaxEnt is warranted by the
application, which has a degree of credibility in
the personality literature.
The real significance lies in the fact that TTR is
representative of a larger class of macroscopic
properties.
November 2000
Linguistic SNLP
Open questions
Features
No claims for the particular features.
But Biber, DiMarco & Hirst, Danlos, Hovy, …
describe observables relevant to style goals.
Future work to encode these as feature functions
for MaxEnt.
Architecture
Is weighted trigram-based lattice adequate?
If not, unpublished work by Langkilde has
proposals to replace with parse forests or similar.
November 2000
Linguistic SNLP
Conclusions
Architecture
Keep two-level architecture,
because it gives the reviewer a clean mechanism
for achieving macroscopic targets.
Application
The reviewer also gets a clean mechanism for
rewriting the text without asking the author. The
author may then need to write text designed to
survive cuts.
New separation of concerns:
Author is the specialist in content
Reviewer is the specialist in the effective choice of words.
November 2000
Linguistic SNLP
Last word
“The right word may be effective,
but no word was ever as effective
as a rightly timed pause.”
November 2000
Linguistic SNLP
Aphasia and Swearing
Richard Shillcock ANC, Edinburgh
Scott McDonald ICCS, Edinburgh
Simon Kirby
Linguistics, Edinburgh
Chris brew
Linguistics, Ohio state
November 2000
Linguistic SNLP
Goals
Understand how words are stored in the
brain
Understand what happens in aphasia
A good reason for using swear words in
public
November 2000
Linguistic SNLP
Aphasia
Broca’s area
Wernicke’s area
November 2000
Linguistic SNLP
Wernicke
Damage to Wernicke’s area impairs
meaning
Never, now mista oyge I
wanna tell this happened
when he rent. His…his kell
come down here and is.. He
got ren something.
November 2000
Linguistic SNLP
Broca
Damage to Broca’s area leaves meaning
intact but makes speech halting.
Yes.. Monday … Dad and
Dick… Wednesday, 9
o’clock, 10 o’clock
November 2000
Linguistic SNLP
Why (one idea)
Memory for how words sound is stored in
Wernicke’s area. Lesions there impair
knowledge of which word go with which
ideas.
Lesions in Broca’s area impair ability to
associate speech movements with words.
Too simple, but will do for now.
November 2000
Linguistic SNLP
Wernicke and sounds
People with mild Wernicke lesions often
have difficulty with sounds, mistaking
e.G. “B: and “p”. Also writing.
Chinese people with Wernicke have the
same trouble with sounds. But writing is
(relatively) unimpaired.
May be because Chinese writing is not so
closely linked to sounds.
November 2000
Linguistic SNLP
What aphasics say
Lancker and Cumming J.Brain research
reviews (1999)
“bl***y h***”,”f**k”,”f**k”,”f**k”
But also
“Two”,”three”,”Billy”,”paper and pencil”
“BBC”
November 2000
Linguistic SNLP
Tourette’s syndrome
Vocal tics. Involuntary speech.
25-50% of patients have coprolalia.
Involuntary foul speaking.
Not specific to speech, there is an
example of coprolalic signing.
Different pattern from aphasic swearing.
November 2000
Linguistic SNLP
What we did
Richard, Scott and I developed a simple
statistical measure of meaning similarity
for explaining priming experiments.
Independently interested in sound-
meaning relation, because of patterns like
all the words that begin with “sn-” or all
those that begin with “gl-”.
November 2000
Linguistic SNLP
Semantic distances
To create a “meaning” for a word, find all
its occurrences in the British national
corpus (100 million words).
Strip off the endings (“walks”,”, ”walked”
-> “walk”).
Look at how often 500 common words
occurred within 5 words of the target.
November 2000
Linguistic SNLP
Contextual vectors
Each word is represented by the counts
for the 500 words.
This can be thought of as a direction in a
500-dimensional space.
Distance is 1-(cosine of angle between
these directions).
For rare words, too few counts, did top
8,000 words only.
November 2000
Linguistic SNLP
Phonetic distance
Used festival text-to-speech synthesizer
to obtain phonetics for words.
Used distinctive feature matrix to get a
set of confusion penalties. Less bad to
confuse a “p” with a “b” than a “z” with an
“a”.
Calculated cheapest way of “mishearing”
word1 as word2.
November 2000
Linguistic SNLP
Testing the hypothesis
Single syllable, monomorphemic words
(conservative).
Left us with 1733 words (but they account
for 63% of the words in the spoken BNC).
1,500,778 pairs of distances. Did a
correlation. (R=0.061). Words similar in
our estimated meaning are more similar
in sound.
November 2000
Linguistic SNLP
Randomization
Each word was randomly assigned a
partner.
Correlation was calculated using the
words own semantic distance but the
partner’s phonetic form.
Repeated with different partners.
The veridical correlation was an outlier in
the distribution.
November 2000
Linguistic SNLP
Title:
meansstderr.eps
Creator:
gnuplot 3.7 patchlevel 1
Preview:
This EPS picture was not saved
with a preview included in it.
Comment:
This EPS picture will print to a
PostScript printer, but not to
other types of printers.
November 2000
Linguistic SNLP
Explanations?
Not just clusters. For every phonological
distance, there is a similar distribution of
semantic distance.
Not just common words. Split lexicon in
two and re-did calculations.
November 2000
Linguistic SNLP
Structure preserving mappings
The brain is infested with structure-
preserving mappings (things close on
input tend to be close on output).
A fully structure preserving lexicon would
be a disaster for communication (words
would have to be too long).
But limited structure preservation can aid
learning.
November 2000
Linguistic SNLP
Recycling
We think the brain is recycling a solution
to an evolutionarily simpler problem, so
we are not surprised to see a tendency to
topographic mapping in the meaning
sound relation.
November 2000
Linguistic SNLP
Swearwords
We noticed that swearwords make an
especially large contribution to the
correlation between sound and meaning.
We speculated that this was related to
their strength as swearwords.
We ran a magnitude estimation
experiment to get subject judgements on
how strong the words are.
November 2000
Linguistic SNLP
- 0 . 5
- 0 . 2 5
0
0 . 2 5
e
x
p
le
t
iv
e
s
t
r
e
n
g
t
h
- 0 . 0 5
0
0 . 0 5
0 . 1
0 . 1 5
v a l u e
v i s c e r a l w o r d s
d e i s t i c w o r d s
r
November 2000
Linguistic SNLP
Deistic and visceral words
Deistic is “god”,”damn”,”hell”
Visceral is parts of the body
No other monosyllables that we found.
November 2000
Linguistic SNLP
Conclusions
The English lexicon has “evolved” in part
to fit the cognitive niche it occupies.
This niche is conditioned by the brain’s
predisposition for structure-preserving
representations.
November 2000
Linguistic SNLP
Conclusions
This tendency is strongest for the
communicatively most important words,
of which the expletives are a prime
example.
Thus, fuck is one of the words that is
most richly embedded in the English
mental lexicon.
November 2000
Linguistic SNLP
Conclusions (swearing)
When the lexicon is compromised by
damage, the expletives will be among the
words that receive the most help from the
rest of the lexical representations.
When the lexicon suffers a “tic”, as in
Tourette’s syndrome, the expletives will
be among the words made most salient
by indiscriminate activity over the lexical
representations of many words.
November 2000
Linguistic SNLP
Conclusions (general)
Statistical NLP is appropriate for the
human sciences.
One key is availability of data.
Biology matters too.
Simple ideas can be enough.
Approximation is good