(Natural Language Processing (Nlp)) Three Linguistic Uses Of Statistical Nlp(1)

background image

Three Linguistic Uses of
Statistical NLP

Chris Brew

Linguistics

The Ohio State University

background image

November 2000

Linguistic SNLP

Plan of the talk

Introduction

Three applications

Verb-classes

Fluency for NLG

Aphasia and swearing

Conclusions

background image

November 2000

Linguistic SNLP

Why language?

Machine learning is flavour of the decade

in computational linguistics.

But machine learning also does games,

robotics, medicine, scene interpretation,
motion tracking, credit ratings …

So why do language?

background image

November 2000

Linguistic SNLP

Which technique?

Very diverse tasks, many challenges.

Area

Topic

Techniques

Intonation

Classify contours

Continuous HMM

Tokenization

Guess tokenisation

Error-driven learning

POS Tagging

Stochastic CUF

Discrete HMM, Logic
Programming

Syntax

Stochastic HPSG

EM algorithm

Verb classes

Classify verb
occurrences

Graphical models

Style

Evaluate translation
quality

Multidimensional scaling,
clustering.

Translation

Find and classify
translation pairs

Contingency table
measures (G score, mutual
information)

background image

November 2000

Linguistic SNLP

What’s in it for human sciences?

If you have a clear hypothesis, you can

run machine learning experiments to test
it. Cheaper than psycholinguistics.

It is possible to systematically explore

large classes of theories, even ones too
costly to code up by hand.

Not necessarily explanatory

background image

November 2000

Linguistic SNLP

Learning Levin’s verb classes

Verbs are central to most linguistic

theories. And needed for all applications.

Beth Levin “English Verb Classes and

Alternations”.

Systematic and theory neutral account of

verbs and their behaviour.

Coverage necessarily incomplete.

background image

November 2000

Linguistic SNLP

Levin’s hypothesis

Verbs with similar semantics show similar
alternations. Load, rub and plaster are like
spray (

SPRAY/LOAD

verbs). Make, build and

knit pattern with carve (

BUILD

verbs). Levin

uses this as a basis for 200-odd classes.

Jessica sprayed paint on the wall.

Martha carved the baby a toy.

Jessica sprayed the wall with paint.

Martha carved a toy for the baby.

Jessica sprayed water at the baby.

*Martha carved a toy at the baby.

*Jessica Sprayed me water.

background image

November 2000

Linguistic SNLP

Ambiguity

784 of Levin’s 3,024 are class
ambiguous. Ambiguity correlates with
high frequency. Verbs can be class
ambiguous even after syntactic frame is
known.

Remember to write your aunt a thankyou letter

MESSAGE_TRANSFER

Our lawyer will write you a Green Card application.

PERFORMANCE

The attendant will call you a cab.

GET

The prosecution will call you a liar.

DUB

background image

November 2000

Linguistic SNLP

Lapata and Brew (1999)

Statistical model of verb class ambiguity.

Task: infer class for ambiguous cases.

Goal: investigate and test Levin’s

hypothesis.

Goal: infer class for verbs omitted from

Levin’s list.

background image

November 2000

Linguistic SNLP

The General Approach

Stochastic process generating class,

frame and verb. Express this process as
a causal model (Bayes net).

Find reasonable estimates of the

conditional probabilities which
parameterize the network.

Find class which maximizes p(class|

frame,verb).

background image

November 2000

Linguistic SNLP

Problem

We have millions of words of POS

tagged English in the British national
corpus, but we don’t know frames or
classes.

We certainly can’t afford to generate

complete training data.

If we had a really good broad-coverage

parser, we would have frames.

We would still need classes.

background image

November 2000

Linguistic SNLP

PP attachment

A similar problem, whose solution we

adapt

S

NP

I

VP

V

washed

NP

DET

the

NN

shirt

PP

P

with

NP

soap

S

NP

I

VP

V

bought

NP

NP

DET

the

NN

shirt

PP

P

with

NP

pockets

background image

November 2000

Linguistic SNLP

Hindle and Rooth

Wanted to obtain (automatically) lexical
information for use in deciding attachment.

Key idea: may not have a perfect parser, but if
we have a reasonable parser, we can use its
(statistically filtered) output to make reasonable
decisions.

You need a lot of text, but it doesn’t have to be
marked up.

background image

November 2000

Linguistic SNLP

Discovering Lexical Association
in text

Church’s part of speech analyser.

Hindle’s FIDDICH partial parser.

13 million words of AP news wire.

Classified probable tokens of

attachments, compared likelihoods of
different attachments. Iterative process.

background image

November 2000

Linguistic SNLP

Ratnaparkhi (Coling 98)

MaxEnt tagger, simple chunker.

Heuristics based on POS sequence.

Slight drop in accuracy.Portable to Spanish.

(v,p,n2) if

p is a real preposition (not “of”)
v is the first verb that occurs < K words left of p
v is not a form of the verb “to be”
No noun occurs between v and p
n2 is first word < K words right of p
No verb occurs between p and n2

background image

November 2000

Linguistic SNLP

Verb frames from the BNC

Wrote simple grammars for.

V NP NP.

V NP PP

for.

V NP PP

to.

Filtered to remove noise (compound

nouns in particular), obtaining joint
frequency distribution of frame and verb.

background image

November 2000

Linguistic SNLP

A causal model

Pclassverb , frame » Pclassframe

Pverb , frame ,class =Pverb Pframeverb Pclassverb , frame

frame

verb

class

background image

November 2000

Linguistic SNLP

The causal model

We have

P(verb),P(frame),P(frame|verb)

but need

P

(class),P(frame|class).

First approach is to approximate. (works)

Second approach is to use EM to

iteratively estimate

P(class),P(frame|class).

(?)

Pverb , frame ,class »

Pverb Pclass Pframeverb Pframeclass

Pframe

background image

November 2000

Linguistic SNLP

P(frame|class)

For each class, counted the syntactic

frames listed in Levin. Fairly coarse and
easy classification of frames. For GIVE
there were NP-v-NP-PP

to

and NP-V-NP-

NP only. 6 frames for PERFORMANCE.

Assumed uniform distribution.

P(NP-V-NP-NP|GIVE) estimated as 1/2.

P(NP-V|PERFORMANCE) estimated as 1/6.

background image

November 2000

Linguistic SNLP

P(class)

Pclass =

verb

Pverb ,class

Pclass =

verb

Pverb Pclassverb

Pclass =

verb

Pverb Pclassamb

class

Ambiguity class: a set of
verbs which show the same
patterns of ambiguity. For
example, all the verbs which
can be either of the classes
MESSAGE_TRANSFER or
PERFORMANCE, but no
other. Ambiguity classes
reduce sparse data
problems. We still need a
principled way of estimating

P(class|ambiguity class)

background image

November 2000

Linguistic SNLP

P(class|amb_class)

Key idea: use class size measured on

verb types to stand in for true class
population, which we don’t know.

Pclassamb

class

»

size class

cÎ amb

class

size c

Verb

Class

Size

P(class|amb_class)

f(verb,class)

Pass

THROW

27

27/72

7783

Pass

SEND

20

20/72

5253

Pass

GIVE

15

15/72

3891

Pass

MARRY

10

10/72

2530

background image

November 2000

Linguistic SNLP

Evaluation

For some verbs, knowing the frame is

sufficient. We checked whether our model
predicts the class that Levin specifies.
Baseline was to use our estimated p
(class).

Frame

Verbs

Baseline

Model

NP-V-NP-NP

123

61.8%

87.8%

NP-V-NP-PP

to

113

67.2%

92%

NP-V-NP-PP

for

70

70%

98.5%

Combined

306

65.7%

91.8%

background image

November 2000

Linguistic SNLP

Evaluation

For other verbs, ambiguity persists. We

marked up some instances with our
judgements. Same baseline.

Frame

Verbs

Baseline

Model

NP-V-NP-NP

14

42.8%

85.7%

NP-V-NP-PP

to

15

73.4%

86.6%

NP-V-NP-PP

for

2

0%

50%

Combined

31

61.3%

83.9%

background image

November 2000

Linguistic SNLP

Evaluation

We think that assigning classes is an

easy and uncontroversial task, but we are
going to test this using external judges.

In many cases the order of preference for

classes seems right. In others the
independence assumptions (class
independent of verb given frame) are
clearly in error.

background image

November 2000

Linguistic SNLP

Second approach

Use EM.

Search issues. Danger of over-fitting.

class

verb

frame

background image

November 2000

Linguistic SNLP

Stochastic text generation

Joint work with Jon Oberlander

background image

November 2000

Linguistic SNLP

Goals

To argue that

Making text fluent demands the achievement of
norms involving “macroscopic” textual properties

To exemplify

Two instances of macroscopic properties

Which are displayed in even simple generation models

To provide

A simple mechanism for achieving macroscopic
norms

Using a two-component architecture

Both parts of which can be stochastic in nature

background image

November 2000

Linguistic SNLP

Authors & reviewers

The process of writing documents (and
preparing talks) often involves more than one
party:

An author

A reviewer

The reviewer can note the need for changes to
both

fidelity

(content) and

fluency

(form)

These can be implemented by the author

As in academic practice

Or by the reviewer

As in journalistic practice

background image

November 2000

Linguistic SNLP

In Mark Twain’s words

Sometimes the reviewer:

“Saves you—and offends you—with this
cold sign in the margin: (?) and you
search the passage and find that the
insulter is right—it doesn't say what you
thought it did: the gas-fixtures are there,
but you didn't light the jets.”

background image

November 2000

Linguistic SNLP

Macroscopic textual properties

Some of the reviewer’s requirements relate to
individual textual items

“sentence Z is too long”

Others relate to properties of the text as a whole

“the style is generally too informal”

“as to the adjectives: when in doubt, strike them out”

“sentences are often too short, or don’t vary enough
in length”

Such distributional properties emerge from a large

number of decisions about individual sentences.

background image

November 2000

Linguistic SNLP

Ex 1: sentence lengths in ILEX

This jewel is a finger ring. It is a
remarkably fluid piece. It is rather
reminiscent of molten metal. It
was made by Frances Beck. It is
also in the Organic style. It was
made in 1969. It is also made
from diamonds. It is made from
tourmaline. It is made from 18-
carat gold. It was made in
Buckingham. It draws on natural
themes for inspiration. It is
inscribed with Hallmarks: (…).
Beck was English. She lived in
Buckingham.

This

jewel

is a finger ring and is

rather reminiscent of molten
metal. It is a remarkably fluid
piece and draws on natural
themes for inspiration. It was
made by Frances Beck, who
was English and lived in
Buckingham. It is also in the
Organic style. It was made in
1969. It is also made from
diamonds. It is made from
tourmaline and 18-carat gold.
The jewel, which is inscribed
with Hallmarks: (…), was made
in Buckingham.

N

ag

gre

ga

tio

n

A

gg

re

ga

tio

n

Mean = 6.0, SD = 2.3                    Mean = 11.1, SD = 5.2

background image

November 2000

Linguistic SNLP

Ex 2: projecting personality in ILEX

This jewel is sort of Arts and Crafts
in style. It's set with jewels, isn't it? I
think Arts and Crafts style jewels
were usually made with rounded
stones but our jewel here wasn't. It
was made with faceted stones. It
was made by a single craftsman,
but that wasn't unusual. Arts and
Crafts style jewels were never
made by groups of craftsmen, or so
I'm told. Arts and Crafts style jewels
probably demonstrated the artistic
sensibilities of the wearer, but this
jewel didn't. Interestingly, it
identified the wearer as a Christian,
didn't it?

This jewel is in the Arts and Crafts
style. It is set with gems. This kind
of jewellery usually features
rounded stones but this item uses
faceted ones. This was made by a
single craftsman; indeed that's
typical of the style. Like most Arts
and Crafts pieces, this pendant has
an elaborate design. Although this
style of work usually demonstrates
the artistic sensibilities of the
wearer, this particular jewel
identifies its wearer as a Christian.

Co

m

pe

te

nt

In

co

m

pe

te

nt

Type/token = 0.75                        Type/token = 0.90

background image

November 2000

Linguistic SNLP

Achieving macroscopic targets

Usability requires text to be naturalistic, and
appropriate to user

Reviewers can identify relevant fluency targets

Specifications of macroscopic textual properties

But to help authors and reviewers achieve those
targets, we need a mechanism:

How do we reduce a target for a macroscopic
property of the text to a prescription for change at the
level of individual authorial decisions?

background image

November 2000

Linguistic SNLP

Approximating Mr Shakespeare

, armed in their stings , Make boot upon the
summer ' s velvet buds , Which pillage they with
merry march bring home To the tent - royal of
their emperor ; Who , busied in his majesty ,
surveys The singing masons building roofs of
gold , The civil citizens kneading up the honey ,
The poor mechanic porters crowding in Their
heavy burdens at his narrow gate , The sad - ey
' d justice , with his surly hum , Delivering o ' er
to executors pale The lazy yawning drone .

background image

November 2000

Linguistic SNLP

Unigram Shakespeare

great , , of . and nothing Who than ; , ; they
gentleman ecus . That Till Britaine of Where
Salisbury even about unprovided that sum
Gainst. behind serve a it offend perdurable ;
friends sort spirit whereof them English me
mouth not Would thy put of peers civil ' pasture
our READABLE the d , ? madame if that Isabel
DAUPHIN need widow KING a shall ' like .
wonderful he The Southampton ? the
Consideration terre Hugh an snatchers is ' keep
repose IS Exeunt ry , mothers inward was words
are BOY another I , Europe

background image

November 2000

Linguistic SNLP

Bigram Shakespeare

retreat . By my hand into the clouds , as cold as
any military man in divers functions , Setting
endeavour in continual motion ; To - morrow is
Saint Crispian . He trots the air , a subject for a
sovereign ' s ambition ! He was thinking of civil
wars when he speaks , The King hath heard that
men of mould . Abate thy rage . Use lenity ,
sweet chuck . NYM . They know your
worthiness . My liege , as you shall read that my
Nell is dead i ' faith , my cousin Suffolk

background image

November 2000

Linguistic SNLP

Trigram Shakespeare

HENRY . We are in God ' s peace ! I have an
excellent armour ; but in loving me you should
love the lovely bully . What men have defeated
the law ; Who when they were as cold as any ' s
ambition ! He was thinking of civil wars when he
was a merry message . KING HENRY . Thou
dost thy office fairly . Turn head and stop
pursuit ; for we hear Your greeting is from him ,
you men of mould . Abate thy rage , abate thy
manly rage ; Abate thy rage ,

background image

November 2000

Linguistic SNLP

Simulating a macroscopic property

Author

µ

σ

2

σ

2

(binomial)

Real

Shakespeare

13.17

326.83

186.645

Twain

16.5

214.83

288.62

Lambs

32.09

753.93

1061.53

Trigram

Shakespeare

13.23

286.56

188.32

Twain

16.09

195.9

274.82

Lambs

32.07

901.337

1060.41

Bigram

Shakespeare

13.16

265.37

186.227

Twain

16.32

209.47

295.49

Lambs

31.8

906.82

1043.14

Unigram

Shakespeare

12.85

174.68

178.09

Twain

16.48

272.43

288.36

Lambs

30.5

955.98

960.5

background image

November 2000

Linguistic SNLP

Bad ways to Twain Shakespeare

Procrustean model:

Erase existing full stops

Insert full stops at fixed intervals

every 16.5 words or so

Result: gibberish

Stochastic punctuation:

Erase all existing punctuation

Stochastically insert punctuation using
model of Twain’s punctuation use

Result: gibberish

background image

November 2000

Linguistic SNLP

A better way

Require the author

eg: Shakespearean trigram model

to produce a set or lattice of alternative

texts.

That way, we can exploit an architecture

based on Langkilde and Knight’s
Nitrogen.

background image

November 2000

Linguistic SNLP

Two-level generation

Nitrogen is a two-component architecture

for generation developed by Knight’s
group at USC.

Symbolic
Generator

Stochastic
Evaluator

Search

Space

background image

November 2000

Linguistic SNLP

Two-level generation for MT

In the original Nitrogen, the generator is a non-
deterministic, symbolic generator, the evaluator
is a bigram or trigram language model.

Application is to Japanese/English MT, where
the input to generation may lack crucial number
information.

Number agreement is treated as a fluency goal,
since the propositional input does not specify it.

The n-gram model selects for number
agreement

background image

November 2000

Linguistic SNLP

The Nitrogen architecture

There are two components, only one of which is
stochastic.

The stochastic evaluator may make fine grained
distinctions, but the generator cannot.

Architecturally, there is no reason why the
generator should not be stochastic too.

If it is, both components can have fine-grained
preferences, and applications can choose how
to strike a balance between fluency and fidelity.

background image

November 2000

Linguistic SNLP

Two-level generation for length

Stochastic
Evaluator

Markov chain generator produces

weighted word lattice.

Binomial (or Negative binomial, Katz k-

mixture, …) evaluates.

Stochastic
Generator

Search

Space

background image

November 2000

Linguistic SNLP

Adding type-to-token ratios

This suffices for sentence length, allowing us to
approach a specified sentence length norm,
while preserving (some of) the text’s integrity.

But for TTR the combinatorics of achieving a
specified norm are more complex.

Still possible to achieve the goal, but need to
replace stochastic evaluator with an appropriate
Maximum Entropy distribution.

Reduces the problem to definition of an appropriate
set of feature functions.

background image

November 2000

Linguistic SNLP

Features for vocabulary diversity

Predicate templates

The label I appears at position p.

Positions p and q carry the same label.

Positions p and q are both labelled with I.

The label I is repeated with inter-token distance d.

These are sufficient to fix TTR and sentence
length.

Designed so that one author’s model can
evaluate another author’s text.

i

ii

iii

iv

v

vi

vii

viii

ix

x

xi

window

onto

a

text

.

This

is

the

window

which

is

1

2

3

4

5

6

7

8

1

9

7

background image

November 2000

Linguistic SNLP

Application

So one can, if required, instantiate the Nitrogen
architecture so as to generate text from
Shakespeare’s trigrams but Twain’s vocabulary
diversity preferences.

The power of MaxEnt is warranted by the
application, which has a degree of credibility in
the personality literature.

The real significance lies in the fact that TTR is
representative of a larger class of macroscopic
properties.

background image

November 2000

Linguistic SNLP

Open questions

Features

No claims for the particular features.

But Biber, DiMarco & Hirst, Danlos, Hovy, …
describe observables relevant to style goals.
Future work to encode these as feature functions
for MaxEnt.

Architecture

Is weighted trigram-based lattice adequate?

If not, unpublished work by Langkilde has
proposals to replace with parse forests or similar.

background image

November 2000

Linguistic SNLP

Conclusions

Architecture

Keep two-level architecture,
because it gives the reviewer a clean mechanism
for achieving macroscopic targets.

Application

The reviewer also gets a clean mechanism for
rewriting the text without asking the author. The
author may then need to write text designed to
survive cuts.

New separation of concerns:

Author is the specialist in content

Reviewer is the specialist in the effective choice of words.

background image

November 2000

Linguistic SNLP

Last word

“The right word may be effective,

but no word was ever as effective

as a rightly timed pause.”

background image

November 2000

Linguistic SNLP

Aphasia and Swearing

Richard Shillcock ANC, Edinburgh

Scott McDonald ICCS, Edinburgh

Simon Kirby

Linguistics, Edinburgh

Chris brew

Linguistics, Ohio state

background image

November 2000

Linguistic SNLP

Goals

Understand how words are stored in the

brain

Understand what happens in aphasia

A good reason for using swear words in

public

background image

November 2000

Linguistic SNLP

Aphasia

Broca’s area

Wernicke’s area

background image

November 2000

Linguistic SNLP

Wernicke

Damage to Wernicke’s area impairs

meaning

Never, now mista oyge I
wanna tell this happened
when he rent. His…his kell
come down here and is.. He
got ren something.

background image

November 2000

Linguistic SNLP

Broca

Damage to Broca’s area leaves meaning

intact but makes speech halting.

Yes.. Monday … Dad and
Dick… Wednesday, 9
o’clock, 10 o’clock

background image

November 2000

Linguistic SNLP

Why (one idea)

Memory for how words sound is stored in

Wernicke’s area. Lesions there impair
knowledge of which word go with which
ideas.

Lesions in Broca’s area impair ability to

associate speech movements with words.

Too simple, but will do for now.

background image

November 2000

Linguistic SNLP

Wernicke and sounds

People with mild Wernicke lesions often

have difficulty with sounds, mistaking
e.G. “B: and “p”. Also writing.

Chinese people with Wernicke have the

same trouble with sounds. But writing is
(relatively) unimpaired.

May be because Chinese writing is not so

closely linked to sounds.

background image

November 2000

Linguistic SNLP

What aphasics say

Lancker and Cumming J.Brain research

reviews (1999)

“bl***y h***”,”f**k”,”f**k”,”f**k”

But also

“Two”,”three”,”Billy”,”paper and pencil”

“BBC”

background image

November 2000

Linguistic SNLP

Tourette’s syndrome

Vocal tics. Involuntary speech.

25-50% of patients have coprolalia.

Involuntary foul speaking.

Not specific to speech, there is an

example of coprolalic signing.

Different pattern from aphasic swearing.

background image

November 2000

Linguistic SNLP

What we did

Richard, Scott and I developed a simple

statistical measure of meaning similarity
for explaining priming experiments.

Independently interested in sound-

meaning relation, because of patterns like
all the words that begin with “sn-” or all
those that begin with “gl-”.

background image

November 2000

Linguistic SNLP

Semantic distances

To create a “meaning” for a word, find all

its occurrences in the British national
corpus (100 million words).

Strip off the endings (“walks”,”, ”walked”

-> “walk”).

Look at how often 500 common words

occurred within 5 words of the target.

background image

November 2000

Linguistic SNLP

Contextual vectors

Each word is represented by the counts

for the 500 words.

This can be thought of as a direction in a

500-dimensional space.

Distance is 1-(cosine of angle between

these directions).

For rare words, too few counts, did top

8,000 words only.

background image

November 2000

Linguistic SNLP

Phonetic distance

Used festival text-to-speech synthesizer

to obtain phonetics for words.

Used distinctive feature matrix to get a

set of confusion penalties. Less bad to
confuse a “p” with a “b” than a “z” with an
“a”.

Calculated cheapest way of “mishearing”

word1 as word2.

background image

November 2000

Linguistic SNLP

Testing the hypothesis

Single syllable, monomorphemic words

(conservative).

Left us with 1733 words (but they account

for 63% of the words in the spoken BNC).

1,500,778 pairs of distances. Did a

correlation. (R=0.061). Words similar in
our estimated meaning are more similar
in sound.

background image

November 2000

Linguistic SNLP

Randomization

Each word was randomly assigned a

partner.

Correlation was calculated using the

words own semantic distance but the
partner’s phonetic form.

Repeated with different partners.

The veridical correlation was an outlier in

the distribution.

background image

November 2000

Linguistic SNLP

Title:
meansstderr.eps
Creator:
gnuplot 3.7 patchlevel 1
Preview:
This EPS picture was not saved
with a preview included in it.
Comment:
This EPS picture will print to a
PostScript printer, but not to
other types of printers.

background image

November 2000

Linguistic SNLP

Explanations?

Not just clusters. For every phonological

distance, there is a similar distribution of
semantic distance.

Not just common words. Split lexicon in

two and re-did calculations.

background image

November 2000

Linguistic SNLP

Structure preserving mappings

The brain is infested with structure-

preserving mappings (things close on
input tend to be close on output).

A fully structure preserving lexicon would

be a disaster for communication (words
would have to be too long).

But limited structure preservation can aid

learning.

background image

November 2000

Linguistic SNLP

Recycling

We think the brain is recycling a solution

to an evolutionarily simpler problem, so
we are not surprised to see a tendency to
topographic mapping in the meaning
sound relation.

background image

November 2000

Linguistic SNLP

Swearwords

We noticed that swearwords make an

especially large contribution to the
correlation between sound and meaning.

We speculated that this was related to

their strength as swearwords.

We ran a magnitude estimation

experiment to get subject judgements on
how strong the words are.

background image

November 2000

Linguistic SNLP

- 0 . 5

- 0 . 2 5

0

0 . 2 5

e

x

p

le

t

iv

e

s

t

r

e

n

g

t

h

- 0 . 0 5

0

0 . 0 5

0 . 1

0 . 1 5

v a l u e

v i s c e r a l w o r d s

d e i s t i c w o r d s

r

background image

November 2000

Linguistic SNLP

Deistic and visceral words

Deistic is “god”,”damn”,”hell”

Visceral is parts of the body

No other monosyllables that we found.

background image

November 2000

Linguistic SNLP

Conclusions

The English lexicon has “evolved” in part

to fit the cognitive niche it occupies.

This niche is conditioned by the brain’s

predisposition for structure-preserving
representations.

background image

November 2000

Linguistic SNLP

Conclusions

This tendency is strongest for the

communicatively most important words,
of which the expletives are a prime
example.

Thus, fuck is one of the words that is

most richly embedded in the English
mental lexicon.

background image

November 2000

Linguistic SNLP

Conclusions (swearing)

When the lexicon is compromised by

damage, the expletives will be among the
words that receive the most help from the
rest of the lexical representations.

When the lexicon suffers a “tic”, as in

Tourette’s syndrome, the expletives will
be among the words made most salient
by indiscriminate activity over the lexical
representations of many words.

background image

November 2000

Linguistic SNLP

Conclusions (general)

Statistical NLP is appropriate for the

human sciences.

One key is availability of data.

Biology matters too.

Simple ideas can be enough.

Approximation is good


Wyszukiwarka

Podobne podstrony:
Geib Christopcher, Steedman Mark On Natural Language Processing and Plan Recognition
The Language of Internet 8 The linguistic future of the Internet
32 Abduction in Natural Language Understanding The Handbook of Pragmatics Blackwell Reference Onli
Body language is something we are aware of at a subliminal level
Guide to the properties and uses of detergents in biology and biochemistry
Partington A linguistic account of wordplay
Uses of the Present Continuous
60 Uses of Vinegar
Top Ten Uses of Get
Uses of the past simple
Natural Meaning for Natural Language
Language Processing in Discourse A Key to Felicitous Translation M Doherty (2002)
History and Uses of Marijuana Its Many?nefits
Three Important Qualities Of Christ's Life
Some uses of the verbs
Body language is something we are aware of at a subliminal level
Guide to the properties and uses of detergents in biology and biochemistry
Partington A linguistic account of wordplay
Sapir (1921) Language an Introduction to the Study of Speech

więcej podobnych podstron