10 1 1 8 1643

10 1 1 8 1643

background image

F

ast

Minim

um

T

raining

Error

Disretization

T

apio

Elomaa

elomaas.helsinki.fi

Juho

Rousu

r

ousus.helsinki.fi

Departmen

t

of

Computer

Siene,

P

.

O.

Bo

x

26,

FIN-00014

Univ

ersit

y

of

Helsinki,

Finland

Abstrat

The

need

to

disretize

a

n

umerial

range

in

to

lass

oheren

t

in

terv

als

is

a

problem

frequen

tly

enoun

tered.

T

raining

Set

Error

(TSE )

is

one

of

the

ommonly

used

impurit

y

funtions

in

this

task.

W

e

sho

w

that

in

order

to

nd

TSE -optimal

disretization

one

only

needs

to

examine

a

small

subset

of

all

ut

p

oin

ts,

alled

alter-

nation

p

oints.

On

the

other

hand,

w

e

pro

v

e

that

failing

to

he

k

an

alternation

p

oin

t

ma

y

lead

to

a

sub

optimal

disretization.

Alterna-

tion

p

oin

ts

an

b

e

iden

tied

eien

tly

one

the

data

has

b

een

ordered.

Our

empirial

ev

aluation

demonstrates

that

the

n

um

b

er

of

alternations

in

n

umerial

ranges

t

ypially

is

m

u

h

lo

w

er

than

the

total

n

um

b

er

of

ut

p

oin

ts.

In

our

exp

erimen

ts

the

disretization

algorithm

running

on

top

of

al-

ternation

p

oin

ts

w

as

signian

tly

faster

than

the

same

algorithm

on

all

ut

p

oin

ts.

Th

us,

the

sublinear

n

um

b

er

of

ev

aluated

threshold

andidates

further

redues

the

pratial

time

requiremen

t

of

TSE

optimization.

1.

In

tro

dution

Consider

ha

ving

a

set

of

lab

eled

data

p

oin

ts

(with

du-

pliate

v

alues

allo

w

ed)

from

some

range

in

the

real

axis.

In

order

to

distill

some

general

information

on

the

relation

b

et

w

een

the

data

v

alues

and

their

lab

els,

the

n

umerial

range

is

disretized

in

to

lab

eled

in

ter-

v

als.

An

upp

er

b

ound

on

the

n

um

b

er

of

allo

w

ed

in

ter-

v

als

is

usually

giv

en

as

an

external

parameter.

Morgan

Kaufmann

2002;

C.

Samm

ut

and

A.

Homann

(eds.):

Pr

o

.

19th

International

Confer

en

e

on

Mahine

L

e

arning,

ICML

2002

(Sydney

,

Australia)

pp.

131138.

As

go

o

dness

riterion

an

ev

aluation

funtion

to

disriminate

b

et

w

een

the

andidate

disretizations

w

e

use

the

T

r

aining

Set

Err

or

(subsequen

tly

TSE

for

short):

ho

ose

the

disretization

in

whi

h

the

in

ter-

v

al

lab

eling

diers

least

often

from

that

of

the

data

p

oin

ts.

In

this

pap

er

w

e

examine

eien

t

w

a

ys

to

ob-

tain

disretizations

that

are

optimal

with

resp

et

to

TSE.

TSE

is

arguably

the

most

ommonly

used

attribute

ev

aluation

funtion

in

ma

hine

learning

algorithms

(Auer,

Holte

&

Maass,

1995;

Bro

dley

,

1995;

Lubin-

sky

,

1995;

Kearns

et

al.,

1997).

Using

TSE

in

gro

w-

ing

the

h

yp

othesis

is

prone

to

o

v

ert

it

to

the

train-

ing

examples.

On

the

other

hand,

the

priniple

of

empiri

al

risk

minimization

is

based

on

the

ho

osing

the

h

yp

othesis

that

minimizes

training

error

(V

apnik

&

Cherv

onenkis,

1971).

Under

ertain

onditions

one

an

guaran

tee

that

the

minim

um-error

h

yp

othesis

will

also

ha

v

e

a

small

generalization

error.

An

union

of

in

terv

als

of

a

n

umerial

range

has

some-

times

b

een

used

as

h

yp

otheses

in

theoretial

frame-

w

orks

(Maass,

1994;

Kearns

et

al.,

1997;

Lozano,

2000),

but

more

often

the

b

ounded-arit

y

disretization

problem

is

enoun

tered

as

a

subproblem

in

learning

more

omplex

h

yp

otheses.

F

or

example,

in

learning

deision

trees

one

needs

to

partition

the

domains

of

n

umerial

attributes

in

to

a

mo

dest

n

um

b

er

of

in

ter-

v

als

(Breiman

et

al.,

1984;

Quinlan,

1986).

In

las-

sier

indution

disretization

of

n

umerial

domains

is

a

p

oten

tial

time-onsumption

b

ottlene

k,

sine

in

the

general

ase

the

n

um

b

er

of

p

ossible

disretizations

is

exp

onen

tial

in

the

n

um

b

er

of

in

terv

al

threshold

andi-

dates

within

the

domain.

With

resp

et

to

man

y

ommonly

used

ev

aluation

fun-

tions

n

umerial

ranges

an

b

e

optimally

disretized

in

quadrati

time

in

the

n

um

b

er

of

in

terv

al

threshold

an-

didates

using

a

dynami

programming

algorithm

(F

ul-

ton,

Kasif

&

Salzb

erg,

1995;

Zighed,

Rak

otomalala

&

F

es

het,

1997;

Elomaa

&

Rousu,

1999),

but

only

TSE

is

kno

wn

to

optimize

in

linear

time

(F

ulton

et

al.,

1995;

Auer,

1997;

Birk

endorf,

1997).

F

or

quadrati-time

al-

gorithms

pruning

ut

p

oin

t

andidates

in

prepro

ess-

ing

has

turned

out

to

b

e

a

useful

w

a

y

to

sp

eed

up

the

sear

h

(F

a

yy

ad

&

Irani,

1992;

Elomaa

&

Rousu,

1999;

2000).

In

this

pap

er

w

e

explore

the

p

ossibilities

and

limitations

of

further

sp

eed-up

of

minim

um

TSE

disretization

via

prepro

essing

the

data.

background image

The

remainder

of

this

pap

er

is

organized

as

follo

ws.

In

the

next

setion

w

e

in

tro

due

the

minim

um

training

error

disretization

problem

more

exatly

and

review

sub

quadrati-time

sear

h

algorithms

that

ha

v

e

b

een

put

forw

ard

for

nding

TSE-optimal

disretizations.

Pruning

threshold

andidates

without

losing

the

p

ossi-

bilit

y

to

reo

v

er

an

optimal

disretization

is

onsidered

in

Setion

3.

In

Setion

4

w

e

pro

v

e

that

when

using

TSE

it

is

p

ossible

to

prune

more

thresholds

than

when

using

other

ommon

ev

aluation

funtions.

In

Setion

5

w

e

rep

ort

empirial

exp

erimen

ts

on

the

pruned

set

of

threshold

andidates.

The

nal

setion

presen

ts

the

onluding

remarks

of

this

pap

er.

2.

Minim

um

TSE

Disretization

Let

us

rst

dene

the

disretization

problem

more

formally

.

A

sample

S

=

f

(x

1

;

y

1

);

:

:

:

;

(x

n

;

y

n

)

g

onsists

of

n

lab

eled

real

v

alues

(giv

en

in

the

in-

reasing

order

of

x).

F

or

ea

h

(x;

y

),

x

2

R

and

y

is

the

lab

el

of

x

from

the

set

of

lasses

C

=

f

1

;

:

:

:

;

m

g

.

A

k

-in

terv

al

disretization

of

the

sam-

ple

is

generated

b

y

pi

king

k

1

interval

thr

esh-

olds

or

ut

p

oints

T

1

<

T

2

<

<

T

k

1

;

T

j

2

(x

min

;

x

max

),

where

x

min

=

min

f

x

j

(x;

y

)

2

S

g

and

x

max

=

max

f

x

j

(x;

y

)

2

S

g.

Moreo

v

er,

empt

y

in

ter-

v

als

are

not

allo

w

ed.

The

set

of

k

1

thresholds

denes

a

partition

U

k

i=1

S

i

of

the

set

S

as

follo

ws:

S

i

=

8

<

:

f

(x;

y

)

2

S

j

x

T

1

g

if

i

=

1,

f

(x;

y

)

2

S

j

T

i

1

<

x

T

i

g

if

1

<

i

<

k

,

f

(x;

y

)

2

S

j

x

>

T

k

1

g

if

i

=

k

.

In

this

pap

er

w

e

use

T

raining

Set

Error

to

deter-

mine

the

go

o

dness

of

a

partition.

Let

Æ

j

(S

)

=

j

f

(x;

y

)

2

S

j

y

6=

j

g

j

denote

the

err

or,

or

the

num-

b

er

of

disagr

e

ements,

with

resp

et

to

lass

j

in

the

set

S

.

That

is,

if

all

instanes

in

S

w

ere

predited

to

b

elong

to

lass

j

,

w

e

w

ould

mak

e

Æ

j

(S

)

errors

on

S

.

F

urthermore,

let

Æ

(S

)

=

min

j

2C

Æ

j

(S

)

denote

the

minim

um

error

on

S

.

A

lass

j

2

C

is

alled

a

major-

ity

lass

of

S

,

if

prediting

lass

j

leads

to

minim

um

n

um

b

er

of

errors

on

S

,

that

is,

Æ

j

(S

)

=

Æ

(S

).

Note

that

more

than

one

lass

an

qualify

as

a

ma

jorit

y

lass.

The

ma

jorit

y

lasses

of

a

set

S

are

denoted

b

y

ma

j

C

(S

)

=

f

j

2

C

j

Æ

j

(S

)

=

Æ

(S

)

g.

Giv

en

a

k

-in

terv

al

partition

U

k

i=1

S

i

of

S

,

where

ea

h

in

terv

al

is

lab

eled

b

y

(one

of

)

its

ma

jorit

y

lass,

its

T

raining

Set

Error

is

giv

en

b

y

TSE

k

℄

i=1

S

i

!

=

k

X

i=1

Æ

(S

i

):

In

tuitiv

ely

,

TSE

is

the

n

um

b

er

of

training

instanes

falsely

lassied

in

the

partition

when

ea

h

in

terv

al

is

lab

eled

b

y

one

of

its

ma

jorit

y

lasses.

The

glob

al

minim

um

error

disretization

problem

is

to

nd

a

partition

U

k

i=1

S

i

of

S

that

has

the

minim

um

TSE

v

alue

o

v

er

all

partitions

of

S

.

The

maxim

um

n

um

b

er

of

in

terv

als

k

ma

y

b

e

giv

en

as

a

parameter.

Then

the

problem

is

to

nd

the

TSE -optimal

partition

among

those

that

ha

v

e

at

most

k

in

terv

als.

This

is

alled

b

ounde

d-arity

disretization.

In

the

follo

wing

w

e

review

eien

t

(sub

quadrati)

al-

gorithms

that

ha

v

e

b

een

prop

osed

for

minim

um

train-

ing

error

disretization

of

n

umerial

ranges.

All

algo-

rithms

require

an

O

(n

log

n)

time

sorting

step

prior

to

disretization.

Note

that

it

is

not

p

ossible

to

disretize

an

unor

der

e

d

sample

of

(x;

y

)

2

R

C

pairs

faster

than

sorting;

giv

en

su

h

a

h

yp

othetial

disretization

algorithm

one

ould

sort

arbitrary

real

n

um

b

ers

b

y

assigning

ea

h

data

p

oin

t

a

unique

lass

lab

el

and

ask-

ing

the

disretization

algorithm

to

reate

TSE -optimal

disretization.

The

output

w

ould

ha

v

e

to

dene

the

se-

quene

of

bins

and

the

sorted

data

ould

b

e

reo

v

ered

in

linear

time

from

it.

Maass

(1994)

w

as

the

rst

to

devise

a

sub

quadrati

algorithm

for

minimizing

TSE

in

b

ounded

arit

y

dis-

retization.

In

his

algorithm

a

balaned

binary

tree,

to

lea

v

es

of

whi

h

the

data

p

oin

ts

are

assigned,

is

on-

struted

in

O

(n

log

n

+

nk

2

)

time.

Optimal

in

terv

al

assignmen

t

and

lab

eling

are

then

found

in

O

(k

2

)

time

using

this

data

struture.

Kearns

et

al.

(1997)

pre-

sen

ted

an

algorithm

requiring

O

(n

log

n)

time.

Their

algorithm

is

based

on

the

observ

ation

that

giv

en

an

optimal

partition

of

arit

y

k

,

in

the

t

w

o-lass

ase,

the

optimal

partition

of

arit

y

k

2

either

has

the

lab

el

of

b

oth

its

rst

and

last

in

terv

al

ipp

ed,

or

one

of

the

other

(in

ternal)

in

terv

als

j

has

its

lab

el

ipp

ed,

in

whi

h

ase

one

in

terv

al

is

omp

osed

of

out

of

the

three

in

terv

als

j

1;

j;

j

+

1

in

the

original

k

-in

terv

al

partition.

Using

dynami

programming

to

omp

ose

the

partition

from

the

b

est

lo

w

er-arit

y

partitions

of

subsets

yields

a

linear-time

algorithm

in

the

t

w

o-lass

ase.

Su

h

an

algorithm

w

as

rst

presen

ted

b

y

F

ulton,

Kasif,

and

Salzb

erg

(1995).

In

pro

essing

the

sorted

data

from

left

to

righ

t,

w

e

only

ha

v

e

to

deide,

at

suitable

thresh-

old

p

oin

ts,

whether

the

unommitted

data

p

oin

ts

are

om

bined

to

the

last

in

terv

al

of

the

already

onstruted

disretization

or

do

they

start

a

new

in

terv

al

in

the

dis-

retization.

The

deision,

of

ourse,

is

based

on

whi

h

of

the

alternativ

es

yields

a

smaller

training

error.

Us-

ing

this

approa

h

and

k

eeping

tra

k

of

the

optimal

dis-

retization

of

the

pro

essed

data

for

all

arities

1;

:

:

:

;

k

lets

us,

in

the

end,

ho

ose

the

optimal

disretization

background image

X

X

X

X

Y

Y

Y

Y

Y

Y

X

X

X

Y

Y

Y

Y

X

X

X

X

Y

X

X

X

Y

Y

1

2

2

3

3

3

4

4

4

4

4

4

5

5

5

5

5

6

7

7

7

8

8

9

9

9

9

1/

2/

1/2

2/4

1/4

1/

3/

1/1

2/2

1

2

3

4

5

6

7

8

9

Figure

1.

A

sequene

of

data

p

oin

ts

sorted

aording

to

their

n

umerial

v

alues

(ab

o

v

e).

The

lass

lab

els

(X

and

Y

)

of

the

data

p

oin

ts

are

also

sho

wn.

The

sequene

of

data

bins

with

the

resp

etiv

e

lass

distributions

(b

elo

w).

In

terv

al

thresholds

an

b

e

set

at

the

bin

b

orders.

of

the

n

umerial

range

with

at

most

k

in

terv

als.

The

time

omplexit

y

of

this

approa

h

learly

is

O

(k

n).

Ho

w

to

generalize

the

dynami

programming

ompu-

tation

to

the

ase

when

there

are

m

2

lasses

w

as

sho

wn

b

y

Auer

(1997).

The

suien

t

mo

diation

is

to

main

tain

separately

for

ea

h

lass

j

(1

j

m)

at

ea

h

threshold

andidate

the

optimal

disretizations

of

arit

y

1;

:

:

:

;

k

su

h

that

the

last

in

terv

al

is

lab

eled

with

j

.

T

aking

m

ultiple

lasses

in

to

aoun

t

raises

the

time

requiremen

t

of

the

algorithm

to

O

(k

mn).

Auer

(1997)

also

sho

w

ed

ho

w

the

spae

omplexit

y

of

the

al-

gorithm

an

b

e

k

ept

lo

w.

Birk

endorf

(1997)

has

giv

en

a

similar

algorithm

as

a

sp

eial

ase

of

a

more

general

optimization

problem.

Belo

w,

w

e

exp

erimen

t

with

this

Auer-Birk

endorf

algorithm.

The

linear-time

sear

h

algorithms

review

ed

ab

o

v

e

are

asymptotially

optimal

in

the

n

um

b

er

of

examples.

One

annot

guaran

tee

nding

the

optimal

disretiza-

tion

without

examining

all

of

the

ut

p

oin

t

andidates,

and

they

annot

b

e

found

without

going

through

all

of

the

data.

F

or

the

dynami

programming

s

heme,

a

w

orst-ase

lo

w

er

b

ound

of

(k

mn)

an

b

e

deriv

ed

from

the

general

literature

onerning

dynami

pro-

gramming

optimization

(Elomaa

&

Rousu,

2001).

Ho

w

ev

er,

in

pratie

there

are

man

y

p

oten

tial

thresh-

olds

that

an

b

e

o

v

erlo

ok

ed

without

risking

to

lose

the

optimal

disretization.

Hene,

the

sublinear

n

um

b

er

of

threshold

andidates

giv

es

a

p

ossibilit

y

to

redue

the

time

requiremen

t

of

the

sear

h.

In

the

follo

wing

w

e

review

approa

hes

to

optima-preserving

pruning

of

ut

p

oin

t

andidates.

3.

Pruning

Threshold

Candidates

in

Prepro

essing

As

noted

b

efore,

the

pro

essing

of

a

n

umerial

v

alue

range

starts

with

sorting

of

the

data

p

oin

ts.

If

one

ould

mak

e

its

o

wn

partition

in

terv

al

out

of

ea

h

data

p

oin

t

in

the

sorted

sequene,

this

disretization

w

ould

ha

v

e

zero

training

error.

Ho

w

ev

er,

one

annot

nor

w

an

ts

to

disern

b

et

w

een

all

data

p

oin

ts.

Only

those

that

dier

in

their

v

alue

an

b

e

separated

from

ea

h

other.

Consider,

for

example,

the

data

set

sho

wn

in

Figure

1.

There

are

27

in

teger-v

alued

data

p

oin

ts.

They

are

instanes

of

t

w

o

lasses;

X

and

Y

.

In

ter-

v

al

thresholds

an

only

b

e

set

in

b

et

w

een

those

p

oin

ts

where

the

data

p

oin

t

v

alue

hanges.

Therefore,

w

e

an

prepro

ess

the

data

in

to

bins.

There

is

one

bin

for

ea

h

existing

data

p

oin

t

v

alue.

Within

ea

h

bin

w

e

reord

the

lass

distribution

of

the

instanes

that

b

elong

to

it.

The

lass

distribution

information

sues

to

ev

aluate

the

go

o

dness

of

the

partition;

the

atual

data

set

do

es

not

need

to

b

e

main

tained.

The

sequene

of

bins

has

the

minimal

attainable

mis-

lassiation

rate.

Ho

w

ev

er,

the

same

rate

an

usu-

ally

b

e

obtained

with

a

smaller

n

um

b

er

of

in

terv

als.

The

analysis

of

the

en

trop

y

funtion

b

y

F

a

yy

ad

and

Irani

(1992)

has

sho

wn

that

ut

p

oin

ts

em

b

edded

in

to

lass-uniform

in

terv

als

need

not

b

e

tak

en

in

to

aoun

t,

only

the

end

p

oin

ts

of

su

h

in

terv

als

the

b

oundary

p

oints

need

to

b

e

onsidered

to

nd

the

optimal

dis-

retization.

Elomaa

and

Rousu

(1999)

sho

w

ed

that

the

same

is

true

for

a

large

lass

of

ommonly

used

ev

alua-

tion

funtions

inluding

also

TSE.

The

analysis

an

b

e

used

in

prepro

essing

in

a

straigh

tforw

ard

manner:

w

e

merge

together

adjaen

t

lass

uniform

bins

with

the

same

lass

lab

el

to

obtain

example

blo

ks

(see

Figure

2).

The

b

oundary

p

oin

ts

of

the

v

alue

range

are

the

b

orders

of

its

blo

ks.

Blo

k

onstrution

still

lea

v

es

all

bins

with

a

mixed

lass

distribution

as

their

o

wn

blo

ks.

Subsequen

tly

,

a

more

general

prop

ert

y

w

as

also

pro

v

ed

for

TSE

and

some

other

ev

aluation

funtions

(Elo-

maa

&

Rousu,

2000):

se

gment

b

or

ders

p

oin

ts

that

lie

in

b

et

w

een

t

w

o

adjaen

t

bins

with

dieren

t

relativ

e

lass

distributions

are

the

only

p

oin

ts

that

need

to

b

e

tak

en

in

to

aoun

t.

It

is

easy

to

see

that

segmen

t

b

orders

are

a

subset

of

b

oundary

p

oin

ts.

Example

se

g-

ments

are

easily

obtained

from

bins

b

y

omparing

the

relativ

e

lass

distributions

of

adjaen

t

bins

(see

Figure

2).

background image

3/

1/2

2/4

1/4

4/

1/1

2/2

2

3

4

5

7

8

9

3/

3/6

1/4

4/

3/3

2

4

5

7

9

Figure

2.

The

blo

ks

(ab

o

v

e)

and

segmen

ts

(b

elo

w)

in

the

sample

of

Figure

1.

Blo

k

b

orders

are

the

b

oundary

p

oin

ts

of

the

n

umerial

range

and

segmen

t

b

orders

are

a

subset

of

them.

4.

F

urther

Prepruning

Opp

ortunities

for

T

raining

Set

Error

An

earlier

pro

of

sho

ws

that

only

segmen

t

b

orders

need

to

b

e

onsidered

when

trying

to

nd

TSE -optimal

par-

titions

(Elomaa

&

Rousu,

2000).

In

this

setion

w

e

pro

v

e

that

ev

en

some

segmen

t

b

orders

an

safely

b

e

disregarded.

Let

a

majority

alternation

p

oint

1

b

e

the

b

order

in

b

et

w

een

t

w

o

onseutiv

e

bins

(or,

as

w

ell,

blo

ks

or

segmen

ts)

that

ha

v

e

dieren

t

ma

jorit

y

lasses.

More

exatly

,

let

S

1

=

f

(x;

y

)

2

S

j

x

=

v

1

g

and

S

2

=

f

(x;

y

)

2

S

j

x

=

v

2

g

,

v

1

;

v

2

2

R,

b

e

t

w

o

adjaen

t

bins.

There

is

a

ma

jorit

y

alternation

p

oin

t

in

b

et

w

een

S

1

and

S

2

,

if

and

only

if

ma

j

C

(S

1

)

\

ma

j

C

(S

2

)

=

;;

that

is,

the

sets

of

ma

jorit

y

lasses

in

S

1

and

S

2

are

disjoin

t.

F

or

example

in

the

dataset

of

Figure

2

there

are

only

t

w

o

ma

jorit

y

alternations.

Figure

3

sho

ws

the

data

organized

in

to

sequenes

b

et

w

een

ma

jorit

y

alter-

nations,

whi

h

help

to

nd

TSE-optimal

partitions.

Theorem

1

The

p

artition

dene

d

by

al

l

majority

al-

ternation

p

oints

in

the

sample

S

has

the

minimum

TSE

value

and

has

minimal

numb

er

of

intervals.

Pro

of

Let

us

rst

sho

w

that

all

ma

jorit

y

alternation

p

oin

ts

m

ust

b

e

ut

p

oin

ts

in

the

minimal

TSE -optimal

partition.

Assume

that

=

U

k

i=1

S

i

is

a

partition

of

the

sample

S

su

h

that

it

do

es

not

on

tain

all

ma

jor-

it

y

alternation

p

oin

ts.

Then

in

there

m

ust

exist

an

in

terv

al

S

i

=

f

(x;

y

)

2

S

j

T

i

1

<

x

T

i

g

that

on-

tains

the

ma

jorit

y

alternation

p

oin

ts

a

1

;

:

:

:

;

a

r

;

r

1,

whi

h

satisfy

T

i

1

<

a

1

<

<

a

r

<

T

i

.

Let

U

r

+1

h=1

Q

h

denote

the

partition

of

S

i

indued

b

y

the

r

alternation

p

oin

ts.

F

rom

the

denition

of

a

ma

jorit

y

alternation

p

oin

t

it

1

Note

that

the

term

alternation

has

a

dieren

t

meaning

here

from,

e.g.,

that

in

(F

ulton

et

al.,

1995;

Kearns

et

al.,

1997).

3/

4/10

7/3

2

5

9

Figure

3.

The

ma

jorit

y

alternations

in

the

sample

of

Figure

1.

follo

ws

that

for

an

y

lass

j

there

is

a

subset

Q

h

j

within

U

r

+1

h=1

Q

h

for

whi

h

Æ

j

(Q

h

j

)

>

Æ

(Q

h

j

),

that

is,

j

is

not

a

ma

jorit

y

lass

in

Q

h

j

.

Th

us,

Æ

j

(S

i

)

>

P

r

+1

i=1

Æ

(Q

i

),

and

sine

j

is

arbitrary

also

Æ

(S

i

)

>

P

r

+1

h=1

Æ

(Q

h

).

Therefore,

partitioning

S

i

in

to

r

+

1

subin

terv

als

at

these

ma

jorit

y

alternations

redues

the

n

um

b

er

of

mis-

lassiations

in

S

i

and,

th

us,

giv

es

a

b

etter

partition

than

.

Hene,

annot

b

e

TSE-optimal.

This

sho

ws

that

an

y

TSE -optimal

partition

has

a

ut

p

oin

t

in

ea

h

ma

jorit

y

alternation

p

oin

t.

Let

us

no

w

assume

that

a

TSE -optimal

partition

=

U

k

i=1

S

i

has

also

other

ut

p

oin

ts

than

ma

jor-

it

y

alternations.

Let

the

ut

p

oin

ts

p

oin

ts

in

the

partition

b

e

T

1

;

:

:

:

;

T

k

1

and

let

us,

further,

dene

T

0

=

1

and

T

k

=

1.

Let

T

i

b

e

one

ut

p

oin

t

in

that

is

not

a

ma

jorit

y

alternation

p

oin

t.

No

w

T

i

separates

t

w

o

in

terv

als

in

the

partition,

S

i

=

f

(x;

y

)

2

S

j

T

i

1

<

x

T

i

g

and

S

i+1

=

f

(x;

y

)

2

S

j

T

i

<

x

T

i+1

g.

Sine

T

i

is

not

a

ma

jorit

y

alternation,

there

m

ust

exist

a

lass

j

for

whi

h

Æ

j

(S

i

)

=

Æ

(S

i

)

and

Æ

j

(S

i+1

)

=

Æ

(S

i+1

).

Th

us,

lab

eling

b

oth

in

terv

als

S

i

and

S

i+1

with

the

same

lab

el

j

results

in

a

minim

um

error

partition.

Therefore,

remo

ving

T

i

will

not

aet

the

n

um

b

er

of

mislassiatio

ns.

The

same

holds

for

all

ut

p

oin

ts

that

are

not

ma

jorit

y

alternations.

Hene,

w

e

ha

v

e

sho

wn

that

for

globally

minimal

TSE

v

alue

one

needs

to

ut

on

ea

h

ma

jorit

y

alternation

p

oin

t

and

to

k

eep

the

n

um

b

er

of

in

terv

als

at

minim

um,

no

other

thresholds

exept

ma

jorit

y

alternation

p

oin

ts

an

b

e

used.

2

The

ab

o

v

e

theorem

sho

ws

that,

when

the

n

um

b

er

of

in

terv

als

in

the

partition

is

not

b

ounded,

the

b

est

par-

tition

is

the

one

that

uts

on

ea

h

ma

jorit

y

alterna-

tion

p

oin

t.

Ho

w

ev

er,

when

an

optimal

partition

of

b

ounded-arit

y

is

required,

examining

ma

jorit

y

alter-

nation

p

oin

ts

alone

do

es

not

sue;

one

has

to

on-

sider

the

frequeny

of

the

minorit

y

lasses

as

w

ell.

In

the

follo

wing,

w

e

generalize

the

onept

of

a

ma

jorit

y

alternation

p

oin

t

so

that

handling

the

b

ounded-arit

y

ase

b

eomes

p

ossible.

Let

P

and

Q

b

e

t

w

o

adjaen

t

bins

with

relativ

e

lass

frequeny

distributions

D

P

=

f

p

1

;

:

:

:

;

p

m

g

and

D

Q

=

f

q

1

;

:

:

:

;

q

m

g,

resp

etiv

ely

.

There

is

an

alternation

background image

p

oint

in

b

et

w

een

P

and

Q

if

there

is

no

index

set

I

=

f

i

1

;

:

:

:

;

i

m

g

su

h

that

p

i

1

p

i

2

p

i

m

and

q

i

1

q

i

2

q

i

m

.

In

other

w

ords,

an

alter-

nation

o

urs

in

b

et

w

een

t

w

o

bins,

if

ordering

of

the

lasses

in

desending

order

of

frequeny

is

dieren

t

in

them;

that

is,

there

exists

a

pair

of

lasses

h

and

j

su

h

that

p

h

>

p

j

and

q

h

<

q

j

.

Let

us

all

the

n

umer-

ial

range

in

terv

als

indued

b

y

the

set

of

alternation

p

oin

ts

as

alternating

se

gments.

Clearly

,

a

ma

jorit

y

al-

ternation

is

a

sp

eial

ase

of

an

alternation.

In

the

t

w

o-lass

setting

the

t

w

o

onepts

oinide.

Th

us

the

n

um

b

er

of

alternations

is

alw

a

ys

at

least

as

high

as

that

of

ma

jorit

y

alternations.

Next

w

e

sho

w

that

only

alternations

need

to

b

e

onsid-

ered

when

sear

hing

for

the

TSE-optimizing

partition.

Theorem

2

F

or

any

numeri

al

r

ange

and

e

ah

k

2

ther

e

is

an

TSE-optimal

p

artition

with

at

most

k

intervals

that

is

dene

d

on

alternation

p

oints.

Pro

of

Let

L,

P

,

Q,

and

R

form

a

sequene

of

ad-

jaen

t

subsets

along

the

n

umerial

range.

Let

the

relativ

e

lass

distributions

of

sets

L

and

R ,

D

L

and

D

R

,

b

e

arbitrary

and

let

D

P

=

f

p

1

;

:

:

:

;

p

m

g

and

D

Q

=

f

q

1

;

:

:

:

;

q

m

g

b

e

su

h

that

there

is

no

alter-

nation

p

oin

t

in

b

et

w

een

the

sets

P

and

Q.

Let

us

no

w

onsider

splitting

the

set

in

to

t

w

o

in

terv

als

and

lab

eling

the

left-hand

side

with

lass

h

and

the

righ

t-hand

side

with

lass

j

.

In

su

h

a

situation,

let

Æ

h;j

(S

℄

T

)

denote

the

error

of

a

binary

partition

with

S

as

the

left-hand

side

and

T

as

the

righ

t-hand

side.

The

errors

of

the

partitions

are

Æ

h;j

(L

℄

(P

[

Q

[

R ))

=

Æ

h

(L)

+

X

S

=P

;Q;R

Æ

j

(S

)

Æ

h;j

((L

[

P

)

℄

(Q

[

R ))

=

X

S

=L;P

Æ

h

(S

)

+

X

S

=Q;R

Æ

j

(S

)

Æ

h;j

((L

[

P

[

Q)

℄

R )

X

S

=L;P

;Q

Æ

h

(S

)

+

Æ

j

(R ):

By

assumption,

the

p

oin

t

in

b

et

w

een

P

and

Q

is

not

an

alternation.

Therefore,

from

the

denition

of

an

alternation

it

follo

ws

that

for

an

y

pair

of

lasses

h

and

j

,

1

h;

j

m,

either

1)

p

h

p

j

and

q

h

q

j

or

2)

p

h

p

j

and

q

h

q

j

.

In

the

rst

ase

Æ

j

(P

)

Æ

h

(P

)

and,

onsequen

tly

,

Æ

h;j

(L

℄

(P

[

Q

[

R ))

Æ

h;j

((L

[

P

)

℄

(Q

[

R )):

In

the

seond

ase

Æ

j

(Q)

Æ

h

(Q),

and

Æ

h;j

((L

[

P

)

℄

(Q

[

R ))

Æ

h;j

((L

[

P

[

Q)

℄

R ):

Hene,

the

partition

(L

[

P

)

℄

(Q

[

R )

is

alw

a

ys

at

most

as

go

o

d

as

the

t

w

o

other

partitions.

Sine

the

lasses

h

and

j

and

the

sets

L

and

R

are

arbitrary

,

w

e

ha

v

e

sho

wn

that

in

an

y

partition

a

ut

p

oin

t

that

is

not

an

alternation

p

oin

t,

an

b

e

replaed

with

another

ut

p

oin

t

without

inreasing

the

error.

Th

us,

one

an

slide

ut

p

oin

ts

to

the

left

or

to

the

righ

t

un

til

an

alternation

p

oin

t

or

the

end

of

in

terv

al

is

enoun

tered.

Hene,

a

TSE-optimal

binary

partition

of

an

in

terv

al

is

dened

on

alternation

p

oin

ts.

Sine

in

the

TSE -optimal

k

-in

terv

al

disretization

the

em

b

edded

binary

partitions

are

TSE -optimal,

the

laim

follo

ws.

2

The

onsequene

of

the

ab

o

v

e

theorem

is

that

only

those

ut

p

oin

ts

that

are

alternations

need

to

b

e

on-

sidered

when

lo

oking

for

the

b

ounded

arit

y

optimal

TSE

partition.

Hene

the

sample

an

b

e

pro

essed

in

to

alternating

segmen

ts.

W

e

sho

w

next

that

no

alternations

an

b

e

pro

v

en

sub-

optimal

without

onsidering

the

on

text

in

whi

h

the

ut

p

oin

t

is;

that

is,

whi

h

other

ut

p

oin

ts

are

presen

t

in

the

k

-in

terv

al

disretization

of

the

range.

Theorem

3

F

or

e

ah

alternation

p

oint

ther

e

is

a

on-

text

in

whih

it

is

the

TSE -optimal

ut

p

oint.

Pro

of

Let

L,

P

,

Q,

and

R

b

e

adjaen

t

subsets

along

a

n

umerial

range.

Let

there

b

e

a

pair

of

lasses

h

and

j

for

whi

h

it

holds

that

p

h

>

p

j

and

q

h

<

q

j

.

That

is,

there

is

an

alternation

p

oin

t

in

b

et

w

een

P

and

Q.

No

w,

let

us

ho

ose

the

lass

distribution

for

L

so

that

h

is

the

ma

jorit

y

lass

of

L

[

P

[

Q.

Su

h

a

distribution

is

easily

generated

b

y

ho

osing

L

to

b

e

large

enough

and

onsist

of

instanes

of

a

single

lass

h.

No

w,

Æ

j

(P

)

>

Æ

h

(P

)

and,

onsequen

tly

,

Æ

h;j

(L

℄

(P

[

Q

[

R ))

>

Æ

h;j

((L

[

P

)

℄

(Q

[

R )):

Similarly

,

the

lass

distribution

of

R

an

b

e

set

so

that

j

is

the

ma

jorit

y

lass

of

P

[

Q

[

R .

Sine

Æ

j

(Q)

<

Æ

h

(Q),

w

e

get

Æ

h;j

((L

[

P

)

℄

(Q

[

R ))

<

Æ

h;j

((L

[

P

[

Q)

℄

R ):

Th

us,

the

optimal

binary

split

with

the

left

side

lab

eled

with

lass

h

and

the

righ

t

side

lab

eled

with

lass

j

has

the

ut

p

oin

t

on

the

alternation

p

oin

t

in

b

et

w

een

P

and

Q.

Sine

h

and

j

are

ma

jorit

y

lasses

of

the

left

and

the

righ

t

sides

of

all

binary

splits

of

the

subsets

L,

P

,

Q,

and

R ,

the

split

has

the

minim

um

TSE.

Sine

h

and

j

are

arbitrary

,

the

laim

follo

ws.

2

The

ab

o

v

e

theorem

sho

ws

that

the

usefulness

of

an

al-

ternation

p

oin

t

annot

b

e

judged

b

y

examining

the

t

w

o

background image

adjaen

t

bins

alone.

This

sho

ws

that

while

a

fast

left-

to-righ

t

san

o

v

er

the

data

examining

t

w

o

bins

at

a

time

is

suien

t

to

nd

alternation

p

oin

ts,

diso

v-

ering

an

equally

simple

and

fast

algorithm

for

pruning

out

some

of

the

alternation

p

oin

ts

is

unlik

ely

.

5.

Empirial

Ev

aluation

In

this

setion

w

e

examine

alternation

p

oin

ts

with

real-

w

orld

data.

W

e

test

rst

for

29

w

ell-kno

wn

data

sets

from

the

UCI

data

rep

ository

(Blak

e

&

Merz,

1998)

what

are

the

relations

of

a

v

erage

n

um

b

ers

of

bin

b

or-

ders,

b

oundary

p

oin

ts,

segmen

t

b

orders,

alternation

p

oin

ts,

and

ma

jorit

y

alternations

p

er

n

umerial

at-

tribute.

Then

the

pratial

sp

eed-up

gained

b

y

exam-

ining

alternations

rather

than

bin

b

orders,

b

oundary

p

oin

ts,

or

segmen

t

b

orders

is

insp

eted.

5.1

On

Finding

Alternation

P

oin

ts

Eien

tly

Let

us

rst

mak

e

a

note

ab

out

the

prepro

essing

algo-

rithm

used

in

these

tests.

All

prepro

essing

algorithms

rst

extrated

bins

from

the

sorted

example

sequene.

Then,

b

oundary

p

oin

ts,

segmen

t

b

orders,

or

alterna-

tion

p

oin

ts

w

ere

extrated.

The

prepro

essing

algo-

rithms

for

b

oundary

p

oin

ts

and

segmen

t

b

orders

b

oth

run

in

time

O

(n

+

mV

),

where

V

is

the

n

um

b

er

of

bins.

The

trivial

algorithm

for

alternation

p

oin

ts

is

one

whi

h

he

ks

frequenies

of

up

to

(m

2

m)=2

pairs

of

lasses

in

t

w

o

adjaen

t

bins

un

til

a

disagreemen

t

in

the

lass

ordering

is

found.

The

pro

essing

of

the

en-

tire

range

tak

es

O

(n

+

m

2

V

)

time

using

this

approa

h.

A

faster

algorithm

for

nding

alternation

p

oin

ts

is

ob-

tained

b

y

sorting

(in

a

left-to-righ

t

pass)

the

lass

dis-

tribution

histograms

of

the

bins

using

bu

k

et

sort

in

amortized

O

(n)

time

and

he

king

a

bu

k

et

and

the

lass

frequeny

distribution

of

the

adjaen

t

in

terv

al

to

the

righ

t

for

a

disagreemen

t

in

lass

ordering.

The

bu

k

et

is

merged

to

the

in

terv

al

if

no

alternation

p

oin

t

is

found.

This

O

(n

+

mV

)

time

alternation

p

oin

t

ex-

tration

meets

the

asymptoti

b

ound

for

extrating

bins

from

the

data.

Ho

w

ev

er,

the

o

eien

ts

in

this

approa

h

are

quite

large.

The

UCI

data

sets

mostly

ha

v

e

few

lasses,

whi

h

prohibits

time

sa

vings

for

the

linear-time

approa

h.

Therefore,

w

e

ha

v

e

used

the

trivial

alternation

p

oin

t

detetion

in

our

exp

erimen

ts.

5.2

Empirial

Results

Figure

4

depits

the

results

of

the

rst

exp

erimen

t.

Ov

er

all

29

test

domains

the

a

v

erage

redution

in

ut

p

oin

t

andidates

when

mo

ving

from

segmen

t

b

orders

to

alternation

p

oin

ts

is

appro

ximately

40%

and

lose

to

60%

when

ompared

to

all

ut

p

oin

ts.

The

n

um-

#

thresholds

25%

50%

75%

29

Abalone

863.7

2

A

dult

3,673.7

5

Annealing

27.5

2

Australian

188.2

6

Auto

insurane

61.4

2

Breast

Wisonsin

9.9

2

Census

inome

12,864.3

2

Coli

86.3

2

Diab

etes

156.8

2

Euth

yroid

123.8

2

German

145.9

6

Glass

115.3

5

Heart

C

30.5

2

Heart

H

25.1

2

Hepatitis

54.3

2

Hyp

oth

yroid

165.7

3

Iris

30.8

26

Letter

reognition

16.0

2

Liv

er

54.7

5

P

age

blo

ks

909.2

6

Satellite

76.3

7

Segmen

tation

137.7

7

Sh

uttle

123.2

2

Sonar

187.6

4

V

ehile

79.4

11

V

o

w

el

623.5

3

W

a

v

eform

714.1

3

Wine

98.2

10

Y

east

51.5

Figure

4.

The

a

v

erage

n

um

b

er

of

bin

b

orders

(the

gures

on

the

righ

t)

and

the

relativ

e

n

um

b

ers

of

b

oundary

p

oin

ts

(bla

k

bars

b

elo

w),

segmen

t

b

orders

(white

bars),

alter-

nation

p

oin

ts

(gra

y

bars),

and

ma

jorit

y

alternations

(dark

gra

y

bars

on

top)

p

er

n

umerial

attribute

of

the

domain.

The

white

gure

on

top

of

the

dark

gra

y

bar

is

the

n

um

b

er

of

lasses.

b

er

of

segmen

t

b

orders

is

only

sligh

tly

smaller

than

that

of

b

oundary

p

oin

ts.

The

n

um

b

er

of

alternations

and

ma

jorit

y

alternations

is

the

same

on

t

w

o-lass

domains

(e.g.,

A

dult,

Australian,

Breast

Wisonsin,

et.),

whi

h

is

lear

from

their

denition.

F

or

some

t

w

o-lass

domains

(e.g.,

Breast

Wisonsin,

Euth

yroid,

Heart

H,

and

Hyp

oth

yroid)

this

n

um

b

er

is

signian

tly

(a.

75%)

lo

w

er

than

that

of

segmen

t

b

orders.

On

other

m

ultilass

domains

(e.g.,

Heart

C,

Letter

reog-

background image

0 %

20 %

40 %

60 %

80 %

100 %

120 %

0

5

10

15

20

25

30

number of classes (m)

#

a

lt

e

rn

a

ti

n

g

s

e

g

m

e

n

ts

v

s

.

#

n

u

m

b

e

r

o

f

b

in

s

(

A

/V

)

Figure

5.

The

eets

of

the

n

um

b

er

of

lasses

on

the

e-

ieny

gain

obtained

b

y

using

alternations

rather

than

bin

b

orders.

nition,

Satellite,

and

Y

east)

there

is

a

signian

t

dif-

ferene

in

the

n

um

b

ers

of

alternations

and

ma

jorit

y

alternations.

A

striking

result

is

that

in

man

y

of

those

domains

that

ha

v

e

man

y

lasses

(e.g.,

Abalone,

Auto

insurane,

Let-

ter

reognition,

V

o

w

el,

and

Y

east)

the

n

um

b

er

of

alter-

nations

is

not

m

u

h

smaller

than

that

of

segmen

t

b

or-

ders.

The

n

um

b

er

of

ma

jorit

y

alternations,

though,

is

usually

still

somewhat

smaller

ev

en

for

these

do-

mains.

An

explanation

for

this

is

that

the

more

there

are

lasses

the

less

ommon

it

is

that

t

w

o

adjaen

t

seg-

men

ts

ha

v

e

the

same

frequeny

order

for

all

of

them.

The

ma

jorit

y

lass

ma

y

still

b

e

the

same

in

t

w

o

ad-

jaen

t

segmen

ts

ev

en

though

the

n

um

b

er

of

lasses

is

high.

Figure

5

plots

the

relation

b

et

w

een

the

n

um

b

er

of

lasses

and

the

amoun

t

of

redution

obtained

in

the

n

um

b

er

of

ut

p

oin

t

andidates

b

y

using

alternation

p

oin

ts.

Ea

h

n

umerial

attribute

within

our

test

do-

mains

is

represen

ted

b

y

one

dot.

Largest

redutions

are

obtained

in

domains

with

few

lasses.

In

the

t

w

o

data

sets

with

o

v

er

25

lasses

the

redution

sta

ys

b

elo

w

25%.

The

orrelation

b

et

w

een

the

ratio

A=V

,

where

A

is

the

n

um

b

er

of

alternating

segmen

ts,

and

the

n

um

b

er

of

lasses

is

quite

lear.

Hene,

one

ma

y

exp

et

that

in

domains

with

man

y

lasses

the

time

sa

vings

are

not

as

great

as

in

domains

with

few

er

lasses.

No

w

that

w

e

kno

w

that

the

n

um

b

er

of

alternation

p

oin

ts

is

often

ev

en

substan

tially

lo

w

er

than

that

of

b

oundary

p

oin

ts

and

segmen

t

b

orders,

the

question

remains

ho

w

m

u

h,

if

at

all,

an

w

e

b

enet

from

using

alternations

rather

than

their

alternativ

es.

Figure

6

plots

the

total

running

time

(prepro

essing

0,1

0,4

0,7

1

2

4

8

16

32

64

128 256

0 %

20 %

40 %

60 %

80 %

100 %

120 %

140 %

160 %

ti

m

e

o

n

a

lt

e

rn

a

ti

o

n

s

v

s

.

ti

m

e

o

n

a

ll

c

u

t

p

o

in

ts

# alternating

segments vs.

# all cut points

(A/V)

maximum arity (k)

Figure

6.

The

time

required

in

sear

hing

for

TSE-optimal

disretization

when

op

erating

on

alternations

on

trasted

to

that

on

all

ut

p

oin

ts.

The

eets

of

the

allo

w

ed

maxim

um

arit

y

and

the

relation

A=V

are

also

tak

en

in

to

aoun

t.

and

sear

h)

of

the

Auer-Birk

endorf

algorithm

on

al-

ternation

p

oin

ts

against

the

same

algorithm

op

erating

on

bin

b

orders.

It

an

b

e

learly

observ

ed

that

as

the

relativ

e

n

um

b

er

of

alternations

redues

or

the

n

um-

b

er

of

in

terv

als

allo

w

ed

in

the

disretization

gro

ws,

the

adv

an

tage

gained

b

y

using

alternation

p

oin

ts

in-

reases

signian

tly

.

The

o

v

erhead

asso

iated

with

nding

the

alternation

p

oin

ts

ditates

that

if

either

of

these

parameters

is

not

fa

v

orable,

then

disretiza-

tion

based

on

alternations

is

less

eien

t

than

that

on

easily-found

bin

b

orders.

F

or

k

8

an

A=V

ratio

b

e-

lo

w

0.9

promises

gains,

while

in

binary

disretization

no

gains

are

obtained

irresp

etiv

e

of

the

A=V

ratio.

In

the

exp

erimen

ts

w

e

also

observ

ed

that

the

dis-

retizations

dened

on

ma

jorit

y

alternations

w

ere

al-

w

a

ys

optimal

ev

en

for

domains

with

more

than

t

w

o

lasses.

This

indiates

that

the

ondition,

whi

h

sets

a

ut

p

oin

t

on

an

alternation

p

oin

t

that

is

not

a

ma-

jorit

y

alternation,

is

v

ery

rare

in

pratie.

So,

if

the

optimal

sore

is

not

required,

one

an

safely

ho

ose

the

ut

p

oin

ts

from

among

ma

jorit

y

alternations.

6.

Conlusion

Examining

segmen

t

b

orders

a

subset

of

b

oundary

p

oin

ts

is

neessary

and

suien

t

in

sear

hing

for

the

optimal

partition

of

a

v

alue

range

with

resp

et

to

a

stritly

on

v

ex

ev

aluation

funtion

(Elomaa

&

Rousu,

2002).

W

e

sho

w

ed

that

with

TSE

whi

h

is

not

stritly

on

v

ex

only

a

w

ell-dened

subset

of

segmen

t

b

orders

need

to

b

e

examined:

only

ma

jorit

y

alternations

and

alternation

p

oin

ts

need

to

b

e

onsid-

background image

ered

when

sear

hing

for

the

global

and

b

ounded-arit

y

optim

um,

resp

etiv

ely

.

On

the

other

hand,

w

e

w

ere

able

to

sho

w

that

in

b

ounded-arit

y

disretization

no

alternations

an

b

e

ig-

nored

with

TSE

without

onsidering

the

plaemen

t

of

adjaen

t

ut

p

oin

ts.

The

Auer-Birk

endorf

algorithm,

iniden

tally

,

do

es

exatly

this

kind

of

b

o

okk

eeping:

it

k

eeps

tra

k

of

the

b

est

on

text

to

the

left

for

ea

h

lass

and

ea

h

arit

y

.

Hene,

impro

ving

the

dynami

programming

s

heme

on

the

oneptual

lev

el

seems

diult.

On

some

real-w

orld

domains

the

n

um

b

er

of

alterna-

tion

p

oin

ts

w

as

diso

v

ered

to

b

e

signian

tly

smaller

than

that

of

segmen

t

b

orders.

Pratial

gains

w

ere

ob-

tained

b

y

using

alternation

p

oin

ts

rather

than

segmen

t

b

orders.

Referenes

Auer,

P

.

(1997).

Optimal

splits

of

single

attributes

(Unpublished

man

usript).

Institute

for

Theoretial

Computer

Siene,

Graz

Univ

ersit

y

of

T

e

hnology

.

Auer,

P

.,

Holte,

R.

C.,

&

Maass,

W.

(1995).

Theory

and

appliation

of

agnosti

P

A

C-learning

with

small

deision

trees.

Pr

o

e

e

dings

of

the

Twelfth

Interna-

tional

Confer

en

e

on

Mahine

L

e

arning

(pp.

2129).

San

F

raniso:

Morgan

Kaufmann.

Birk

endorf,

A.

(1997).

On

fast

and

simple

algorithms

for

nding

maximal

subarra

ys

and

appliations

in

learning

theory

.

Computational

L

e

arning

The

ory,

Pr

o

e

e

dings

of

the

Thir

d

Eur

op

e

an

Confer

en

e

(pp.

198209).

Berlin,

Heidelb

erg:

Springer-V

erlag.

Blak

e,

C.

L.,

&

Merz,

C.

J.

(1998).

UCI

r

ep

ository

of

mahine

le

arning

datab

ases.

Departmen

t

of

Infor-

mation

and

Computer

Siene,

Univ

ersit

y

of

Cali-

fornia

at

Irvine.

Breiman,

L.,

F

riedman,

J.

H.,

Olshen,

R.

A.,

&

Stone,

C.

J.

(1984).

Classi

ation

and

r

e

gr

ession

tr

e

es.

P

a-

i

Gro

v

e:

W

adsw

orth.

Bro

dley

,

C.

(1995).

Automati

seletion

of

split

ri-

terion

during

tree

gro

wing

based

on

no

de

lo

ation.

Pr

o

e

e

dings

of

the

Twelfth

International

Confer

en

e

on

Mahine

L

e

arning

(pp.

7380).

San

F

raniso:

Morgan

Kaufmann.

Elomaa,

T.,

&

Rousu,

J.

(1999).

General

and

e-

ien

t

m

ultisplitting

of

n

umerial

attributes.

Mahine

L

e

arning,

36,

201244.

Elomaa,

T.,

&

Rousu,

J.

(2000).

Generalizing

b

ound-

ary

p

oin

ts.

Pr

o

e

e

dings

of

the

Sevente

enth

National

Confer

en

e

on

A

rtiial

Intel

ligen

e

(pp.

570576).

Menlo

P

ark:

AAAI

Press.

Elomaa,

T.,

&

Rousu,

J.

(2001).

On

the

omputational

omplexit

y

of

optimal

m

ultisplitting.

F

undamenta

Informati

ae,

47,

3552.

Elomaa,

T.,

&

Rousu,

J.

(2002).

Linear-time

pre-

pro

essing

in

optimal

n

umerial

range

partitioning.

Journal

of

Intel

ligent

Information

Systems,

18,

55

70.

F

a

yy

ad,

U.

M.,

&

Irani,

K.

B.

(1992).

On

the

handling

of

on

tin

uous-v

alued

attributes

in

deision

tree

gen-

eration.

Mahine

L

e

arning,

8,

87102.

F

ulton,

T.,

Kasif,

S.,

&

Salzb

erg,

S.

(1995).

Eien

t

algorithms

for

nding

m

ulti-w

a

y

splits

for

deision

trees.

Pr

o

e

e

dings

of

the

Twelfth

International

Con-

fer

en

e

on

Mahine

L

e

arning

(pp.

244251).

San

F

raniso:

Morgan

Kaufmann.

Kearns,

M.,

Mansour,

Y.,

Ng,

A.

Y.,

&

Ron,

D.

(1997).

An

exp

erimen

tal

and

theoretial

ompari-

son

of

mo

del

seletion

metho

ds.

Mahine

L

e

arning,

27,

150.

Lozano,

F.

(2000).

Mo

del

seletion

using

radema

her

p

enalization.

Pr

o

e

e

dings

of

the

Se

ond

ICSC

Sym-

p

osium

on

Neur

al

Networks.

Berlin:

ICSC

A

a-

demi.

Lubinsky

,

D.

J.

(1995).

Inreasing

the

p

erformane

and

onsisteny

of

lassiation

trees

b

y

using

the

auray

riterion

at

the

lea

v

es.

Pr

o

e

e

dings

of

the

Twelfth

International

Confer

en

e

on

Mahine

L

e

arning

(pp.

371377).

San

F

raniso:

Morgan

Kaufmann.

Maass,

W.

(1994).

Eien

t

agnosti

P

A

C-learning

with

simple

h

yp

otheses.

Pr

o

e

e

dings

of

the

Seventh

A

nnual

A

CM

Confer

en

e

on

Computational

L

e

arn-

ing

The

ory

(pp.

6775).

New

Y

ork:

A

CM

Press.

Quinlan,

J.

R.

(1986).

Indution

of

deision

trees.

Ma-

hine

L

e

arning,

1,

81106.

V

apnik,

V.,

&

Cherv

onenkis,

A.

(1971).

On

the

uni-

form

on

v

ergene

of

relativ

e

frequenies

of

ev

en

ts

to

their

probabilities.

The

ory

of

Pr

ob

ability

and

its

Appli

ations,

16,

264280.

Zighed,

D.,

Rak

otomalala,

R.,

&

F

es

het,

F.

(1997).

Optimal

m

ultiple

in

terv

als

disretization

of

on

tin-

uous

attributes

for

sup

ervised

learning.

Pr

o

e

e

dings

of

the

Thir

d

International

Confer

en

e

on

Know

le

dge

Dis

overy

and

Data

Mining

(pp.

295298).

Menlo

P

ark:

AAAI

Press.

Wyszukiwarka

Podobne podstrony:
10 Metody otrzymywania zwierzat transgenicznychid 10950 ppt
10 dźwigniaid 10541 ppt
wyklad 10 MNE
Kosci, kregoslup 28[1][1][1] 10 06 dla studentow
10 budowa i rozwój OUN
10 Hist BNid 10866 ppt
POKREWIEŃSTWO I INBRED 22 4 10
Prezentacja JMichalska PSP w obliczu zagrozen cywilizacyjn 10 2007
Mat 10 Ceramika
BLS 10
10 0 Reprezentacja Binarna
10 4id 10454 ppt
10 Reprezentacja liczb w systemie komputerowymid 11082 ppt

więcej podobnych podstron