Jacek Kluska
Department of Electrical and Computer Engineering
Rzeszow University of Technology, Poland
RBF-based ANN and SVM models in
the ovarian cancer data classification
Contents
2
Introduction
Theory of Support Vector Machines (SVM)
Radial-Basis Function Network (RBFN) approach
The ovarian cancer input data
Comparative study of SVM and RBFN on ovarian
cancer data classification
Conclusions
Introduction
3
Ovarian cancer database represents the population
of women who suffered from ovarian cancer.
Ovarian cancer is the most common form of cancer
found in female genitalia in Poland.
The reason is gene mutation and the existence of this
disease among the forebears of a sick person.
Ovarian cancer's morbidity has increased.
Occurrence of the ovarian cancer is estimated with
the frequency of:
3/100000 of registered women under the age of 37, and
40/100000 over the age of 37.
Ovarian cancer input data
4
The medical data set was provided as the result
of a cooperation between Rzeszów University of
Technology and Prof. Andrzej Skret and Dr.
Tomasz Lozinski from Obstetrics and
Gynecology Department, State Hospital
Rzeszow, Poland.
The input data is represented by the population
of 199 women who were treated for ovarian
cancer with 17 different medical parameters
registered for each patient during treatment
process.
The details (later on).
Motivation
5
What are classification capabilities of SVM and
RBFN on medical data ?
Which model generalizes better using Cross-
Validation procedure ?
Which model is more accurate determining:
Sensitivity
Specificity
Theory of SVM
6
Find GSH by solving QP problem
- Hessian matrix
- design parameter
Problem solution:
1
1
max
,
2
,
0
0
,
1, ,
l
i
i
i
C
i
l
H
Y
α
α α
α
x x
( , )
i j
i
j
l l
y y K
H
0
C
α
α
SVM – problem solution
7
Let us assume ERBF
Solution: Optimal Classifier
where
x x
*
sign
,
i i
i
i SV
y
y K
b
α
α
*
1
,
,
,
2
,
:
1,
1,
|
0
i i
i
r
i
s
i SV
r
s
i
b
y K
K
r s
SV y
y
SV
i
x x
x x
2
,
exp
2
K
u
v
u v
RBF network architecture
8
solverb function (Matlab’s Neural Networks Toolbox )
Chen, S., C.F.N.Cowan, and P.M. Grant, Orthogonal
Least Squares Learning Algorithm for Radial Basis
Function Networks, IEEE Transactions on Neural
Networks, Vol. 2, No. 2, March 1991, pp. 302–309.
x
1
x
17
RBF
Linear
RBF network (Matlab)
9
net = newrb(P,T,GOAL,SPREAD)
P – matrix of input vectors,
T – matrix of target vectors.
newrb iteratively creates a radial basis network
one neuron at a time.
Neurons are added to the network until the sum-
squared error falls beneath an error goal or a
maximum number of neurons has been reached.
The larger the input space, i.e. the number of
inputs, and the ranges those inputs vary over, the
more radbas neurons required.
Disease parameters
10
The medical data represent a group of 199 women,
which were treated for ovarian cancer. The entire set
of 199 cases is divided into two subgroups. It stems
from the fact that there is a survival threshold of 60
months of treatment among patients meaning that if a
patient survives beyond this period of time, she will
recover from this particular disease.
This allows to assume that the classification problem
can be considered as a data separation for two
classes.
17 parameters were registered for each patients:
68 of patients survived past 60 months
131 died before this period.
11
1.
figo staging of ovarian cancer
∈ {1,2,3,4}
2.
observation-examination
∈ {0,1}; 60 months-complete (1), less than 60 – incomplete (0)
3.
hospital
∈ {0,1}; state clinical hospitals (1), others (0)
4.
age of hospitalized women
∈ {22,25,...,81}
5.
hysterectomy – removal of uterus
∈ {0,1}; performed (1), not performed (0)
6.
adnexectomy – complete removal of ovary and salpinx
∈ {0,1}; performed (1), not
performed (0)
7.
full exploration of abdomen
∈ {0,1}; possible (0), performed (1)
8.
type of surgery
∈ {1,2,3}; hysterectomy (1), adnexectomy (2), exploration only (3)
9.
appendectomy – removal of appendix
∈ {0,1}; performed (1), not performed (0)
10. removal of intestine
∈ {0,1}; performed (1), not performed (0)
11. degree of debulking
∈ {1,2,3}; entire (3), up to 2cm (1), more than 2cm (2)
12. mode of surgery
∈ {1,2,3}; intraperitoneal (1), extraperitoneal (2), trans-tumor resection (3)
13. histological type of tumor
∈ {1,2,3,4,5}
14. grading of tumor
∈ {1,2,3,4}; GI (1), GII (2), GIII (3), data unavailable (4)
15. type of chemotherapy
∈ {1,2,3}
16. radiotherapy
∈ {0,1}; performed (1), not performed (0)
17. “second look” surgery
∈ {0,1}; performed (1), not performed (0)
Disease parameters in details
Comparative study of SVM and RBFN
on ovarian cancer data classification
12
Input space:
Cardinality of training data set:
Labels in SVM
If a patient survived past 60 months, then the label is "1"
If a patient died before 60 months, then the label is "-1"
Labels in NN
If a patient survived past 60 months, then RBFN output
signal is "1"
If a patient died before 60 months, then RBFN output
signal is "0"
17
n
199
l
Generalization performance
13
Software based on J. Platt’s SMO used to
classify data by means of SVM
Matlab’s Neural Networks Toolbox
solverb
function implemented to classify data by means
of RBFN
10 - fold Cross-Validation
procedure applied to
determine which model generalizes better to
unknown data
Generalization parameters
14
SVM
SVM parameter:
--------------------------
RBFN
R
BFN parameter:
2
,
exp
2
K
u
v
u v
sc
(1)
(1)
2
exp
2( )
j
j
y
sc
x
w
Cross Validation procedure
15
1.
All input patterns (l = 199) are divided into K subsets
(10). The points for the subsets should be selected
randomly.
2.
The model is trained on all subsets except from one.
3.
The model is tested on the subset left out.
Trial 1
Trial 2
Trial K
Cross Validation error
16
•
Cross Validation
error
- definition
•
- the number of misclassified examples
within a single
m
th
separation
•
L - the number of validating data
•
CV error should be as low as possible.
1
100
K
CV
i
i
E
L
L
K
m
i
Data Normalization
17
Data normalization is a scaling of attributes values to
the specific range. This may improve the quality of
classification (decrease CV error) and enhance the
generalization ability of the classifiers. Such a
mechanism is applied in the cases when the values of
the feature parameters differ significantly one from
another. It relies on scaling of the original data to
specific range, e.g. [0,1].
Ovarian cancer data is a very good example of a data
set to which the normalization should be applied. To
justify this, it is good to analyze the values of feature
parameters.
Normalization of the input patterns
18
9 attributes can get values from {0,1}: observation-
examination, kind of hospital, hysterectomy, adnexectomy,
exploration of abdomen, appendectomy, removal of
intestine, radiotherapy, second-look surgery.
4 attributes can get values from {1,2,3}: kind of surgery,
degree of debulking, mode of surgery, kind of
chemioteraphy.
2 attributes can get values from {1,2,3,4}: figo staging and
grading of tumor can get values from the interval.
Histological type of tumor is from {1,2,3,4,5}.
The greatest range of values is assigned to the attribute
named age of patient: {22,25,...,81}.
There is a wide range of the attribute values for the input
vectors.
Normalization examples
19
Let
be the
j
th coordinate of the ith data
vector
1.
max normalization
2.
min-max normalization
where
( )
{
}
( )
[
]
_
0,1
_
i
i
x
j
MAX
norm x
j
MAX
j
=
∈
( )
( )
1,...,
1,...,
_
max
,
_
min
i
i
j
n
j
n
MAX
j
x
j
MIN
j
x
j
=
=
=
=
( )
{
}
( )
[
]
_
_
_
0,1
_
_
i
i
x
j
MIN
j
MIN
MAX
norm x
j
MAX
j
MIN
j
−
=
∈
−
( )
i
x
j
RBFN Cross Validation error
20
2
3
4
20
40
60
80
100
10
20
30
40
50
60
70
80
90
100
E
CV
[%]
S1
sc
SVM Cross Validation error
21
0.1
1
10
100
1000
10000
15
20
25
30
35
E
CV
[%]
σ
C = 100
C = 1000
C = 10 000
C = 100 000
C = 1000 000
Generalization results
22
RBFN - the lowest percentage of misclassified
examples:
max normalization: E
CV
=14.58% (S1=k=11, sc=2.5),
min-max normalization: E
CV
=14.11% (S1=k=11,
sc=3.5),
no normalization: E
CV
=20.08% (S1=k=19, sc=4).
SVM - the lowest percentage of misclassified
examples:
max normalization: E
CV
=14.61[%] (C=10³,
σ=55.7),
min-max normalization: E
CV
=15.13[%] (C=10³,
σ=55.7),
no normalization: E
CV
=17.61[%] (C=10
⁵, σ=290).
23
Generalization results –
summary
SVM
RBFN
Optimal
parameter
14.61 %
14.11 %
*
4.7
*
2.75
sc
CV
E
Other criterion: diagnostic accuracy
24
According to (Sboner et al. 2003) for we compute:
TP - properly classified patients (True), who are sick (Pos),
FN - incorrectly classified patients (False), who are healthy
(Neg),
TN - correctly classified patients (True) with negative test result
(Neg),
FP - identifies incorrectly classified (False) ill patients (Pos).
TruePos
TP
sensitivity
TruePos
FalseNeg
TP
FN
TrueNeg
TN
specificity
TrueNeg
FalsePos
TN
FP
*
*
,sc
Diagnostic accuracy – simple abstract
example
25
Given 5 medical cases:
3 sick (marked with S) and 2 healthy (marked with
H) patients represented by the set:
[S S S H H]
On the basis of positive and negative
observation, and the number of correctly and
incorrectly classified patients, calculate the
diagnostic accuracy parameters:
sensitivity
specificity
for all possible 2^5 = 32 combinations.
Diagnostic accuracy – simple abstract
example – cont.
26
combination sensitivity specificity sum
1.
H H H H H
inf
0.4
inf
2.
S H H H H
1
0.5
1.5
3.
H S H H H
1
0.5
1.5
4.
S S H H H
1
0.67
1.67
5.
H H S H H
1
0.5
1.5
6.
S H S H H
1
0.67
1.67
7.
H S S H H
1
0.67
1.67
8.
S S S H H
1
1
2
9.
H H H S H
0
0.25
0.25
10.
S H H S H
0.5
0.33
0.83
11.
H S H S H
0.5
0.33
0.83
12.
S S H S H
0.67
0.5
1.17
13.
H H S S H
0.5
0.33
0.83
14.
S H S S H
0.67
0.5
1.17
15.
H S S S H
0.67
0.5
1.17
16.
S S S S H
0.75
1
1.75
combination sensitivity specificity sum
17.
H H H H S
0
0.25
0.25
18.
S H H H S
0.5
0.33
0.83
19.
H S H H S
0.5
0.33
0.83
20.
S S H H S
0.67
0.5
1.17
21.
H H S H S
0.5
0.33
0.83
22.
S H S H S
0.67
0.5
1.17
23.
H S S H S
0.67
0.5
1.17
24.
S S S H S
0.75
1
1.75
25.
H H H S S
0
0
0
26.
S H H S S
0.33
0
0.33
27.
H S H S S
0.33
0
0.33
28.
S S H S S
0.5
0
0.5
29.
H H S S S
0.33
0
0.33
30.
S H S S S
0.5
0
0.5
31.
H S S S S
0.5
0
0.5
32.
S S S S S
0.6
inf
inf
27
Diagnostic accuracy results -
ovarian cancer data classification
SVM
(
σ = 4.7)
RBFN
(sc = 2.75)
sensitivity
0.88
0.91
specificity
0.78
0.78
Conclusions
28
Both RBFN and SVM have an acceptable
generalization ability and diagnostic accuracy on
comparable level of acceptability.
Sometimes RBFN seems to outperform SVM.
Generalization performances of SVM and RBFN
and medical measures (sensitivity and specificity)
are low, because of:
low cardinality of data set,
complexity of input space,
imbalance between the classes: similar input data
– different labelling.
Conclusions – cont.
29
Unfortunately there is a disproportion in the
number of instances; in the entire set consisting
of 199 cases, there are
68 negative samples and
131 positive samples.
It is recommended to use both models: RBFN
and SVM to predict a condition of a hospitalized
patient, but the outcomes of the systems have to
be appraised by the doctor to be taken into
account.
Acknowledgements
The author is grateful to PhD student Maciej Kusy for his valuable help with data
preparation and calculations.