Jacek Kluska

Department of Electrical and Computer Engineering

Rzeszow University of Technology, Poland

Application of data classifiers to

breast cancer diagnosis

Contents



Breast cancer input data



Overview of data classifiers



Exponential Radial Basis Function SVM (ERBFSVM)



Radial Basis Function Neural Network (RBFNN)



Learning Vector Quantization Neural Network (LVQNN)



Two Layer Perceptron (2LP)



Back Propagation Network (BPNN)



Decision supporting parameters



Cross Validation error



Diagnostic accuracy measures



Comparative analysis



Conclusions

Introduction

The goal:



Comparative analysis of ANNs and SVM in the
classification of medical data: breast cancer.



The input space is represented by the population of
683 women who suffered from breast cancer with 10
attributes associated with each patient.



Two data set classification problem, since each
instance has one of two possible cases: benign or
malignant.



In order to find which of the models predicts unknown
data more accurately, a 10-fold Cross Validation (CV)
algorithm is used.



Additionally, the diagnostic accuracy of the classifiers
is measured.

Introduction – cont.



Breast cancer input data (Wolberg, Mangasarian):

www.ics.uci.edu/~mlearn/databases/breast-cancer-
wisconsin/



Cardinality of learning data set



Total number of attributes

clump thickness (1,2,…,10)

uniformity of cell size (1,2,…,10)

uniformity of cell shape (1,2,…,10)

marginal adhesion (1,2,…,10)

single epithelial cell size (1,2,…,10)

bare nuclei (1,2,…,10)

bland chromatin (1,2,…,10)

normal nucleoli (1,2,…,10)

mitoses (1,2,…,10)

n 

683

l 

ERBFSVM



ERBF kernel



Solution: optimal classifier

____________________________________

where









sign

i i

i SV

y K











x x























i i

i SV

y K

r s

SV y





 





 







x x









exp









x x

LVQNN



The output (1 neuron):



The outputs of the competitive hidden layer neurons



compet returns 1 only for the neuron whose weight vector

forms the closest match with the input x.





(2)

(1)

, ,













(1)

compet

, ,

j n









RBF network



The output (1 neuron):



The outputs of hidden layer neurons



- spread constant





(2)

(1)

, ,









(1)

exp

2( )

, ,

j n





































2LP



The output (1 neuron):

where



The outputs of hidden layer neurons:









(2)

(1)

(2)

(1)

sign

, ,













(1)

sign





BPN



The output (1 neuron):

where



The outputs of hidden layer neurons:









(2)

(1)

(2)

(1)

logsig

, ,

















(1)

logsig

logsig( )

exp















Cross Validation procedure

Division of all input patterns (e.g. l = 199) into K
subsets, (e.g. K = 10).

Model training on all subsets except from one.

Model testing on the subset left out.



Cross Validation error (should be as low as
possible):



- the number of misclassified examples

within a single m

separation.



L - the number of validating data.





100









ERBFSVM results



The lowest percentage of misclassified examples:

= 2.78 [%] by C = 1 and

σ ∈ {3.9,4.7}

0.1

100

1000

10000

[%]

C = 1
C = 10
C = 100
C = 1 000
C = 10 000
C = 100 000
C = 1 000 000

RBFNN results



The lowest percentage of misclassified examples:

= 2.78 [%] by S1

∈ {9,11,13, ..., 29}

100

200

300

400

500

600

[%]

LVQNN vs. remaining networks



LVQNN accomplishes identical lowest prediction error:
ERBFSVM and RBFNN, E

= 2.78 [%] by S1 = 10

100

200

300

400

500

600

[%]

2TBPNN
2LBPNN
2LP
LVQNN

3LBPNN



3 layer BPNN is completely unpredictable as a
classifier ...

100

150

200

100

150

200

[%]

Diagnostic accuracy

TruePos

sensitivity

TruePos

FalseNeg

TrueNeg

specificity

TrueNeg

FalsePos









Generalization results –

summary

Classifier

[%]

sensitivity

specificity

ERBSFM

2.78

0.95

0.98

RBFNN

2.78

0.95

0.98

LVQNN

2.78

0.95

0.98

2LP

4.11

0.93

0.94

3BPNN

3.53

0.93

0.95

Diagnostic accuracy and generalization
ability of breast cancer data classifiers

Conclusions



The sensitivity 0.95 for ERBFSVM, LVQNN and
RBFNN means that 95% of sick patients is identified
as sick.



Specificity 0.98 gives 98% of certainty that healthy
patients are diagnosed as such.



High diagnostic accuracy is justified by a very good
generalization ability of the models; E

= 2.78 [%].

This suggests that these classifiers are reliable and
precise in medical diagnosis on new breast cancer
cases.



The above models can serve as a feedback for
physicians during the process of treatment.

Conclusions – cont.



Medical Diagnostic System

Acknowledgements
The author is grateful to PhD student Maciej Kusy for his valuable help with data
preparation and calculations.

Module I

Data Preparation

Module III

Main Routine -

Training and

Testing of Models

Module II

Specification and

Initialization of

Models

Module IV

Selection of Most

Optimal Model.

Rule Generation

Document Outline

Application of data classifiers to breast cancer diagnosis
Contents
Introduction
Introduction – cont.
ERBFSVM
LVQNN
RBF network
2LP
BPN
Cross Validation procedure
ERBFSVM results
RBFNN results
LVQNN vs. remaining networks
3LBPNN
Diagnostic accuracy
Slide Number 16
Conclusions
Conclusions – cont.

Wyszukiwarka

Podobne podstrony:
Perceived risk and adherence to breast cancer screening guidelines
11 a Ovarian Cancer
2008 Coping With Breast Cancer Workbook for couples
The Relationship between Twenty Missense ATM Variants and Breast Cancer Risk The Multiethnic Cohort
A nonsense mutation (E1978X) in the ATM gene is associated with breast cancer
Missense Variants in ATM in 26,101 Breast Cancer Cases an 29,842 Controls
INTERNET USE AND SOCIAL SUPPORT IN WOMEN WITH BREAST CANCER
Evaluation of the role of Finnish ataxia telangiectasia mutations in hereditary predisposition to br
Population Based Estimates of Breast Cancer Risks Associated With ATM Gene Variants c 7271T4G and c
Quality of life of 5–10 year breast cancer survivors diagnosed between age 40 and 49
Established breast cancer risk factors by clinically important
Rare, Evolutionarily Unlikely Missense Substitutions in ATM Confer Increased Risk of Breast Cancer
03 Antibody conjugated magnetic PLGA nanoparticles for diagnosis and treatment of breast cancer
Single nucleotide polymorphism D1853N of the ATM gene may alter the risk for breast cancer
Variants in the ATM gene associated with a reduced risk of contralateral breast cancer
Breast Cancer Early Detection ACS
Spectrum of ATM Gene Mutations in a Hospital based Series of Unselected Breast Cancer Patients
Variants in the ATM gene and breast cancer susceptibility
Predictors of perceived breast cancer risk and the relation between preceived risk and breast cancer

więcej podobnych podstron

11 b Breast Cancer

Document Outline