motif mixture

background image

Motif representation using position

weight matrix

Xiaohui Xie

University of California, Irvine

Motif representation using position weight matrix – p.1/31

background image

Position weight matrix

Position weight matrix representation of a motif with width

w

:

θ

=






θ

11

θ

12

θ

13

θ

14

θ

21

θ

22

θ

23

θ

24

· · ·

θ

w

1

θ

w

2

θ

w

3

θ

w

4






(1)

where each row represents one position of the motif, and

is normalized:

4

X

j

=1

θ

ij

= 1

(2)

for all

i

= 1, 2, · · · , w

.

Motif representation using position weight matrix – p.2/31

background image

Likelihood

Given the position weight matrix

θ

, the probability of

generating a sequence

S

= (S

1

, S

2

,

· · · , S

w

)

from

θ

is

P

(S|θ) =

w

Y

i

=1

P

(S

i

i

)

(3)

=

w

Y

i

=1

θ

i,S

i

(4)

For convenience, we have converted

S

from a string of

{A, C, G, T }

to a string of

{1, 2, 3, 4}

.

Motif representation using position weight matrix – p.3/31

background image

Likelihood

Suppose we observe not just one, but a set of sequences

S

1

, S

2

,

· · · , S

n

, each of which contains exactly

w

letters.

Assume each of them is generated independently from

the model

θ

. Then, the likelihood of observing these

n

sequences is

P

(S

1

, S

2

,

· · · , S

n

|θ) =

n

Y

k

=1

P

(S

k

|θ)

(5)

=

n

Y

k

=1

w

Y

i

=1

θ

i,S

ki

=

w

Y

i

=1

4

Y

j

=1

θ

c

ij

ij

(6)

where

c

ij

is the number of letter

j

at position

i

(Note that

P

4
j

=1

c

ij

= n

for all

i

).

Motif representation using position weight matrix – p.4/31

background image

Parameter estimation

Now suppose we do not know

θ

. How to estimate it from

the observed sequence data

S

1

, S

2

,

· · · , S

n

?

One solution: calculate the likelihood of observing the

provided

n

sequences for different values of

θ

,

L

(θ) = P (S

1

, S

2

,

· · · , S

n

|θ) =

n

Y

k

=1

w

Y

i

=1

θ

i,S

ki

(7)

Pick the one with the largest likelihood, that is, to find

θ

that

max

θ

P

(S

1

, S

2

,

· · · , S

n

|θ)

(8)

Motif representation using position weight matrix – p.5/31

background image

Maximum likelihood estimation

Maximum likelihood estimation of

θ

is

ˆ

θ

M L

= arg max

θ

log L(θ)) =

w

X

i

=1

4

X

j

=1

c

ij

log θ

ij

s.t.

4

X

j

=1

θ

ij

= 1,

∀i = 1, · · · , w

(9)

Motif representation using position weight matrix – p.6/31

background image

Optimization with equality constraints

Construct a Lagrangian function taking the equality

constraint into account:

g

(θ) = log L(θ) +

w

X

i

=1

λ

i

(1 −

4

X

j

=1

θ

ij

)

(10)

Solve the unconstrained optimization problem

ˆ

θ

= arg max

θ

g

(θ)) =

w

X

i

=1

4

X

j

=1

c

ij

log θ

ij

+

w

X

i

=1

λ

i

(1 −

4

X

j

=1

θ

ij

)

(11)

Motif representation using position weight matrix – p.7/31

background image

Optimization with equality constraints

Take the derivative of

g

(θ)

w.r.t.

θ

ij

and the Lagrange

multiplier

λ

i

and set them to 0

∂g

(θ)

θ

ij

= 0

(12)

∂g

(θ)

λ

i

= 0

(13)

which leads to:

ˆ

θ

ij

=

c

ij

n

(14)

which is simply the frequency of different letters at each

position. (

c

ij

is the number of letter

j

at position

i

).

Motif representation using position weight matrix – p.8/31

background image

Bayes’ Theorem

P

(θ|S) =

P

(S|θ)P (θ)

P

(S)

(15)

Each term in Bayes’ theorem has a conventional name:

1.

P

(S|θ)

– the conditional probability of

S

given

θ

, also

called the likelihood.

2.

P

(θ)

– the prior probability or marginal probability of

θ

.

3.

P

(θ|S)

– the conditional probability of

θ

given

S

, also

called the posterior probability of

θ

4.

P

(S)

– the marginal probability of

S

, and acts as a

normalizing constant.

Motif representation using position weight matrix – p.9/31

background image

Maximum a posteriori (MAP) estmation

MAP (or posterior mode) estimation of

θ

is

ˆ

θ

MAP

(S) = arg max

θ

P

(θ|S

1

, S

2

,

· · · , S

n

)

(16)

= arg max

θ

log L(θ) + log P (θ)

(17)

Assume

P

(θ) =

Q

w
i

=1

P

i

)

(independence of

θ

i

at

diffferent position

i

).

Model

P

i

)

with a Dirichlet distribution

i

1

, θ

i

2

, θ

i

3

, θ

i

4

) ∼ Dir(α

1

, α

2

, α

3

, α

4

).

(18)

Motif representation using position weight matrix – p.10/31

background image

Dirichlet Distribution

Probability density function of Dirichlet distribution Dir(

α

)

of order

K

≥ 2

:

p

(x

1

,

· · · , x

K

; α

1

,

· · · , α

K

) =

1

B

(α)

K

Y

i

=1

x

α

i

1

k

(19)

for all

x

1

,

· · · , x

K

>

0

and

P

K
i

=1

x

i

= 1

. The density is zero

outside this open

(K − 1)

-dimensional simplex.

α

= (α

1

,

· · · , α

K

)

are parameters with

α

i

>

0

for all

i

.

B

(α)

, the normalizing constant, is the multinomial beta

function:

B

(α) =

Q

K
i

=1

Γ(α

i

)

Γ(

P

K
i

=1

α

i

)

(20)

Motif representation using position weight matrix – p.11/31

background image

Gamma function

Gamma function for positive real

z

Γ(z) =

Z

0

t

z−

1

e

t

dt

(21)

Γ(z + 1) = zΓ(z)

(22)

If

n

is a positive integer, then

Γ(n + 1) = n!

(23)

Motif representation using position weight matrix – p.12/31

background image

Properties of Dirichlet distribution

Dirichlet distribution

p

(x

1

,

· · · , x

K

; α) =

1

B

(α)

K

Y

i

=1

x

α

i

1

i

(24)

Expectation, define

α

0

=

P

K
i

=1

α

i

,

E

[X

i

] =

α

i

α

0

(25)

Variance

V ar

[X

i

] =

α

i

0

− α

i

)

α

2

0

0

+ 1)

(26)

Co-variance

Cov

[X

i

X

j

] =

−α

i

α

j

α

2

0

0

+ 1)

(27)

Motif representation using position weight matrix – p.13/31

background image

Posterior Distribution

Conditional probability:

P

(S

1

, S

2

,

· · · , S

n

|θ) =

Q

w
i

=1

Q

4
j

=1

θ

c

ij

ij

Prior probability:

p

i

1

,

· · · , θ

i

4

; α) =

1

B

(α)

Q

4
i

=1

θ

α

i

1

i

Posterior probability:

P

i

|S

1

,

· · · , S

n

) = Dir(c

i

1

+ α

1

, c

i

2

+ α

2

, c

i

3

+ α

3

, c

i

4

+ α

4

)

Maxmium a posteriori estimate:

θ

MAP

i

=

c

ij

+ α

i

− 1

n

+ α

0

− 4

(28)

where

α

0

P

i

α

i

.

Motif representation using position weight matrix – p.14/31

background image

Mixture of sequences

Suppose we have a more difficult situation: Among the

set of

n

given sequences,

S

1

, S

2

,

· · · , S

n

, only a subset of

them are generated by a weight matrix model

θ

. How to

identify

θ

in this case?

Let us first define the "non-motif" (also called background)

sequence. Suppose they are generated from a single

distribution

p

0

= (p

0
A

, p

0
C

, p

0
G

, p

0
T

) = (p

0

1

, p

0

2

, p

0

3

, p

0

4

)

(29)

Motif representation using position weight matrix – p.15/31

background image

Likelihood for mixture of sequences

Now the problem is we do not know which sequence is

generated from the motif (

θ

)

and which one is generated

from the background model (

θ

0

)

.

Suppose we are provided with such label information:

z

i

=

1

if

S

i

is generated by

θ

0

if

S

i

is generated by

θ

0

(30)

for all

i

= 1, 2, · · · , n

.

Then, the likelihood of observing the

n

sequences

P

(S

1

, S

2

,

· · · , S

n

|z, θ, θ

0

) =

n

Y

i

=1

[z

i

P

(S

i

|θ) + (1 − z

i

)P (S

i

0

)]

Motif representation using position weight matrix – p.16/31

background image

Maximum Likelihood

Find the joint probability of sequences and the labels

P

(S, z|θ, θ

0

) = P (S|z, θ, θ

0

)P (z)

=

n

Y

i

=1

P

(z

i

)[z

i

P

(S

i

|θ) + (1 − z

i

)P (S

i

0

)]

where

z

≡ (z

1

,

· · · , z

n

)

and

P

(z) =

Q

i

P

(z

i

)

.

Marginalize over labels to derive the likelihood

L

(θ) = P (S|θ, θ

0

) =

n

Y

i

=1

[P (z

i

= 1)P (S

i

|θ)+P (z

i

= 0)P (S

i

0

)]

Maximum likelihood estimate:

ˆ

θ

M L

= arg max

θ

log L(θ))

Motif representation using position weight matrix – p.17/31

background image

Maximum Likelihood

Find the joint probability of sequences and the labels

P

(S, z|θ, θ

0

) = P (S|z, θ, θ

0

)P (z)

=

n

Y

i

=1

P

(z

i

)[z

i

P

(S

i

|θ) + (1 − z

i

)P (S

i

0

)]

where

z

≡ (z

1

,

· · · , z

n

)

and

P

(z) =

Q

i

P

(z

i

)

.

Marginalize over labels to derive the likelihood

L

(θ) = P (S|θ, θ

0

) =

n

Y

i

=1

[P (z

i

= 1)P (S

i

|θ)+P (z

i

= 0)P (S

i

0

)]

Maximum likelihood estimate:

ˆ

θ

M L

= arg max

θ

log L(θ))

Motif representation using position weight matrix – p.18/31

background image

Lower bound on the L

(θ)

Log likelihood function

log L(θ) =

n

X

i

=1

log [P (z

i

= 1)P (S

i

|z

i

= 1) + P (z

i

= 0)P (S

i

|z

i

= 0)]

where

P (S

i

|z

i

= 1) = P (S

i

|θ)

and

P (S

i

|z

i

= 0) = P (S

i

0

)

.

Jensen’s inequality:

log(q

1

x + q

2

y) ≥ q

1

log(x) + q

2

log(y)

for all

q

1

, q

2

≥ 0

and

q

1

+ q

2

= 1

.

Motif representation using position weight matrix – p.19/31

background image

EM-algorithm

Lower bound on

log L(θ)

.

log L(θ) ≥

N

X

i

=1

{q

i

log

P (z

i

= 1)P (S

i

|z

i

= 1)

q

i

+

(1 − q

i

) log

P (z

i

= 0)P (S

i

|z

i

= 0)

1 − q

i

} ≡

N

X

i

=1

φ(q

i

, θ)

Expectation-Maximization: Alternate between two steps:

E-step

ˆ

q

i

= arg max

q

i

φ(q

i

, ˆ

θ)

M-step

ˆ

θ = arg max

θ

N

X

i

=1

φ(ˆ

q

i

, θ)

Motif representation using position weight matrix – p.20/31

background image

E-Step

Auxiliary function

φ(q

i

, θ) = q

i

log

P (z

i

= 1)P (S

i

|z

i

= 1)

q

i

+(1−q

i

) log

P (z

i

= 0)P (S

i

|z

i

= 0)

1 − q

i

E-step

ˆ

q

i

= arg max

q

i

φ(q

i

, ˆ

θ)

which leads to

ˆ

q

i

=

P (z

i

= 1)P (S

i

|z

i

= 1)

P (z

i

= 1)P (S

i

|z

i

= 1) + P (z

i

= 0)P (S

i

|z

i

= 0)

= P (z

i

|S

i

)

Motif representation using position weight matrix – p.21/31

background image

M-Step

Auxiliary function

φ(q

i

, θ) = q

i

log

P (z

i

= 1)P (S

i

|z

i

= 1)

q

i

+(1−q

i

) log

P (z

i

= 0)P (S

i

|z

i

= 0)

1 − q

i

M-step

ˆ

θ

=

arg max

θ

N

X

i=1

φ(ˆ

q

i

, θ)

=

arg max

θ

N

X

i=1

ˆ

q

i

[log P (S

i

|θ) + (1 − ˆ

q

i

) log P (S

i

0

)]

which leads to

ˆ

θ

ij

=

P

N
k=1

ˆ

q

k

I(S

ki

= j)

P

N
k=1

ˆ

q

k

where

I(a)

is an indicator function:

I(a) = 1

if

a

is true and 0

o.w

.

Motif representation using position weight matrix – p.22/31

background image

Summary of EM-algorithm

Initialize parameters

θ

.

Repeat until convergence

E-step: estimate the expected values of labels, given the current

parameter estimate

ˆ

q

i

= P (z

i

|S

i

)

M-step: re-estimate the parameters, given the expected

estimates of the labels

ˆ

θ

ij

=

P

N
k=1

ˆ

q

k

I(S

ki

= j)

P

N
k=1

ˆ

q

k

The procedure is guaranteed to converge to a local maximum or saddle

point solution.

Motif representation using position weight matrix – p.23/31

background image

What about MAP estimate?

Consider a Dirichlet prior distribution on

θ

i

:

θ

i

= Dir(α) ∀i = 1, · · · , w

Initialize parameters

θ

.

Repeat until convergence

E-step: estimate the expected values of labels, given the current

parameter estimate

ˆ

q

i

= P (z

i

|S

i

)

M-step: re-estimate the parameters, given the expected

estimates of the labels

ˆ

θ

ij

=

P

N
k=1

ˆ

q

k

I(S

ki

= j) + α

j

− 1

P

N
k=1

ˆ

q

k

+ α

0

− 4

Motif representation using position weight matrix – p.24/31

background image

Different methods for parameter estimation

So far, we have introduced two methods: ML and MAP

Maximum likelihood (ML)

ˆ

θ

ML

= arg max

θ

P (S|θ)

Maximum a posterior (MAP)

ˆ

θ

MAP

= arg max

θ

P (θ|S)

Bayes Estimator

ˆ

θ = arg max

θ

E

h ˆ

θ(S) − θ)

2

i

that is the one minimizing MSE( mean square error).

ˆ

θ = E[θ|S] =

Z

θP (θ|S)dθ

Motif representation using position weight matrix – p.25/31

background image

Joint distribution

Joint distribution of labels and sequences

P (S, z|θ, θ

0

)

= P (S|z, θ, θ

0

)P (z)

=

n

Y

i=1

P (z

i

)[z

i

P (S

i

|θ) + (1 − z

i

)P (S

i

0

)]

Joint distribution of

S

,

z

,

θ

and

θ

0

P (S, z, θ, θ

0

) =

n

Y

i=1

P (z

i

)[z

i

P (S

i

|θ) + (1 − z

i

)P (S

i

0

)]P (θ)P (θ

0

)

Find the joint distribution of

S

and

z

P (S, z) =

Z

θ

Z

θ

0

n

Y

i=1

P (z

i

)[z

i

P (S

i

|θ) + (1 − z

i

)P (S

i

0

)]P (θ)P (θ

0

)dθ

0

Motif representation using position weight matrix – p.26/31

background image

Posterior distribution of labels

Posterior distribution of

z

:

P (z|S) =

Z

θ

n

Y

i=1

P (z

i

)[z

i

P (S

i

|θ) + (1 − z

i

)P (S

i

0

)]P (θ)dθ/P (S)

∼ q

m

(1 − q)

n

−m

w

Y

i=1

Z

θ

i

4

Y

j=1

θ

n

ij

ij

P (θ

i

)dθ

i

B(n

0

+ α

0

)

B(α

0

)

∼ q

m

(1 − q)

n

−m

w

Y

i=1

 B(n

i

+ α)

B(α)

 B(n

0

+ α

0

)

B(α

0

)

where

m =

P

n
i=1

z

i

,

P (z

i

= 1) = q

,

P (θ

i

) = Dir(α)

,

P (θ

0

) = Dir(α

0

)

are Dirichlet priors, and

n

ij

P

n
k=1

z

k

I(S

ki

= j)

is the number of letter

j

at position

i

among

the sequences with label

1

.

n

i

≡ (n

i1

, · · · , n

i4

)

.

n

0,j

P

n
k=1

(1 − z

k

)

P

w
i=1

I(S

ki

= j)

and

n

0

≡ (n

0,1

, · · · , n

0,4

)

.

Motif representation using position weight matrix – p.27/31

background image

Sampling

Posterior distribution of

z

:

P (z|S) ∼ q

m

(1 − q)

n

−m

w

Y

i=1

B(n

i

+ α)

B(α)

B(n

0

+ α

0

)

B(α

0

)

Posterior distribution of

z

k

conditioned on all other labels

z

−k

≡ {z

i

|i = 1, · · · , n, i 6= k}

:

P (z

k

= 1|z

−k

, S) ∼ q

w

Y

i=1

 B(n

−k,i

+ ∆(S

ki

) + α)

B(n

−k,i

+ α)



where

n

−k,ij

P

n
l=1,l

6=k

z

l

I(S

li

= j)

is the number of letter

j

at

position

i

among all sequences with label

1

excluding the

k

th

sequence.

n

−k,i

≡ (n

−k,i1

, · · · , n

−k,i4

)

.

∆(l) = (b

1

, · · · , b

4

)

with

b

j

= 1

for

j = S

ki

and otherwise 0.

Motif representation using position weight matrix – p.28/31

background image

Posterior distribution

Posterior distribution of

z

k

conditioned on

z

−k

:

P (z

k

= 1|z

−k

, S) ∼ q

w

Y

i=1

n

−k,iS

ki

+ α

S

ki

− 1

P

j

[n

−k,ij

+ α

j

− 1]

= q

w

Y

i=1

θ

iS

ki

Note that

θ

iS

ki

is same as the MAP estimate of the frequency weight

matrix using all sequences with label 1 excluding the

k

th

sequence.

Similarly

P (z

k

= 0|z

−k

, S) ∼ (1−q)

w

Y

i=1

n

−k,0S

ki

+ α

0,S

ki

− 1

P

j

[n

−k,0j

+ α

0,j

− 1]

= (1−q)

w

Y

i=1

θ

0,S

ki

θ

0,S

ki

is same as the MAP estimate of the background distribution.

Motif representation using position weight matrix – p.29/31

background image

Gibbs sampling

Posterior probability

P (z

k

= 1|z

−k

, S) ∼

w

Y

i=1

θ

iS

ki

P (z

k

= 0|z

−k

, S) ∼ (1 − q)

w

Y

i=1

θ

0,S

ki

Gibbs sampling

P (z

k

= 1|z

−k

, S) =

q

Q

w
i=1

θ

iS

ki

q

Q

w
i=1

θ

iS

ki

+ (1 − q)

Q

w
i=1

θ

0,S

ki

Motif representation using position weight matrix – p.30/31

background image

Gibbs sampling

Initialize labels

z

: Assign the value of

z

i

randomly according to

P (z

i

= 1) = q

for all

i = 1, · · · , n

.

Repeat until converge

Repeat from

i = 1

to

n

Update

θ

matrix using the MAP estimate (excluding

i

th

sequence)

Sample the value of

z

i

Motif representation using position weight matrix – p.31/31


Document Outline


Wyszukiwarka

Podobne podstrony:
ANAESTHETIC MIXTURES GC
Existence of the detonation cellular structure in two phase hybrid mixtures
Improved Characterization of Nitromethane, Nitromethane Mixtures, and Shaped Charge Jet
Effect of magnetic field on the performance of new refrigerant mixtures
PROJEKT MIXTURA (2)
projekt mixtura
Influence Of Magnetic Field On Two Phase Flow Convective Boiling Of Some Refrigerant Mixtures
Davies The Salterton Trilogy3 A Mixture of Frailties
PROJEKT MIXTURA (Receptura apteczna Rp. 11 str. 142), TPL
time travel as a motif of science fiction literature VI6ORRO2IIKCW77P7OIGCQCUXBD6KNDGDY43DRI
ANAESTHETIC MIXTURES GC
Role of the Structure of Heterogeneous Condensed Mixtures in the Formation of Agglomerates

więcej podobnych podstron