Motif representation using position

weight matrix

Xiaohui Xie

University of California, Irvine

Motif representation using position weight matrix – p.1/31

Position weight matrix

Position weight matrix representation of a motif with width










· · ·










(1)

where each row represents one position of the motif, and

is normalized:

= 1

(2)

for all

= 1, 2, · · · , w

Motif representation using position weight matrix – p.2/31

Likelihood

Given the position weight matrix

, the probability of

generating a sequence

= (S

, S

· · · , S

)

from

(S|θ) =

|θ

)

(3)

i,S

(4)

For convenience, we have converted

from a string of

{A, C, G, T }

to a string of

{1, 2, 3, 4}

Motif representation using position weight matrix – p.3/31

Likelihood

Suppose we observe not just one, but a set of sequences

, S

· · · , S

, each of which contains exactly

letters.

Assume each of them is generated independently from

the model

. Then, the likelihood of observing these

sequences is

, S

· · · , S

|θ) =

|θ)

(5)

i,S

(6)

where

is the number of letter

at position

(Note that

4
j

= n

for all

Motif representation using position weight matrix – p.4/31

Parameter estimation

Now suppose we do not know

. How to estimate it from

the observed sequence data

, S

· · · , S

One solution: calculate the likelihood of observing the

provided

sequences for different values of

(θ) = P (S

, S

· · · , S

|θ) =

i,S

(7)

Pick the one with the largest likelihood, that is, to find

∗

that

max

, S

· · · , S

|θ)

(8)

Motif representation using position weight matrix – p.5/31

Maximum likelihood estimation

Maximum likelihood estimation of

M L

= arg max

log L(θ)) =

log θ

s.t.

= 1,

∀i = 1, · · · , w

(9)

Motif representation using position weight matrix – p.6/31

Optimization with equality constraints

Construct a Lagrangian function taking the equality

constraint into account:

(θ) = log L(θ) +

(1 −

)

(10)

Solve the unconstrained optimization problem

= arg max

(θ)) =

log θ

(1 −

)

(11)

Motif representation using position weight matrix – p.7/31

Optimization with equality constraints

Take the derivative of

(θ)

w.r.t.

and the Lagrange

multiplier

and set them to 0

∂g

(θ)

= 0

(12)

∂g

(θ)

= 0

(13)

which leads to:

(14)

which is simply the frequency of different letters at each

position. (

is the number of letter

at position

Motif representation using position weight matrix – p.8/31

Bayes’ Theorem

(θ|S) =

(S|θ)P (θ)

(S)

(15)

Each term in Bayes’ theorem has a conventional name:

(S|θ)

– the conditional probability of

given

, also

called the likelihood.

(θ)

– the prior probability or marginal probability of

(θ|S)

– the conditional probability of

given

, also

called the posterior probability of

(S)

– the marginal probability of

, and acts as a

normalizing constant.

Motif representation using position weight matrix – p.9/31

Maximum a posteriori (MAP) estmation

MAP (or posterior mode) estimation of

MAP

(S) = arg max

(θ|S

, S

· · · , S

)

(16)

= arg max

log L(θ) + log P (θ)

(17)

Assume

(θ) =

w
i

(θ

)

(independence of

diffferent position

Model

(θ

)

with a Dirichlet distribution

(θ

, θ

) ∼ Dir(α

, α

(18)

Motif representation using position weight matrix – p.10/31

Dirichlet Distribution

Probability density function of Dirichlet distribution Dir(

)

of order

≥ 2

· · · , x

; α

· · · , α

) =

(α)

−

(19)

for all

· · · , x

and

K
i

= 1

. The density is zero

outside this open

(K − 1)

-dimensional simplex.

= (α

· · · , α

)

are parameters with

for all

(α)

, the normalizing constant, is the multinomial beta

function:

(α) =

K
i

Γ(α

)

Γ(

K
i

)

(20)

Motif representation using position weight matrix – p.11/31

Gamma function

Gamma function for positive real

Γ(z) =

∞

z−

−

(21)

Γ(z + 1) = zΓ(z)

(22)

is a positive integer, then

Γ(n + 1) = n!

(23)

Motif representation using position weight matrix – p.12/31

Properties of Dirichlet distribution

Dirichlet distribution

· · · , x

; α) =

(α)

−

(24)

Expectation, define

K
i

] =

(25)

Variance

V ar

] =

(α

− α

)

(α

+ 1)

(26)

Co-variance

Cov

] =

−α

(α

+ 1)

(27)

Motif representation using position weight matrix – p.13/31

Posterior Distribution

Conditional probability:

, S

· · · , S

|θ) =

w
i

4
j

Prior probability:

(θ

· · · , θ

; α) =

(α)

4
i

−

Posterior probability:

(θ

· · · , S

) = Dir(c

+ α

, c

+ α

, c

+ α

, c

+ α

)

Maxmium a posteriori estimate:

MAP

+ α

− 1

+ α

− 4

(28)

where

≡

Motif representation using position weight matrix – p.14/31

Mixture of sequences

Suppose we have a more difficult situation: Among the

set of

given sequences,

, S

· · · , S

, only a subset of

them are generated by a weight matrix model

. How to

identify

in this case?

Let us first define the "non-motif" (also called background)

sequence. Suppose they are generated from a single

distribution

= (p

0
A

, p

0
C

, p

0
G

, p

0
T

) = (p

, p

)

(29)

Motif representation using position weight matrix – p.15/31

Likelihood for mixture of sequences

Now the problem is we do not know which sequence is

generated from the motif (

)

and which one is generated

from the background model (

)

Suppose we are provided with such label information:







is generated by

(30)

for all

= 1, 2, · · · , n

Then, the likelihood of observing the

sequences

, S

· · · , S

|z, θ, θ

) =

|θ) + (1 − z

)P (S

|θ

)]

Motif representation using position weight matrix – p.16/31

Maximum Likelihood

Find the joint probability of sequences and the labels

(S, z|θ, θ

) = P (S|z, θ, θ

)P (z)

)[z

|θ) + (1 − z

)P (S

|θ

)]

where

≡ (z

· · · , z

)

and

(z) =

)

Marginalize over labels to derive the likelihood

(θ) = P (S|θ, θ

) =

[P (z

= 1)P (S

|θ)+P (z

= 0)P (S

|θ

)]

Maximum likelihood estimate:

M L

= arg max

log L(θ))

Motif representation using position weight matrix – p.17/31

Maximum Likelihood

Find the joint probability of sequences and the labels

(S, z|θ, θ

) = P (S|z, θ, θ

)P (z)

)[z

|θ) + (1 − z

)P (S

|θ

)]

where

≡ (z

· · · , z

)

and

(z) =

)

Marginalize over labels to derive the likelihood

(θ) = P (S|θ, θ

) =

[P (z

= 1)P (S

|θ)+P (z

= 0)P (S

|θ

)]

Maximum likelihood estimate:

M L

= arg max

log L(θ))

Motif representation using position weight matrix – p.18/31

Lower bound on the L

(θ)

Log likelihood function

log L(θ) =

log [P (z

= 1)P (S

= 1) + P (z

= 0)P (S

= 0)]

where

P (S

= 1) = P (S

|θ)

and

P (S

= 0) = P (S

|θ

)

Jensen’s inequality:

log(q

x + q

y) ≥ q

log(x) + q

log(y)

for all

, q

≥ 0

and

+ q

= 1

Motif representation using position weight matrix – p.19/31

EM-algorithm

Lower bound on

log L(θ)

log L(θ) ≥

log

P (z

= 1)P (S

= 1)

(1 − q

) log

P (z

= 0)P (S

= 0)

1 − q

} ≡

φ(q

, θ)

Expectation-Maximization: Alternate between two steps:

E-step

= arg max

φ(q

, ˆ

θ)

M-step

θ = arg max

φ(ˆ

, θ)

Motif representation using position weight matrix – p.20/31

E-Step

Auxiliary function

φ(q

, θ) = q

log

P (z

= 1)P (S

= 1)

+(1−q

) log

P (z

= 0)P (S

= 0)

1 − q

E-step

= arg max

φ(q

, ˆ

θ)

which leads to

P (z

= 1)P (S

= 1)

P (z

= 1)P (S

= 1) + P (z

= 0)P (S

= 0)

= P (z

)

Motif representation using position weight matrix – p.21/31

M-Step

Auxiliary function

φ(q

, θ) = q

log

P (z

= 1)P (S

= 1)

+(1−q

) log

P (z

= 0)P (S

= 0)

1 − q

M-step

arg max

i=1

φ(ˆ

, θ)

arg max

i=1

[log P (S

|θ) + (1 − ˆ

) log P (S

|θ

)]

which leads to

N
k=1

I(S

= j)

N
k=1

where

I(a)

is an indicator function:

I(a) = 1

is true and 0

o.w

Motif representation using position weight matrix – p.22/31

Summary of EM-algorithm

Initialize parameters

Repeat until convergence

E-step: estimate the expected values of labels, given the current

parameter estimate

= P (z

)

M-step: re-estimate the parameters, given the expected

estimates of the labels

N
k=1

I(S

= j)

N
k=1

The procedure is guaranteed to converge to a local maximum or saddle

point solution.

Motif representation using position weight matrix – p.23/31

What about MAP estimate?

Consider a Dirichlet prior distribution on

= Dir(α) ∀i = 1, · · · , w

Initialize parameters

Repeat until convergence

E-step: estimate the expected values of labels, given the current

parameter estimate

= P (z

)

M-step: re-estimate the parameters, given the expected

estimates of the labels

N
k=1

I(S

= j) + α

− 1

N
k=1

+ α

− 4

Motif representation using position weight matrix – p.24/31

Different methods for parameter estimation

So far, we have introduced two methods: ML and MAP

Maximum likelihood (ML)

= arg max

P (S|θ)

Maximum a posterior (MAP)

MAP

= arg max

P (θ|S)

Bayes Estimator

θ = arg max

h ˆ

θ(S) − θ)

that is the one minimizing MSE( mean square error).

θ = E[θ|S] =

θP (θ|S)dθ

Motif representation using position weight matrix – p.25/31

Joint distribution

Joint distribution of labels and sequences

P (S, z|θ, θ

)

= P (S|z, θ, θ

)P (z)

i=1

P (z

)[z

P (S

|θ) + (1 − z

)P (S

|θ

)]

Joint distribution of

and

P (S, z, θ, θ

) =

i=1

P (z

)[z

P (S

|θ) + (1 − z

)P (S

|θ

)]P (θ)P (θ

)

Find the joint distribution of

and

P (S, z) =

i=1

P (z

)[z

P (S

|θ) + (1 − z

)P (S

|θ

)]P (θ)P (θ

)dθ

Motif representation using position weight matrix – p.26/31

Posterior distribution of labels

Posterior distribution of

P (z|S) =

i=1

P (z

)[z

P (S

|θ) + (1 − z

)P (S

|θ

)]P (θ)dθ/P (S)

∼ q

(1 − q)

−m

i=1





j=1

P (θ

)dθ





B(n

+ α

)

B(α

)

∼ q

(1 − q)

−m

i=1

B(n

+ α)

B(α)

B(n

+ α

)

B(α

)

where

m =

n
i=1

P (z

= 1) = q

P (θ

) = Dir(α)

P (θ

) = Dir(α

)

are Dirichlet priors, and

≡

n
k=1

I(S

= j)

is the number of letter

at position

among

the sequences with label

≡ (n

, · · · , n

)

0,j

≡

n
k=1

(1 − z

)

w
i=1

I(S

= j)

and

≡ (n

0,1

, · · · , n

0,4

)

Motif representation using position weight matrix – p.27/31

Sampling

Posterior distribution of

P (z|S) ∼ q

(1 − q)

−m

i=1

B(n

+ α)

B(α)

B(n

+ α

)

B(α

)

Posterior distribution of

conditioned on all other labels

−k

≡ {z

|i = 1, · · · , n, i 6= k}

P (z

= 1|z

−k

, S) ∼ q

i=1

B(n

−k,i

+ ∆(S

) + α)

B(n

−k,i

+ α)

where

−k,ij

≡

n
l=1,l

6=k

I(S

= j)

is the number of letter

position

among all sequences with label

excluding the

sequence.

−k,i

≡ (n

−k,i1

, · · · , n

−k,i4

)

∆(l) = (b

, · · · , b

)

with

= 1

for

j = S

and otherwise 0.

Motif representation using position weight matrix – p.28/31

Posterior distribution

Posterior distribution of

conditioned on

−k

P (z

= 1|z

−k

, S) ∼ q

i=1

−k,iS

+ α

− 1

−k,ij

+ α

− 1]

= q

i=1

Note that

is same as the MAP estimate of the frequency weight

matrix using all sequences with label 1 excluding the

sequence.

Similarly

P (z

= 0|z

−k

, S) ∼ (1−q)

i=1

−k,0S

+ α

0,S

− 1

−k,0j

+ α

0,j

− 1]

= (1−q)

i=1

0,S

is same as the MAP estimate of the background distribution.

Motif representation using position weight matrix – p.29/31

Gibbs sampling

Posterior probability

P (z

= 1|z

−k

, S) ∼

i=1

P (z

= 0|z

−k

, S) ∼ (1 − q)

i=1

0,S

Gibbs sampling

P (z

= 1|z

−k

, S) =

w
i=1

+ (1 − q)

w
i=1

0,S

Motif representation using position weight matrix – p.30/31

Gibbs sampling

Initialize labels

: Assign the value of

randomly according to

P (z

= 1) = q

for all

i = 1, · · · , n

Repeat until converge

Repeat from

i = 1

Update

matrix using the MAP estimate (excluding

sequence)

Sample the value of

Motif representation using position weight matrix – p.31/31

Document Outline

Position weight matrix
Likelihood
Likelihood
Parameter estimation
Maximum likelihood estimation
Optimization with equality constraints
Optimization with equality constraints
Bayes' Theorem
Maximum a posteriori (MAP) estmation
Dirichlet Distribution
Gamma function
Properties of Dirichlet distribution
Posterior Distribution
Mixture of sequences
Likelihood for mixture of sequences
Maximum Likelihood
Maximum Likelihood
Lower bound on the $L( heta )$
EM-algorithm
E-Step
M-Step
Summary of EM-algorithm
What about MAP estimate?
Different methods for parameter estimation
Joint distribution
Posterior distribution of labels
Sampling
Posterior distribution
Gibbs sampling
Gibbs sampling

Wyszukiwarka

Podobne podstrony:
ANAESTHETIC MIXTURES GC
Existence of the detonation cellular structure in two phase hybrid mixtures
Improved Characterization of Nitromethane, Nitromethane Mixtures, and Shaped Charge Jet
Effect of magnetic field on the performance of new refrigerant mixtures
PROJEKT MIXTURA (2)
projekt mixtura
Influence Of Magnetic Field On Two Phase Flow Convective Boiling Of Some Refrigerant Mixtures
Davies The Salterton Trilogy3 A Mixture of Frailties
PROJEKT MIXTURA (Receptura apteczna Rp. 11 str. 142), TPL
time travel as a motif of science fiction literature VI6ORRO2IIKCW77P7OIGCQCUXBD6KNDGDY43DRI
ANAESTHETIC MIXTURES GC
Role of the Structure of Heterogeneous Condensed Mixtures in the Formation of Agglomerates

więcej podobnych podstron

motif mixture

Document Outline