probabilisticmethodsforbioinformatics 110703043352 phpapp02

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Introduction to Probabilistic Models for Bioinformatics

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

July 3, 2011

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Short introduction to Bioinformatics

I

Bioinformatics is the application of statistics and computer science to the field of
molecular biology.

I

Major research efforts in the field include sequence alignment, gene finding,
genome assembly, drug design, drug discovery, protein structure alignment,
protein structure prediction, prediction of gene expression and protein-protein
interactions, genome-wide association studies and the modeling of evolution.

I

At the current moment, given the enormous volumes of sequenced data, one of
the biggest challenges is not producing, but actually understanding the data.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Short introduction to Bioinformatics

I

Bioinformatics is the application of statistics and computer science to the field of
molecular biology.

I

Major research efforts in the field include sequence alignment, gene finding,
genome assembly, drug design, drug discovery, protein structure alignment,
protein structure prediction, prediction of gene expression and protein-protein
interactions, genome-wide association studies and the modeling of evolution.

I

At the current moment, given the enormous volumes of sequenced data, one of
the biggest challenges is not producing, but actually understanding the data.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Short introduction to Bioinformatics

I

Bioinformatics is the application of statistics and computer science to the field of
molecular biology.

I

Major research efforts in the field include sequence alignment, gene finding,
genome assembly, drug design, drug discovery, protein structure alignment,
protein structure prediction, prediction of gene expression and protein-protein
interactions, genome-wide association studies and the modeling of evolution.

I

At the current moment, given the enormous volumes of sequenced data, one of
the biggest challenges is not producing, but actually understanding the data.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

What are the Probabilistic Models?

I

There are 2 basic definitions:

I

Statistical analysis tool that estimates, on the basis of past (historical) data, the
probability of an event occurring again.

I

Probabilistic model is a system that simulates the object under the consideration
and produces different outcomes with different probabilities.

I

Simple example - rolling a die.

I

A bit more relevant example - random sequence model in DNA .

I

Biological sequences are strings from a finite alphabet of residues, most
commonly either four nucleotides, or twenty amino acids.

I

Imagine that a residue a occurs with probability q

a

, if protein or DNA sequence is

denoted x

1

...x

n

, then probability of the whole sequence is:

q

x

1

q

x

2

...q

x

n

=

n

Y

i =1

q

x

i

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

What are the Probabilistic Models?

I

There are 2 basic definitions:

I

Statistical analysis tool that estimates, on the basis of past (historical) data, the
probability of an event occurring again.

I

Probabilistic model is a system that simulates the object under the consideration
and produces different outcomes with different probabilities.

I

Simple example - rolling a die.

I

A bit more relevant example - random sequence model in DNA .

I

Biological sequences are strings from a finite alphabet of residues, most
commonly either four nucleotides, or twenty amino acids.

I

Imagine that a residue a occurs with probability q

a

, if protein or DNA sequence is

denoted x

1

...x

n

, then probability of the whole sequence is:

q

x

1

q

x

2

...q

x

n

=

n

Y

i =1

q

x

i

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

What are the Probabilistic Models?

I

There are 2 basic definitions:

I

Statistical analysis tool that estimates, on the basis of past (historical) data, the
probability of an event occurring again.

I

Probabilistic model is a system that simulates the object under the consideration
and produces different outcomes with different probabilities.

I

Simple example - rolling a die.

I

A bit more relevant example - random sequence model in DNA .

I

Biological sequences are strings from a finite alphabet of residues, most
commonly either four nucleotides, or twenty amino acids.

I

Imagine that a residue a occurs with probability q

a

, if protein or DNA sequence is

denoted x

1

...x

n

, then probability of the whole sequence is:

q

x

1

q

x

2

...q

x

n

=

n

Y

i =1

q

x

i

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

What are the Probabilistic Models?

I

There are 2 basic definitions:

I

Statistical analysis tool that estimates, on the basis of past (historical) data, the
probability of an event occurring again.

I

Probabilistic model is a system that simulates the object under the consideration
and produces different outcomes with different probabilities.

I

Simple example - rolling a die.

I

A bit more relevant example - random sequence model in DNA .

I

Biological sequences are strings from a finite alphabet of residues, most
commonly either four nucleotides, or twenty amino acids.

I

Imagine that a residue a occurs with probability q

a

, if protein or DNA sequence is

denoted x

1

...x

n

, then probability of the whole sequence is:

q

x

1

q

x

2

...q

x

n

=

n

Y

i =1

q

x

i

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Sequence Alignment

I

Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
to identify regions of similarity that may be a consequence of functional,
structural, or evolutionary relationships between the sequences.

I

A variety of computational algorithms have been applied to the sequence
alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic
methods.

I

Common formats for representing alignments are FASTA and GenBank format

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Sequence Alignment

I

Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
to identify regions of similarity that may be a consequence of functional,
structural, or evolutionary relationships between the sequences.

I

A variety of computational algorithms have been applied to the sequence
alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic
methods.

I

Common formats for representing alignments are FASTA and GenBank format

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Sequence Alignment

I

Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
to identify regions of similarity that may be a consequence of functional,
structural, or evolutionary relationships between the sequences.

I

A variety of computational algorithms have been applied to the sequence
alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic
methods.

I

Common formats for representing alignments are FASTA and GenBank format

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Pairwise Alignment

I

Pairwise sequence alignment methods are used to find the best-matching
piecewise (local) or global alignments of two query sequences.

I

The three primary methods of producing pairwise alignments are dot-matrix
methods, dynamic programming, and word methods.

I

Needleman-Wunsch algorithm (Global Alignment)

I

Smith-Waterman algorithm (Local Alignment)

I

FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
dynamic models)

I

Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine,
etc.)

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Pairwise Alignment

I

Pairwise sequence alignment methods are used to find the best-matching
piecewise (local) or global alignments of two query sequences.

I

The three primary methods of producing pairwise alignments are dot-matrix
methods, dynamic programming, and word methods.

I

Needleman-Wunsch algorithm (Global Alignment)

I

Smith-Waterman algorithm (Local Alignment)

I

FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
dynamic models)

I

Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine,
etc.)

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Pairwise Alignment

I

Pairwise sequence alignment methods are used to find the best-matching
piecewise (local) or global alignments of two query sequences.

I

The three primary methods of producing pairwise alignments are dot-matrix
methods, dynamic programming, and word methods.

I

Needleman-Wunsch algorithm (Global Alignment)

I

Smith-Waterman algorithm (Local Alignment)

I

FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
dynamic models)

I

Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine,
etc.)

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Pairwise Alignment

I

Pairwise sequence alignment methods are used to find the best-matching
piecewise (local) or global alignments of two query sequences.

I

The three primary methods of producing pairwise alignments are dot-matrix
methods, dynamic programming, and word methods.

I

Needleman-Wunsch algorithm (Global Alignment)

I

Smith-Waterman algorithm (Local Alignment)

I

FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
dynamic models)

I

Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine,
etc.)

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Pairwise Alignment

I

Pairwise sequence alignment methods are used to find the best-matching
piecewise (local) or global alignments of two query sequences.

I

The three primary methods of producing pairwise alignments are dot-matrix
methods, dynamic programming, and word methods.

I

Needleman-Wunsch algorithm (Global Alignment)

I

Smith-Waterman algorithm (Local Alignment)

I

FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
dynamic models)

I

Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine,
etc.)

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Pairwise Alignment

I

Pairwise sequence alignment methods are used to find the best-matching
piecewise (local) or global alignments of two query sequences.

I

The three primary methods of producing pairwise alignments are dot-matrix
methods, dynamic programming, and word methods.

I

Needleman-Wunsch algorithm (Global Alignment)

I

Smith-Waterman algorithm (Local Alignment)

I

FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
dynamic models)

I

Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine,
etc.)

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

I

Example - Smith-Waterman: A matrix H is built as follows:

H(i , 0) = 0, 0 ≤ i ≤ m

H(0, j ) = 0, 0 ≤ j ≤ n

if a

i

= b

j

then w (a

i

, b

j

) = w (match)

or if a

i

! = b

j

then w (a

i

, b

j

) = w (mismatch)

H(i , j ) = max

8

>

>

<

>

>

:

0

H(i − 1, j − 1) + w (a

i

, b

j

)

Match/Mismatch

H(i − 1, j ) + w (a

i

, −)

Deletion

H(i , j − 1) + w (−, b

j

)

Insertion

9

>

>

=

>

>

;

, 1 ≤ i ≤ m, 1 ≤ j ≤ n

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

I

Sequence 1 = ACACACTA, Sequence 2 = AGCACACA

I

w(match) = +2

I

w(a,-) = w(-,b) = w(mismatch) = -1

H =

0

B
B
B
B
B
B
B
B
B
B
B
B
B
@

A

C

A

C

A

C

T

A

0

0

0

0

0

0

0

0

0

A

0

2

1

2

1

2

1

0

2

G

0

1

1

1

1

1

1

0

1

C

0

0

3

2

3

2

3

2

1

A

0

2

2

5

4

5

4

3

4

C

0

1

4

4

7

6

7

6

5

A

0

2

3

6

6

9

8

7

8

C

0

1

4

5

8

8

11

10

9

A

0

2

3

6

7

10

10

10

12

1

C
C
C
C
C
C
C
C
C
C
C
C
C
A

I

In the example, the highest value corresponds to the cell in position (8,8). The
walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2), (2,1),
(1,1), and (0,0)

I

Sequence 1 = A-CACACTA, Sequence 2 = AGCACAC-A

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

I

Sequence 1 = ACACACTA, Sequence 2 = AGCACACA

I

w(match) = +2

I

w(a,-) = w(-,b) = w(mismatch) = -1

H =

0

B
B
B
B
B
B
B
B
B
B
B
B
B
@

A

C

A

C

A

C

T

A

0

0

0

0

0

0

0

0

0

A

0

2

1

2

1

2

1

0

2

G

0

1

1

1

1

1

1

0

1

C

0

0

3

2

3

2

3

2

1

A

0

2

2

5

4

5

4

3

4

C

0

1

4

4

7

6

7

6

5

A

0

2

3

6

6

9

8

7

8

C

0

1

4

5

8

8

11

10

9

A

0

2

3

6

7

10

10

10

12

1

C
C
C
C
C
C
C
C
C
C
C
C
C
A

I

In the example, the highest value corresponds to the cell in position (8,8). The
walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2), (2,1),
(1,1), and (0,0)

I

Sequence 1 = A-CACACTA, Sequence 2 = AGCACAC-A

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

I

Sequence 1 = ACACACTA, Sequence 2 = AGCACACA

I

w(match) = +2

I

w(a,-) = w(-,b) = w(mismatch) = -1

H =

0

B
B
B
B
B
B
B
B
B
B
B
B
B
@

A

C

A

C

A

C

T

A

0

0

0

0

0

0

0

0

0

A

0

2

1

2

1

2

1

0

2

G

0

1

1

1

1

1

1

0

1

C

0

0

3

2

3

2

3

2

1

A

0

2

2

5

4

5

4

3

4

C

0

1

4

4

7

6

7

6

5

A

0

2

3

6

6

9

8

7

8

C

0

1

4

5

8

8

11

10

9

A

0

2

3

6

7

10

10

10

12

1

C
C
C
C
C
C
C
C
C
C
C
C
C
A

I

In the example, the highest value corresponds to the cell in position (8,8). The
walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2), (2,1),
(1,1), and (0,0)

I

Sequence 1 = A-CACACTA, Sequence 2 = AGCACAC-A

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Multiple Sequence Alignment Models

I

A multiple sequence alignment (MSA) is a sequence alignment of three or more
biological sequences, commonly protein, DNA, or RNA.

I

We usually want to do multiple alignments to find a homologous sequences that
point to a shared evolutionary origins that can be used for further phylogenetic
analysis.

I

Progressive Alignment Methods - constructing succession of a pairwise alignment.

I

Hidden Markov Models - representation of MSA as DAG, observed states are
individual alignment columns and the hidden states represent the presumed
ancestral sequence.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Multiple Sequence Alignment Models

I

A multiple sequence alignment (MSA) is a sequence alignment of three or more
biological sequences, commonly protein, DNA, or RNA.

I

We usually want to do multiple alignments to find a homologous sequences that
point to a shared evolutionary origins that can be used for further phylogenetic
analysis.

I

Progressive Alignment Methods - constructing succession of a pairwise alignment.

I

Hidden Markov Models - representation of MSA as DAG, observed states are
individual alignment columns and the hidden states represent the presumed
ancestral sequence.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Multiple Sequence Alignment Models

I

A multiple sequence alignment (MSA) is a sequence alignment of three or more
biological sequences, commonly protein, DNA, or RNA.

I

We usually want to do multiple alignments to find a homologous sequences that
point to a shared evolutionary origins that can be used for further phylogenetic
analysis.

I

Progressive Alignment Methods - constructing succession of a pairwise alignment.

I

Hidden Markov Models - representation of MSA as DAG, observed states are
individual alignment columns and the hidden states represent the presumed
ancestral sequence.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

What is Phylogenetics?

I

Phylogenetics is the study of evolutionary relatedness among groups of organisms
(e.g. species, populations), which is discovered through molecular sequencing
data and morphological data matrices.

I

Evolution is regarded as a branching process, whereby populations are altered
over time and may speciate into separate branches, hybridize together, or
terminate by extinction. This may be visualized in a phylogenetic tree.

I

Ernst Haeckel’s recapitulation theory (”ontogeny recapitulates phylogeny”) is a
hypothesis that in developing from embryo to adult, animals go through stages
resembling or representing successive stages in the evolution of their remote
ancestors.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

What is Phylogenetics?

I

Phylogenetics is the study of evolutionary relatedness among groups of organisms
(e.g. species, populations), which is discovered through molecular sequencing
data and morphological data matrices.

I

Evolution is regarded as a branching process, whereby populations are altered
over time and may speciate into separate branches, hybridize together, or
terminate by extinction. This may be visualized in a phylogenetic tree.

I

Ernst Haeckel’s recapitulation theory (”ontogeny recapitulates phylogeny”) is a
hypothesis that in developing from embryo to adult, animals go through stages
resembling or representing successive stages in the evolution of their remote
ancestors.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

What is Phylogenetics?

I

Phylogenetics is the study of evolutionary relatedness among groups of organisms
(e.g. species, populations), which is discovered through molecular sequencing
data and morphological data matrices.

I

Evolution is regarded as a branching process, whereby populations are altered
over time and may speciate into separate branches, hybridize together, or
terminate by extinction. This may be visualized in a phylogenetic tree.

I

Ernst Haeckel’s recapitulation theory (”ontogeny recapitulates phylogeny”) is a
hypothesis that in developing from embryo to adult, animals go through stages
resembling or representing successive stages in the evolution of their remote
ancestors.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Building Phylogenetic Trees

I

Phylogenetic trees among a nontrivial number of input sequences are constructed
using computational phylogenetics methods.

I

Common method is to search for maximum likelihood, often within a Bayesian
Framework, and apply an explicit model of evolution to phylogenetic tree
estimation.

I

Identifying the optimal tree using many of these techniques is NP-hard, so
heuristic search and optimization methods are used in combination with
tree-scoring functions to identify a reasonably good tree that fits the data.

I

They do not necessarily accurately represent the species evolutionary history as
the data on which they are based is noisy; the analysis can be confounded by
horizontal gene transfer, hybridisation between species that were not nearest
neighbors on the tree before hybridisation takes place, convergent evolution, and
conserved sequences.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Building Phylogenetic Trees

I

Phylogenetic trees among a nontrivial number of input sequences are constructed
using computational phylogenetics methods.

I

Common method is to search for maximum likelihood, often within a Bayesian
Framework, and apply an explicit model of evolution to phylogenetic tree
estimation.

I

Identifying the optimal tree using many of these techniques is NP-hard, so
heuristic search and optimization methods are used in combination with
tree-scoring functions to identify a reasonably good tree that fits the data.

I

They do not necessarily accurately represent the species evolutionary history as
the data on which they are based is noisy; the analysis can be confounded by
horizontal gene transfer, hybridisation between species that were not nearest
neighbors on the tree before hybridisation takes place, convergent evolution, and
conserved sequences.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Building Phylogenetic Trees

I

Phylogenetic trees among a nontrivial number of input sequences are constructed
using computational phylogenetics methods.

I

Common method is to search for maximum likelihood, often within a Bayesian
Framework, and apply an explicit model of evolution to phylogenetic tree
estimation.

I

Identifying the optimal tree using many of these techniques is NP-hard, so
heuristic search and optimization methods are used in combination with
tree-scoring functions to identify a reasonably good tree that fits the data.

I

They do not necessarily accurately represent the species evolutionary history as
the data on which they are based is noisy; the analysis can be confounded by
horizontal gene transfer, hybridisation between species that were not nearest
neighbors on the tree before hybridisation takes place, convergent evolution, and
conserved sequences.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Building Phylogenetic Trees

I

Phylogenetic trees among a nontrivial number of input sequences are constructed
using computational phylogenetics methods.

I

Common method is to search for maximum likelihood, often within a Bayesian
Framework, and apply an explicit model of evolution to phylogenetic tree
estimation.

I

Identifying the optimal tree using many of these techniques is NP-hard, so
heuristic search and optimization methods are used in combination with
tree-scoring functions to identify a reasonably good tree that fits the data.

I

They do not necessarily accurately represent the species evolutionary history as
the data on which they are based is noisy; the analysis can be confounded by
horizontal gene transfer, hybridisation between species that were not nearest
neighbors on the tree before hybridisation takes place, convergent evolution, and
conserved sequences.

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Other Models

I

Transformational Grammars (Chomsky Hierarchy)

I

RNA Structure Analysis Models (RNA contains the interactions - rather than
preserving the sequence)

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics

background image

EVEN BRIDGES

G E N O M I C S, LLC

Short introduction to Bioinformatics

What are the Probabilistic Models?

Sequence Alignment

Pairwise Alignment

Multiple Sequence Alignment Models

What is Phylogenetics?

Building Phylogenetic Trees

Other Models

Conctact Us

Contact Us

I

We are Hiring!

Igor Bogicevic (igor.bogicevic@sbgenomics.com)

Introduction to Probabilistic Models for Bioinformatics


Document Outline


Wyszukiwarka

Podobne podstrony:
surfacefinishmetrologyiss1 140102202845 phpapp01
Probabilistyczna ocena niezawodności konstrukcji metodami Monte Carlo z wykorzystaniem SSN
Funkcje probabilistyczne1, Przepięcia i Ochrona Przepięciowa
Probabilistyka6
Probabilistic slope stability analysis by finite elements
Metody probabilistyczne
metody probablistyczne definicje TVQHLC5TC7JG4EOHQ2LLLL4EDLRIVLTY3DTA2II
Probabilistyka 07 2008
Probabilistyka Arkusze I VII
probabilistyczna natura wiata czyli chaos jako nauka fizyka kwantowa magia
M Cieciura, J Zacharski Podstawy probabilistyki z przykładami zastosowań w informatyce (cz 4)
dydaktyka opracowanie 2008 2009 wpuw 091011124634 phpapp02
RP Teoria Sciąga, Budownictwo, II TOB zaoczne PP, I sem, Probabilistyka i prawdopodobieństwo, labora
Metody probabilistyczne4
probabilistyka
Metody probabilistyczne1
Jak rozjebać kolokwium nr 2 z probabilistyki
Metody Probabilistyczne Koło 1
ProbabilistykaEND SprawozdanieA Nieznany

więcej podobnych podstron