bioinformatyka Bioinf8

Molecular evolution

Find protein sequences of alpha, beta and zeta hemoglobins from human and mouse (select organism name :” homo | musculus “, description hemoglobin, description “alpha | beta | zeta”, Note : mouse hemoglobins consist of two subunits (major beta1 and minor beta2, choose both)
Save all sequences in a fasta format file (you can select multiple entries by using check-boxes and clicking the save button. Next select “output to : file” and“save with view : Fasta2Seqs”)
Edit the saved file so lines containing sequences descriptions (starting with “>”) consist of up to 5-letter names (e.g use the form : HHA - for Human hemoglobin aplha; MHB1 - for mouse hemoglobin beta1; etc…. )
Make a multiple alignment using a T-Coffee ww server. Click on the link “clustalw_aln” to see the results. Keep the Firefox-T-Coffee window open for now.
Import your multiple alignment to the Jalview editor. ( find jalview applet using e.g google), run the fourth applet, select “input from textbox”, copy-paste your alignment from the Firefox-T-Coffee window (use : control-c, control-v)
Edit the multiple alignment by removing columns containing gaps. Select “output to textbox” and choose FASTA format. Keep the window open.
Use phylip www service to calculate persimony tree, e.g:

Copy-paste your fasta alignment and calculate parsimony tree using bootstraps. (select Bootstrap options, check “Perform a bootstrap”, use 10 replicates, check “compute a consensus”)
See the results. Your 10 bootstrap trees are in “outfile”, the consensus tree is in “outfile.consense”. Does the consensus tree place proteins on correct branches?
Use a different method (Neighbour - Joining, N-J) to calculate consense tree. Use server :

First create protdist matrix for NJ. Copy-paste your fasta alignment ald calculate protein distance matrices (select Bootstrap options, perform a bootstrap before analysis, use 10 replicates)
When the matrices are completed choose the “neighbor” program. Select “bootstrap options” , check “analyse multiple data sets”. Choose 10 data sets and compute a consensus tree.
Look at your results in “outfile.consense” file. Compare the two methods. Are the results identical? Why?

Bioinformatyka 2006/07 8/2007