Edwards lab notes

From MolEvol
Revision as of 10:08, 5 August 2014 by Ejmctavish (talk | contribs) (Phybase, an R module for estimating, analyzing and simulating species trees)

Tutorial here

To start the lab:

log into the cluster

   module load bioware

copy the lab exercise files to your home directory

  cp -r /class/molevol-shared/edwards_lab .
  cd edwards_lab

for each section of the tutorial, cd in the appropriate directory.


-Liu (2008) Bioinformatics, 24:2542-2543
Liu & Pearl (2007) Syst Bio 56

a) Input file format – modified mrBayes file BEST block:

partition Genes = 30:
set partition=Genes;
taxset species1=P_acuticauda;
taxset species2=P_hecki;
taxset species3=P_cincta;
taxset species4=T_guttata;
prset thetapr=invgamma(3,0.003) GeneMuPr=uniform(0.5,1.5) best=1;
unlink to
pology=(all) brlens=(all) statefreq=(all) genemu=(all);
mcmc ngen=10000000 samplefreq=100 nrun=2 nchain=1;

Log on to the cluster

Open BET

   Best> execute finchbest-star-steac.nex

When analysis is done you can type:

   Best> execute  finch-best-star-steac.nex.sumt

Examine output files (similar to mrBayes): Finch_BEST.nex.run1.p, Finch_BEST.nex.run2.p, Finch_BEST.nex.sptree.con, and *.t, *.parts, *.tprobs files

MP-EST maximum (pseudo)likelihood estimation of species trees

Liu et al.(2010) BMC Evolutionary Biology,10:302

You need: 1. rooted gene trees (some missing taxa ok; right now, just one sequence per species; all gene trees must have outgroup; branch lengths not necessary)

2. control file: (“Maluridae_control.txt”) contains information on where the gene trees are, how the gene tree OTUs map onto species, etc. Maluridae_gene.trees

18 26
Kalkadoon_Grasswren 1 Kalkadoon_Grasswren
Grey_Grasswren 1 Grey_Grasswren
Carpentarian_Grasswren 1 Carpentarian_Grasswren
White_shouldered_fairy_wren 1 W
White_winged_Fairy_Wren 1 White_winged_Fairy_Wren
White_throated_Gerygone 1 White_throated_Gerygone

Run mp-est

  mpest Maluridae_control.txt 

Output tree search (Maluridae_gene.trees.tre) will be in Nexus format; examine last (mp-est) tree in file for results. Branch lengths are in coalescent units, unless length > 9 in which case length is 9.

Phybase, an R module for estimating, analyzing and simulating species trees

a. Making a STAR tree: Input file: rooted or unrooted gene trees in phylip or nexus format. But if trees are unrooted, you must first root them to make a STAR tree (can be done in R).

  cd Phybase

call up R (type ‘R’)

To install phybase in R:

  >install.packages("../molevol-software/phybase_a.tar.gz",repos = NULL, type="source")

Respond Y to the questions it will ask about installing locally.


   > setwd("/Scott/MBL/MBL_2014/Lab")
  1. sets working directory -makes accessing files easier, but should not be necessary on the cluster, if you opened R from the PHYBASE directory.
   > wrentrees<-read.tree.string(file="Maluridae.trees",format="phylip")
  1. variable genetrees has 3 values: vector of trees; species names; and TRUE or FALSE for rooted or note.
    1. extracts trees from the file and assigns them to variable “genetreevector”
   > wrentaxanames<-species.name(wrengenetrees[1])
    1. gets gene tree names from the first gene tree;make sure this gene tree has all taxa in it.
  > wrenspnames<-species.name(wrengenetrees[1])
    1. assigns same names to species tree as in first gene tree

Now, link names in gene tree with names in species tree via a matrix called “species.structure”

  > wrentreematrix<-matrix(0,26,26)
    1. a matrix for 26 species, filled with 0s

> d iag(wrentreematrix)< - 1

  1. 1s on the diagonal indicate a 1

- to - 1 correspondence of gene and species names

    1. now, make a star tree:

> wrenstartree< - star.sptree(wrengenetrees,speciesname=wrenspnames,taxaname=wrentaxanames,species.structure =wrentreematrix,outgro up="White_throated_Gerygone",method="nj") Now write the STAR tree to a nexus of newick file: > write.tree.string(wrenstartree, file="wrenstartree.nex") > write.tree.string(wrenstartree, format="phylip",file="wrenstartree.phy") Representing species trees a s matrices and simulating gene trees will wait for another time. The Phybase manual has useful instructions for these two topics. b. The multilocus bootstrap i.

4 ii. type: library(phybase) Input file for DNA sequence data

same as for BEST (Nexu

s/mrbayes file with BEST block)

  1. read in a sequence file

> wrenfile< - "Maluridae_seqs.nex"

  1. assign DNA sequences in

that file to a variable " wrenfile " > wrendata< - read.dna.seq(wrenfile) > wrensequence< - wrendata$seq

  1. assigns sequences in wrensequence to the

file “wrendata” > wrengenes< - wrendata$gene

  1. assign gen

e partitions to variable " wrengenes " > wrennames< - wrendata$name

  1. get taxa names

– these are the OTUs in the gene trees > write.dna(sequence=wrensequence, file= "wrenseqs.phy", format="phylip", name=w rennames)

  1. can export DNA sequence in nexus of phylip format
  2. bootstrap the data set

> bootstrap.mulgene(sequence=wrensequence, gene=wrensgene, name=wrennam es, boot=100, outfile="wrenboot_ seqs_100.txt") Can look at bootstrapped data set using nano or oth er text editor (nano wrenboot_ seqs_100.txt ) Multilocus bootstrap replicates can be used for many species tree methods, such as STAR, MDC, MP - EST, STEM, and many other species tree methods. References: Castillo - Ramírez, S., L. Liu, D. Pearl, and Edwards m S. V. 2010. Bayesian estimation of species trees: a practical guide to optimal sampling and analysis, Pages 15 - 33 in L. L. Knowles, and L. S. Kubatko, eds. Estimating Species Trees: Practical and Theoretical Aspects. New Jersey, Wiley - Blackwell. Edwards, S. V., L. Liu, and D. K. Pearl. 2007. High - resolution species trees without concatenation. Proceedings of the National Academy of Sciences (USA) 104:5936 - 5941. Edwards, S. V. 2009. Is a new and general theory of molecular systematics emerging? Evolution 6 3:1 - 19. Kubatko, LS. 2009. Identifying Hybridization Events in the Presence of Coalescence via Model Selection, Systematic Biology 58(5): 478 - 488 . Liu, L. (2008). BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics (Oxford, England) , 24 (21), 2542 – 3. doi:10.1093/bioinformatics/btn484 Liu, L., L . Yu, and S. Edwards. 2010. A maximum pseudo - likelihood approach for estimating species trees under the coalescent model. BMC Evolutionary Biology 10:302. Liu, L., L. Yu, D. K. Pearl, and S. V. Edwards. 2009. Estimating species phylogenies using coalescence times among sequences. Syst Biol 58:468 - 477. Liu, L., Yu L., & Pearl D. K. 2009. Maximum tree: a consistent estimator of the species tree. Journal of Mathematical Biology 60(1): 95 - 106 Liu , L., L. Yu, L. Kubatko, D. K. Pearl, and S. V. Edwards. 2009. Coalescent methods for estimating phylogenetic trees. Mol Phylogenet Evol 53:320 - 328. Liu, L., D. K. Pearl, R. T. Brumfield, and S. V. Edwards. 2008. Estimating species trees using multiple - all ele DNA sequence data. Evolution 62:2080 - 2091. Liu, L., and D. K. Pearl. 2007. Species trees f rom gene trees: reconstructing B ayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56:504 - 514.