Difference between revisions of "Edwards lab notes"

From MolEvol
Line 40: Line 40:
taxset species4=T_guttata;
taxset species4=T_guttata;
prset thetapr=invgamma(3,0.003) GeneMuPr=uniform(0.5,1.5) best=1;
prset thetapr=invgamma(3,0.003) GeneMuPr=uniform(0.5,1.5) best=1;
unlink to
unlink topology=(all) brlens=(all) statefreq=(all) genemu=(all);
pology=(all) brlens=(all) statefreq=(all) genemu=(all);
mcmc ngen=10000000 samplefreq=100 nrun=2 nchain=1;
mcmc ngen=10000000 samplefreq=100 nrun=2 nchain=1;

Revision as of 19:50, 27 July 2015

Alternate pdf of Tutorial here

To start the lab:

log into the cluster

   module load bioware

copy the lab exercise files to your home directory

  wget https://molevol.mbl.edu/images/5/53/Edwards_lab.zip
  unzip Edwards_lab.zip

for each section of the tutorial, cd in the appropriate directory.


Liu (2008) Bioinformatics, 24:2542-2543 Liu & Pearl (2007) Syst Bio 56:504-514

Files are in the BEST directory

   cd BEST

Examine the input file format–It's a modified MrBayes file

   nano finch-best-star-steac.nex

BEST block:

partition Genes = 30:
set partition=Genes;
taxset species1=P_acuticauda;
taxset species2=P_hecki;
taxset species3=P_cincta;
taxset species4=T_guttata;
prset thetapr=invgamma(3,0.003) GeneMuPr=uniform(0.5,1.5) best=1;
unlink topology=(all) brlens=(all) statefreq=(all) genemu=(all);
mcmc ngen=10000000 samplefreq=100 nrun=2 nchain=1;

Log on to the cluster


   Best> execute finch-best-star-steac.nex

When analysis is done you can type:

   Best> execute  finch-best-star-steac.nex.sumt
   Best> quit

Examine output files (similar to mrBayes): Finch_BEST.nex.run1.p, Finch_BEST.nex.run2.p, Finch_BEST.nex.sptree.con, and *.t, *.parts, *.tprobs files

MP-EST maximum (pseudo)likelihood estimation of species trees

Liu et al.(2010) BMC Evolutionary Biology,10:302

You need: 1. rooted gene trees (some missing taxa ok; right now, just one sequence per species; all gene trees must have outgroup; branch lengths not necessary) 2. control file: (“Maluridae_control.txt”)contains information on where the gene treesare, how the gene tree OTUs map onto species, etc.

These files are in the MPEST directory

   cd ../MPEST

Take a look at the rooted gene trees:

   nano Maluridae_gene.trees


18 26
Kalkadoon_Grasswren 1 Kalkadoon_Grasswren
Grey_Grasswren 1 Grey_Grasswren
Carpentarian_Grasswren 1 Carpentarian_Grasswren
White_shouldered_fairy_wren 1 White_shouldered_fairy_wren
White_winged_Fairy_Wren 1 White_winged_Fairy_Wren
White_throated_Gerygone 1 White_throated_Gerygone

Run mp-est

  mpest Maluridae_control.txt 

Output tree search (Maluridae_gene.trees.tre) will be in Nexus format; examine last (mp-est) tree in file for results. Branch lengths are in coalescent units, unless length > 9 in which case length is 9.

Phybase, an R module for estimating, analyzing and simulating species trees

a. Making a STAR tree: Input file: rooted or unrooted gene trees in phylip or nexus format. But if trees are unrooted, you must first root them to make a STAR tree (can be done in R).

  cd ../Phybase
  cd STAR

call up R (type ‘R’)

To install phybase in R:


All commands after the '>' should be entered into R.

  > install.packages("/class/molevol-software/phybase_1.4.tar.gz",repos = NULL, type="source")

Respond Y to the questions it will ask about installing locally.

  > library(phybase)

> setwd("/Scott/MBL/MBL_2014/Lab") sets working directory -makes accessing files easier, but should not be necessary on the cluster, if you opened R from the PHYBASE directory.

   > wrentrees <- read.tree.string(file="Maluridae.trees",format="phylip")

variable genetrees has 3 values: vector of trees; species names; and TRUE or FALSE for rooted or note.

   > wrengenetrees <- wrentrees$tree

extracts trees from the file and assigns them to variable “genetreevector”

   > wrentaxanames <- species.name(wrengenetrees[1])

gets gene tree names from the first gene tree;make sure this gene tree has all taxa in it.

   > wrenspnames <- species.name(wrengenetrees[1])

assigns same names to species tree as in first gene tree Now, link names in gene tree with names in species tree via a matrix called “species.structure”

   > wrentreematrix <- matrix(0,26,26)

a matrix for 26 species, filled with 0s

   > diag(wrentreematrix) <- 1

1s on the diagonal indicate a 1-to-1 correspondence of gene and speciesnames now, make a star tree:

   > wrenstartree <- star.sptree(wrengenetrees,speciesname=wrenspnames,taxaname=wrentaxanames,species.structure=wrentreematrix,outgroup="White_throated_Gerygone",method="nj")

Now write the STAR tree to a nexus of newick file:

   > write.tree.string(wrenstartree, file="wrenstartree.nex")
   > write.tree.string(wrenstartree, format="phylip",file="wrenstartree.phy")

Quit R

   > q()

Representing species trees as matrices and simulating gene trees will wait for another time. The Phybase manual has useful instructions for these two topics.

The multilocus bootstrap

Change directories into the Phybase exercise,

  cd ../multilocus_bootstrap
or set it with the setwd() command once you've started R.
  > library(phybase)

Input file for DNA sequence data: same as for BEST (Nexus/mrbayes file with BEST block) read in a sequence file

   > wrenfile <- "Maluridae_seqs.nex"

assign DNA sequences in that file to a variable "wrenfile"

   > wrendata <- read.dna.seq(wrenfile)
   > wrensequence <- wrendata$seq

assigns sequences in wrensequence to the file “wrendata”

   > wrengenes <- wrendata$gene

assign gene partitions to variable "wrengenes"

   > wrennames <- wrendata$name

get taxa names–these are the OTUs in the gene trees

   > write.dna(sequence=wrensequence, file= "wrenseqs.phy", format="phylip", name=wrennames)

can export DNA sequence in nexus of phylip format bootstrap the data set

   > bootstrap.mulgene(sequence=wrensequence, gene=wrengenes, name=wrennames, boot=100,outfile="wrenboot_seqs_100.txt")

To quit R

   > q()

Can look at bootstrapped data set using nano or other text editor

   nano wrenboot_seqs_100.txt

Multilocus bootstrap replicates can be used for many species tree methods, such as STAR, MDC, MP-EST,STEM, and many other species tree methods.


Castillo-Ramírez, S., L. Liu, D. Pearl, and Edwards m S. V. 2010. Bayesian estimation of species trees: a practical guide to optimal sampling and analysis, Pages 15-33 in L. L. Knowles, and L. S. Kubatko, eds. Estimating Species Trees: Practical and Theoretical Aspects. New Jersey, Wiley-Blackwell.

Edwards,S. V., L. Liu, and D. K. Pearl. 2007. High-resolution species trees without concatenation. Proceedings of the National Academy of Sciences (USA) 104:5936-5941.

Edwards, S. V. 2009. Is a new and general theory of molecular systematics emerging? Evolution 6 3:1-19.

Kubatko, LS. 2009. Identifying Hybridization Events in the Presence of Coalescence via Model Selection, Systematic Biology 58(5): 478-488.

Liu, L. (2008). BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics (Oxford, England),24(21), 2542–3. doi:10.1093/bioinformatics/btn484

Liu, L., L. Yu, and S. Edwards. 2010. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evolutionary Biology 10:302.

Liu, L., L. Yu, D. K. Pearl, and S. V. Edwards. 2009. Estimating species phylogenies using coalescence times among sequences. Syst Biol 58:468-477.

Liu, L., Yu L., & Pearl D. K. 2009. Maximum tree: a consistent estimator of the species tree. Journal of Mathematical Biology 60(1): 95-106

Liu, L., L. Yu, L. Kubatko, D. K. Pearl, and S. V. Edwards. 2009. Coalescent methods for estimating phylogenetic trees. Mol Phylogenet Evol 53:320-328.

Liu, L., D. K. Pearl, R. T. Brumfield, and S. V. Edwards. 2008. Estimating species trees using multiple-allele DNA sequence data. Evolution 62:2080-2091.

Liu, L., and D. K. Pearl. 2007. Species trees from gene trees: reconstructing B ayesian posterior distributions of aspecies phylogeny using estimated gene tree distributions. Syst Biol 56:504-514.