Edwards lab notes
To start the lab:
log into the cluster
module load bioware
copy the lab exercise files to your home directory
cp -r /class/molevol-shared/edwards_lab . cd edwards_lab
for each section of the tutorial, cd in the appropriate directory.
Liu (2008) Bioinformatics, 24:2542-2543 Liu & Pearl (2007) Syst Bio 56:504-514
Examine the input file format – It's a modified MrBayes file BEST block:
partition Genes = 30: locus097,locus098,locus118,locus119,locus120,locus122,locus130,locus104,locus129,locus143,locus146,locus111,locus13 5,locus148,locus182,locus200,locus209,locusB098,locus184,locus185,locus186,locus187,locus188,locus192,locu s193,locu s195,locus198,locus199,locusB200,locus103; set partition=Genes; taxset species1=P_acuticauda; taxset species2=P_hecki; taxset species3=P_cincta; taxset species4=T_guttata; prset thetapr=invgamma(3,0.003) GeneMuPr=uniform(0.5,1.5) best=1; unlink to pology=(all) brlens=(all) statefreq=(all) genemu=(all); mcmc ngen=10000000 samplefreq=100 nrun=2 nchain=1; quit; end;
Log on to the cluster
Best> execute finch-best-star-steac.nex
When analysis is done you can type:
Best> execute finch-best-star-steac.nex.sumt
Examine output files (similar to mrBayes): Finch_BEST.nex.run1.p, Finch_BEST.nex.run2.p, Finch_BEST.nex.sptree.con, and *.t, *.parts, *.tprobs files
MP-EST maximum (pseudo)likelihood estimation of species trees
Liu et al.(2010) BMC Evolutionary Biology,10:302
You need: 1. rooted gene trees (some missing taxa ok; right now, just one sequence per species; all gene trees must have outgroup; branch lengths not necessary) 2. control file: (“Maluridae_control.txt”)contains information on where the gene treesare, how the gene tree OTUs map onto species, etc.
0 3 18 26 Kalkadoon_Grasswren 1 Kalkadoon_Grasswren Grey_Grasswren 1 Grey_Grasswren Carpentarian_Grasswren 1 Carpentarian_Grasswren ... White_shouldered_fairy_wren 1 W hite_shouldered_fairy_wren White_winged_Fairy_Wren 1 White_winged_Fairy_Wren White_throated_Gerygone 1 White_throated_Gerygone 0 (((((((Kalkadoon_Grasswren,Dusky_Grasswren),Black_Grasswren),Eyrean_Grasswren),Thick_billed_Grasswren),(Grey_Grasswren,( Carpent arian_Grasswren,Striated_Grasswren)),Short_tailed_Grasswren),((((Lovely_Fairy_wren,Red_winged_Fairy_wren,Blue_breast ed_Fairy_wren,Variegated_Fairy_Wren),((((Superb_Fairy_wren,Splendid_Fairy_wren),(Red_backed_Fairy_wren,White_shouldered_ fairy_wren)),Purple_ crowned_Fairy_wren,White_winged_Fairy_Wren),Emperor_Fairy_Wren)),(Southern_Emu_wren,(Mallee_emu_ wren,Red_crowned_Emu_Wren))),(Broad_billed_Fairy_Wren,Orange_crowned_Fairy_wren))),White_throated_Gerygone);
Output tree search (Maluridae_gene.trees.tre) will be in Nexus format; examine last (mp-est) tree in file for results. Branch lengths are in coalescent units, unless length > 9 in which case length is 9.
Phybase, an R module for estimating, analyzing and simulating species trees
a. Making a STAR tree: Input file: rooted or unrooted gene trees in phylip or nexus format. But if trees are unrooted, you must first root them to make a STAR tree (can be done in R).
call up R (type ‘R’)
To install phybase in R:
All commands after the '>' should be entered into R.
>install.packages("../molevol-software/phybase_a.tar.gz",repos = NULL, type="source")
Respond Y to the questions it will ask about installing locally.
sets working directory -makes accessing files easier, but should not be necessary on the cluster, if you opened R from the PHYBASE directory.
variable genetrees has 3 values: vector of trees; species names; and TRUE or FALSE for rooted or note.
extracts trees from the file and assigns them to variable “genetreevector”
gets gene tree names from the first gene tree;make sure this gene tree has all taxa in it.
assigns same names to species tree as in first gene tree Now, link names in gene tree with names in species tree via a matrix called “species.structure”
a matrix for 26 species, filled with 0s
1s on the diagonal indicate a 1-to-1 correspondence of gene and speciesnames now, make a star tree:
Now write the STAR tree to a nexus of newick file:
> write.tree.string(wrenstartree, file="wrenstartree.nex") > write.tree.string(wrenstartree, format="phylip",file="wrenstartree.phy")
Representing species trees as matrices and simulating gene trees will wait for another time. The Phybase manual has useful instructions for these two topics.
The multilocus bootstrap
R > library(phybase)
Input file for DNA sequence data: same as for BEST (Nexus/mrbayes file with BEST block) read in a sequence file
assign DNA sequences in that file to a variable "wrenfile"
> wrendata<-read.dna.seq(wrenfile) > wrensequence<-wrendata$seq
assigns sequences in wrensequence to the file “wrendata”
assign gene partitions to variable "wrengenes"
get taxa names–these are the OTUs in the gene trees
> write.dna(sequence=wrensequence, file= "wrenseqs.phy", format="phylip", name=wrennames)
can export DNA sequence in nexus of phylip format bootstrap the data set
> bootstrap.mulgene(sequence=wrensequence, gene=wrensgene, name=wrennames, boot=100,outfile="wrenboot_seqs_100.txt")
To quit R
Can look at bootstrapped data set using nano or other text editor
Multilocus bootstrap replicates can be used for many species tree methods, such as STAR, MDC, MP-EST,STEM, and many other species tree methods.
Castillo-Ramírez, S., L. Liu, D. Pearl, and Edwards m S. V. 2010. Bayesian estimation of species trees: a practical guide to optimal sampling and analysis, Pages 15-33 in L. L. Knowles, and L. S. Kubatko, eds. Estimating Species Trees: Practical and Theoretical Aspects. New Jersey, Wiley-Blackwell.
Edwards,S. V., L. Liu, and D. K. Pearl. 2007. High-resolution species trees without concatenation. Proceedings of the National Academy of Sciences (USA) 104:5936-5941.
Edwards, S. V. 2009. Is a new and general theory of molecular systematics emerging? Evolution 6 3:1-19.
Kubatko, LS. 2009. Identifying Hybridization Events in the Presence of Coalescence via Model Selection, Systematic Biology 58(5): 478-488.
Liu, L. (2008). BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics (Oxford, England),24(21), 2542–3. doi:10.1093/bioinformatics/btn484
Liu, L., L. Yu, and S. Edwards. 2010. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evolutionary Biology 10:302.
Liu, L., L. Yu, D. K. Pearl, and S. V. Edwards. 2009. Estimating species phylogenies using coalescence times among sequences. Syst Biol 58:468-477.
Liu, L., Yu L., & Pearl D. K. 2009. Maximum tree: a consistent estimator of the species tree. Journal of Mathematical Biology 60(1): 95-106
Liu, L., L. Yu, L. Kubatko, D. K. Pearl, and S. V. Edwards. 2009. Coalescent methods for estimating phylogenetic trees. Mol Phylogenet Evol 53:320-328.
Liu, L., D. K. Pearl, R. T. Brumfield, and S. V. Edwards. 2008. Estimating species trees using multiple-allele DNA sequence data. Evolution 62:2080-2091.
Liu, L., and D. K. Pearl. 2007. Species trees from gene trees: reconstructing B ayesian posterior distributions of aspecies phylogeny using estimated gene tree distributions. Syst Biol 56:504-514.