Difference between revisions of "Edwards lab notes"

From MolEvol
(MP-EST maximum (pseudo)likelihood estimation of species trees)
Line 11: Line 11:
   wget https://molevol.mbl.edu/images/1/1f/Edwards_lab_2015.zip
   wget https://molevol.mbl.edu/images/1/1f/Edwards_lab_2015.zip
   unzip Edwards_lab.zip
   unzip Edwards_lab_2015.zip
for each section of the tutorial, cd in the appropriate directory.
for each section of the tutorial, cd in the appropriate directory.
'''Liu''' (2008) Bioinformatics, 24:2542-2543
'''Liu & Pearl''' (2007) Syst Bio 56:504-514
Files are in the BEST directory
    cd BEST
Examine the input file format–It's a modified MrBayes file
    nano finch-best-star-steac.nex
BEST block:
partition Genes = 30:
set partition=Genes;
taxset species1=P_acuticauda;
taxset species2=P_hecki;
taxset species3=P_cincta;
taxset species4=T_guttata;
prset thetapr=invgamma(3,0.003) GeneMuPr=uniform(0.5,1.5) best=1;
unlink topology=(all) brlens=(all) statefreq=(all) genemu=(all);
mcmc ngen=10000000 samplefreq=100 nrun=2 nchain=1;
Log on to the cluster
    Best> execute finch-best-star-steac.nex
When analysis is done you can type:
    Best> execute  finch-best-star-steac.nex.sumt
    Best> quit
Examine output files (similar to mrBayes):
Finch_BEST.nex.run1.p, Finch_BEST.nex.run2.p,
Finch_BEST.nex.sptree.con, and *.t, *.parts, *.tprobs files
==MP-EST maximum (pseudo)likelihood estimation of species trees ==
Liu et al.(2010) BMC Evolutionary Biology,10:302
You need:
1. rooted gene trees (some missing taxa ok; right now, just one sequence per species; all gene trees must have outgroup; branch lengths not necessary)
2. control file: (“Maluridae_control.txt”)contains information on where the gene treesare, how the gene tree OTUs map onto species, etc.
These files are in the MPEST directory
    cd ../MP-EST
Take a look at the rooted gene trees:
    nano Maluridae_gene.trees
18 26
Kalkadoon_Grasswren 1 Kalkadoon_Grasswren
Grey_Grasswren 1 Grey_Grasswren
Carpentarian_Grasswren 1 Carpentarian_Grasswren
White_shouldered_fairy_wren 1 White_shouldered_fairy_wren
White_winged_Fairy_Wren 1 White_winged_Fairy_Wren
White_throated_Gerygone 1 White_throated_Gerygone
Run mp-est
  mpest Maluridae_control.txt
Output tree
will be in Nexus format; examine last (mp-est) tree
in file for results. Branch lengths are in coalescent units, unless length > 9 in which case length is
==Phybase, an R module for estimating, analyzing and simulating species trees==
Making a STAR tree:
Input file: rooted or unrooted gene trees in phylip or nexus format. But if trees are unrooted, you must first
root them to make a STAR tree (can be done in R).
  cd ../Phybase
  cd STAR
call up R (type ‘R’)
To install phybase in R:
To install phybase in R:
Line 125: Line 20:
All commands after the '>' should be entered into R.
All commands after the '>' should be entered into R.
   > install.packages("/class/molevol-software/phybase_1.4.tar.gz",repos = NULL, type="source")
   > install.packages("/class/molevol-software/phybase_1.4.tar.gz",repos = NULL, type="source")
Respond Y to the questions it will ask about installing locally.
  > library(Phybase)
> setwd("/class/scott/Phybase/STAR")
sets working directory -makes accessing files easier, but should not be necessary on the cluster, if you opened R from the PHYBASE directory.
    > wrentrees <- read.tree.string(file="Maluridae.trees",format="phylip")
variable genetrees has 3 values: vector of trees; species names; and TRUE or FALSE for rooted or note.
    > wrengenetrees <- wrentrees$tree
extracts trees from the file and assigns them to variable “genetreevector”
    > wrentaxanames <- species.name(wrengenetrees[1])
gets gene tree names from the first gene tree;make sure this gene tree has all taxa in it.
    > wrenspnames <- species.name(wrengenetrees[1])
assigns same names to species tree as in first gene tree
Now, link names in gene tree with names in species tree via a matrix called “species.structure”
    > wrentreematrix <- matrix(0,26,26)
a matrix for 26 species, filled with 0s
    > diag(wrentreematrix) <- 1
1s on the diagonal indicate a 1-to-1 correspondence of gene and speciesnames
now, make a star tree:
    > wrenstartree <- star.sptree(wrengenetrees,speciesname=wrenspnames,taxaname=wrentaxanames,species.structure=wrentreematrix,outgroup="White_throated_Gerygone",method="nj")
Now write the STAR tree to a nexus of newick file:
    > write.tree.string(wrenstartree, file="wrenstartree.nex")
    > write.tree.string(wrenstartree, format="phylip",file="wrenstartree.phy")
Quit R
    > q()
Representing species trees as matrices and simulating gene trees will wait for another time. The Phybase manual has useful instructions for these two topics.
=The multilocus bootstrap=
Change directories into the multilocus bootstrap exercise,
  cd ../multilocus_bootstrap
or set it with the setwd() command once you've started R.
  > library(phybase)
Input file
for DNA sequence data: same as for BEST (Nexus/mrbayes file with BEST block)
read in a sequence file
    > wrenfile <- "Maluridae_seqs.nex"
assign DNA sequences in that file to a variable "wrenfile"
    > wrendata <- read.dna.seq(wrenfile)
    > wrensequence <- wrendata$seq
assigns sequences in wrensequence to the file “wrendata”
    > wrengenes <- wrendata$gene
assign gene partitions to variable "wrengenes"
    > wrennames <- wrendata$name
get taxa names–these are the OTUs in the gene trees
    > write.dna(sequence=wrensequence, file= "wrenseqs.phy", format="phylip", name=wrennames)
can export DNA sequence in nexus or phylip format
bootstrap the data set
    > bootstrap.mulgene(sequence=wrensequence, gene=wrengenes, name=wrennames, boot=100,outfile="wrenboot_seqs_100.txt")
To quit R
    > q()
Can look at bootstrapped data set using nano or other text editor
    nano wrenboot_seqs_100.txt
Multilocus bootstrap replicates can be used for many species tree methods, such as STAR, MDC, MP-EST,STEM, and many other species tree methods.
Castillo-Ramírez, S., L. Liu, D. Pearl, and Edwards
m S. V. 2010. Bayesian estimation of species trees: a practical
guide to optimal sampling and analysis, Pages 15-33 in L. L. Knowles, and L. S. Kubatko, eds. Estimating
Species Trees: Practical and Theoretical Aspects. New Jersey, Wiley-Blackwell.
Edwards,S. V., L. Liu, and D. K. Pearl. 2007. High-resolution species trees without concatenation. Proceedings of
the National Academy of Sciences (USA) 104:5936-5941.
Edwards, S. V. 2009. Is a new and general theory of molecular systematics emerging? Evolution 6
Kubatko, LS. 2009. Identifying Hybridization Events in the Presence of Coalescence
via Model Selection, Systematic Biology 58(5): 478-488.
Liu, L. (2008). BEST: Bayesian estimation of species trees under the
coalescent model. Bioinformatics (Oxford, England),24(21), 2542–3. doi:10.1093/bioinformatics/btn484
Liu, L., L. Yu, and S. Edwards. 2010. A maximum pseudo-likelihood approach for estimating species trees under
the coalescent model. BMC Evolutionary Biology 10:302.
Liu, L., L. Yu, D. K. Pearl, and S.
V. Edwards. 2009. Estimating species phylogenies using coalescence times
among sequences. Syst Biol 58:468-477.
Liu, L., Yu L., & Pearl D. K. 2009. Maximum tree: a consistent estimator of the species tree. Journal of
Mathematical Biology 60(1): 95-106
Liu, L., L. Yu, L. Kubatko, D. K. Pearl, and S. V. Edwards. 2009. Coalescent methods for estimating phylogenetic
trees. Mol Phylogenet Evol 53:320-328.
Liu, L., D. K. Pearl, R. T. Brumfield, and S. V. Edwards. 2008. Estimating species trees using multiple-allele DNA sequence data. Evolution 62:2080-2091.
Liu, L., and D. K. Pearl. 2007. Species trees from gene trees: reconstructing B
ayesian posterior distributions of aspecies phylogeny using estimated gene tree distributions. Syst Biol 56:504-514.

Revision as of 13:57, 28 July 2015

Alternate pdf of Tutorial here

To start the lab:

log into the cluster

   module load bioware

copy the lab exercise files to your home directory

  wget https://molevol.mbl.edu/images/1/1f/Edwards_lab_2015.zip
  unzip Edwards_lab_2015.zip

for each section of the tutorial, cd in the appropriate directory.

To install phybase in R:


All commands after the '>' should be entered into R.

  > install.packages("/class/molevol-software/phybase_1.4.tar.gz",repos = NULL, type="source")