Difference between revisions of "Edwards lab notes"

From MolEvol
Line 45: Line 45:
 
end;
 
end;
 
</pre>
 
</pre>
Type:
+
 
username
+
 
@class
+
 
-
+
Log on to the cluster
01 ~]$
+
 
best
+
Open BET
Best>
+
    best
execute
+
 
finch
+
    Best> execute finchbest-star-steac.nex
-
+
 
best
 
-
 
star
 
-
 
steac
 
.nex
 
 
When analysis is done you can type:
 
When analysis is done you can type:
Best>
+
 
execute
+
    Best> execute finch-best-star-steac.nex.sumt
Finch_BEST.ne
+
 
x.sumt
+
Examine output files (similar to mrBayes):  
Examine output files (similar to mrBayes): Finch_BEST.nex.run1.p, Finch_BEST.nex.run2.p,
+
Finch_BEST.nex.run1.p, Finch_BEST.nex.run2.p,
 
Finch_BEST.nex.sptree.con, and *.t, *.parts, *.tprobs files
 
Finch_BEST.nex.sptree.con, and *.t, *.parts, *.tprobs files
2
+
 
. MP
+
 
-
+
 
EST
+
==MP-EST maximum (pseudo)likelihood estimation of species trees ==
-
+
 
maximum (pseudo)likelihood estimation of species trees
+
Liu et al.(2010) BMC Evolutionary Biology,10:302
+
 
Liu et al.
 
(2010)
 
BMC Evolut
 
ionary Biology
 
,
 
10
 
:302
 
 
You need:
 
You need:
 
1. rooted gene trees (some missing taxa ok; right now, just one sequence per species; all
 
1. rooted gene trees (some missing taxa ok; right now, just one sequence per species; all
 
gene trees must have outgroup; branch lengths not necessary)
 
gene trees must have outgroup; branch lengths not necessary)
 +
 
2. control file:
 
2. control file:
(“Maluridae_control
+
(“Maluridae_control.txt”)
.txt”)
+
contains information on where the gene trees
contains information on wh
 
ere the gene trees
 
 
are, how the gene tree OTUs map onto species, etc.
 
are, how the gene tree OTUs map onto species, etc.
 
Maluridae_gene.trees
 
Maluridae_gene.trees
 +
<pre>
 
0
 
0
 
3
 
3
Line 111: Line 98:
 
crowned_Fairy_wren,White_winged_Fairy_Wren),Emperor_Fairy_Wren)),(Southern_Emu_wren,(Mallee_emu_
 
crowned_Fairy_wren,White_winged_Fairy_Wren),Emperor_Fairy_Wren)),(Southern_Emu_wren,(Mallee_emu_
 
wren,Red_crowned_Emu_Wren))),(Broad_billed_Fairy_Wren,Orange_crowned_Fairy_wren))),White_throated_Gerygone);
 
wren,Red_crowned_Emu_Wren))),(Broad_billed_Fairy_Wren,Orange_crowned_Fairy_wren))),White_throated_Gerygone);
3
+
</pre>
Execute:
+
 
username
+
Run mp-est
@class
+
  mpest Maluridae_control.txt  
-
 
01 ~]$
 
mpest Maluridae_c
 
ontrol.txt
 
 
Output tree
 
Output tree
 
search
 
search
 
(Maluridae_gene.trees.tre)
 
(Maluridae_gene.trees.tre)
will be in Nexus format; examine last (mp
+
will be in Nexus format; examine last (mp-est) tree
-
 
est) tree
 
 
in file for results. Branch lengths are in coalescent units, unless length > 9 in which case length is
 
in file for results. Branch lengths are in coalescent units, unless length > 9 in which case length is
 
9.
 
9.
3)
+
 
Phybase, an R module for estimating, an
+
==Phybase, an R module for estimating, analyzing and simulating species trees==
alyzing and simulating species trees
 
 
a.
 
a.
 
Making a STAR tree:
 
Making a STAR tree:
Input file: rooted or unrooted gene trees in phylip or nexus format
+
Input file: rooted or unrooted gene trees in phylip or nexus format. But if trees are unrooted, you must first
. But if trees are unrooted, you must first
 
 
root them to make a STAR tree (can be done in R).
 
root them to make a STAR tree (can be done in R).
 +
 
>
 
>
 
setwd("/Scott/MBL/MBL_2014/Lab")
 
setwd("/Scott/MBL/MBL_2014/Lab")

Revision as of 10:01, 5 August 2014

Tutorial here


To start the lab:

log into the cluster


   module load bioware


copy the lab exercise files to your home directory

  cp -r /class/molevol-shared/edwards_lab .
  cd edwards_lab

for each section of the tutorial, cd in the appropriate directory.


BEST

-Liu (2008) Bioinformatics, 24:2542-2543
Liu & Pearl (2007) Syst Bio 56
504-514

a) Input file format – modified mrBayes file BEST block:

partition Genes = 30:
locus097,locus098,locus118,locus119,locus120,locus122,locus130,locus104,locus129,locus143,locus146,locus111,locus13
5,locus148,locus182,locus200,locus209,locusB098,locus184,locus185,locus186,locus187,locus188,locus192,locu
s193,locu
s195,locus198,locus199,locusB200,locus103;
set partition=Genes;
taxset species1=P_acuticauda;
taxset species2=P_hecki;
taxset species3=P_cincta;
taxset species4=T_guttata;
prset thetapr=invgamma(3,0.003) GeneMuPr=uniform(0.5,1.5) best=1;
unlink to
pology=(all) brlens=(all) statefreq=(all) genemu=(all);
mcmc ngen=10000000 samplefreq=100 nrun=2 nchain=1;
quit;
end;


Log on to the cluster

Open BET

   best
   Best> execute finchbest-star-steac.nex

When analysis is done you can type:

   Best> execute  finch-best-star-steac.nex.sumt

Examine output files (similar to mrBayes): Finch_BEST.nex.run1.p, Finch_BEST.nex.run2.p, Finch_BEST.nex.sptree.con, and *.t, *.parts, *.tprobs files


MP-EST maximum (pseudo)likelihood estimation of species trees

Liu et al.(2010) BMC Evolutionary Biology,10:302

You need: 1. rooted gene trees (some missing taxa ok; right now, just one sequence per species; all gene trees must have outgroup; branch lengths not necessary)

2. control file: (“Maluridae_control.txt”) contains information on where the gene trees are, how the gene tree OTUs map onto species, etc. Maluridae_gene.trees

0
3
18 26
Kalkadoon_Grasswren 1 Kalkadoon_Grasswren
Grey_Grasswren 1 Grey_Grasswren
Carpentarian_Grasswren 1 Carpentarian_Grasswren
...
White_shouldered_fairy_wren 1 W
hite_shouldered_fairy_wren
White_winged_Fairy_Wren 1 White_winged_Fairy_Wren
White_throated_Gerygone 1 White_throated_Gerygone
0
(((((((Kalkadoon_Grasswren,Dusky_Grasswren),Black_Grasswren),Eyrean_Grasswren),Thick_billed_Grasswren),(Grey_Grasswren,(
Carpent
arian_Grasswren,Striated_Grasswren)),Short_tailed_Grasswren),((((Lovely_Fairy_wren,Red_winged_Fairy_wren,Blue_breast
ed_Fairy_wren,Variegated_Fairy_Wren),((((Superb_Fairy_wren,Splendid_Fairy_wren),(Red_backed_Fairy_wren,White_shouldered_
fairy_wren)),Purple_
crowned_Fairy_wren,White_winged_Fairy_Wren),Emperor_Fairy_Wren)),(Southern_Emu_wren,(Mallee_emu_
wren,Red_crowned_Emu_Wren))),(Broad_billed_Fairy_Wren,Orange_crowned_Fairy_wren))),White_throated_Gerygone);

Run mp-est

  mpest Maluridae_control.txt 

Output tree search (Maluridae_gene.trees.tre) will be in Nexus format; examine last (mp-est) tree in file for results. Branch lengths are in coalescent units, unless length > 9 in which case length is 9.

Phybase, an R module for estimating, analyzing and simulating species trees

a. Making a STAR tree: Input file: rooted or unrooted gene trees in phylip or nexus format. But if trees are unrooted, you must first root them to make a STAR tree (can be done in R).

> setwd("/Scott/MBL/MBL_2014/Lab")

sets working directory – makes accessing files easier > wrentrees< - read.tree.string(file="Maluridae.trees",format="phylip")

variable genetrees has 3 values: vector of trees; species names; and TRUE or FALSE for rooted or note. > wrengenetrees< - wrentree s$tree

    1. extracts trees from the file and assigns them to variable

“genetreevector” > wrentaxanames< - species.name(wrengenetrees[1])

    1. gets gene tree names from the first gene tree;

make sure this gene tree has all taxa in it. > wrenspnames< - species.name( wrengenetrees[1])

    1. assigns same names to species tree as in first

gene tree Now, link names in gene tree with names in species tree via a matrix called “species.structure” > wrentreematrix< - matrix(0,26,26)

    1. a matrix for 26 species, filled with 0s

> d iag(wrentreematrix)< - 1

  1. 1s on the diagonal indicate a 1

- to - 1 correspondence of gene and species names

    1. now, make a star tree:

> wrenstartree< - star.sptree(wrengenetrees,speciesname=wrenspnames,taxaname=wrentaxanames,species.structure =wrentreematrix,outgro up="White_throated_Gerygone",method="nj") Now write the STAR tree to a nexus of newick file: > write.tree.string(wrenstartree, file="wrenstartree.nex") > write.tree.string(wrenstartree, format="phylip",file="wrenstartree.phy") Representing species trees a s matrices and simulating gene trees will wait for another time. The Phybase manual has useful instructions for these two topics. b. The multilocus bootstrap i. call up R (type ‘R’)

To install phybase in R:

  R
  >install.packages("../molevol-software/phybase_a.tar.gz",repos = NULL, type="source")

Respond Y to the questions it will ask about installing locally.

  >library(phybase)

4 ii. type: library(phybase) Input file for DNA sequence data

same as for BEST (Nexu

s/mrbayes file with BEST block)

  1. read in a sequence file

> wrenfile< - "Maluridae_seqs.nex"

  1. assign DNA sequences in

that file to a variable " wrenfile " > wrendata< - read.dna.seq(wrenfile) > wrensequence< - wrendata$seq

  1. assigns sequences in wrensequence to the

file “wrendata” > wrengenes< - wrendata$gene

  1. assign gen

e partitions to variable " wrengenes " > wrennames< - wrendata$name

  1. get taxa names

– these are the OTUs in the gene trees > write.dna(sequence=wrensequence, file= "wrenseqs.phy", format="phylip", name=w rennames)

  1. can export DNA sequence in nexus of phylip format
  2. bootstrap the data set

> bootstrap.mulgene(sequence=wrensequence, gene=wrensgene, name=wrennam es, boot=100, outfile="wrenboot_ seqs_100.txt") Can look at bootstrapped data set using nano or oth er text editor (nano wrenboot_ seqs_100.txt ) Multilocus bootstrap replicates can be used for many species tree methods, such as STAR, MDC, MP - EST, STEM, and many other species tree methods. References: Castillo - Ramírez, S., L. Liu, D. Pearl, and Edwards m S. V. 2010. Bayesian estimation of species trees: a practical guide to optimal sampling and analysis, Pages 15 - 33 in L. L. Knowles, and L. S. Kubatko, eds. Estimating Species Trees: Practical and Theoretical Aspects. New Jersey, Wiley - Blackwell. Edwards, S. V., L. Liu, and D. K. Pearl. 2007. High - resolution species trees without concatenation. Proceedings of the National Academy of Sciences (USA) 104:5936 - 5941. Edwards, S. V. 2009. Is a new and general theory of molecular systematics emerging? Evolution 6 3:1 - 19. Kubatko, LS. 2009. Identifying Hybridization Events in the Presence of Coalescence via Model Selection, Systematic Biology 58(5): 478 - 488 . Liu, L. (2008). BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics (Oxford, England) , 24 (21), 2542 – 3. doi:10.1093/bioinformatics/btn484 Liu, L., L . Yu, and S. Edwards. 2010. A maximum pseudo - likelihood approach for estimating species trees under the coalescent model. BMC Evolutionary Biology 10:302. Liu, L., L. Yu, D. K. Pearl, and S. V. Edwards. 2009. Estimating species phylogenies using coalescence times among sequences. Syst Biol 58:468 - 477. Liu, L., Yu L., & Pearl D. K. 2009. Maximum tree: a consistent estimator of the species tree. Journal of Mathematical Biology 60(1): 95 - 106 Liu , L., L. Yu, L. Kubatko, D. K. Pearl, and S. V. Edwards. 2009. Coalescent methods for estimating phylogenetic trees. Mol Phylogenet Evol 53:320 - 328. Liu, L., D. K. Pearl, R. T. Brumfield, and S. V. Edwards. 2008. Estimating species trees using multiple - all ele DNA sequence data. Evolution 62:2080 - 2091. Liu, L., and D. K. Pearl. 2007. Species trees f rom gene trees: reconstructing B ayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56:504 - 514.