Difference between revisions of "Edwards lab notes"

From MolEvol
Line 18: Line 18:
 
for each section of the tutorial, cd in the appropriate directory.
 
for each section of the tutorial, cd in the appropriate directory.
  
To run the software:
+
 
  best
+
==BEST==
  mpest
+
-Liu (2008) Bioinformatics, 24:2542-2543
 +
; Liu & Pearl (2007) Syst Bio 56:504-514
 +
 
 +
a) Input file format
 +
 +
modified mrBayes file
 +
BEST block:
 +
<pre>
 +
partition Genes = 30:
 +
locus097,locus098,locus118,locus119,locus120,locus122,locus130,locus104,locus129,locus143,locus146,locus111,locus13
 +
5,locus148,locus182,locus200,locus209,locusB098,locus184,locus185,locus186,locus187,locus188,locus192,locu
 +
s193,locu
 +
s195,locus198,locus199,locusB200,locus103;
 +
set partition=Genes;
 +
taxset species1=P_acuticauda;
 +
taxset species2=P_hecki;
 +
taxset species3=P_cincta;
 +
taxset species4=T_guttata;
 +
prset thetapr=invgamma(3,0.003) GeneMuPr=uniform(0.5,1.5) best=1;
 +
unlink to
 +
pology=(all) brlens=(all) statefreq=(all) genemu=(all);
 +
mcmc ngen=10000000 samplefreq=100 nrun=2 nchain=1;
 +
quit;
 +
end;
 +
</pre>
 +
Type:
 +
username
 +
@class
 +
-
 +
01 ~]$
 +
best
 +
Best>
 +
execute
 +
finch
 +
-
 +
best
 +
-
 +
star
 +
-
 +
steac
 +
.nex
 +
When analysis is done you can type:
 +
Best>
 +
execute
 +
Finch_BEST.ne
 +
x.sumt
 +
Examine output files (similar to mrBayes): Finch_BEST.nex.run1.p, Finch_BEST.nex.run2.p,
 +
Finch_BEST.nex.sptree.con, and *.t, *.parts, *.tprobs files
 +
2
 +
. MP
 +
-
 +
EST
 +
-
 +
maximum (pseudo)likelihood estimation of species trees
 +
 +
Liu et al.
 +
(2010)
 +
BMC Evolut
 +
ionary Biology
 +
,
 +
10
 +
:302
 +
You need:
 +
1. rooted gene trees (some missing taxa ok; right now, just one sequence per species; all
 +
gene trees must have outgroup; branch lengths not necessary)
 +
2. control file:
 +
(“Maluridae_control
 +
.txt”)
 +
contains information on wh
 +
ere the gene trees
 +
are, how the gene tree OTUs map onto species, etc.
 +
Maluridae_gene.trees
 +
0
 +
3
 +
18 26
 +
Kalkadoon_Grasswren 1 Kalkadoon_Grasswren
 +
Grey_Grasswren 1 Grey_Grasswren
 +
Carpentarian_Grasswren 1 Carpentarian_Grasswren
 +
...
 +
White_shouldered_fairy_wren 1 W
 +
hite_shouldered_fairy_wren
 +
White_winged_Fairy_Wren 1 White_winged_Fairy_Wren
 +
White_throated_Gerygone 1 White_throated_Gerygone
 +
0
 +
(((((((Kalkadoon_Grasswren,Dusky_Grasswren),Black_Grasswren),Eyrean_Grasswren),Thick_billed_Grasswren),(Grey_Grasswren,(
 +
Carpent
 +
arian_Grasswren,Striated_Grasswren)),Short_tailed_Grasswren),((((Lovely_Fairy_wren,Red_winged_Fairy_wren,Blue_breast
 +
ed_Fairy_wren,Variegated_Fairy_Wren),((((Superb_Fairy_wren,Splendid_Fairy_wren),(Red_backed_Fairy_wren,White_shouldered_
 +
fairy_wren)),Purple_
 +
crowned_Fairy_wren,White_winged_Fairy_Wren),Emperor_Fairy_Wren)),(Southern_Emu_wren,(Mallee_emu_
 +
wren,Red_crowned_Emu_Wren))),(Broad_billed_Fairy_Wren,Orange_crowned_Fairy_wren))),White_throated_Gerygone);
 +
3
 +
Execute:
 +
username
 +
@class
 +
-
 +
01 ~]$
 +
mpest Maluridae_c
 +
ontrol.txt
 +
Output tree
 +
search
 +
(Maluridae_gene.trees.tre)
 +
will be in Nexus format; examine last (mp
 +
-
 +
est) tree
 +
in file for results. Branch lengths are in coalescent units, unless length > 9 in which case length is
 +
9.
 +
3)
 +
Phybase, an R module for estimating, an
 +
alyzing and simulating species trees
 +
a.
 +
Making a STAR tree:
 +
Input file: rooted or unrooted gene trees in phylip or nexus format
 +
. But if trees are unrooted, you must first
 +
root them to make a STAR tree (can be done in R).
 +
>
 +
setwd("/Scott/MBL/MBL_2014/Lab")
 +
#
 +
sets working directory
 +
 +
makes accessing files easier
 +
>
 +
wrentrees<
 +
-
 +
read.tree.string(file="Maluridae.trees",format="phylip")
 +
#
 +
variable genetrees has 3 values: vector of trees; species names; and TRUE or FALSE for rooted or note.
 +
>
 +
wrengenetrees<
 +
-
 +
wrentree
 +
s$tree
 +
##extracts trees from the file and assigns them to variable
 +
“genetreevector”
 +
>
 +
wrentaxanames<
 +
-
 +
species.name(wrengenetrees[1])
 +
##gets gene tree names from the first gene tree;
 +
make sure this gene tree has all taxa in it.
 +
>
 +
wrenspnames<
 +
-
 +
species.name(
 +
wrengenetrees[1])
 +
##assigns same names to species tree as in first
 +
gene tree
 +
Now, link names in gene tree with names in species tree via a matrix called “species.structure”
 +
>
 +
wrentreematrix<
 +
-
 +
matrix(0,26,26)
 +
##a matrix for 26 species, filled with 0s
 +
>
 +
d
 +
iag(wrentreematrix)<
 +
-
 +
1
 +
#1s on the diagonal indicate a 1
 +
-
 +
to
 +
-
 +
1 correspondence of gene and species
 +
names
 +
##now, make a star tree:
 +
>
 +
wrenstartree<
 +
-
 +
star.sptree(wrengenetrees,speciesname=wrenspnames,taxaname=wrentaxanames,species.structure
 +
=wrentreematrix,outgro
 +
up="White_throated_Gerygone",method="nj")
 +
Now write the STAR tree to a nexus of newick file:
 +
>
 +
write.tree.string(wrenstartree, file="wrenstartree.nex")
 +
>
 +
write.tree.string(wrenstartree, format="phylip",file="wrenstartree.phy")
 +
Representing species trees a
 +
s matrices and simulating gene trees will wait for another time. The Phybase
 +
manual has useful instructions for these two topics.
 +
b.
 +
The multilocus bootstrap
 +
i.
 +
call up R (type ‘R’)
  
 
To install phybase in R:
 
To install phybase in R:
Line 30: Line 212:
  
 
   >library(phybase)
 
   >library(phybase)
 +
 +
4
 +
ii.
 +
type:
 +
library(phybase)
 +
Input file
 +
for DNA sequence data
 +
: same as for BEST (Nexu
 +
s/mrbayes file with BEST block)
 +
#read in a sequence file
 +
>
 +
wrenfile<
 +
-
 +
"Maluridae_seqs.nex"
 +
#assign DNA sequences in
 +
that file to a variable "
 +
wrenfile
 +
"
 +
>
 +
wrendata<
 +
-
 +
read.dna.seq(wrenfile)
 +
>
 +
wrensequence<
 +
-
 +
wrendata$seq
 +
#assigns sequences in wrensequence to the
 +
file “wrendata”
 +
> wrengenes<
 +
-
 +
wrendata$gene
 +
#assign gen
 +
e partitions to variable "
 +
wrengenes
 +
"
 +
> wrennames<
 +
-
 +
wrendata$name
 +
#get taxa names
 +
 +
these are the OTUs in the gene trees
 +
> write.dna(sequence=wrensequence, file= "wrenseqs.phy", format="phylip", name=w
 +
rennames)
 +
#can export DNA sequence in nexus of phylip format
 +
#bootstrap the data set
 +
> bootstrap.mulgene(sequence=wrensequence, gene=wrensgene, name=wrennam
 +
es, boot=100,
 +
outfile="wrenboot_
 +
seqs_100.txt")
 +
Can look at bootstrapped data set using nano or oth
 +
er text editor
 +
(nano
 +
wrenboot_
 +
seqs_100.txt
 +
)
 +
Multilocus bootstrap replicates can be used for many species tree methods, such as STAR, MDC, MP
 +
-
 +
EST,
 +
STEM, and many other species tree methods.
 +
References:
 +
Castillo
 +
-
 +
Ramírez, S., L. Liu, D. Pearl, and Edwards
 +
m S. V. 2010. Bayesian estimation of species trees: a practical
 +
guide to optimal sampling and analysis, Pages 15
 +
-
 +
33 in L. L. Knowles, and L. S. Kubatko, eds. Estimating
 +
Species Trees: Practical and Theoretical Aspects. New Jersey, Wiley
 +
-
 +
Blackwell.
 +
Edwards,
 +
S. V., L. Liu, and D. K. Pearl. 2007. High
 +
-
 +
resolution species trees without concatenation. Proceedings of
 +
the National Academy of Sciences (USA) 104:5936
 +
-
 +
5941.
 +
Edwards, S. V. 2009. Is a new and general theory of molecular systematics emerging? Evolution 6
 +
3:1
 +
-
 +
19.
 +
Kubatko, LS. 2009. Identifying Hybridization Events in the Presence of Coalescence
 +
via Model Selection,
 +
Systematic Biology 58(5): 478
 +
-
 +
488
 +
.
 +
Liu, L. (2008). BEST: Bayesian estimation of species trees under the
 +
coalescent model.
 +
Bioinformatics (Oxford,
 +
England)
 +
,
 +
24
 +
(21), 2542
 +
 +
3. doi:10.1093/bioinformatics/btn484
 +
Liu, L., L
 +
. Yu, and S. Edwards. 2010. A maximum pseudo
 +
-
 +
likelihood approach for estimating species trees under
 +
the coalescent model. BMC Evolutionary Biology 10:302.
 +
Liu, L., L. Yu, D. K. Pearl, and S.
 +
V. Edwards. 2009. Estimating species phylogenies using coalescence times
 +
among sequences. Syst Biol 58:468
 +
-
 +
477.
 +
Liu, L., Yu L., & Pearl D. K. 2009.
 +
Maximum tree: a consistent estimator of the species tree. Journal of
 +
Mathematical Biology 60(1): 95
 +
-
 +
106
 +
Liu
 +
, L., L. Yu, L. Kubatko, D. K. Pearl, and S. V. Edwards. 2009. Coalescent methods for estimating phylogenetic
 +
trees. Mol Phylogenet Evol 53:320
 +
-
 +
328.
 +
Liu, L., D. K. Pearl, R. T. Brumfield, and S. V. Edwards. 2008. Estimating species trees using multiple
 +
-
 +
all
 +
ele DNA
 +
sequence data. Evolution 62:2080
 +
-
 +
2091.
 +
Liu, L., and D. K. Pearl. 2007. Species trees f
 +
rom gene trees: reconstructing B
 +
ayesian posterior distributions of a
 +
species phylogeny using estimated gene tree distributions. Syst Biol 56:504
 +
-
 +
514.

Revision as of 09:57, 5 August 2014

Tutorial here


To start the lab:

log into the cluster


   module load bioware


copy the lab exercise files to your home directory

  cp -r /class/molevol-shared/edwards_lab .
  cd edwards_lab

for each section of the tutorial, cd in the appropriate directory.


BEST

-Liu (2008) Bioinformatics, 24:2542-2543
Liu & Pearl (2007) Syst Bio 56
504-514

a) Input file format – modified mrBayes file BEST block:

partition Genes = 30:
locus097,locus098,locus118,locus119,locus120,locus122,locus130,locus104,locus129,locus143,locus146,locus111,locus13
5,locus148,locus182,locus200,locus209,locusB098,locus184,locus185,locus186,locus187,locus188,locus192,locu
s193,locu
s195,locus198,locus199,locusB200,locus103;
set partition=Genes;
taxset species1=P_acuticauda;
taxset species2=P_hecki;
taxset species3=P_cincta;
taxset species4=T_guttata;
prset thetapr=invgamma(3,0.003) GeneMuPr=uniform(0.5,1.5) best=1;
unlink to
pology=(all) brlens=(all) statefreq=(all) genemu=(all);
mcmc ngen=10000000 samplefreq=100 nrun=2 nchain=1;
quit;
end;

Type: username @class - 01 ~]$ best Best> execute finch - best - star - steac .nex When analysis is done you can type: Best> execute Finch_BEST.ne x.sumt Examine output files (similar to mrBayes): Finch_BEST.nex.run1.p, Finch_BEST.nex.run2.p, Finch_BEST.nex.sptree.con, and *.t, *.parts, *.tprobs files 2 . MP - EST - maximum (pseudo)likelihood estimation of species trees – Liu et al. (2010) BMC Evolut ionary Biology , 10

302

You need: 1. rooted gene trees (some missing taxa ok; right now, just one sequence per species; all gene trees must have outgroup; branch lengths not necessary) 2. control file: (“Maluridae_control .txt”) contains information on wh ere the gene trees are, how the gene tree OTUs map onto species, etc. Maluridae_gene.trees 0 3 18 26 Kalkadoon_Grasswren 1 Kalkadoon_Grasswren Grey_Grasswren 1 Grey_Grasswren Carpentarian_Grasswren 1 Carpentarian_Grasswren ... White_shouldered_fairy_wren 1 W hite_shouldered_fairy_wren White_winged_Fairy_Wren 1 White_winged_Fairy_Wren White_throated_Gerygone 1 White_throated_Gerygone 0 (((((((Kalkadoon_Grasswren,Dusky_Grasswren),Black_Grasswren),Eyrean_Grasswren),Thick_billed_Grasswren),(Grey_Grasswren,( Carpent arian_Grasswren,Striated_Grasswren)),Short_tailed_Grasswren),((((Lovely_Fairy_wren,Red_winged_Fairy_wren,Blue_breast ed_Fairy_wren,Variegated_Fairy_Wren),((((Superb_Fairy_wren,Splendid_Fairy_wren),(Red_backed_Fairy_wren,White_shouldered_ fairy_wren)),Purple_ crowned_Fairy_wren,White_winged_Fairy_Wren),Emperor_Fairy_Wren)),(Southern_Emu_wren,(Mallee_emu_ wren,Red_crowned_Emu_Wren))),(Broad_billed_Fairy_Wren,Orange_crowned_Fairy_wren))),White_throated_Gerygone); 3 Execute: username @class - 01 ~]$ mpest Maluridae_c ontrol.txt Output tree search (Maluridae_gene.trees.tre) will be in Nexus format; examine last (mp - est) tree in file for results. Branch lengths are in coalescent units, unless length > 9 in which case length is 9. 3) Phybase, an R module for estimating, an alyzing and simulating species trees a. Making a STAR tree: Input file: rooted or unrooted gene trees in phylip or nexus format . But if trees are unrooted, you must first root them to make a STAR tree (can be done in R). > setwd("/Scott/MBL/MBL_2014/Lab")

sets working directory – makes accessing files easier > wrentrees< - read.tree.string(file="Maluridae.trees",format="phylip")

variable genetrees has 3 values: vector of trees; species names; and TRUE or FALSE for rooted or note. > wrengenetrees< - wrentree s$tree

    1. extracts trees from the file and assigns them to variable

“genetreevector” > wrentaxanames< - species.name(wrengenetrees[1])

    1. gets gene tree names from the first gene tree;

make sure this gene tree has all taxa in it. > wrenspnames< - species.name( wrengenetrees[1])

    1. assigns same names to species tree as in first

gene tree Now, link names in gene tree with names in species tree via a matrix called “species.structure” > wrentreematrix< - matrix(0,26,26)

    1. a matrix for 26 species, filled with 0s

> d iag(wrentreematrix)< - 1

  1. 1s on the diagonal indicate a 1

- to - 1 correspondence of gene and species names

    1. now, make a star tree:

> wrenstartree< - star.sptree(wrengenetrees,speciesname=wrenspnames,taxaname=wrentaxanames,species.structure =wrentreematrix,outgro up="White_throated_Gerygone",method="nj") Now write the STAR tree to a nexus of newick file: > write.tree.string(wrenstartree, file="wrenstartree.nex") > write.tree.string(wrenstartree, format="phylip",file="wrenstartree.phy") Representing species trees a s matrices and simulating gene trees will wait for another time. The Phybase manual has useful instructions for these two topics. b. The multilocus bootstrap i. call up R (type ‘R’)

To install phybase in R:

  R
  >install.packages("../molevol-software/phybase_a.tar.gz",repos = NULL, type="source")

Respond Y to the questions it will ask about installing locally.

  >library(phybase)

4 ii. type: library(phybase) Input file for DNA sequence data

same as for BEST (Nexu

s/mrbayes file with BEST block)

  1. read in a sequence file

> wrenfile< - "Maluridae_seqs.nex"

  1. assign DNA sequences in

that file to a variable " wrenfile " > wrendata< - read.dna.seq(wrenfile) > wrensequence< - wrendata$seq

  1. assigns sequences in wrensequence to the

file “wrendata” > wrengenes< - wrendata$gene

  1. assign gen

e partitions to variable " wrengenes " > wrennames< - wrendata$name

  1. get taxa names

– these are the OTUs in the gene trees > write.dna(sequence=wrensequence, file= "wrenseqs.phy", format="phylip", name=w rennames)

  1. can export DNA sequence in nexus of phylip format
  2. bootstrap the data set

> bootstrap.mulgene(sequence=wrensequence, gene=wrensgene, name=wrennam es, boot=100, outfile="wrenboot_ seqs_100.txt") Can look at bootstrapped data set using nano or oth er text editor (nano wrenboot_ seqs_100.txt ) Multilocus bootstrap replicates can be used for many species tree methods, such as STAR, MDC, MP - EST, STEM, and many other species tree methods. References: Castillo - Ramírez, S., L. Liu, D. Pearl, and Edwards m S. V. 2010. Bayesian estimation of species trees: a practical guide to optimal sampling and analysis, Pages 15 - 33 in L. L. Knowles, and L. S. Kubatko, eds. Estimating Species Trees: Practical and Theoretical Aspects. New Jersey, Wiley - Blackwell. Edwards, S. V., L. Liu, and D. K. Pearl. 2007. High - resolution species trees without concatenation. Proceedings of the National Academy of Sciences (USA) 104:5936 - 5941. Edwards, S. V. 2009. Is a new and general theory of molecular systematics emerging? Evolution 6 3:1 - 19. Kubatko, LS. 2009. Identifying Hybridization Events in the Presence of Coalescence via Model Selection, Systematic Biology 58(5): 478 - 488 . Liu, L. (2008). BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics (Oxford, England) , 24 (21), 2542 – 3. doi:10.1093/bioinformatics/btn484 Liu, L., L . Yu, and S. Edwards. 2010. A maximum pseudo - likelihood approach for estimating species trees under the coalescent model. BMC Evolutionary Biology 10:302. Liu, L., L. Yu, D. K. Pearl, and S. V. Edwards. 2009. Estimating species phylogenies using coalescence times among sequences. Syst Biol 58:468 - 477. Liu, L., Yu L., & Pearl D. K. 2009. Maximum tree: a consistent estimator of the species tree. Journal of Mathematical Biology 60(1): 95 - 106 Liu , L., L. Yu, L. Kubatko, D. K. Pearl, and S. V. Edwards. 2009. Coalescent methods for estimating phylogenetic trees. Mol Phylogenet Evol 53:320 - 328. Liu, L., D. K. Pearl, R. T. Brumfield, and S. V. Edwards. 2008. Estimating species trees using multiple - all ele DNA sequence data. Evolution 62:2080 - 2091. Liu, L., and D. K. Pearl. 2007. Species trees f rom gene trees: reconstructing B ayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56:504 - 514.