Casey Dunn

From MolEvol
Revision as of 10:48, 21 July 2017 by Caseywdunn (talk | contribs) (Web pages)
Dunn.jpg

Duration of Stay

For the 2017 course, I'll arrive on Thursday July 20 and leave on Sunday July 23.

Web pages

Lab - http://dunnlab.org

CreatureCast - http://creaturecast.org/

Practical Computing for Biologists - http://practicalcomputing.org/

Code - https://bitbucket.org/caseywdunn, https://github.com/caseywdunn/

Twitter - @caseywdunn

Lecture Materials

2017: lecture slides


Here are links for some of the sites I talk about in the lecture:

Agalma transcriptome analysis tool

Agalma paper

Agalma sample analysis

CreatureCast


Spurious Correlations


Below are some quick-references that I hope will be useful for the course:

Statistics

Computing

Exercise cheat sheet

I may do some exercises in class from the book I wrote with Steve Haddock, Practical Computing for Biologists. To follow along, you will need to have a text editor that supports regular expressions (sometimes called grep). I suggest Sublime.


Text for examples in class:

Replace genus name with first letter and then a .

Agalma elegans

Frillagalma vitiazi

Cordagalma tottoni

Shortia galacifolia

Mus musculus



Remove tick and subsequent letter directions

+40 46'N +014 15'E

+21 17'N -157 52'W


Keep just the numbers, get rid of the letters 5th

3rd

2nd

4th



Exercise 1:

Copy and paste the following fasta file to your text editor:


>CAA58790.1= green fluorescent protein [Aequorea victoria]

MSKGEELFTGVVPILVELDGDVNGQKFSVRGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFLKSAMPEGYVQERTIFYKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKMEYNYNSHNVYIMGDKPKNGIKVNFKIRHNIKDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSQDPHGKRDHMVLLEFVTSAGITHGMDELYK

>AAZ67342.1= GFP-like red fluorescent protein [Corynactis californica]

MSLSKQVLPRDVKMRYHMDGCVNGHQFIIEGEGTGKPYEGKKILELRVTKGGPLPFAFDILSSVFTYGNRCFCEYPEDMPDYFKQSLPEGHSWERTLMFEDGGCGTASAHISLDKNCFVHKSTFHGVNFPANGPVMQKKTLNWEPSSELITAGDGILKGDVTMFLMLEGGHRLKCQFTTSYKAKKAVKMPPNHIIEHRLVRKEVADAVQIQEHAVAKHFIV

>ACX47247.1= green fluorescent protein [Haeckelia beehleri]

MEFEPEFFNKPVPLEMTLRGCVNGKEFMIFGKGEGDASKGNIKGKWILSHSEDGKCPMSWAVLAPTFAYGFKVFAKYPKDFAHFWQDCMPVGYSERRITRFGRLSGNDDIEQEGIMNTYHEVQMRERMVGDEITWIVESRVKLDATINENSPILMNDGLSEYRPNLERTVSFEDGLKNYSQFFYPIKDCETKDYIIANQMTHERPLSKCNKPGRLPPSHFKRTDLEQWKDSKEDKDHIVQEEITAFLLQAQDKDLQSLGIGM

>ABC68474.1= red fluorescent protein [Discosoma sp. RC-2004]

MRSSKNVIKEFMRFKVRMEGTVNGHEFEIEGEGEGRPYEGHNTVKLKVTKGGPLPFAWDILSPQFQYGSKVYVKHPADIPDYKKLSFPEGFKWERVMNFEDGGVVTVTQDPSLQDGCFIYKVKFIGVNFPSDGPVMQKKTMGWEASTERLYPRDGVLKGEIHKALKLKDGGHYLVEFKTIYMAKKPVQLPGYYYVDSKLDITSHNKDYTIVEQYERTEGRHHLFLKAELGSNVGER

>AAQ01183.1= green fluorescent protein 1 [Pontellina plumata]

MPAMKIECRISGTLNGVVFELVGGGEGIPEQGRMTNKMKSTKGALTFSPYLLSHVMGYGFYHFGTYPSGYENPFLHAANNGGYTNTRIEKYEDGGVLHVSFSYRYEAGRVIGDFKVVGTGFPEDSVIFTDKIIRSNATVEHLHPMGDNVLVGSFARTFSLRDGGYYSFVVDSHMHFKSAIHPSILQNGGSMFAFRRVEELHSNTELGIVEYQHAFKTPTAFA


Use your new regular expressions skills to convert the headers from the format: >CAA58790.1= GFP [Aequorea victoria] To: >CAA58790_Aequorea

Exercise 2:

Copy and paste this tree into your text editor:

((raccoon:19.19959,bear:6.80041):0.84600,((sea_lion:11.99700, seal:12.00300):7.52973,((monkey:100.85930,cat:47.14069):20.59201, weasel:18.87953):2.09460):3.87382,dog:25.46154);


Use regular expressions to remove the branch lengths.

Once you've done that, paste the tree into your editor again and truncate the branch lengths so that there are only two numbers after each decimal point.