Computer lab introduction

From MolEvol
Revision as of 09:17, 23 July 2012 by Ejmctavish (talk | contribs)

"On the other side of the screen, it all looks so easy." (TRON, 1982)

Goals

By the end of this introduction, you should be able to:

  • Log in to the class cluster (either directly using ssh or via cyberduck).
  • Navigate and understand the directory structure.
  • Create and edit files and directories on the cluster.
  • Create and run a basic shell script.

Ask for help if any of these things aren't working by the end!


WiFi Login

To log on to the MBL wireless choose the MBL-REGISTER from the wireless list. Your username is your initials followed by the 5 digit number on the side of your identification card. Your password is the same. E.g. if you name is Joe Bloggs and the your card has the number 12345 on the side then your login details are:

username: jb12345
password: jb12345

Some people have been having trouble, if it isn't working find or email Emily Jane ejmctavish@utexas.edu

Remote computer access software

We will use Secure Shell (SSH) and sFTP to connect to the servers. The servers are powerful computers where we can run the programs much faster than would be possible on your laptop. In order to access these you need to log in to your assigned server

Please download the following programs as needed unless it is installed by default

  • Mac Os X
    • Cyberduck (file transfer via sFTP)
    • SCP and SSH installed by default
  • Linux
    • SCP and SSH installed by default

Plain-text editors

Programs that you might be used to using for manipulating text (e.g., Microsoft Word) do all sorts of complicated things to your files, including inserting lots of weird codes for specific formatting, special software-specific characters, page margin instructions, etc. The software that we are going to use frequently are controlled by text files (that's where their instructions are stored), and they are easily confused--they need a simple file with just their instructions, and none of that other stuff. So we need a plain-text editor to make and manipulate these files.

Note: different operating systems code line-break or end-of-line characters in different ways, and this can cause problems (Unix: LF '\n', Mac: CR '\r', Windows: CR+LF '\r\n').

With GUI

Without GUI

nano, Emacs, Vim

SSH

SSH stands for "Secure Shell." These are programs that provide a Unix shell so that one can enter commands and log onto other computers (i.e., those on the server where we will be doing our analyses).

SSH on Windows

Use PuTTY.

SSH on Linux and Mac Os X

First, open a terminal window:

  • Linux: Konsole (KDE), gnome-terminal (GNOME)
  • Mac: Terminal (in /Applications/Utilities)

In the following command replace username and servername with the user and server name found on the back of your name tag:

ssh username@servername

For example if my username is cmeehan and my server name is class-04 I type:
ssh cmeehan@class-04

It will then ask for my password which is on the back of your card under your username

Changing your password

The first thing to do once you have successfully logged on to a server is to change your password. This is done by typing:

passwd

This will prompt you to enter a new password so do so and press the enter key. Next you re-enter the new password and again press enter. It may then ask you for your LDAP password and you should type in your original password given on the back of your card.
Once you have done this every time you ssh in to the server you will use the new password created.

Directory structure

The file systems used by Linux, Mac OS X, and Windows are organized in a hierarchical, multifurcating tree structure. That might sound confusing, but you're used to working with this organization scheme through the Mac Finder or the Windows Explorer--folders (directories) are stored inside other folders, and they in yet other folders. The path through this directory tree can be used to specify the absolute (starting at the root) or relative (to some other directory) location of any given file.

  • NOTE: Regardless of the operating system on your laptop, when you log on to the cluster, you will be on the class machines, and they're all running linux.

Linux/Mac

The root folder of the directory tree is symbolized by a forward slash "/". Path names through the directory tree are formed by separating directory names with the "/" character. For example in the path /usr/local, local is a subdirectory of usr, which in turn is a subdirectory of the root folder (/). Further, usr is called the parent directory of local (whose parent is /). Users with an account on the computer have a so called home directory for their own files and folders, and where the operating system keeps user-specific settings. The operating system uses other locations in the directory tree to store system-wide files, like some executables, system-wide settings, etc.

Windows

In Windows, each disk partition has its own, separate root (without a common root for all partitions). However, each drive still has a most elemental directory, e.g. denoted C:\ for the partition named C. Absolute paths are therefore based on the partition root. Folder names in paths are separated by the backslash character "\".

Current working directory

The current working directory (or working directory) may be defined as the directory we are working in at the moment. Path names that do not start with / are interpreted as relative to the current working directory.

Command line interface

When you open up your SSH client (a Terminal window on a Mac, or PuTTY for Windows), you'll see a prompt that will look vaguely like: Macintosh-6:user$ There are lots of variations on the theme, but the prompt usually has a little bit of information on where you currently are on the computer (in this case, in a folder called "user"), and then some sort of symbol, and then a space where you can enter commands. Commands are only executed after you press enter. If you are logged onto a class server, the prompt will be something like [username@classServerName ~]$

Basic Syntax

Unix commands follow the general format of:

command -options target. 

Not all commands need options (sometimes called flags, and generally preceded by a single or double hyphen ("-" or "--")) or targets, but others require them.

  • For example:
    • cd homedirectory uses the command "cd" (change directory) and the target "homedirectory" to move from the current directory into the subdirectory called "homedirectory"
    • ls -l homedirectory uses the command "ls" (list), the option "-l" for long-list, and the target "homedirectory" to list the contents of homedirectory in the "long list" format, which provides more thorough descriptions than does the regular "ls".

Notes on syntax for directory structure

  • Two dots (..) indicates the parent directory of the present working directory. So, for example, "cd .. will move you back one directory.
  • One dot (.) indicates the present working directory. So, for example, "cd ." will keep you where you are. There are times where the single dot can be more useful than this...
  • The tilde (~) refers to your home directory. On the class machines your home directory is /class/yourusername. You'll also have a unique home directory on your laptop, etc. The tilde is very helpful if you get lost while using the terminal -- just type "cd ~" and you'll be back in your homedirectory.
  • A forward slash (/) by itself or at the start of a path refers to the root of the filing system -- the folder that contains all other folders.
  • NOTE: do not make changes on the class server in the root folder or any shared folders. All your work is to be done in your home directory or a subdirectory of this.

Some suggestions concerning file and folder names

  • Avoid spaces in script and filenames (use underscores, dots, or hyphens, use "CamelBack" notation). Spaces are used in command line editing to separate options etc so if there is a space in a filename it will mess up the correct running of programs.
  • Do not use "weird" characters (#@!*&^, etc., especially ?, *, \, or /)

Don't Panic

When it all goes south, "control-C" is your friend. It breaks whatever processes are running, and gives you your prompt back. Or, failing that, just close the Terminal and start again.

Intro-to-Unix tutorial

Navigating

Start by entering

pwd

This will print your working directory (the directory you are currently in). You should be in /class/your_login
Next type

ls

This list the contents of your working directory (which is likely empty).
You can also look at the contents of any other directory by supplying the path. For instance,

ls ..

will list the contents of the parent directory that your current directory is in, in this case the class directory.

mkdir is the command to make a directory. Type

mkdir myfolder

to make a new folder called myfolder. Type ls and then enter. It should be listed. We can also use the ls command with flags at this point. Typing

ls -l

will list the contents of the current directory in "long" format which includes information about permissions and file size.

cd is the command to change directories. We can move into the new folder you made by typing

cd myfolder

You can use pwd to confirm you've moved and are now in a new working directory. You can move back to your home directory by typing

cd ..

And can move to the root class directory by typing

cd /

Confirm you are in the class directory with pwd. You can move from here to the myfolder directory in your home directory by typing

cd class/cmeehan/myfolder

Calling a program

First, let's make a file using the command-line editor "nano." If you use a pre-existing file as nano's target, it will open it for editing, and if you use a non-existent file name, nano will create it for you (a new blank file). Let's do the latter:

nano firstprogram.sh

This command both opens nano, and creates a file called "firstprogram.sh". In the nano window, enter the following two lines:

#!/bin/bash
echo "hello mbl"

This file is called a shell script. It is a file that contains lists of unix commands like echo, ls, etc.
To move around your nano window, use the arrow keys, not your mouse. Control+x will exit, and nano will ask you if you want to save changes. Say yes.

Shell scripts almost always have the suffix ".sh". They are run by typing "sh" before the filename.
We can run this program by typing

sh firstprogram.sh

This should print 'hello mbl' to the screen.

Copying, renaming, and moving files

The copy command (cp) is used to copy files to new places. The command basic syntax is cp source_file destination_file First create a file called 'tmp1.txt' in nano and put whatever you want inside of it.
We will now make a copy of tmp1.txt called tmp2.txt by typing:

cp tmp1.txt tmp2.txt

We can also cp a file from the shared directory using absolute and relative paths.

cp /class/shared/testfile.txt .

will copy a file named "testfile.txt" to your current directory but will not change its name. Use ls to verify.
The move command (mv) can be used to move or rename files. The command syntax is mv source destination

mv testfile.txt example.txt

has the effect of renaming testfile.txt to example.txt. Use ls to check. We can move this file up one directory by using mv as well.

mv example.txt ..

will move the example.txt file to the parent directory of your current directory. Use ls .. to check.

  • NOTE: Do not move files that are not in your home directory or a subdirectory of this. All files in shared or root folders are to be copied, never moved.


Loading pre-installed programs

Many of the programs that you will need for the course are already installed on the class servers. These need to be specifically loaded in for them to work.
For example on the command line type

blastn

You should get an error message saying: -bash: blastn: command not found
Now type

module load bioware

This should load all the preinstalled programs so that you can access them. Now type blastn again and you should see:
BLAST query/options error: Either a BLAST database or subject sequence(s) must be specified
Showing that blastn is now available for use.
If needed, programs can all be unloaded by typing module unload bioware

Downloading files onto the class servers

There are two places you may need to get files on to the class server from: your own computer or an online source
In order to get files from your computer to the server open a terminal window and navigate to the folder on your computer using the commands like cd. Once in the folder containing the file you want to upload you type

scp filename username@classServername:./

This will upload the file to your home directory on the cluster

In order to get a file from an online source you can use wget. Type:

wget URL

Where the url is the website address of the file you wish to download. For example

wget https://molevol.mbl.edu/wiki/index.php/Main_Page

Will download the html file that makes the main page of the molecular evolution website to the directory you are in.

Useful links

The following table contains a list of commands that will allow us to navigate through the directory structure. The entries are linked to their Wikipedia pages, which contain very useful examples.

Some basic commands
Linux/Mac MS-DOS Description Syntax (Linux/Mac)
pwd chdir print working directory pwd
ls dir list directory contents ls
history doskey /history display command history history
cd cd change directory cd directory_name
mkdir mkdir make directory mkdir directory_name
cp copy copy files cp original_filename copied_filename
mv move move files (the same as rename files) mv original_filename moved_filename
rm del remove file(s) rm filename
clear cls clear the screen clear
exit exit quit command line exit

Command line editing

The following features of most command line interpreters often come in handy:

  • Up and down arrow keys: cycle through previously issued commands
  • Tab completion
  • 'CTRL+a' moves cursor to beginning of the line
  • 'CTRL+e' moves cursor to end of the line

Advanced topics (unfinished)

  • Cygwin (a Linux look-and-feel environment for Windows)
  • PSCP (SCP client for Windows)

Adding a directory to the path

sed, grep, and awk

Regular expressions

More about text editors

Looking inside text files

There are different commands that can be used to look inside files from within the command line interface. Two of these are cat in Linux/Mac, type in MS-DOS, and less in Linux/Mac, more in MS-DOS. The command more is also available in Linux/Mac, and in fact more is less, but less has more features than more.

  • cat: cat stands for concatenate. This command is useful for peeking at short files; using cat to view a long file results in the top lines scrolling off before one can even read them. The simplest use of cat is with
    cat filename
  • less: This command is useful to read long text files because a page of text (i.e., a command line window filled from top to bottom) is displayed one at a time. With less is easy to move forward and backward by lines, pages and even between two or more files. less is a program in itself, so when it is invoked, a prompt appears at the bottom of the page awaiting for a new less command. The prompt in less is a colon (:). The simplest use of less is with
    less filename

The following table contains a list of useful less commands

Some basic less commands
Command Description
spacebar display next page
return display next line
n f move forward n lines
b move backward one page
n b move backward n lines
/ word search forward for word
? word search backward for word
h help
q quit

Online resources to learn UNIX

Unix for beginners

A basic Unix tutorial

Introduction to Unix