Introduction to Linux

The Physics Department provides access to large-scale scientific computing resources. Most of these computers run the Linux operating system, so to use them, it is helpful to have at least basic familiarity with Linux. This introduction covers how to access the Department Linux computers, manipulate files, run programs, and create your own programs. Although this document is written as an introduction to Linux, most of the commands are applicable to other brands of unix, including Darwin, which is the unix underlying OS X.

Accessing Department Linux Computers

Just like a website is accessed via a network communication protocol called http or https, the Linux user interface may be accessed via a network communication protocol called ssh. You will need a computer with a program that can act as an ssh client in order to access the ssh server on the Department Linux computers. On a Windows computer, you can use an ssh client from the Windows command prompt or you can use an ssh application such as PuTTY. On a Mac, you can use the ssh command in the Terminal app found in Applications/Utilities.

You will need to have an account on the Department Linux cluster. To request one, contact help@hep.wisc.edu. Once you have one, use your ssh program to connect to login.physics.wisc.edu, using the login name and password that were provided to you.

An example ssh connection to login.physics.wisc.edu from a Mac’s Terminal app is shown below. Replace the login name “dan” with your login name.

$ ssh dan@login.physics.wisc.edu
dan@login.physics.wisc.edu's password:
Last login: Tue Apr 15 11:24:06 2014 from 128.104.164.180


#######################################################
    Welcome to login01.physics.wisc.edu
    Scientific Linux release 6.4 (Carbon)

    996.66 MB RAM
    1 cores of type QEMU Virtual CPU version 1.1.2
#######################################################

[dan@login01 ~]$

The Command Prompt

Once you have connected via ssh, you will see a command prompt that looks something like this:

[dan@login01 ~]$

The command prompt indicates taht the computer is waiting for you to enter a command. The default command prompt shown above contains your login name, the name of the computer you are logged into, and the name of the directory you are in. You can customize the command prompt to show other information.

Enter the command “pwd” and press enter.

[dan@login01 ~]$ pwd
/home/dan
[dan@login01 ~]$

The output of the command is on the next line after the command you entered, and a new command prompt is on the next line after that. The pwd command displays the name of the current working directory. In this example, it is “/home/dan”, which is my home directory.

Enter the command “date” and press enter. The date command displays the current date and time.

[dan@login01 ~]$ date
Thu Feb 26 11:52:12 CST 2015
[dan@login01 ~]$

Now press the up arrow key. You will see “date” appear on your command prompt. Press the up arrow key again. You will see “pwd” appear in place of “date”. The up arrow key allows you to access commands that you entered before. If you go past the command you wanted, you can press the down arrow key to go forward in history. Once you have returned to a command that you wish to repeat, you can press enter to execute it.

[dan@login01 ~]$ date
Thu Feb 26 11:58:46 CST 2015
[dan@login01 ~]$

From here on, I will condense the prompts to just “$”. Although “$” seems at first like an odd character for the computer to use to prompt you for input, it is consistent with the arcane look and feel that unix wizards expect.

Files and Directories

To see a list of the files in the current directory, use the “ls” command.

$ ls
notes private  public

The file “notes” and the directories “private” and “public” are listed. Depending on your computer, you may see the directories in a different color to indicate that they are directories rather than files. Another way to see that information is to add the “-l” option to the “ls” command:

$ ls -l
total 5
-rw-rw-r--. 1 dan dan   16 Feb 26 12:05 notes
drwxr-x--x. 2 dan dan 2048 Feb 26 12:03 private
drwxrwxr-x. 3 dan dan 2048 Feb 26 12:03 public

When entering the command, be sure to put a space before the “-l” option to separate it from the command. Many Linux commands have options that may be specified to change their behavior. The “-l” option used here causes “ls” to display the file listing in “long format”, which includes information in columns. The last column is the name of the file or directory. Prior to that is the time and date it was last modified. Prior to that is the size in bytes. Prior to that is the name of the group who owns the file. Prior to that is the username of the person who owns the file. Prior to that is the number of links to the file or directory; ignore that for now. The first column is a string of one-character attributes. The meaning of these is shown below in order of left-most to right-most position:

d  = d or - to indicate directory or file
r  = r or - to indicate readable by owner or not
w  = w or - to indicate writable by owner or not
x  = x or - to indicate executable (for file) or listable (for directory) by owner or not
r  = r or - to indicate readable by group owner or not
w  = w or - to indicate writable by group owner or not
x  = x or - to indicate executable (for file) or listable (for directory) by group owner or not
r  = r or - to indicate readable by others or not
w  = w or - to indicate writable by others or not
x  = x or - to indicate executable by others or not

The first character of the attributes can be used to tell which entries are files and which are directories. This is followed by three triplets of attributes that indicate who is allowed to do what to the file. The story of who is allowed to do what is actually a little more complicated, because there are different technologies for file storage. Some have more elaborate ways of controlling access. For example, the files in your home directory are in a filesystem called AFS, which ignores all but the first triplet of permissions and adds additional access controls that are not shown in the output of the “ls” command. We will revisit that later.

To illustrate basic file manipulation commands, I will first create a file using a simple text file editor named “nano”. The command to edit a file named “myfile” is as follows:

$ nano myfile

When you enter that command, the nano text editor will appear on your screen. Enter a few lines of text. Then press Ctrl-X to exit. It will ask if you want to save the changes you made. Press Y. It will ask what filename to write to. Press enter to accept the default, which is the name you specified in the command: myfile.

Now the “ls” command shows the additional file:

$ ls
myfile  notes  private  public

To change the name of the file, use the “mv” command to move it.

$ mv myfile yourfile
$ ls
notes  private  public  yourfile

To make a copy of the file, use the “cp” command.

$ cp yourfile myfile
$ ls
myfile  notes  private  public  yourfile

To create a directory and move the file into it, use the “mkdir” and “mv” commands.

$ mkdir mydir
$ mv myfile mydir
$ ls
mydir  notes  private  public  yourfile

The “ls” command shows the directory “mydir”, but it does not show the contents of that directory. To see the contents, add “mydir” as an argument to the “ls” command.

$ ls mydir
myfile

To see the file listing in long format, use the “-l” option as before.

$ ls -l mydir
total 1
-rw-rw-r--. 1 dan dan 18 Feb 26 13:24 myfile

Now look what happens when we try to rename “myfile” using the “mv” command.

$ mv myfile hisfile
mv: cannot stat `myfile': No such file or directory

Like most error messages in Linux, this one contains some arcane terminology (cannot stat) and some more easily interpreted parts (No such file or directory) that seem untrue until you know what you are doing. The reason the command failed is that the name “myfile” refers to a file in the current working directory. Since “myfile” is actually in a directory named “mydir”, we need to refer to it by specifying the “path” to it. Instead of just “myfile” we must write “mydir/myfile”.

$ mv mydir/myfile mydir/hisfile
$ ls mydir
hisfile

Note that the example specified the path “mydir” for both the original name and the new name. If we had not specified this path for the new name, it would have moved the file into the current working directory instead of keeping it in “mydir”.

If you are working a lot with files in a directory, it is more convenient to make that directory your current working directory, so you don’t have to specify a path to the files. To do that, use the “cd” command:

[dan@login01 ~]$ cd mydir
[dan@login01 mydir]$ pwd
/home/dan/mydir
[dan@login01 mydir]$ ls
hisfile
[dan@login01 mydir]$ mv hisfile herfile
herfile
[dan@login01 mydir]$ ls
herfile

Notice how the current working directory is displayed in my prompt. Initially, I was in my home directory. The “~” character is an abbreviation for the home directory. To see the contents of my home directory, I can use “~” as an argument to the “ls” command:

$ ls ~
mydir  notes  private  public  yourfile

That is the same as using the full path to my home directory. A full path begins with “/” and lists each of the directories and directories inside those directories until reaching the desired location.

$ ls /home/dan
mydir  notes  private  public  yourfile

Another useful abbreviation is “..”. This refers to the parent of the current working directory. In this example, the parent directory happens to also be my home directory.

$ ls ..
mydir  notes  private  public  yourfile

Now suppose we want to move “herfile” back into my home directory and remove “mydir”. This can be done with the following commands:

$ mv herfile ..
$ cd ..
$ rmdir mydir
$ ls
herfile  notes  private  public  yourfile

Input and Output

Many unix commands read some data, perform some computation, and output a result. A large set of these commands use what is called “standard input and output”. A program that uses standard input and output can have its input and/or output be in a file or be entered interactively on your screen. (In unix terminology, your screen is called a “terminal”.) Controlling standard input and output allows you to combine unix commands in powerful ways.

An example of a program that uses standard input and output is “sort”. To use it interactively, enter the command “sort”. Then enter a few lines of text and press ctrl-D to indicate “end-of-file”.

$ sort
one
two
three
four
<ctrl-D>
four
one
three
two

Instead of displaying the sorted result on your terminal, you could redirect the standard output to a file. The following example uses the “>” character to redirect the sorted output to a file named “mysort”. It then uses the “cat” command to display the contents of the file.

$ sort > mysort
one
two
three
four
$ cat mysort
four
one
three
two

To redirect the standard input from a file instead of from the terminal, the “<” character is used. However, many commands do not require the “<” character to be explicitly used, because any filename given as an argument to the command is read as input. For example, “cat mysort” could have been written “cat < mysort” to achieve the same thing.

The “grep” command is used to search data for a pattern. The pattern is specified as a “regular expression”. The following example searches for all lines in the file mysort that contain the letter “o”.

$ grep 'o' mysort
four
one
two

The output of one command can be piped into another command using the “|” character. This avoids the need to store the output into a file and then feed the file into the second command. The following example finds all lines containing an “o” and sorts them in reverse.

$ grep 'o' mysort | sort -r
two
one
four

When the number of lines in the output is very large, you may not want it all to stream by on your terminal. To have it pause between screenfuls you can pipe the output to the “less” command. The following example lets you page through the numbers 1 through 1000.

$ seq 1 1000 | less

Press the spacebar to go to the next page, “b” to go back a page, “/600” to search forward for 600, “?50” to search backward for 50, “G” to go to the end, “1G” to go to line 1, and “q” to quit.

AFS

We use a filesystem called AFS for home directories. It has some special features that make it a little different from a “normal” filesystem such as the filesystem used for /scratch.

AFS is a global networked filesystem. It therefore needs to control who in the world can access it. Each directory has an “access control list” (ACL). To view it, use the “fs listacl” command.

$ fs listacl ~
Access list for /home/dan is
Normal rights:
  system:administrators rlidwka
  system:anyuser l
  dan rlidwka

Each entry in the ACL contains a description of who it applies to and what rights they have. The rights are

r  = read
l  = list
i  = insert
d  = delete
w  = write
k  = lock
a  = administer

Everyone granted access to a file via the AFS ACLs must also be allowed to have that access according to the file owner’s unix access rights (i.e. the first three rwx triplet in the access rights shown by “ls -l”).

Examine the ACLs on the “public” and “private” directories in your home directory.

$ fs la ~/private
Access list for /home/dan/private is
Normal rights:
  system:administrators rlidwka
  dan rlidwka
$ fs la ~/public
Access list for /home/dan/public is
Normal rights:
  system:administrators rlidwka
  system:anyuser rl
  dan rlidwka

The difference is that “system:anyuser” has read and list rights to “~/public” but not to “~/private”. You can therefore use the “public” directory for files that you wish to share with other users and the “private” directory for files that you do not wish to share.

To give someone else read access to a directory, use the “fs setacl” command.

$ mkdir experiment1
$ fs setacl -dir experiment1 -acl cwseys rl
$ fs listacl experiment1
Access list for experiment1 is
Normal rights:
system:administrators rlidwka
system:anyuser l
cwseys rl
dan rlidwka
When you create a new directory, it inherits all the ACLs of its parent directory.

AFS Token

AFS knows who you are via an authentication system called kerberos. It’s not good enough to just be logged into a computer as a particular username. You also have to have an “AFS token” obtained through kerberos. This happens automatically when you log in, so normally you don’t need to think about it. However, the AFS token has a limited lifespan. If you stay logged in for a long time, the AFS token will expire, and you may find that you can no longer access files.

To check your AFS token, use the “tokens” command:

$ tokens

Tokens held by the Cache Manager:

User's (AFS ID 6062) tokens for afs@physics.wisc.edu [Expires Mar  8 17:00]
   --End of list--

To get a new token, use kinit and aklog commands:

$ kinit
Password for dan@PHYSICS.WISC.EDU:
$ aklog
$ tokens

Tokens held by the Cache Manager:

User’s (AFS ID 6062) tokens for afs@physics.wisc.edu [Expires Mar 8 17:28]
–End of list–

Writing Shell Scripts

Commands that are frequently used can be put in a file and executed as a script. This is called a “shell script”, because the unix program that interprets commands is called a shell. There are different shells that one can use. So far, you have been using the default login shell “bash”. Most shells have the same basic syntax but may differ in the syntax used for doing more advanced things such as for loops and if statements. The most commonly used shell for writing scripts is “sh”, which is a subset of “bash”. We will use “sh” in this example.

To make a script, edit a file and put in the commands you wish to run. The following example makes a script that finds powers of 10 in its input and displays them in reverse numerical order.

$ cat > myscript
#!/bin/sh

grep '0$' | sort -n -r

<ctrl-d>
$ chmod a+x myscript
$ seq 1 100 | ./myscript
100
90
80
70
60
50
40
30
20
10

The first line in the script is “#!/bin/sh”. This “shebang” line tells Linux to execute the script using “/bin/sh”, which is the full path to the “sh” command shell. The “chmod” command was then used to grant a users permission to execute the script.

Notice that when running the script, the path “./myscript” was used instead of just “myscript”. Unlike other files, programs must either be specified via a path or must exist in a directory designated for programs. The “.” character is an abbreviation for the current working directory, so the path that was specified in this case was simply the current working directory. We could instead put the script in a directory designated for programs. Such directories are traditionally called “bin”, so the following example puts the script in a new directory named “bin” and adds this to your list of places where programs are expected to be.

$ mkdir bin
$ mv myscript bin
$ export PATH=~/bin:$PATH
$ seq 1 20 | myscript
20
10

This makes use of an environment variable named “PATH”. Adding “~/bin” to PATH means that programs in “~/bin” can be executed without specifying the path to them. However, the change only lasts for the duration of your login session. To make it permanent, read on.

Environment Variables

In the preceeding section, an environment variable named “PATH” was used. This variable has a special meaning. It is a colon separated list of paths in which to look for programs when a command is executed.

To see the current value of a variable, use the “echo” command.

$ echo $PATH
/home/dan/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin

This is unix. Want to see each item in that list on a separate line? Pipe the output of the “echo” comand through the “sed” command and tell it to replace colons with newlines:

$ echo $PATH | sed 's|:|\n|g'
/home/dan/bin
/usr/lib64/qt-3.3/bin
/usr/local/etc
/usr/local/bin/afs
/usr/local/bin
/bin
/usr/bin
/usr/local/sbin
/usr/sbin
/sbin

Look again at the command that was previously used to add something to PATH:

$ export PATH=~/bin:$PATH

The “export” command is used here to assign a new value to PATH. By specifying the value as “~/bin:$PATH” we caused “~/bin” to be put first in the list. To put it last in the list, we would use “$PATH:~/bin”. The use of “$PATH” in the value is just a short-hand for the current value of “PATH”.

Changes to the environment only last for the duration of the current login session, and they only apply to the current login session. (You could have multiple simultaneous sessions.) To make a change that applies to all future login sessions, you need to modify your shell initialization script. The default login shell is “bash”. Every time you log in, bash executes the commands found in “.bash_profile” in your home directory. If you type “ls” in your home directory, you will not see this file, even though it is there. That is because, by default, “ls” hides all files that begin with “.”. To see them, use the “-a” option:

$ ls -a
.  ..  .bash_history  .bash_profile  .bashrc  bin  herfile  mysort  notes  private  public  yourfile

You can edit your .bash_profile with nano and put the “export PATH=~/bin:$PATH” command at the bottom. This will cause all future login sessions to have the modified PATH setting.

Transfering Files, Passwordless Login, and Other ssh Tricks

See SSH Access.

Disconnecting and Reconnecting

If you are running commands that take time to complete, and you wish to disconnect and reconnect later, one option is to use tmux. Rather than sshing to login.physics.wisc.edu, you will need to ssh to a specific computer (e.g. login02.physics.wisc.edu), since you need to reconnect to the same computer you were previously using in order to access the same tmux session. Using tmux works for text-based programs that run in a terminal. For graphical programs, you can use xrdp or VNC, which are described in the next section.

Be aware that long-running sessions, whether in tmux, xrdp, or VNC, may encounter the problem of the home directory AFS access token expiring. To see how to check for that and remedy it, see AFS Token.

Graphical Applications

Linux provides graphical interfaces in addition to the command-line interface. For example, the Physics Department provides programs such as Mathematica and Matlab that have graphical interfaces in Linux. For heavy-duty computations, you should use HTCondor rather than running the task directly on login.physics.wisc.edu. However, login.physics.wisc.edu can be used to test or compile your program.

xrdp and Windows Remote Desktop

One way to use graphical applications is to use xrdp. To use it, you will need Windows Remote Desktop or compatible software installed on your computer. On a Windows computer, Windows Remote Desktop is already installed. On a mac, it can be installed from App Store. In Linux, there are several programs compatible with the Windows Remote Desktop protocol.

To connect to the remote desktop server, your computer will need to be connected to the wisc.edu network either directly or via WiscVPN or you will need to use ssh port forwarding. To set up ssh port forwarding, the following command may be used:

ssh -L 10000:localhost:3389 USER@login02.physics.wisc.edu

Once the above ssh connection is established, you can connect a remote desktop client to the hostname localhost:10000, and ssh will forward this connection to the xrdp service on the login computer.

If you are not using ssh port forwarding, configure Windows Remote Desktop to connect to one of the login computers. You should choose a specific computer such as login01.physics.wisc.edu rather than the name of the login cluster login.physics.wisc.edu in case you want to disconnect and reconnect to the same session.

X Forwarding

Another way to use graphical applications is to install X Windows on your computer. On a Mac, you can install XQuartz for this purpose. Once you have X Windows installed, you can make ssh to login.physics.wisc.edu by adding the “-X” ssh option when logging in. Graphical programs that you run from the ssh session will be able to open a window on your computer.

VNC

Another way to use graphical programs is to use VNC. Like xrdp, this is convenient if you need to be able to disconnect and reconnect to the same session.

To start a VNC server, run the following command on the login computer:

vncserver -geometry 1024x768

To connect to the VNC server using a VNC client running on the login computer, run the following command, replacing 1 with the display number assigned when vncserver was started:

vncviewer login02:1

To stop the VNC server, use the following command, replacing 1 with the display number assigned when vncserver was started:

vncserver -kill :1

Rather than running vncviewer on the login computer and viewing it over X forwarding, performance may be improved by running vncviewer on your computer and using ssh port forwarding to allow it to connect to the VNC server. To do so, you will need to know the port number being used by the server. One way to find it is to look in the log file for a line of the form Listening for VNC connections on TCP port N. The path to the log file is displayed when the server is started. It is in the .vnc directory.

Another way to find the port is to examine the VNC server process using the following command:

ps uwwwx | grep vnc

Look for the port number in a line like the following:

dan      1881388  0.1  0.3  50652 14284 pts/1    S    17:29   0:00 Xtightvnc :1 -desktop X -auth /afs/physics.wisc.edu/home/dan/.Xauthority -geometry 1024x768 -depth 24 -rfbwait 120000 -rfbauth /afs/physics.wisc.edu/home/dan/.vnc/passwd -rfbport 5901 -fp /usr/share/fonts/X11/misc/,/usr/share/fonts/X11/Type1/,/usr/share/fonts/X11/75dpi/,/usr/share/fonts/X11/100dpi/ -co /etc/X11/rgb

Once you know the port number, connect to the login computer with ssh using the following arguments, replacing 5901 with the port number and USER with your physics.wisc.edu username:

ssh -L10000:localhost:5901 USER@login02.physics.wisc.edu

The above command forwards port 10000 (arbitrarily chosen) on your computer to port 5901 on login02. You can then run vncviewer on your computer and connect to port 10000, which will be forwarded by ssh to your VNC server port on login02.

vncviewer localhost:10000