Shell scripting#
Introduction#
Instead of typing all the UNIX commands we need to perform one after the other, we can save them all in a file (a “script”) and execute them all at once. Recall from the UNIX and Linux Chapter that the bash shell (or terminal) is a text command processor that interfaces with the Operating System. The bash shell provides a computer language that can be used to build scripts (AKA shell scripts) that can be run through the terminal.
What shell scripts are good for#
It is possible to write reasonably sophisticated programs with shell scripting, but the bash language is not featured to the extent that it can replace a “proper” language like C, Python, or R. However, you will find that shell scripting is necessary. This is because as such, as you saw in the previous chapter, UNIX has an incredibly powerful set of tools that can be used through the bash terminal. Shell scripts can allow you to automate the usage of these commands and create your own, simple utility tools/scripts/programs for tasks such as backups, converting file formats, handling & manipulating files and directories). This enables you to perform many everyday tasks on your computer without having to invoke another language that might require installation or updating.
Your first shell script#
Let’s write our first shell script.
Some conventions and syntax rules#
By convention, Unix shell variables should be named in UPPERCASE
Also, to create more complex variable names, use snake case (for example “VAR_NAME”)
There should be no spaces around the
=
when assigning these variables;MY_VAR=value
would work, butMY_VAR = value
wouldn’t, because then the shell assumes thatMY_VAR
must be the name of a command and tries to execute it (with= value
as arguments).
\(\star\) Write and save a file called boilerplate.sh
in CMEECourseWork/week1/code
, and add the following script to it
(type it in your code editor):
#!/bin/sh
# Author: Your Name your.login@imperial.ac.uk
# Script: boilerplate.sh
# Desc: simple boilerplate for shell scripts
# Arguments: none
# Date: Oct 2019
echo -e "\nThis is a shell script! \n"
#exit
The .sh
extension is not necessary, but useful for you and your programming IDE (e.g., Visual Studio Code, Emacs, etc) to identifying the file type.
The first line is a “shebang” (or sha-bang or hashbang or pound-bang or hash-exclam or hash-pling! – Wikipedia). It can also can be written as
#!/bin/bash
(assuming you are using the bash shell). It tells the bash interpreter that this is a bash script and that it should be interpreted and run as such, and be executed by/bin/sh
.The hash marks in the following lines tell the interpreter that it should ignore the lines following them (that’s how you put in script documentation (who wrote the script and when, what the script does, etc.) and comments on particular line of script.
The
-e
flag toecho
Note that there is a commented out
exit
command at the end of the script. Uncommenting it will not change the behavior of the script, but will allow you to generate a error code, and if the command is inserted in the middle of the script, to stop the code at that point. To find out more, see this and this in particular.
Tip
#!/bin/sh
is the standard location of the Bourne shell (sh
) on most Unix systems. If you’re using GNU/Linux (e.g., Ubuntu), /bin/sh
is normally a symbolic link to bash (or, sometimes, [dash]
](https://blog.cloudware.bg/en/dash-vs-bash-shell/)).
Special characters#
In shell scripts, there are certain, “special” characters that must be properly “escaped” to avoid interpretation by the shell. Some of these you already saw in the UNIX Chapter; for example, in the bash challenge command find . -type f -exec ls -s {} \; | sort -n | head -10
, the character ;
had to be escaped with a \
to avoid being interpreted as a special character. There is a list of these in the UNIX chapter, and additional ones will be introduced here.
Next, let’s run your first shell script.
Running shell scripts#
There are two ways to run a shell script:
Call the bash interpreter to run the file:
bash myscript.sh
(You can also use sh myscript.sh
, but it may give a slightly different output.)
This is the right way if the script is does something specific in a given project.
Note
Bash (bash) is one of many available (yet the most commonly used) Unix shells. Bash stands for “B
ourne A
gain SHell”,and is an improvement of the original Bourne shell (sh
). Basically bash
is sh
, with more features and nicer (more intuitive, compact) syntax. Most inbuilt UNIX commands or your own scripts will work the same, but at times with subtle differences in output.
Tip
Mac Users: your default shell might not be bash
, zsh
. Usually, running a shell script or command with bash
and zsh
will give you an identical processing and output. The commands you learned for bash
will also work in zsh
although they may give somewhat different output.
Make the script executable and execute it:
chmod +x myscript.sh
./myscript.sh # the ./ is needed
Use this second approach for a script that does something generic, and is likely to be reused again and again (Can you think of examples?)
The generic scripts of type (2) can be saved in username/bin/
, and made easily accessible by telling UNIX to look in /home/bin
for specific scripts. To this end, you need to add bin
to the directory paths that linux searches in for executables. For this you need to set the $PATH
environmental variable: a list of directories (separated by colons) that tells the shell which ones to search for executable files (more on environmental variables below).
First, check which directories are already in $PATH
:
echo $PATH
Then check if you already have a bin directory
:
find /home/ -maxdepth 3 -name 'bin' -type d
Tip
Mac Users: on Macs you may not need to search /home/
, but just /
Note the maxdepth 3
directive. You don’t want to search in every possible directory in your UNIX tree (under home
)! If you see no bin
directory (e.g., you might find .local/bin
), then create one:
mkdir ~/.local/bin # in ".local" to keep it to only current user
Then, add it to the $PATH
:
export "PATH=$PATH:$HOME/.local/bin"
This change will not persist after you have rebooted your computer. To make it persistent,
For Bash, you need to add
export PATH=$PATH:$HOME/.local/bin
, to the appropriate file that will be read when your shell launches. There are a few different files where you can set the variable name:~/.bashrc
~/.profile
~/.bash_profile
Check if these files exist, and then add the path specification command (export PATH=$PATH:$HOME/.local/bin
in this case) to any of them, but usually ~/.bashrc
is a good choice. Then log out and in again, or run source ~/.bashrc
(if it was indeed .bashrc
that you edited).
For other shells, you need to find the appropriate file by reading that shell’s documentation. In particular, on current Mac OS versions, which now use the
zsh
shell, it will be~/.zshrc
.
Note
If you have two executable files sharing the same name located in two different directories, the shell will run the file that is in the directory that comes first in the paths listed in $PATH
.
Now run your first shell script.
★ cd
to your code
directory, and run it :
cd ../code
bash boilerplate.sh
This is a shell script!
I have specified the relative path ../code
assuming that you are in some other directory in your current week (sandbox
, results
or data
).
Variables in shell scripts#
You will need to handle and manipulate variables (AKA parameters) inside shell scripts to truly exploit the powerful features of the bash (shell) language.
Note
At the most fundamental level, a “variable” in any programming language or environment is a named section (portion, chunk) of the computer’s memory which can be assigned values, read and manipulated.
Shell scripts have two types of variables.
Special Variables#
These are set by the shell, and typically cannot have values assigned to them (cannot be modified). They contain useful or necessary information needed for the script to run. These include:
Environmental variables: These contain information about the system (e.g.,
$PATH
, which you saw above), are available system-wide (so you can invoke them directly in the commandline, outside a shell script), and are available to (or “inherited by”) all new processes and shells generated (“spawned”) by a bash script (AKA a “child” process or shell).Special internal variables: These exist only in the environment of a particular execution of the shell script. These will not be available any more once the script has finished running, unless you explicitly export them.
Tip
To see a list of all current environmental variables, you can use env
in the commandline.
Here are some key special internal variables in shell scripts :
Variable |
Description |
---|---|
|
The filename (basename) of the current script, including any extension |
|
Here |
|
The number of arguments (parameters) supplied to a script (the script was “called” with) |
|
All the arguments are individually printed. For example, if a script receives two arguments, |
This is not an exhaustive list, but the important ones to remember in basic shell scripting.
Assigned Variables#
These are assigned manually by the user. These are present within the current instance of the shell only and are not available any child processes spawned started by the script unless they are explicitly exported.
In general, assigned variables in the bash language are analogous to those in any other programming language (e.g., Python): they can be a number, a character, a string of characters, or boolean (true/false). There are three ways to assign values to such variables (note lack of spaces!):
Explicit declaration:
MY_VAR=myvalue
Reading from the user input (the script will wait for the value to be provided):
read MY_VAR
Command substitution:
MY_VAR=$(command)
(the variable is the output of somecommand
); e.g.,MY_VAR=$( (ls | wc -l) )
Tip
The command substitution of the type MY_VAR=$(command) is one type of “shell expansion”. There are several types of shell expansions, which you can learn about here. Along with command substitution, shell parameter expansion is particularly important to learn about in shell scripting.
Some examples of variables#
Here is an example illustrating the different types of shell variables (and assignments):
#!/bin/sh
## Illustrates the use of variables
# Special variables
echo "This script was called with $# parameters"
echo "The script's name is $0"
echo "The arguments are $@"
echo "The first argument is $1"
echo "The second argument is $2"
# Assigned Variables; Explicit declaration:
MY_VAR='some string'
echo 'the current value of the variable is:' $MY_VAR
echo
echo 'Please enter a new string'
read MY_VAR
echo
echo 'the current value of the variable is:' $MY_VAR
echo
## Assigned Variables; Reading (multiple values) from user input:
echo 'Enter two numbers separated by space(s)'
read a b
echo
echo 'you entered' $a 'and' $b '; Their sum is:'
## Assigned Variables; Command substitution
MY_SUM=$(expr $a + $b)
echo $MY_SUM
\(\star\) Save this as a single variables.sh
script.
\(\star\) Now run this script with any arguments:
bash variables.sh
\(\star\) And compare the output when run with two arguments:
bash variables.sh 1 two
And also type into another script file the following (save as MyExampleScript.sh
) and run it:
#!/bin/sh
MSG1="Hello"
MSG2=$USER
echo "$MSG1 $MSG2"
echo "Hello $USER"
echo
This introduces you to the $USER
(same as $USERNAME
) environmental variable.
A useful shell-scripting example#
Let’s write a shell script to transform comma-separated files (csv) to tab-separated files and vice-versa. This can be handy — for example, in certain computer languages, it is much easier to read tab or space
separated files than csv (e.g., C
)
To do this, in the bash we can use tr
(abbreviation of tr
anslate or tr
ansliterate), which deletes or substitute characters. Here are some examples.
echo "Remove excess spaces." | tr -s " "
Remove excess spaces.
echo "remove all the a's" | tr -d "a"
remove ll the 's
echo "set to uppercase" | tr [:lower:] [:upper:]
SET TO UPPERCASE
echo "10.00 only numbers 1.33" | tr -d [:alpha:] | tr -s " " ","
10.00,1.33
Now write a shell script to substitute all tabs with commas called tabtocsv.sh
:
#!/bin/sh
# Author: Your name you.login@imperial.ac.uk
# Script: tabtocsv.sh
# Description: substitute the tabs in the files with commas
#
# Saves the output into a .csv file
# Arguments: 1 -> tab delimited file
# Date: Oct 2019
echo "Creating a comma delimited version of $1 ..."
cat $1 | tr -s "\t" "," >> $1.csv
echo "Done!"
exit
Now test it (note where the output file gets saved and why). First create a text file with tab-separated text:
echo -e "test \t\t test" >> ../sandbox/test.txt # again, note the relative path!
Now run your script on it
bash tabtocsv.sh ../sandbox/test.txt
Creating a comma delimited version of ../sandbox/test.txt ...
Done!
Note that
$1
is the way a shell script defines a placeholder for a variable (in this case the filename). See next section for more on variable names in shell scripts.The new file gets saved in the same location as the original (Why is that?)
The file got saved with a
.txt.csv
extension. That’s not very nice. Later you will get an opportunity to fix this!
Some more examples#
Here are a few more illustrative examples (test each one out, save in week1/code/
with the given name):
Count lines in a file#
Save this as CountLines.sh
:
#!/bin/bash
NumLines=`wc -l < $1`
echo "The file $1 has $NumLines lines"
echo
The <
redirects the contents of the file to the stdin (standard input) of the command wc -l
. It is needed here because without it, you would not be able to catch just the numerical output (number of lines). To see this, try deleting <
from the script and see what the output looks like (it will also print the script name, which you do not want).
Concatenate the contents of two files#
Save this as ConcatenateTwoFiles.sh
:
#!/bin/bash
cat $1 > $3
cat $2 >> $3
echo "Merged File is"
cat $3
Convert tiff to png#
This assumes you have done apt install imagemagick
(remember sudo
!)
Save this as tiff2png.sh
:
#!/bin/bash
for f in *.tif;
do
echo "Converting $f";
convert "$f" "$(basename "$f" .tif).png";
done

Fig. 5 This is not a good use of shell scripting!
(Source: XKCD)#
Practicals#
Instructions#
Along with the completeness of the practicals/exercises themselves, you will be marked on the basis of how complete and well-organized your directory structure and content is.
Review (especially if you got lost along the way) and make sure all the shell scripts you created in this chapter are functional.
Make sure you have your weekly directory organized with
data
,sandbox
,code
with the necessary files, underCMEECourseWork/week1
.All scripts should run on any other Unix/Linux machine — for example, always call data from the
data
directory using relative paths.Make sure there is a
readme
file in every week’s directory. This file should give an overview of the weekly directory contents, listing all the scripts and what they do. This is different from thereadme
for your overall git repository, of whichWeek 1
is a part. You will write a similarreadme
for each subsequent weekly submission.Don’t put any scripts that are part of the submission in your
home/bin
directory! You can put a copy there, but a working version should be in your repository.
Improving scripts#
Note that some of the shell scripts that you have created in this chapter above requires input files. For example, tabtocsv.sh
needs one input file, and ConcatenateTwoFiles.sh
needs two. When you run any of these scripts without inputs (e.g., just bash tabtocsv.sh
), you either get no result, or an error.
The goal of this exercise is to make each such script robust so that it gives feedback to the user and exits if the right inputs are not provided.
A new shell script#
Write a
csvtospace.sh
shell script that takes ac
ommas
eparatedv
alues and converts it to a space separated values file. However, it must not change the input file — it should save it as a differently named file.This script should be able to handle wrong or missing inputs (similar to the previous exercise).
Save the script in
CMEECourseWork/week1/code
, and run it on thecsv
data files that are inTemperatures
in the master repository’sData
directory.
Readings & Resources#
The bash reference manual: https://www.gnu.org/software/bash/manual/bash.html
Plenty of shell scripting resources and tutorials out there; in particular, look up http://www.tutorialspoint.com/unix/unix-using-variables.htm
These is a relatively intuitive set of notes on shell scripting; https://www.shellscript.sh/
Some shell scripting examples