The newline Guide to Bash Scripting - Part 3

slaider13 · 19.Июль.2021 21:26:55

SSH#

By far the most popular way to connect securely to other hosts from Bash is using the OpenSSH suite of tools. If you administer any *nix host there’s a good chance it’s already running an SSH service. This chapter looks into common use cases and pitfalls.

If you’re unsure whether you are running an SSH service you can check the output of ps ax | grep '[/]sshd' – it should list one or more processes if it’s running. You could also try to connect to the local service by running ssh 127.0.0.1 , but interpreting the results might actually be a bit harder. A “Permission denied” error message, password prompt or simply getting a Bash prompt with no output are sure signs that an SSH service is running.

Connecting to a host interactively#

ssh HOST , for example ssh example.org , is the most common SSH connection command. If everything is set up properly you should now be running a shell on the server. If this does not work out of the box you may want to check with the system administrator. Depending on the setup changes might be necessary on the client or server to enable this simplicity.

The shell on the other host might not be Bash, but if it’s any common shell you should be able to tell by running echo "$SHELL" . If that does not end in /bash you could try running exec bash to replace the current shell with a Bash shell. If that doesn’t work you might just have to familiarize yourself with another shell or ask the system administrator to install a familiar shell.

Creating keys#

As an SSH user you need a key pair consisting of your public and private keys. As the names indicate the public key is not secret, and can be shared freely. The private key is secret and usually personal, and should be stored and transferred so that it’s only ever accessible by you. If you already have a key pair it will probably be stored as ~/.ssh/id_rsa (private key) and ~/.ssh/id_rsa.pub (public key).

The “rsa” part of the name is the name of the main algorithm used to create the key. Several other algorithms are available, but the filename should start with “id_”.

If you do not have a key, creating one with default settings is as simple as running ssh-keygen and following the instructions. An example session:

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/jdoe/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/jdoe/.ssh/id_rsa.
Your public key has been saved in /home/jdoe/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:R1uCCpXuP+ccVTaJIyXWbZKyq/JWekCr5LSHpbGGSpM jdoe@example.org
The key's randomart image is:
+---[RSA 2048]----+
|      ..  o..o   |
|     ..  o.o+.o. |
|    ..  . +o+o=  |
|     ......= + . |
|     ...S.o..    |
|   .  = +.o.     |
|  E  = X =.      |
| . .. X B.o.     |
|  .. . =.=o      |
+----[SHA256]-----+

The default path to the new private key is in parentheses on the second line of the output. You can simply press Enter to use that path. The passphrase is used to encrypt the private key. You will be prompted for it when necessary.

Running a script on another host non–interactively#

Once you can connect to a host using SSH you can of course run commands interactively while in the remote shell. You can also run a single command on the server and exit immediately with ssh HOST COMMAND [ARGUMENT…] . For example, ssh example.org export will start a remote shell, run the export command in the remote shell, and then exit the remote shell immediately.

There is a major issue with running anything but the most trivial commands this way: any characters which have a special meaning to the shell need to be escaped or quoted, otherwise they apply to the local shell. For example, ssh example.org cd /tmp '&&' echo "\$PWD" will open a shell on the server, change the working directory to /tmp, and finally print the value of $PWD as set by the server. This is cumbersome and error–prone.

A more subtle issue is that ssh HOST COMMAND [ARGUMENT…] will consume the client’s standard input to pass it to the remote command. Here’s a common way to run into this issue:

awk '{print $1}' /etc/hosts | while read -r host
do
    if ssh -o ConnectTimeout=1 "$host" true
    then
        echo "$host is reachable"
    fi
done

shellcheck detects this issue, but shellcheck can’t detect all such issues. Testing with more than one input line is the best way to make sure the loop works as intended.

This script is meant to try to connect to all the IPs listed in /etc/hosts to check which are reachable using SSH, but it will actually only run at most one ssh command! What happens is that read consumes the first line of the input, then the ssh command consumes the rest of the input. Once the read command comes around again there’s no more input to be read, so the loop stops prematurely.

If you want to run anything non–trivial you can instead pass a script into standard input of the ssh command, for example ssh example.org < dwim.bash . That way you don’t have to transfer the script to the server first, and avoid the issues described above. Just beware that everything in the script runs remotely: if the script depends on commands, files or variables which are not available on the server it will fail.

ssh -n will achieve the same result, but is completely application–specific. Passing data on standard input or plugging it with < /dev/null is applicable to all commands.

The exit code from an ssh command is the exit code of the command passed to it, so you can check for that locally after running the command just like you would if the command ran locally. The only exception is exit code 255, which is reserved for SSH client errors.

Jump server#

When working with a service consisting of many separate hosts it is common to have a dedicated host which is used as an entry point for SSH connections. Usually this jump server has no other services running on it. This can make connecting to a host behind the jump server cumbersome, having to first ssh JUMP_HOST to get a shell on the jump server and then ssh SERVICE to get to the relevant service host. If most of the work is on the service host it is convenient to set up the SSH client to automatically connect via the jump server. This is easily achieved within ~/.ssh/config. For example, with “jump.example.org” as the externally available jump server and “service” as the internal host:

Host service ProxyJump jump.example.org

Host service
    ProxyJump jump.example.org

This way connecting to the service host via the jump server is as simple as ssh service .

Quitting a hung session#

If the SSH connection has died for some reason, such as disconnecting the network or the server restarting, it is possible for the remote shell to become unresponsive. At this point nothing you type shows up in the terminal, and it could take a very long time before the client overshoots a relevant timeout and disconnects automatically. Luckily there’s a simple key sequence which you can send to tell the client to close immediately: Enter, ~ (tilde), . (full stop).

Enter is necessary because the sequence is only detected at the start of a line.
Tilde is the default SSH client escape character ( -e escape_char in man ssh ).
Full stop finally tells the client to close the connection.

From the Terminal to Production#

The terminal is great for running experiments: massage the input into a usable form, pass it to a processing command, and pull out the interesting bits from the output. However, once you’ve ended up with a useful, working set of commands, putting them together into a maintainable and user–friendly script involves many other pieces. In this chapter we’re going to explore how to go about this, taking an in–depth look at some best practices along the way. We’re going to discuss the portable shebang line, well–structured documentation, flexible argument handling, strict error handling, cleanup code, and more.

The concepts discussed here are central to our goal of writing production–ready, maintainable scripts. However, if there are any concepts that you’re not yet familiar with, feel free to come back to them after reading other chapters.

Start with a shebang#

Aim of this section: Make the script runnable as long as Bash in is the $PATH .

The first line of a script is special. If it starts with the characters #! the rest of that line is used as the path to the interpreter for the script when run directly. This line is usually called the shebang, from the “hash” and “bang” characters.

On some systems the path to the Bash interpreter command is /bin/bash , on others it’s /usr/bin/bash . Part of the freedom of a Linux system is that you can literally install anything anywhere, but these are by far and wide the two most common options. In short, this is not standardized, so there’s no single Bash path we can use in a shebang directly. Fortunately there’s a tool called env which can be used to find the path of an executable on your $PATH and run that. env has a much more reliable path, so this is a portable shebang line:

#!/usr/bin/env bash

Because this is “free” portability, and because the development community seems to like it, I recommend just using the line above, and you won’t have to worry about it again. Of course, if for whatever reason your Bash executable is not on the $PATH then by all means use the path directly, with the caveat that it will need editing to run on most other platforms.

Documentation#

Aim of this section: Create easily–readable structured script documentation.
You wouldn’t write a script which did something trivial, so you should help out those people running it in the future (such as yourself) by including enough documentation to at the very least know what it is meant to do and how to run it.

There is no commonly–accepted format for this documentation. Some use existing languages like troff (the source language of man pages) or Markdown, but the problem with such languages is that the documentation in the actual script file is then no longer plain text.

If you want to produce stand–alone or typeset documentation for your script I would recommend putting it in a separate file and adding the plaintext version to the script in a post–processing step when releasing a new version. This way the documentation in the script is always readable, and you don’t need to keep two versions of it in sync.

If instead you’re fine with documenting the script using plain text within it, there’s a simple trick to make it easily readable:

Resize the terminal window until it is as wide as your script’s maximum line length (anywhere from 78 to 120 characters is common) minus two. So if you want the text of a line of documentation to be maximum 100 character, use a terminal width of 98. You can use echo $COLUMNS to check the width after resizing.
Run man man .
Use that manual page as your template by prefixing each line with # and a space (hence the “minus two” above).

This should make the documentation easily readable for anyone familiar with the most ubiquitous command line documentation format in existence.

An example of 80 characters wide documentation:

These sections should be fairly self–explanatory, and are relevant for almost any program. Which sections you choose to include is of course up to you, and I would encourage browsing the man pages of similar programs to get ideas for other useful sections.

Failing fast#

Aim of this section: Avoid cascading failures and hard–to–debug situations by exiting the script at the first sign of trouble.

See Fail–Fast Settings for an in–depth discussion of why, how and when these settings protect you.

A fail–fast script exits as soon as it encounters an error. You can think of it as pre–flight checks: everything has to line up just so for the script to do its job properly. Continuing in case of an error does not guarantee failure, but

there are generally many more ways a command can fail than succeed,
the failure modes are usually not tested, and
failure modes compound when running more commands with broken state.

So you will be very lucky to get the result you want despite an error, and pretty lucky if all you have to do is tweak something before re–running the command. Permanent data loss (even unrelated to the input data) or ending up in an unknown state are common results.

One common example is using a command which is on the $PATH to do some pre–processing. $PATH will be defined in an interactive shell, but not in a crontab. If you don’t stop the script when the pre–processing fails, the rest of the script often runs like a train wreck: it gets into an infinite loop, fills up the disk with useless data until it crashes, or modifies files it should never have touched.

This is not an academic problem. For example, the Linux client for the massively popular Steam gaming platform had a devastating bug which in some circumstances deleted every single file on the machine which was owned by the current user!

By default, Bash does not fail fast. In interactive shells that’s useful – you don’t want your terminal to quit just because of a typo – but in scripts you’ll want to err on the safe side. To enable several safeguards you can simply add the following as the first code, after the shebang line and documentation but before doing anything else:

set -o errexit -o noclobber -o nounset -o pipefail
shopt -s failglob

For simplicity’s sake I’ve left some of these out of scripts where they don’t apply, but it’s generally safer to just start with all of them enabled.

Check shell version#

Aim of this section: Ensure that your script is run with a compatible interpreter.

Portability is a bit of a sore subject in shell scripting. On the one hand, using portable language features means that the code is more likely to do the expected actions when run with different interpreters (or different versions of the same interpreter), which can save users time. On the other hand:

It is not possible to write a fully portable script, because there are a huge number of interpreters and versions, and the target interpreters are (pretty much by definition) unknown
The script could do something subtly wrong when run by another interpreter, like doing nothing when it should be doing something
The script could do something catastrophically wrong when run by another interpreter, like deleting all the user’s files
A lot of useful tools and Bash–specific features are not portable, so portable code will be more complex than non–portable code with the same feature set
When something goes wrong the user is going to have a nasty time trying to debug complex code written by someone else for a most likely unknown set of interpreters and versions

We could flip the situation completely and only support a single version of Bash. That might be reasonable in a project where reliability is extremely important, but has its own costs and risks: developers would have to be able to run the same version of Bash, independent of the operating system’s default interpreter. This could involve a container, a virtual machine, compiling Bash locally, or a package manager which allows multiple Bash versions to be installed at the same time.

A compromise would be to support a range of Bash versions, making sure to test the script with at least the oldest and most recent version in the range. For example, a script supporting versions 4 through 5.0 might look like this:

#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail
shopt -s failglob

if [[ -z "${BASH_VERSINFO-}" ]]
then
    echo 'Cannot determine Bash version' >&2
    exit 1
fi

if (( "${BASH_VERSINFO[0]}" < 4 ))
then
    echo 'Bash versions <4 are unsupported' >&2
    exit 1
fi

if (( "${BASH_VERSINFO[0]}" >= 5 )) && (( "${BASH_VERSINFO[1]}" > 0 ))
then
    echo 'Bash versions >5.0 are unsupported' >&2
    exit 1
fi

echo "Let's roll!"

Similarly you can check the value of $OSTYPE if your script only supports a specific operating system or needs to do things differently depending on the same.

Argument handling#

Aim of this section: Handle arguments in a simple, flexible and recognizable manner.

An “argument” is a word passed to a command. Most commands require arguments to do something useful, such as deploying to a specific environment or processing some files.

A “word” in Bash is a strange concept, only vaguely related to natural language words. A Bash word is any sequence of bytes (excluding NUL) bracketed on either side by the start of a simple command, an unescaped, unquoted space, tab or newline character, or the end of a simple command. Examples of single words include:

./dwim.bash , since it is bracketed by the start and end of the command

./my\ documents/ , since the space character is escaped with a backslash

'./my documents' , since the string is quoted

./dwim.bash | tac is two sets of single words, ./dwim.bash and tac (which reverses lines), because | separates two commands

Examples of multiple words include:

./dwim.bash ./my documents is three words, ./dwim.bash , ./my and documents , because none of the spaces are escaped or quoted

./dwim.bash "./$user"' 'documents is two words, because Bash supports mixing quoted and unquoted strings in the same word

./dwim.bash ./$user documents is at least three words, because word splitting happens after expanding variables, and $user might contain any number of words, including zero

Arguments are stored — in order — in the parameter array $@ , and can be accessed individually as $1 (the first parameter) onwards. Basically we can think of the parameter as the name, key or 1–based index, and the argument as the value.

You need to use curly brackets to refer to the tenth and subsequent arguments, as in ${10} .

$0 is the command name itself, and is not considered a parameter or part of the parameter array.

So we can break a command like grep 'foo.*bar' './some file.txt' > ./output.txt into $0 , which is grep , $1 , which is foo.*bar , and $2 , which is ./some file.txt . The redirection is not part of the arguments.

An “option” is an argument which modifies the behavior of the program. Options can be either “flags” such as --verbose , which toggle a boolean, or a key/value pair such as --configuration=./my.conf . Option arguments may be followed by “non–option arguments”, which is the input the command will work with, such as a file name. An optional separator of -- is sometimes used to keep options and non–options apart. This allows specifying non–option arguments starting with hyphens as in grep foo -- -test.txt , although as we’ll see this is an antipattern with a simple workaround.

Putting options before arguments when writing commands is a good habit to get into. some-command /some/path --some-option will work with some commands, but argument parsing is implemented in countless different ways, and some programs will do unexpected things (or just fail) if we mix these together.

Hold on though, what about find ? Doesn’t it put the paths to search for in the middle and the options (like -type d ) at the end? It’s confusing, but the synopsis in the GNU find manual page explains this: it starts with a handful of rarely–used options, followed by the “starting points,” and the rest of the arguments all make up an expression which filters and manipulates the files within the starting points. The keywords in the expression just happen to look like options.

The following script shows how you might set up argument handling:

#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail
shopt -s failglob

version='0.1'
readonly version

usage() {
    echo 'dwim [--verbose] [--configuration=FILE] [--] FILE…' >&2
    echo 'dwim --help' >&2
}

verbose_printf() {
    if (( "$verbosity" > 0 ))
    then
        # shellcheck disable=SC2059
        printf "$@"
    fi
}

# Defaults
configuration_file=/etc/example.conf
verbosity=0

arguments="$(
    getopt --options='' --longoptions=configuration:,help,verbose,version \
    --name="$0" -- "$@"
)"
eval set -- "$arguments"
unset arguments

while true
do
    case "$1" in
        --configuration)
            configuration_file="$2"
            shift 2
            ;;
        --help)
            usage
            exit
            ;;
        --version)
            echo "$version"
            exit
            ;;
        --verbose)
            ((++verbosity))
            shift
            ;;
        --)
            shift
            break
            ;;
        *)
            printf 'Not implemented: %q\n' "$1" >&2
            exit 1
            ;;
    esac
done

readonly configuration_file verbosity

if (( $# == 0 ))
then
    usage
    exit 1
fi

verbose_printf 'Using configuration file %q\n' "$configuration_file"

for argument
do
    verbose_printf 'Processing %q\n' "$argument"
done

This does pretty much what you would expect:

$ ./argument-handling.bash --verbose './first file' '/path/to/second file'
Using configuration path /etc/example.conf
Processing ./first\ file
Processing /path/to/second\ file

Step by step:

Start the script with the shebang line and error handling directives.
Set up the version string near the top of the script so that it’s easy to find and change when updating the script.
Create utility functions for non–trivial reusable code.
Set up option defaults for easy modification and to avoid undefined variable errors.
Run getopt to produce a string which can be used to re–set the parameter array in a standard way. getopt splits up key/value pairs, so you can use the unambiguous form --key=value but your parameter array will contain --key and value separately.
Override the parameter array using the string produced by getopt . This replaces $@ with a more easily parseable set of arguments. Some things to note:

getopt inserts a standard option delimiter argument -- if it’s not already present after the last option it recognizes in the input:

$ getopt --options='' --longoptions=help --name="dwim" -- --help
  --help --

It splits up --key=value arguments into two arguments --key and value for easier looping:

$ getopt --options='' --longoptions=x: --name="dwim" -- --x=1
  --x '1' --

It handles whitespace such as newlines in arguments:

$ getopt --options='' --longoptions=x: --name="dwim" -- --x=$'1\n2'
  --x '1
 2' --

It’s important to note that the output contains literal quotes, which is why we have to use eval to treat them as syntactic quotes. Using just set -- "$arguments" would result in adding literal quotes to some of the values.
We’re not using short option names, but unfortunately we still have to use the --options / -o key/value flag for it to work:

$ getopt --longoptions=help --name="dwim" -- --help
  --

getopt prints useful error messages and returns exit code 1 when passing wrong arguments (but still outputs what it was able to parse):

$ getopt --options='' --longoptions=x: --name="dwim" -- --x
 dwim: option '--x' requires an argument
  --
 $ getopt --options='' --longoptions=help --name="dwim" -- --help --foo
 dwim: unrecognized option '--foo'
  --help --

shift [N] left shifts the parameter array by N (default 1) positions, so when we have handled a flag like --verbose we have to run shift and when we have handled a key/value pair we need to shift 2 to start working on the next parameter. For example, using the getopt output above as a template:

$ set -- --flag --key value -- './some  file.txt'
  $ echo "$1"
  --flag
  $ shift
  $ echo "$1"
  --key
  $ shift
  $ echo "$1"
  value
  $ shift
  $ echo "$1"
  --
  $ shift
  $ echo "$1"
  ./some  file.txt

Once we encounter -- processing is finished.
If we ever encounter anything else ( * matches any string) that means we must’ve passed an option to getopt which we have not yet handled in the case statement, so we need to report that as an implementation error.
Once we’re done processing the arguments we can set the resulting configuration read–only to avoid accidentally changing them later in the script.
A final check verifies that there’s still at least one argument left after option parsing, to fit with the synopsis.

See /usr/share/doc/util-linux/getopt/getopt-parse.bash for a canonical example of getopt use.

Including files#

Aims of this section:

Understand the difference between absolute and relative paths
Understand what the working directory is
Know how to deal with references to other files in simple and not–so–simple cases

A path starting with “/” is called “absolute.” It refers to one particular file in any given context, independent of current working directory. So there can be only one /dev/hostname within the current context. Of course, the particular path may not exist, or the file contents and metadata may change at any time. There are also many ways to change which file an absolute path refers to:

mounting over the path or any ancestor path or
entering a chroot jail, container, VM or other host.

If your script needs to change such a context it is important that you are clear about which code runs within each context. The simplest way to clarify this is usually to put any code you need to run within a different context into a separate script, even if it’s just a single line. That way a context change in your code is represented by a context change in your editor. Another advantage of this approach is that you will never have to escape code within other code, which gets really complicated really fast.

A path starting with any other character is relative to the current working directory. It may be confusing that this is not the directory your script is in! If your shell’s working directory is /home/user and you call ./scripts/pdf2png.bash ./report.pdf the working directory of the script is /home/user. This is handy in most cases, because it means that pdf2png.bash will look for /home/user/report.pdf rather than /home/user/scripts/report.pdf.

Paths starting with “.” are “explicitly relative.” Most of the time adding ./ to the start of a path makes no difference, but they are easier to deal with than the alternative. Consider for example the perfectly valid filename “#tags.txt”. If you pass that as is, unquoted, to a command it will be treated as a comment! Try for example cat #tags.txt – it will do nothing, because cat has been called without an argument and its behavior in that case is to print anything you enter on its standard input. You’ll have to press Ctrl–c to cancel it or Ctrl–d to indicate end of file. cat '#tags.txt' or cat ./#tags.txt will do what you expect. This gets even more complex if the filename starts with a hyphen – what happens now depends on the argument handling of the application. For example:

$ cat -losses.csv
cat: invalid option -- 'l'
Try 'cat --help' for more information.

Quoting doesn’t help here, because the issue is with the application rather than the shell:

$ cat '-losses.csv'
cat: invalid option -- 'l'
Try 'cat --help' for more information.

This finally brings us to actually including files. As you can see from the above, referring to absolute paths in both arguments and within the code makes things simpler – the working directory doesn’t matter. But two desirable features mean that we have to deal with relative paths:

We want to be able to relocate the script and its dependencies. Whether the files are in /home/user/scripts or /usr/bin should not affect the functionality.
We want to be able to pass relative paths as arguments for convenience.

readlink is often used to convert relative paths to absolute ones, but it doesn’t help with this problem: if the relative path isn’t correct, readlink isn’t going to find the correct file — classic garbage in, garbage out. It’s just going to make debugging harder by obfuscating the input.

To deal with files located relative to the script we need to know the directory of the script itself. The simplest case should work if you completely control the environment the script runs in, and nobody is doing anything weird or actively trying to break things:

#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail
shopt -s failglob

script_directory="$(dirname "$0")"

echo "I'm in ${script_directory}"

Unfortunately this doesn’t handle many corner cases, and the code can get unbearably complex. We’ll look at some issues in the following sections.

Different directories#

A sourced file is in a different directory than the sourcing file, and needs to know its own directory. Anything derived from $0 won’t help here, because that is not changed when sourcing. The fix for this is simple — replace $0 with the first element of the $BASH_SOURCE array:

script_directory="$(dirname "${BASH_SOURCE[0]}")"

echo "I'm in ${script_directory}"

Running via a symbolic link#

Somebody creates a symbolic link to your script in another directory, and your script relies on other files relative to itself which are not linked. For example, given a script:

#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail
shopt -s failglob

script_directory="$(dirname "${BASH_SOURCE[0]}")"

# shellcheck source=utilities.bash
. "${script_directory}/utilities.bash"

and its utilities.bash:

warn() {
    printf "%s\n" "$@" >&2
}

both in for example ~/bin. Somebody then creates a symbolic link to the main script, and tries to run it:

$ sudo ln --symbolic ~/bin/example.bash /usr/local/bin/example
$ /usr/local/bin/example
/usr/local/bin/example: line 7: /usr/local/bin/utilities.bash: No such file or directory

There is no “obviously best” solution to this problem. Depending on your situation, one of the following may be appropriate:

Treat it as an application error, and run script_directory="$(readlink --canonicalize-existing -- "$script_directory")" repeatedly until it stops changing to make sure you get back to the original script directory. This adds a fair bit of complexity for something which should be the most trivial part of the code.
Treat it as a packaging error, and inline the script at deployment time so there is only ever one file.
Treat it as a user error, and expect the user to fix it themselves by creating another symlink or inlining the file.

Newline at end of filename#

The script path ends with a newline. Yes, this happens. All it takes is accidentally closing a quote after pressing Enter:

$ sudo ln --symbolic ~/bin/example.bash '/usr/local/bin/example
> '

Bash will even autocomplete that path — it is valid, after all — so it might be some time before anyone even notices the strange name. The problem is a detail of how command substitutions work. Lines end in a newline character, so most programs will use newline terminators. But when processing lines of text individually you typically don’t care about this “extra” character, because it’s not really part of the line. So command substitutions remove any trailing newlines. This means that whenever you do want the exact output of a command substitution, for example a filename, you have to “neutralize” this trimming:

#!/usr/bin/env bash

set -o errexit -o nounset

script_directory="$(dirname "${BASH_SOURCE[0]}X")"
script_directory="${script_directory%X}"

# shellcheck source=utilities.bash
. "${script_directory}/utilities.bash"

Nothing is trimmed in variable expansions, so this can handle almost any filename. Wait, almost? Read on…

Hyphen at start of filename#

Remember how hyphens at the start of filenames usually means the filenames are treated as options instead? And $BASH_SOURCE just contains whatever was typed on the command line. At this point you have to make the path explicitly relative before using it in commands:

#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail
shopt -s failglob

script_path="${BASH_SOURCE[0]}"

if [[ "${script_path:0:1}" != '/' ]]
then
    script_path="./${script_path}"
fi
script_directory="$(dirname "${BASH_SOURCE[0]}X")"
script_directory="${script_directory%X}"

# shellcheck source=utilities.bash
. "${script_directory}/utilities.bash"

All this is to say that it can get complicated to deal with every possible situation, and to be careful to strike a balance between robust and maintainable code.

Cleaning up#

Variable scope#

Aim of this section: Understand how variables are scoped to avoid namespace pollution and associated bugs.

It can get really difficult to tell what is happening when there is a lot of global state. Here we’ll look at some ways to reduce the scope of variables to avoid namespace pollution.

Command scope#

A variable assigned in the prefix of a command is scoped to that command. For example, it is common practice to ensure consistent sorting by setting the $LC_COLLATE variable. Compare

$ printf '%s\n' a A | LC_COLLATE=en_US.utf8 sort
a
A

to

$ printf '%s\n' a A | LC_COLLATE=POSIX sort
A
a

LC_COLLATE is a special locale setting. “POSIX” is a special locale which is available on all modern *nix systems, which makes it ideal for portability.

Please note that the assignment becomes part of the context of the command, not the shell, so assigning one before a shell builtin does not work:

$ name=Smith echo "$name"
[prints an empty line]

Function scope[#](#function-scope

A variable can be scoped to a function:

quote() {
    local line
    while IFS= read -r line
    do
        printf '> %s\n' "$line"
    done
}

This scope is separate from the shell scope, so we can even use the same variable outside the function:

$ line='original value'
$ quote <<< 'example text'
> example text
$ echo "$line"
original value

Using the same variable for different things within the same script is not recommended, even when using local variables, but this at least makes the function reusable in other scripts and free of one kind of side effect.

Shell scope#

Exported variables are scoped to the declaring and descendant shells. This basically means that once a variable is exported it is in context for the remainder of that script, including any subshells. This can be handy for consistency. For example, to ensure consistent locale handling in the entire script we can put export LC_ALL=POSIX near the top. The flip side of this convenience is that these are effectively global variables — they should be used sparingly, when the convenience of having the variable available everywhere is more important than the added complexity in the context with the associated risk of unintended side effects.

In interactive shells variables such as $HOME and $USER are automatically exported — you can list the currently exported variables with declare -x or its shorthand export .

Variables are otherwise scoped to the shell they are declared in.

Scope can be confusing, because “the shell” refers to any code written in the shell, not the commands which run inside the shell. For example, non–exported variables are readable in subshells, but any subshell value changes are local:

$ unset value $ value=outer $ (echo “$value”; value=inner; echo “$value” outer inner $ echo “$value” outer

$ unset value
$ value=outer
$ (echo "$value"; value=inner; echo "$value";)
outer
inner
$ echo "$value"
outer

Basically, non–exported variables used within a subshell work similarly to exported variables used within another script.

Taking out the files#

Aim of this section: Remove temporary files when the script ends to avoid clutter and crashes.

Storing the intermediary results of your script in temporary files can be handy to debug a complex or evolving process. It’s important to clean up these files after running or you risk running into any of these issues:

No more disk space. This usually is fairly easy to fix, but can often cause arbitrary processes to crash.
No more inodes. If your script produces a lot of files, even tiny ones, you can run into this. df --inodes will give you the relevant numbers. The symptoms are usually going to be the same as running out of disk space.
No more memory. Temporary directories are often mounted as virtual memory filesystems for performance reasons:

$ stat --file-system --format=%T /tmp
 tmpfs

Running out of memory is usually going to result in processes having a bad time.

In addition it’s just nice to clean up after running — less clutter and easier debugging. We also want to make sure that the cleanup process is as simple as possible, to avoid bugs like deleting the wrong files or leaving some files behind. This is a perfect use case for traps:

cleanup() {
    rm --force --recursive "$temporary_directory"
}

trap cleanup EXIT
temporary_directory="$(mktemp --directory)"

touch "${temporary_directory}/result.txt"

Now, unless your script is killed with SIGKILL (which bypasses all traps) every file in the temporary directory will be removed when the script exits. This means your script no longer needs to keep track of which temporary files it creates, which makes the cleanup process simple and reliable. Another advantage of this is that if mktemp succeeds you are guaranteed that the files you are working with will not collide with files created by other users — creating a directory is an atomic operation on most local filesystems, and only the creator and root have access to the contents.

You may want to make sure the files are not deleted in case of an error. A simple tweak will ensure this:

#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail
shopt -s failglob

cleanup() {
    # shellcheck disable=SC2181
    if (( $? == 0 ))
    then
        rm --force --recursive "$temporary_directory"
    else
        echo "Script failed. Temporary directory: ${temporary_directory}" >&2
    fi
}

trap cleanup EXIT
temporary_directory="$(mktemp --directory)"

result_file="${temporary_directory}/result.txt"
touch "$result_file"
grep nonexistent "$result_file"

This will alert the user in case of error, so they know where to find the temporary files. For example:

$ ./cleanup-on-success.bash
Script failed. Temporary directory: /tmp/tmp.jy9eCZjFVK
$ ls /tmp/tmp.jy9eCZjFVK
result.txt

Fail–Fast Settings#

Our scripts don’t always go as planned, and so it’s important to have a strategy for how to handle failures. One way is called “fail fast” where we simply exit if there is a problem. Here’s how we fail-fast in bash scripts.

errexit#

The idea behind this is that the script should exit whenever it encounters an error, also known as a non–zero exit code. This script illustrates the idea:

#!/usr/bin/env bash

set -o errexit

false

destroy-all-humans

Running this script is completely safe! The false command returns a non–zero exit code; Bash notices this and terminates the script immediately. We’re all safe, at least for now.

Some exceptions to this rule are necessary to be able to write useful programs. We still want to be able to use conditionals, and we can:

#!/usr/bin/env bash

set -o errexit

if [[ -f "$1" ]]
then
    echo 'Yes'
else
    echo 'No'
fi

echo 'Done'

[[ … ]] is a conditional expression: it returns either 0 (success) or 1 (failure) depending on a condition. In the above case the condition is checking whether the first argument ( $1 ) is an ordinary file ( -f ). The space after [[ and before ]] are necessary, because [[ is just another command with arguments.

Since the root directory is not a plain file the [[ command returns a non–zero exit code, but because the command runs inside a conditional this does not stop the script: it prints “No” and then “Done”.

Unfortunately determining what errexit will do is sometimes an expert–level topic (see Why doesn’t set -e (or set -o errexit, or trap ERR) do what I expected? and Bash: Error handling). Because of this some developers discourage the use of errexit altogether. I personally recommend using it for these reasons:

Critical scripts should be tested. For example, if your script takes one or more files there should be an automated test which verifies that not passing a file results in the expected exit code. errexit makes this easy.
Many of the unexpected ways errexit is handled is caused by not writing the simplest code possible. So using errexit indirectly encourages writing simpler code.
There are really only three alternatives:
- Using a different language with better error handling might be an option if we can convince the stakeholders that it will be worth it. If Bash was chosen for good reasons, like fast development and simple stream processing, this might be a difficult sell.
- Not handling errors at all, which takes on a massive risk even for a simple program.
- Handling errors yourself. This might seem pragmatic, but appending || exit $? to almost every line is not going to help maintainability and trap 'exit $?' ERR is just duplicating what errexit does in the first place, with the additional risk of a typo silently breaking the error handling.

noclobber#

“Clobbering” a file has a very specific meaning in Bash — it’s when you use the > redirect operator to overwrite an existing regular file. Setting noclobber results in a non–zero exit code when a script tries to clobber a file, leaving the file untouched. See for example this script, intended to get 32 bytes of random binary data and convert it to hexadecimal so that it can be stored in a configuration file:

noclobber also applies to symbolic links to regular files. Some special files like /dev/null are not actually changed when you attempt to clobber them. noclobber is smart enough to allow such redirects.
#!/usr/bin/env bash set -o errexit -o noclobber working_directory="$(mktemp --directory)" bin_path="${working_directory}/secret.bin" hex_path="${working_directory}/secret.hex" dd bs=32 count=1 if=/dev/urandom of="$bin_path" status=none # shellcheck disable=SC2094 xxd -cols 32 -plain “$bin_path” > “$bin_path” logger “New secret available in ${hex_path}.”

There’s a simple typo: the xxd output should be redirected to $hex_path , but instead it’s redirected back to the already existing file.

This is a particularly nasty kind of redirect error. A common mistake when trying to create as few files as possible is to run COMMAND < FILE > FILE . You might reasonably expect it to start the command, read the file and write the result back to the same file. But because of the way redirects work what actually happens is that Bash opens the file for reading, empties the file, opens the file for writing, and then the command starts. So the command reads an empty file and therefore usually produces an empty output file.

The xxd line results in the following error message:

[…]/secret.bin: cannot overwrite existing file

The script exits without ever running the xxd command. Because errexit is enabled the script also terminates at that line, so it never runs logger .

noclobber encourages several best practices:

Using unique filenames at each step of processing makes it much easier to debug a failing process — you can see the output of every step, and can tweak and retry steps easily until things work they way you expect.
mktemp --directory is a great way of guaranteeing that you won’t run into clobbering errors because of files from previous runs or unrelated sources in your current working directory.
Processing using pipelines avoids clobbering, and is generally much faster than the stop–and–go process of creating a new file at each step.

nounset#

nounset treats attempts to reference an undefined variable as an error. The following script assumes that an argument is always passed to it:

#!/usr/bin/env bash

set -o errexit -o nounset

dir="$1"

for file in "$dir"/*
do
    echo "Processing ${file}"
done

Strictly speaking errexit is redundant in this case, because nounset implies errexit when running in a non–interactive shell. Since this is surprising behavior I still recommend setting errexit explicitly.

This script stops with an exit code of 1 at the dir="$1" line (again without executing it). Without nounset the loop would have iterated over all the files in the root directory, which could be catastrophic depending on the actual processing.

Several projects have done Bad Things™ by not s

pipefail#

A pipeline only has a single exit code. By default this is the exit code of the last command in the pipeline. This is bad — in production–quality code we want to make sure that every non–zero exit code is either explicitly handled or results in the script failing immediately. Take this script for example:

#!/usr/bin/env bash # Count main heading words in a markdown document grep '^# ’ ./README.md | wc --words

If there is no README.md file in the current directory this script will succeed: grep will helpfully report

grep: ./README.md: No such file or directory

but the exit code of the script will be zero. Often you don’t care about this specific error, but what if the error is something more insidious? Consider this script:

#!/usr/bin/env bash

# Dangerous code, do not use!
netstat --listening --numeric --tcp | tail --lines=+3

This should show every listening TCP network service running on the machine. Let’s say it’s also part of a bigger system to report targets for security auditing, so we want to make the output easily processable. There’s no option to omit the two–line header from netstat output so we use tail to print only line three onwards.

There’s a bug in this script, but it’s subtle. Like the “curious incident of the dog in the night–time” from Sherlock Holmes the absence of something can indicate a problem. Consider what happens when there is no netstat command on the system, such as on a minimal installation:

The first command fails because netstat does not exist on the $PATH . Bash reports the missing command, but that information may be silenced or lost in the noise of the overall system.
The second command always succeeds, and ends up printing nothing.
The exit code of the second command is returned by the script.

The result is that it looks like there is nothing to report, when in reality the script failed to check whether there is anything to report!

The easy way to fix this is with pipefail : the exit code of the pipeline becomes the exit code of the last (rightmost) command which failed. If none of the commands fail the exit code is zero. Combining this with errexit we get this script:

#!/usr/bin/env bash

set -o errexit -o pipefail

netstat --listening --numeric --tcp | tail --lines=+3

Now if the first command fails for whatever reason the script will fail immediately. We can then either handle or report the error in the caller, all the way to the end user.

Some sub–optimal solutions to this problem are common. One possibility would be to use temporary files instead of a pipeline, but that would be inefficient and more complicated. Another would be to check for the existence of the command before using it, but that’s an example of asking when we should be telling — it’s a poor substitute for catching the actual error. It would not work if the error was an unknown netstat option, for example.

If you want to detect and react to specific exit codes from specific commands in a pipeline there is the $PIPESTATUS array variable. For example, let’s expand on the previous script:

#!/usr/bin/env bash

set -o errexit -o pipefail

# Get PID and program name of world-answering services
netstat --listening --numeric --program --tcp \
    | grep --fixed-strings '0.0.0.0:*' \
    | tr --squeeze-repeats ' ' \
    | cut --delimiter=' ' --fields=7 \
    || [[ "${PIPESTATUS[*]}" == '0 1 0 0' ]]

The grep manual states that it will return exit code 1 if no lines are output, but in this script that is actually the ideal state – no services are exposed to the world. So in order to be able to detect any other issues with this pipeline we have to check each of the return codes and verify that only grep returned 1, and that all the other commands succeeded.

Since $PIPESTATUS is set on every command we have to check the entire array in a single command or copy the array to a different one before checking each exit code separately. Which one you choose depends on how detailed error handling you want.

failglob#

One confusing default behavior of globbing, especially because it’s completely different from how regular expressions work, is that globs expand to themselves if there are no matches:

$ cd “$(mktemp --directory)” $ for file in ./* > do > grep foo “$file” > done grep: ./*: No such file or directory

In the spirit of this section you can fail when a glob doesn’t match any existing files:

$ shopt -s failglob
$ for file in ./*
> do
>     grep foo "$file"
> done
bash: no match: ./*
$ echo "$?"
1

Much easier to debug! Another option is to use shopt -s nullglob to expand non–matching globs to the empty string. But this comes with its own set of caveats, because no arguments often means something special such as “read from standard input” ( grep ) or “operate on the current directory” ( find ).

Version Control#

Prerequisites: Before reading this chapter, you should have installed Git and should be familiar with editing commands.

After this chapter you will have learned your first shell superpower! If that sounds like a tall order, consider this: after learning a handful of very simple commands you will be able to back up your most important files in private and for free and sync these files to all your machines with a single tool available on every modern operating system.

Currently the field of version control systems is completely dominated by Git, so that’s what we’ll use. One advantage of learning Git in a shell is that the vast majority of what you learn is transferable to other version control systems, and will likely be transferable to future version control systems.

Gitting started#

First there is some once–only configuration you need to do. Initially you’ll be working alone, but Git is a highly collaborative tool, so it expects to be able to uniquely identify contributors. It does this with a combination of a name and an email address. To set the email address we run git config --global user.email "me@example.com" — use this example, or your actual address. The command to set your name is git config --global user.name "Your Name" . Since these settings are global they will apply to any Git changes you do on this machine.

Global Git settings are in ~/.gitconfig . Run git config --list to show all the non–default settings currently in effect. Settings for individual repositories are in .git/config in the top level directory of the repository.

At this point I recommend you sign up to one of the many free Git hosting services out there. GitLab is good because they have a free and open source version you can host yourself if you should ever need to, but it’s not too important which one you choose for your first repository. You interact with all of them in the same way, and changing providers can be done with a single command. So you don’t have to worry about lock–in, and you can take your time to find a provider you like. You can even use several for the same repository. Once signed up you can create a repository on the website.

The local equivalent to creating a repository on a provider website is running git init . If you want to learn about it and much more I recommend the reference section at the end of this chapter.

Once you’ve created a repository you can clone it to your machine. The exact command should be on the page you got after creating the repository and starts with git clone . Run that command in your shell, and Git will securely download the new repository into a directory with the same name as the repository. If instead you get some sort of “Permission denied” error at this stage you may want to look

Gitting stuff done#

At this point you should have a directory with the name of your project. Maybe you’ve created a repository to try out some of the stuff in this book:

$ ls
fullstack-bash

Go into your directory and you’ll see a hidden “.git” directory:

$ cd fullstack-bash
$ ls --almost-all
.git

The “.git” directory contains all the information that the git commands use to keep track of changes to your files. You’ll learn about some of these below, but in general you should not need to change any of them manually.

Start your Git career by checking the status of the repository:

$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

That’s a lot of nonsense if you’re not yet used to version control systems. A quick glossary:

Commit is a super important verb and noun. Each commit in a repository is a collection of file changes (a “diff”, short for “difference”) along with some metadata, including a long hexadecimal ID.
The working tree refers to the tree of files inside the repository except for the “.git” directory. Being “clean” means there are no relevant file changes; basically, nothing to do.
A branch is a marker assigned to a commit ID. When a branch is active and you create a commit the branch is moved to the new commit. People often talk about several commits being “on a branch” – this refers to the commit the branch is assigned to and all its ancestor commits. Commits will often be on more than one branch. In this case, when people talk about commits on a branch they usually mean the commits which are only on that branch.
“master” is the name of the default branch.
“origin” is the name of the default remote, basically the place you cloned from, and “origin/master” refers to the “master” branch on the “origin” remote. Being “up to date” means that as far as the local repository is concerned the current branch on the remote refers to the same commit as the current branch on your clone of the repository.

You might have noticed how I mention “files” but not “directories” above. This is deliberate, because for technical and historical reasons Git only tracks changes to the contents of files, not directories. Which means that you can’t track a completely empty directory in Git. This has led to a convention of creating an empty “.gitkeep” file within a directory you really do want to track, and tracking that file as a substitute for the directory.

This is your repository. Now create some stuff! A common starting point is what’s called a “readme” file, commonly in a lightweight markup language called Markdown, which explains what the repository is all about:

$ > README.md cat <<‘EOF’ > # Fullstack Bash Scripting notes & code > EOF

$ > README.md cat <<'EOF'
> # Fullstack Bash Scripting notes & code
> EOF

What does the status look like now?

$ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	README.md

nothing added to commit but untracked files present (use "git add" to track)

A completely new repository starts without commits. If you created a repository using a hosting service it may have already created a first commit containing some files.

The interesting new word here is “track” and the hint about git add <file>... : new files are untracked until added by Git. Until a file is tracked it is not yet part of the repository, and just happens to be within the same directory as the repository. For this reason, most Git commands only deal with tracked files. Let’s add the file and check the status again:

Checking the status after any git command is a good way to make sure you understand what each step did and to get tips about next steps.

$ git add README.md
$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   README.md

The cache is an intermediate workspace which is useful to allow you to commit part of your work, even down to individual lines in a file. To add parts of files you can try adding interactively using git add --patch or third–party GUIs like git gui or TortoiseGit.

So Git now tracks the file, but it’s not committed yet:

$ git commit --message="Briefly explain the project"
[master (root-commit) 5f00a4b] Briefly explain the project
 1 file changed, 1 insertion(+)
 create mode 100644 README.md

None of this is relevant to us yet, so let’s just check the status:

$ git status
On branch master
nothing to commit, working tree clean

Notice how “Your branch is up to date …” is missing. To update “origin” we need to push, which uploads the commits to wherever “origin” points to and tells “origin” to update the branch to point to the new commit:

$ git push
Counting objects: 3, done.
Writing objects: 100% (3/3), 262 bytes | 262.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To example.org:victor-engmark/fullstack-bash.git
 * [new branch]      master -> master

And what’s the status?

$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

We’re up to date! That means you could literally remove the entire repository directory and clone it again, and yo

Git gud#

At this point there’s still a lot to learn to use Git in a team, where it really shines. There are many resources for learning about Git in depth; I would recommend the free and popular Pro Git book. If you’re only getting started with version control there are a few things to keep in mind:

Almost any destructive operation can be reversed, if you’re careful. Slow down, maybe experiment in a copy of the repository, and you won’t lose changes by accident.
Track the source files and the instructions (or code) to transform it into other formats, but not the files generated from the sources. Tracking generated files usually leads to a messy and bloated repository history. That’s not to say you shouldn’t keep the generated files somewhere!
In the same vein it’s better to track the plaintext version of something than any binary representations. Some formats can be losslessly converted from a binary to a text format, for example by decompressing an SVGZ file, saving as FODT rather than ODT, or exporting a spreadsheet to CSV.

Quality Assurance#

Prerequisites: Before reading this chapter, you should be familiar with script structure and comfortable running scripts.

Linting#

The definitive linting tool for shell scripts is shellcheck . It will catch many common issues; let’s look at an example script:

#!bash

# This code is terrible - please don't use it for anything!

if [ -z $1 ]
then
    exit 1
fi

targets = "$PWD/$@/*"

cat $targets/*.txt | wc --lines | read lines

echo $lines

Depending on how familiar you are with shell scripting, that script may look fine or horrendous. Let’s check:

$ shellcheck ./shellcheck-example.bash

In ./shellcheck-example.bash line 1:
#!bash
^-- SC2239: Ensure the shebang uses an absolute path to the interpreter.


In ./shellcheck-example.bash line 5:
if [ -z $1 ]
        ^-- SC2086: Double quote to prevent globbing and word splitting.


In ./shellcheck-example.bash line 10:
targets = "$PWD/$@/*"
        ^-- SC1068: Don't put spaces around the = in assignments.
          ^---------^ SC2124: Assigning an array to a string! Assign as array, or
use * instead of @ to concatenate.


In ./shellcheck-example.bash line 12:
cat $targets/*.txt | wc --lines | read lines
    ^------^ SC2086: Double quote to prevent globbing and word splitting.
                                  ^--^ SC2162: read without -r will mangle
backslashes.
                                       ^---^ SC2030: Modification of lines is local
(to subshell caused by pipeline).


In ./shellcheck-example.bash line 14:
echo $lines
     ^----^ SC2031: lines was modified in a subshell. That change might be lost.
     ^----^ SC2086: Double quote to prevent globbing and word splitting.

For more information:
  https://www.shellcheck.net/wiki/SC1068 -- Don't put spaces around the = in ...
  https://www.shellcheck.net/wiki/SC2239 -- Ensure the shebang uses an absolu...
  https://www.shellcheck.net/wiki/SC2124 -- Assigning an array to a string! A...

Ouch! As you can see, there’s a problem on almost every line:

the shebang line should use an absolute path
variables in conditionals should be quoted (this would not be a problem with [[ )
assignments do not work the same as they do in most programming languages
you have to be explicit about creating associative arrays
read ’s defaults are not safe
assignments are local to the current shell

#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail
shopt -s failglob

if [[ "$#" -ne 1 ]]
then
    exit 1
fi

targets=("${@}/"*)

sum=0
for directory in "${targets[@]}"
do
    read -r lines < <(cat "${directory}/"*".bash" | wc --lines)
    ((sum += lines))
done

echo "$sum"

Testing#

Before getting into to the details there’s a popular myth which needs dispelling: that testing shell scripts is basically pointless. I suspect this is an unfortunate side–effect of the most common style of shell script: a bunch of commands with many responsibilities and side–effects. As a community, developers moved away from that many years ago in most programming except shell scripting. One reason for this is probably that shell scripts can be difficult to handle as a unit when split up — a set of shell scripts are much less of a unit than a set of Java classes or even a Python package. But by splitting up scripts we can make them much easier to test. We just need to make sure they stay together into production, so that they still work as a unit.

Another common myth is that testing is an all–or–nothing proposition. Instead, every test should improve the project a little bit, allowing the long–term quality and speed of development to become good enough and stay good enough.

Probably the highest value testing to get started with is a simple side–effect–free transformation. Code without side–effects is easy to test, because it doesn’t matter in which order the tests run. They can even run in parallel, which is really helpful once your test run times crawl up over the one second mark. A lot of shell scripts are written so that almost every line has side–effects, but usually those side–effects can be isolated so that the remaining code can be tested easily. And transformations are really common in shell scripts.

Let’s start with a typical example: your application stores database connection information in a JSON configuration file. You want to write a backup script, and have decided that doing this as Bash is easiest for now. But the database client takes this configuration as command line arguments, so you need to transform it. You might already be thinking about something like database_client $(grep --regexp=… config.json | sed --expression=…) STATEMENT – let’s see how you might test that transformation.

config.json:

{
  "username": "user",
  "password": "pass"
}

We want our new script to print “user” followed by “pass”, each NUL–terminated:

#!/usr/bin/env bash

script_directory="$(dirname "${BASH_SOURCE[0]}")"

test_should_print_username_and_password_nul_separated() {
    configuration='{
  "username": "user",
  "password": "pass"
}'
    mapfile -d '' result < <(
        "${script_directory}/config-transformer.bash" <<< "$configuration"
    )
    assertEquals 'user' "${result[0]}"
    assertEquals 'pass' "${result[1]}"
    true
}

# shellcheck source=shunit2
. "${script_directory}/shunit2"

shunit2 is probably the leading framework for testing shell scripts. It is focused on portability, so if you do need to write portable shell scripts it is a good choice for testing all of them with the same script. Unlike ordinary scripts we should not set -o errexit etc. in test scripts, because these options would interfere with the way shunit2 works.

Our first test run:

$ ./test-config-transformer.bash
test_should_print_username_and_password
./test-config-transformer.bash: line 10: ./config-transformer.bash: No such file or directory
ASSERT:expected:<user> but was:<>
ASSERT:expected:<pass> but was:<>
shunit2:ERROR test_should_print_username_and_password() returned non-zero return
code.

Ran 1 test.

FAILED (failures=3)

Sometimes writing a bit of boilerplate is necessary to be able to get a useful error message. In this example creating config-transformer.bash first would not substantially change the test result, so I’ve skipped that step.

Let’s try a naive implementation:

#!/usr/bin/env bash

set -o errexit -o noclobber -o nounset -o pipefail
shopt -s failglob

grep --regexp=username --regexp=password \
    | tr '\n' '\0' \
    | cut --delimiter='"' --fields=4 --zero-terminated

Our second test run:

$ ./test-config-transformer.bash
test_should_print_username_and_password

Ran 1 test.

OK

This gets the job done for our oversimplified example, but you’ve probably already spotted some terrible bugs:

The grep command matches other lines with names or values containing “username” or “password”.
JSON escapes are not reversed, so the output may contain redundant backslashes.
The script relies on a particular pretty–print format of JSON, where each property is on a separate line.

All of these point to the same cause: grep , tr and cut are line–based tools, not JSON tools. The easiest fix is to use an actual JSON parser such as jq for this. And because of how we’re writing tests we won’t have to modify the existing ones when swapping out the application code.

I would recommend looking into the red, green, refactor test–driven development (TDD) approach to end up with robust shell scripts. It is a difficult technique, but well worth it to end up with simple production code and short, orthogonal tests.

As a more complete example, this is the test suite for multigrep.bash , shown in the Exit Codes chapter:

#!/usr/bin/env bash

script_directory="$(dirname "${BASH_SOURCE[0]}")"

script="${script_directory}/multigrep.bash"

run_script() {
    "$script" "$@"
    exit_code="$?"
    echo x # Avoid newline trimming in command substitution
    return "$exit_code"
}

test_should_return_exit_code_1_when_no_patterns() {
    result="$(run_script <<< 'input')"
    assertEquals 1 $?
    assertEquals 'x' "$result"
    true
}

test_should_return_grep_exit_code_when_not_1() {
    result="$(run_script "\\" <<< 'input')"
    assertEquals 2 $?
    assertEquals 'x' "$result"
    true
}

test_should_print_matching_pattern() {
    result="$(run_script 'in' <<< 'input')"
    assertEquals 0 $?
    assertEquals $'input\nx' "$result"
    true
}

test_should_return_success_when_no_matches() {
    result="$(run_script 'in' <<< 'NOPE')"
    assertEquals 0 $?
    true
}

test_should_match_all_patterns_in_any_order() {
    result="$(run_script '01' '10' \
        <<< $'000\n001\n010\n011\n100\n101\n110\n111')"
    assertEquals 0 $?
    assertEquals $'010\n101\nx' "$result"
    true
}

# Verify escaping of user input
test_should_handle_whitespace_in_pattern() {
    result="$(run_script 'a  b' 'b a' <<< $'aa bb\na b a\na  bb aa')"
    assertEquals 0 $?
    assertEquals $'a  bb aa\nx' "$result"
    true
}

test_should_handle_quotes_in_pattern() {
    result="$(run_script '"' "'" '"$' <<< $'"1"\n\'2\'\n\'"3"\'\n"\'4\'"')"
    assertEquals 0 $?
    assertEquals $'"\'4\'"\nx' "$result"
    true
}

test_should_handle_dollar_in_pattern() {
    result="$(run_script '^..$' '0$' <<< $'00\n01\n10\n11\n000')"
    assertEquals 0 $?
    assertEquals $'00\n10\nx' "$result"
    true
}

test_should_handle_backslash_in_pattern() {
    result="$(run_script '\[' "\\\\" <<< $'\\ (\n\\ )\n\\ [\n\\ ]')"
    assertEquals 0 $?
    assertEquals $'\\ [\nx' "$result"
    true
}

# shellcheck source=../quality-assurance/shunit2
. "$(dirname "$script_directory")/quality-assurance/shunit2"

Signals#

Prerequisites: Before reading this chapter, you should be familiar with running scripts.

Signals are a limited form of inter–process communication, used to send asynchronous notifications to processes. Within Bash, signals can be sent using the kill command and are handled by traps. When a process is running in the foreground (that is, we’re waiting for it to finish before being able to run another command) we can also send some signals to the foreground process using keyboard shortcuts.

Let’s start by looking into the above in terms of a signal you might already be familiar with. SIGINT, the keyboard interrupt signal, is sent to the foreground process when pressing Ctrl–c. Try running for example sleep infinity , which would normally run forever in the foreground. Press Ctrl-c to interrupt it. At this point, Bash prints ^C at the cursor position to indicate where in the output stream of the foreground process the shortcut was pressed, terminates the sleep process, and returns to the prompt:

$ sleep infinity
^C
$ █

^C is an example of one format for representing a keyboard shortcut: pressing Ctrl ( ^ ) followed by a letter (in this case C ). The letter is always printed in uppercase, whether or not Shift is involved in the keyboard shortcut.

Pressing Ctrl-c on the Bash prompt itself also prints ^C followed by a new prompt, but does not exit the Bash process. This is a handy way to cancel editing the current command without having to erase it before starting the next command.

This chapter just covers the most common signals; see man 7 signal and the “Signals” section of man bash for details.

0 (zero) is not a “real” signal – it is used to check whether a process is still running. kill -0 PID… checks whether the given PIDs are still running and returns successfully if at least one of them still is. Can’t be trapped since the kernel is responsible for keeping track of which processes are running.

There is no “SIG…” name for signal 0.

SIGINT, the keyboard interrupt signal, is sent to the foreground process when pressing Ctrl–c. By default it terminates the process. Some interactive applications when receiving this signal will prompt the user asking what they want to do, in particular in the middle of a process which shouldn’t be interrupted.

SIGKILL is the overkill option. It should never be used in production code, mainly because it doesn’t allow the process to trap it or do any kind of cleaning up after itself. This can result in all sorts of nasty breakage, including unrecoverable errors like corrupted files. Basically treat kill -KILL (or kill -9 , which is synonymous) as a bug.

SIGQUIT, the quit signal, by default terminates the process and dumps core. *Ctrl-* (backslash) sends SIGQUIT to the foreground process.

SIGTERM, the termination signal, is sent when not specifying a signal in the kill command, as in kill PID… . In a non–interactive program this is probably the signal we’ll want to use to terminate a child process.

SIGSTOP and SIGTSTP, the stop and terminal stop signals, pause the execution of the process until it receives SIGCONT, the continue if stopped signal. Ctrl–z sends SIGTSTP to the current foreground process. SIGSTOP can’t be trapped.

All the named signals above have numeric equivalents (listed by kill -l ), but the names are easier to read and to look up, so that’s what this book uses. The numbering of signals is also system–dependent, so this is another case where the portable code is also more readable.

Sending signals#

kill [-SIGNAL] PID… sends the given signal to all the given process IDs (except for signals 0, SIGKILL, SIGSTOP and SIGTSTP, which sends the given signal to the kernel). For example, kill -TERM 123 sends SIGTERM to the process with ID 123. Each process can either “trap” the signal or let the kernel perform the default action on the process, as documented in man 7 signal .

The signal can also be specified with the “SIG” prefix, as in kill -SIGTERM PID… , or with the signal number, as in kill -15 PID… .

kill is a shell builtin, but help kill is not very detailed. Some of the documentation in man kill applies to the builtin as well, except for things like the different options.

Handling signals#

A trap is a signal handler specified in Bash. To set up a trap for any number of signals, use trap COMMAND SIGNAL… . Whenever the script receives any of these signals it will execute the command. trap supports some important pseudo–signals (that is, they can be trapped but are not “real” signals, so we can’t send them with kill ):

The DEBUG trap runs before every command. We can use it to print useful context such as the value of a variable or even to step through each line: trap 'read -p "$BASH_COMMAND"' DEBUG .
The ERR trap runs when any of the conditions triggering errexit occur (it doesn’t override errexit though, so the script will still exit at that point). This can be useful to print specific debugging information, for example: trap 'echo "$counter"' ERR …
The EXIT trap runs when a script is exiting. This is really helpful to do the kind of cleanup which should happen after the script has nothing else left to do, such as removing any temporary directories. This trap does not interrupt the termination of the script, so there is no need to run exit at the end of the EXIT trap code.
The RETURN trap is similar to EXIT . It runs when a function or sourced script (that is, . FILE or source FILE ) finishes.

For example:

trap 'echo "Continuing…" >&2' CONT

while true
do
    sleep 0.001
done

When this script starts it sets up the traps and sleeps repeatedly unless interrupted. Let’s explore how this works. Run the script, then press Ctrl–z to pause it, and then resume the script in the foreground by running fg . At this point the script should print “Continuing…”. Press Ctrl–c to terminate the script.

When triggering a trap this can be thought of as queuing a command to run after the current foreground process. This has some important side–effects:

The code inside a trap can itself be interrupted. For example, try adding sleep infinity; to the start of the trap above. The trap ends up running forever, but we can still cancel it using Ctrl–c. In this case, the echo command never runs.
The trap code might never even start. Starting from the original code above, try changing the sleep delay in the loop to infinity . When pressing Ctrl–z the echo command is scheduled, but the sleep command never finishes so echo never runs.

Bash builtin commands are part of the current foreground process. Unlike separate programs like sleep this means they will be suspended while the trap code runs. For example:

trap 'echo "Continuing…" >&2' CONT

read -r line
echo "$line"

When starting this program, it will wait until it receives a line of input. It then prints the line and exits. If we instead press Ctrl–z and run fg , the script prints “Continuing…” and then resumes waiting for input.

Since the trap command string is executed verbatim, it has the same issues as running code with eval . Therefore, it should be as simple as possible; simpler than normal code. One option is to put the command in a function, as in trap cleanup EXIT . The trap command runs in the same namespace as the command at which point it was invoked, so it has access to any context (like variables) which has changed during the script.

A common bug in traps is using double quotes around a command with a variable: trap "echo \"$index\"" USR1 . The problem is that any variable within double quotes is going to be expanded immediately, not when the script receives a USR1 signal. Better to use single quotes ( trap 'echo "$index"' USR1 ). Even better, call a function which just runs echo "$index" .

Whenever we set the code for a signal it replaces any existing code. This makes it difficult to modify signal handlers, since we have to not only deal with the complexity of eval but also have to avoid issues such as double command separators (ending up with nonsense like first; && second , for example), conflicting side effects, and whether the commands may end up running in a different order. Better to push that complexity into a function or script which can figure out which actions to take.

Autocompletion#

Prerequisites: Before reading this chapter, you should be familiar with autocompleting commands, options and arguments by using the Tab key, writing functions and sourcing files.

Installing autocompletions#

The default autocompletion includes the files in the working directory. For example, if you start a command line with a non–existing command like foobar , then press Space followed by Tab, Bash will suggest the local files:

$ cd "$(mktemp --directory)"
$ touch aye
$ mkdir bee
$ foobar <Tab>
aye  bee/

To complete anything else we need an autocompletion program. These are usually included in the package containing the program. If you find that autocompletion is not enabled after installing, that is, it only suggests working directory files, it might be worth looking for a separate package called “PROGRAM–complete” or similar.

I say autocompletion “program” because even though it’s usually a script, it doesn’t have to be.

Writing your own autocompletion#

When creating a command line tool for use by a small group it’s probably sufficient to print a helpful message when running with the --help flag. But once you start thinking about redistributing your tool you’ll want to make adoption as easy as possible. An autocompletion script is a good step in that direction. If your tool has a reasonable number of well–named parameters, autocompletion acts as a quicker substitute for referring to a man page or --help , and even users completely unfamiliar with the tool can get a good idea of what it’s capable of and how easy it will be to use.

The simplest autocompletion is no autocompletion. As mentioned above, file path completion is the default, and that covers a number of simple commands. But all commands benefit from a --help flag, and many commands can benefit from common flags such as --verbose or --quiet to show more or less output, --color=[auto|always|never] to set whether the output should be colored, or --config=FILE to point to a configuration file. Let’s try implementing completion for a fictional “do what I mean” command, dwim . We start by setting up a completion for a dwim command which only takes one argument, the static string --help :

complete -W ‘–help’ dwim

This script is meant to be sourced, which is why it doesn’t have all the From the Terminal to Production bells and whistles.

The complete command specifies how arguments are to be autocompleted. Running the command above means that the command called dwim will be autocompleted with the single argument (“word”; -W ) --help unconditionally. To test it, simply run the above complete command, type dwim -- and press Tab: the command line will now read dwim --help . It doesn’t matter that no dwim command exists – the completion is completely independent of the command.

The completion does not include completion of the dwim command itself – if you type dwi and press Tab it will not complete to dwim (unless, of course, you have installed a program called “dwim”). Completion of the first word is handled by a different mechanism.

Let’s make this more interesting by adding support for --color=[auto|always|never ever] :

_dwim() {
    local before_previous_word color_values completions current_word options \
        previous_word

    # Create an array containing all the options: "--color=auto",
    # "--color=always", "--color=never ever" and "--help"
    color_values=('auto' 'always' 'never ever')
    mapfile -t options < <(printf -- "--color='%q'\n" "${color_values[@]}")
    options+=('--help')

    # Save the last three words of the command, up to and including the
    # position of the cursor
    before_previous_word="${COMP_WORDS[$((COMP_CWORD-2))]}"
    previous_word="${COMP_WORDS[$((COMP_CWORD-1))]}"
    current_word="${COMP_WORDS[COMP_CWORD]}"

    # Generate completion when the cursor is at the end of a partial color
    # value, for example `--color=al`
    if [[ "$before_previous_word" == '--color' ]] \
        && [[ "$previous_word" == '=' ]]
    then
        mapfile -t completions < <(
            compgen -W "$(printf '%q ' "${color_values[@]}")" -- "$current_word"
        )
        mapfile -t COMPREPLY < <(printf "%q \n" "${completions[@]}")
        return
    fi

    # Generate completion when the cursor is at the end of `--color=`
    if [[ "$previous_word" == '--color' ]] && [[ "$current_word" == '=' ]]
    then
        mapfile -t COMPREPLY < <(printf "%q \n" "${color_values[@]}")
        return
    fi

    # Generate default completion, suggesting all the options
    mapfile -t COMPREPLY < <(compgen -W "${options[*]}" -- "$current_word")
}

complete -o nosort -F _dwim dwim

Ouch, the complexity just exploded! Let’s go through this section by section:

The autocomplete function ( _dwim ) could be called anything, but simply giving it the name of the command prefixed with an underscore makes it easy to find and unlikely to clash with any existing command or function names. This is a common convention for completion scripts.
We don’t want to pollute the global namespace, so we need to declare all the temporary variables local.
The color key/value options are generated from the color values.
$COMP_WORDS is an array of the words typed after the program name. And $COMP_CWORD is the index of the word where the cursor is in $COMP_WORDS . We need to know at most the current word and the two words before it.
We check the completions from the most to the least specific. Autocompletion is handled by Readline, so = is considered a separate word for autocompletion purposes. This is why complete -W '--color=always --color=auto --color=never --help' dwim won’t work – it stops completing anything after --color= . This is the source of most of the complexity:
The first if checks whether we are completing the word after --color= , that is, the color value. If so, we generate completion values ( compgen ) from the list of words ( -W 'string' ) which match the current word and set the completion reply ( $COMPREPLY ) array to the newline–separated results. Bash–quoting the values within that string using printf ’s %q allows us to have values with spaces (such as never ever , which is included to demonstrate this) and other special characters in the completions.
The second if checks for the exact string --color= . If so, just offer the color values as suggestions.
If neither of the above apply we just complete using the original options array.
At the end we tell complete to use the _dwim function to work out the completion of the dwim command, and to not sort the completions. In general, we do want to sort options to make them easier to scan through, but in case of key/value pairs like --color=COLOR it’s common to show the default value first.

After sourcing a script containing the above code we can try out the resulting completion:

$ dwim <Tab>
--color=auto  --color=always  --color=never\ ever  --help
$ dwim --c<Tab>
--color=auto  --color=always  --color=never\ ever
$ dwim --color=<Tab>
auto  always  never\ ever
$ dwim --color=n<Tab>
$ dwim --color=never\ ever

A little experimentation shows that Bash will complete as much of the command as can be unambiguously completed based on the text before the cursor. So after the dwim --c completion above the command line will say dwim --color= , because all of the available completions start with that string.

These are all static completions. To introduce dynamic completions like --config=FILE we need to use a different method:

_dwim_files() {
    local completions path
    COMPREPLY=()
    mapfile -t completions < <(compgen -A file -- "$1")
    for path in "${completions[@]}"
    do
        if [[ -d "$path" ]]
        then
            COMPREPLY+=("$(printf "%q/" "$path")")
        else
            COMPREPLY+=("$(printf "%q " "$path")")
        fi
    done
}

_dwim() {
    local before_previous_word color_values completions current_word options \
        previous_word

    # Create an array containing all the options: "--color=auto",
    # "--color=always", "--color=never ever" and "--help"
    color_values=('auto' 'always' 'never ever')
    mapfile -t options < <(printf -- "--color='%q'\n" "${color_values[@]}")
    options+=('--config=' '--help\ ')

    # Save the last three words of the command, up to and including the
    # position of the cursor
    before_previous_word="${COMP_WORDS[$((COMP_CWORD-2))]}"
    previous_word="${COMP_WORDS[$((COMP_CWORD-1))]}"
    current_word="${COMP_WORDS[COMP_CWORD]}"

    # Generate completion when the cursor is at the end of a partial color
    # value, for example `--color=al`
    if [[ "$before_previous_word" == '--color' ]] \
        && [[ "$previous_word" == '=' ]]
    then
        mapfile -t completions < <(
            compgen -W "$(printf '%q ' "${color_values[@]}")" -- "$current_word"
        )
        mapfile -t COMPREPLY < <(printf "%q \n" "${completions[@]}")
        return
    fi

    # Generate completion when the cursor is at the end of `--color=`
    if [[ "$previous_word" == '--color' ]] && [[ "$current_word" == '=' ]]
    then
        mapfile -t COMPREPLY < <(printf "%q \n" "${color_values[@]}")
        return
    fi

    # Generate completion when the cursor is at the end of a partial file path,
    # for example `--config=./`
    if [[ "$before_previous_word" == '--config' ]] \
        && [[ "$previous_word" == '=' ]]
    then
        _dwim_files "$current_word"
        return
    fi

    # Generate completion when the cursor is at the end of `--config=`
    if [[ "$previous_word" == '--config' ]] && [[ "$current_word" == '=' ]]
    then
        _dwim_files ''
        return
    fi

    # Generate default completion, suggesting all the options
    mapfile -t COMPREPLY < <(compgen -W "${options[*]}" -- "$current_word")
}

complete -o nosort -o nospace -F _dwim dwim

A quick diff between these implementations reveals:

A function which completes directories followed by a slash and files followed by a space.
The default options include --config= . Unlike --color=[auto|always|never ever] we don’t want to suggest --config=FILE for every path available, because that result could be huge. So we simply postpone suggesting actual filenames until the cursor is after --config= .
To avoid complete inserting a space character when completing --config= we need to turn off that default with -o nospace and manually insert spaces after completing all other parameters.
--config=./some/path ignores the --config= prefix and tries to complete the rest as a path.
--config= tries to complete the empty path, resulting in suggestions from the working directory.

Summary

In this book I’ve tried to present Bash itself, popular tools and techniques, pitfalls and workarounds. I hope it has given you enough knowledge to build commands and scripts while confident that the result will be not just useful but robust and maintainable.

Bash is everywhere. Since 1989 it has grown to be the most popular shell on *nixes, and it will be around for longer still. It is simple to get started with, hard to master, and you can do incredible things with a line or two. Happy scripting, thank you for reading, and good luck!