The newline Guide to Bash Scripting - Part 1

About this book#

Fullstack Bash Scripting is an exploration of the Bash shell, scripting language and related tools. It’s aimed at developers who want to get the job done right the first time, and make sure maintenance is a breeze. Topics include:

  • how to find help on arbitrary commands
  • how to structure a robust and maintainable script
  • editing commands and interacting with the terminal efficiently
  • tools for dealing with version control, JSON, XML, text encodings, images and compressed files
  • quality assurance
  • … and much more

The goal of this book is to give you the tools and techniques to write Bash scripts which will stand the test of time, and to give you the means to explore the vast shell scripting ecosystem to find the tools you need right now.

Audience#

Bash is the Linux glue language, so this book is aimed at software developers mainly working in some other language. We assume that you have working knowledge of programming concepts like variable assignments, loops and files, and at least a passing familiarity with [interactive shells](https://en.wikipedia.org/w/index.php?

Organization of this book#

The themes of this book are closely related, and it is probably better suited for exploration than for reading from cover to cover. That said, it is roughly divided into the following sections:

Basics, which includes getting help for arbitrary commands, running scripts, editing commands, and how to copy and paste effectively.

Common tasks, including listing files, working with text, math, time, script output, JSON, XML, images, compression and SSH.

Managing scripts, which first goes through common ingredients of a production–grade script, then follows up with version control, exit codes and quality assurance.

Two advanced subjects, signals and autocompletion.

I have intentionally excluded other scripting languages, such as AWK, Perl, Ruby and sed. There are

Conventions used in this book#

I use “Linux” almost exclusively when talking about the operating system surrounding the shell, because

PATH may refer to a variable name, and $PATH definitely refers to a variable value. I will use the latter to avoid any confusion between variables and other things.

[…] means that some of the command output was omitted for brevity. This should be more readable than a bunch of cut , head and tail commands.

Throughout this book, you will see the following typographical conventions that indicate different types of information:

In–line code references will look like this: echo "$SHELL" .

Blocks of code look like this:

#!/usr/bin/env bash

Shell sessions include:

  • The prompt, which is the string printed before the cursor when starting a new command. It’s set as simply PS1='\$ ' to make it stand out while being short.
  • The command itself.
  • Continuation indicators, which are the strings printed before every line after the first one in a multi–line command. It’s set to the default, PS2='> ' .
  • Command standard output and standard error.

Example:

$ cat > credentials.json << 'EOF'
> {"username":"jdoe","password":"sec\\ret"}
> EOF
$ jq . credentials.json
{
  "username": "jdoe",
  "password": "sec\\ret"
}
  • cat is the first command, spread over three lines and with no terminal output.
  • jq is the second command, followed by its output.
$PS0 is empty. $PS3 and $PS4 are not relevant to this book.

When relevant, shell sessions may include a character to indicate the position of the cursor after running the last command.

Tips and tricks look like this:

mkdir is short for “make directory.”

Warnings look like this:

Use More Quotes™!

Development environment#

This book was written using Bash 4.2 and 5.0 on Linux. The majority of the contents should be applicable to versions long before and after those, and to other Unix–like operating systems. However, since everything on Linux is configurable, absolute statements such as “ $PATH will not be defined in a crontab” should be treated as a pragmatic shorthand for providing a virtual machine with the configuration I used when writing. It is not even theoretically possible to write a piece of software which will behave the same no matter how and where it is run. So the only way to know what some command will actually do is to run it, and no statement in this book should be treated as absolute truth. In the same vein, the code in this book is written to make a best effort at doing the right thing in a reasonable set of circumstances.

The included scripts have been linted using ShellCheck. Some of them have been tested using shunit2.

Acknowledgements#

Thank you to Nate Murray, my publisher, for his advice and endless patience, Gillian Merchant for her excellent review of later revisions, my partner for discussions about form and content, all my encouraging friends, family, colleagues at Catalyst and Toitū Te Whenua, and Andrew Maguire and John Billings for reviewing an early draft of the book. Special thanks to all the Bash community contributors on Stack Overflow, Unix & Linux Stack Exchange and Greg’s Wiki, which are treasure troves of information about all things Bash.

About the author#

My name is Victor Engmark. I have been a software developer since 2004, free and open source software contributor since 2007, Stack Overflow & Stack Exchange contributor since 2009, and a Bash user since 2010.

My first big task using Bash was to change a write–once read–many script to be properly robust. I ended up pulling it apart into maybe a dozen scripts and unit testing each of them separately. It was a great learning experience: for the rest of my time in that job there were no bug reports for it, which cemented the value of testing and the fact that Bash scripts can be made robust.

Command documentation

Precedence is the same here as you might be familiar with from arithmetic. For example, multiplication has higher precedence than addition, so 2 + 3 × 4 is equal to 2 + (3 × 4), or 14. A common use case for this is to define a function with the same name as a file command to set some default options. As a quick example, here’s how we would tell grep to use colored output by default:

Command documentation#

This chapter will help you find documentation for the commands on your machine.

Each type of command is documented in different ways. There are five types of commands: alias, keyword, function, builtin and file, in order of decreasing precedence.

Precedence is the same here as you might be familiar with from arithmetic. For example, multiplication has higher precedence than addition, so 2 + 3 × 4 is equal to 2 + (3 × 4), or 14. A common use case for this is to define a function with the same name as a file command to set some default options. As a quick example, here’s how we would tell grep to use colored output by default:

grep() { command grep --color=auto “$@” }

grep() {
    command grep --color=auto "$@"
}

command suppresses shell function lookup, forcing Bash to refer to the file command within the function. Otherwise, the function would just recurse forever.

  • Aliases don’t provide help, because they are just shorthand for a command with some arguments. They are also considered deprecated in favor of functions because of some technical shortcomings, so we’ll ignore them for now.
  • Standalone function are very rarely self–documenting, so we’ll ignore those as well.
  • Keywords are things like if , time , and [[ , which are parsed differently from other commands. For example, if must be followed by an expression before then , and then must be followed by an expression before fi :books:
 $ if
 > then
 bash: syntax error near unexpected token `then'
 $ if true
 > then
 > fi
 bash: syntax error near unexpected token `fi'
  • Builtins are commands built into Bash that provide functionality which operates on the internals of the shell, such as command , export , and trap .
  • File commands are all the other familiar commands: find , grep , ls , etc.

Before we go looking elsewhere for help on a command, it is important to know which type it is. We can run type -a COMMAND to list all the commands with that name in order of decreasing precedence. For example, echo is typically available both as a builtin and as a file command:

$ type -a echo
echo is a shell builtin
echo is /bin/echo

Since shell builtins take precedence over file commands, if we run echo normally we’re running the shell builtin. We should therefore look up the shell builtin help to learn about it.

In arithmetic we would use parentheses to override precedence. We can do the same with Bash commands, to force running a lower precedence command. Unfortunately the rules are a bit complicated:

  • Aliases are expanded by replacing the first word of a command before other expansions. To ignore aliases we can therefore change the first word in a command in some way, such as quoting it. For example, running 'COMMAND' (including quotes) will ignore any aliases called COMMAND , instead looking for any other command type.
  • command COMMAND ignores both aliases and functions.
  • env COMMAND ignores everything before file commands. Which means that to run /bin/echo rather than the builtin with the same name we can run env echo . This is rarely necessary, and complicates the script a bit, so we should only do this if we actually have to, for example for compatibility between different shells.

Help with shell builtins and keywords#

Builtin commands and keywords are part of the Bash shell. They are documented in context in man bash and summarized by help . Running help on its own will give you some basic info, and list the builtin and keyword synopses in two columns:

$ help
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
These shell commands are defined internally.  Type `help' to see this list.
Type `help name' to find out more about the function `name'.
Use `info bash' to find out more about the shell in general.
Use `man -k' or `info' to find out more about commands not in this list.

A star (*) next to a name means that the command is disabled.

 job_spec [&]                            history [-c] [-d offset] [n] or hist>
 (( expression ))                        if COMMANDS; then COMMANDS; [ elif C>
 . filename [arguments]                  jobs [-lnprs] [jobspec ...] or jobs >
 :                                       kill [-s sigspec | -n signum | -sigs>
 [ arg... ]                              let arg [arg ...]
 [[ expression ]]                        local [option] name[=value] ...
 alias [-p] [name[=value] ... ]          logout [n]
[…]

The > at the end of a synopsis means it’s been truncated. Unfortunately it is not possible to fit the full synopses no matter the terminal width. Keywords like do and done are only used as part of bigger command structures, so they don’t have their own entries.

  • To list only the builtins you can use compgen -b .
  • To list only the keywords you can use compgen -k .

We can use the help command to get help about any of these, such as exit :books:

$ help exit
exit: exit [n]
    Exit the shell.

    Exits the shell with a status of N.  If N is omitted, the exit status
    is that of the last command executed.

help is also self–documenting:

$ help help
help: help [-dms] [pattern ...]
    Display information about builtin commands.

    Displays brief summaries of builtin commands.  If PATTERN is
    specified, gives detailed help on all commands matching PATTERN,
    otherwise the list of help topics is printed.

    Options:
      -d	output short description for each topic
      -m	display usage in pseudo-manpage format
      -s	output only a short usage synopsis for each topic matching
    		PATTERN

    Arguments:
      PATTERN	Pattern specifiying a help topic

    Exit Status:
    Returns success unless PATTERN is not found or an invalid option is given.

Help with file commands#

Let’s say you’re looking for help with the ls command, so you check the type:

$ type -a ls
ls is aliased to `ls --color=auto'
ls is /bin/ls

We can ignore the alias since there’s no help for those. There is no function, keyword or shell builtin called ls , but there is a file path. In other words, we want information about the file command called ls .

File commands have been developed by thousands of mostly independent people over several decades, and have been documented in many different ways. We’ll go over some of the most common ones, but beware that none of them are guaranteed to give results – developers are free to provide documentation in any way they see fit, including not at all. In general, though, popular file commands have excellent documentation, and we can expect any file command which is available on a popular Linux distribution to have at least some documentation.

“Executable” (the technical word for “runnable”) is the preferred noun used to refer to the file rather than the command. That is, ls is a command and /bin/ls is an executable.

Some people treat “binary” (as in the /bin directory) as a synonym for “executable,” but this is an unfortunate misnomer: not all binary files are executable (such as JPEG image files, which are meant to be read but not executed) and not all executables are binary (such as Bash scripts, which are plain text).

Self–documenting commands#

Self–documenting commands can be run in such a way that they print their own documentation. The most common way to trigger this is to pass the --help flag:

$ ls --help
Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.

Mandatory arguments to long options are mandatory for short options too.
  -a, --all                  do not ignore entries starting with .
  -A, --almost-all           do not list implied . and ..
[…]

None of the Bash builtins are self–documenting. For example, echo --help will just print “–help” followed by a newline. Shell builtins are instead documented with the help command: help echo . This is one example of a difference between a builtin and the file command with the same name: env echo --help will print the file command’s help text. A more extreme example is the dir commands: the builtin and file commands do completely different things; compare help dir (the builtin) and dir --help (the file command).

Some commands only support short options, such as those from the BSD family of operating systems. For these it’s common to support the -h flag to print help:

$ nc -h
OpenBSD netcat (Debian patchlevel 1.187-1ubuntu0.1)
usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl]
	  [-m minttl] [-O length] [-P proxy_username] [-p source_port]
	  [-q seconds] [-s source] [-T keyword] [-V rtable] [-W recvlimit] [-w timeout]
	  [-X proxy_protocol] [-x proxy_address[:port]] 	  [destination] [port]
	Command Summary:
		-4		Use IPv4
		-6		Use IPv6
[…]

Some commands will also print their help text (or a special “command error” variant of it) if you run them without any arguments:

$ nc
usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl]
	  [-m minttl] [-O length] [-P proxy_username] [-p source_port]
	  [-q seconds] [-s source] [-T keyword] [-V rtable] [-W recvlimit] [-w timeout]
	  [-X proxy_protocol] [-x proxy_address[:port]] 	  [destination] [port]

Some commands such as tail will “hang” instead at this point. When that happens it usually means the command has gone into a fallback interactive mode where it’s waiting for something to process on standard input. If this happens, pressing Ctrl–d will tell the command that there is no more input, which means it will exit immediately. If it still does not exit you may have to force it by pressing Ctrl–c.

Manuals

The help text produced in the previous section is often a short summary of the full manual. The manual is often available in the form of a “man page” which you can access by running man COMMAND, for example man git. This shows a lightly formatted document in a pager for easy navigation. man is self–documenting, so to figure out how to use it you can run man man. (In brief, use Page Up, Page Down and the arrow keys to navigate, and q to quit.)

Some commands have manuals which are more like a website, with links between sections. You can access these by running info COMMAND, for example info grep. info can be self–documenting, so info info may work on your system. Otherwise man info will show you the basic command line use. To navigate an info document you can use the same buttons as for man pages, Enter to follow a highlighted link, and l to go back to the previous section.

Some manuals do not correspond to commands on the system. There are also manuals for configuration files (such as man ssh_config), filesystems (such as man ext4), encodings (such as man utf-8), and more.
Elsewhere

Reference documentation, by its nature, can’t answer many of the important questions you might have during development:

Which tool is the most convenient to achieve what you want? It can be tempting to treat any Turing–complete language as a golden hammer when much more convenient alternatives exist.

What is the idiomatic way to achieve what you want? Developer communities will usually gravitate towards one way, or a small handful of ways of doing anything common, whether it’s how to indent code blocks or check whether a file is empty. Using these patterns makes it easier for others to understand your code, which is one of the most important characteristics of any software.

What are the important characteristics of this solution? Does it process one line (slow) or buffer (fast) at a time? Does it handle unusual input like newline, backslash, quotes, NUL, empty lines or non–existing paths? Is it robust in the face of third party failures?

This book tries to give you the tools to answer some of these yourself, but for specific questions there are lots of excellent resources online. The following are some of the best in their respective fields:

Stack Overflow for general programming questions.

Unix & Linux Stack Exchange for Bash and other *nix tools questions.

Software Engineering Stack Exchange for more subjective questions around how programming interacts with the real world.

Greg’s Wiki has heaps of good advice.

Code Review Stack Exchange to invite others to review your code.

Your favorite search engine can often help, but this is risky. For one thing, it can be difficult to know which search terms to use. Searching for “while read loop” when the idiomatic solution actually pipes the result to a tool you don’t yet know exists could lead to no answers or (even worse) the wrong answers. Search engines also do a poor job of finding the best resources. Top results pretty much by definition come from sites which are good at optimizing for search engines, but their content is often missing crucial details, outdated or even plain wrong.

None of the above worked!#

In the worst case the documentation might be so big (or the developers so conservative about using disk space) that the documentation is in a separate package. In that case you’ll have to go through a more difficult process:

Figure out which package manager installed the package you want the documentation for.

Ask the package manager what the package name is for your tool.

Ask the package manager for other packages with names containing the package name found above. Sometimes there is a separate package with a name like COMMAND-doc containing extensive documentation.

This process is unfortunately completely package–specific, and there is no guarantee of success. Fortunately this situation is uncommon, and with massive storage being the norm it will probably become even less common over time.

How to read a synopsis

Also called “usage string” or simply “usage.”

The synopsis is a formatted string, usually a single line towards the beginning of the documentation, which shows how to write commands for a specific program. It starts with the program name. The words after that usually align with the following patterns:

  • A word in all uppercase is a placeholder for a word you have to provide. For example, the synopsis foo PATH means a command like foo ‘./some file.txt’ is valid, but foo (no file) or foo ./1.txt ./2.txt (two files) is not valid. This word can also reference several other parameters, such as OPTIONS referring to any of the options documented elsewhere.

  • A word in all lower case is literal – it should be entered verbatim in the command. For example, the synopsis foo [–help] means that foo and foo --help are the only valid commands.

  • A word enclosed in square brackets means it’s optional. For example, the synopsis foo [PATH] means that both foo and foo ‘./some file.txt’ are both valid.

  • A word followed by an ellipsis means that the word can be repeated. For example, the synopsis foo PATH… means that foo takes one or more PATH arguments, meaning foo ./first and foo ./first ./second are both valid, but foo is not.

  • A pipe character separates mutually exclusive choices. For example, the synopsis foo --force|–dry-run means that foo --force and foo --dry-run are valid, but foo --force --dry-run is not.

  • An equals sign separates an option name from its value. For example, the synopsis foo [–config=FILE] means that foo and foo --config=/etc/foo.conf are both valid, but foo --config and foo --config= are not.

  • Complex commands sometimes have more than one synopsis, based on fundamentally different use cases.

PATH usually means any path (not to be confused with the PATH variable, referred to in this book as $PATH), including one pointing to a directory, while FILE means a path to a non–directory, the sort of file you can read from.

Unfortunately there are many variations of the above:

  • Using lower case for both literals and placeholders. Should be avoided since it makes the synopsis ambiguous.

  • Using angle brackets for placeholders. For example, being equivalent to FILE. Not recommended since < and > are used for redirects.

  • Using a space character rather than = between options and values. This is ambiguous: --foo bar could mean a flag --foo and a completely separate argument bar or could mean that the value of “foo” is “bar”. Sometimes we can guess based on the name of the words that they belong together, but --foo=bar is less ambiguous.

  • Making the space between options and values optional. For example, -cFILE being equivalent to the longer but unambiguous --configuration=FILE.

  • Using ellipses even though the values are not shell words. For example, the synopsis may say FILE… but the documentation specifies that each file must be separated by a comma.

  • Printing all the short options as one word prefixed with a hyphen. Quick, what do you make of [-AcdrtuxGnSkUWOmpsMBiajJzZhPlRvwo?]?

Make sure to read the documentation if it’s unclear.

In short: PLACEHOLDER, literal, [optional], repetition… and first choice|second choice.

Based on this, let’s see what we can learn from the following synopsis:

dwim [--verbose] pack|unpack PATTERN FILE…
dwim [--version|--help]
  • dwim, --verbose, pack, unpack, --version and --help are literals.

  • PATTERN and FILE are placeholders.

  • The most common use cases are dwim pack PATTERN FILE… and dwim unpack PATTERN FILE….

  • When running dwim pack or dwim unpack we need to specify exactly one pattern and at least one file.

  • The --verbose flag is optional.

  • We can run dwim --version to print dwim’s version information.

  • We can run dwim --help to print dwim’s documentation.

Is there a command for that?

If you only know some keywords relating to what you are trying to do you can use the apropos command to search for keywords within man page summaries. apropos KEYWORD… matches any of the keywords, and apropos --and KEYWORD… matches all of the keywords. For example:

$ apropos --and find file
find (1)             - search for files in a directory hierarchy
findfs (8)           - find a filesystem by label or UUID
findmnt (8)          - find a filesystem
git-pack-redundant (1) - Find redundant pack files
gst-typefind-1.0 (1) - print Media type of file
locate (1)           - find files by name
mlocate (1)          - find files by name
systemd-delta (1)    - Find overridden configuration files

Running Scripts

There are many ways to run Bash scripts, and there are some subtle differences between them.

Explicitly interpret a script

You can explicitly interpret a script using a specific bash executable by passing the script as an argument:

bash ./script.bash

The bash executable is an “interpreter,” a program which can read and execute commands written in the Bash scripting language. Basically this means that if script.bash is a readable file in the current directory you can run bash ./script.bash to execute the commands within that file.

Note that there’s no requirement that the file is a valid Bash script! For one thing, the file extension does not have any influence on the contents of a file. Someone could mv cat.jpg script.bash, but of course that doesn’t make their cat picture a valid script! Something you are more likely to come across is people using the generic .sh extension. I’ve avoided this, because despite superficial similarities it is extremely difficult to write truly portable shell scripts. In general you can’t expect a script to work the same in different versions of Bash without some work to test the functionality in every version after every change. Supporting even a single version of another shell is much more work, and actively supporting more than one shell can cripple the development process.

You can also pass a script to standard input of a bash command to run it. This is rarely useful, for example when running a script generated by another command: cat header.bash body.bash footer.bash | bash.

Running a script in the wrong shell

It is not advisable to run Bash scripts with another shell interpreter like sh:

sh ./script.bash # Don’t do this!

This command uses the sh interpreter to run script.bash, regardless of the shebang line or the contents of the file! It is not quite as wrong as python ./script.bash, but the script is still likely to behave differently when run with the wrong interpreter. The side–effect can be anything from a simple syntax error to a subtle bug or catastrophic behavior.

On some systems /bin/sh is a symlink to the Bash interpreter. When running such an sh command Bash enables POSIX mode to behave more closely to POSIX shell, a semi–formal definition of a shell which as far as I understand no implementation has ever matched exactly. For example, Bash’s POSIX mode still allows Bash–specific syntax (“bashisms”) like [[ … ]].

Interpret using the shebang line

Another way to run scripts is to enter just the path to the script:

./script.bash

If you run a path to a file directly the kernel will check how the file starts in order to determine how it should be run. If it starts with the characters #! (hash followed by exclamation mark) the remainder of the line is considered a shebang line. The shebang line is syntactic sugar to explicitly set the interpreter as part of the script itself, allowing the user to run the script without knowing which interpreter to use. In the vast majority of cases this is what you want, because as we mentioned shell portability is difficult.

One thing to bear in mind is that in order to run a script directly like this (rather than explicitly interpreting it as we did above) you need to have theexecute permission in addition to the normal read permission.

Run via $PATH

Even simpler is just to enter the filename of the script. This will work if the script is saved in one of the directories in your $PATH variable.

script.bash

A typical $PATH looks something like this:

$ echo "$PATH"
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

If you place an executable file called script.bash inside one of those directories (the ones in your $PATH, not mine) you can run it by just typing script.bash and pressing Enter. This is a great convenience – it means you can run the script without remembering its path no matter what the current working directory is. It also means you have to write the script to not depend on being in a specific working directory.

/usr/local/sbin is a good place for your system tools, such as background services. /usr/local/bin is a good place for your own command–line tools. The rest of the paths are for the kind of executables provided by system packages. The Filesystem Hierarchy Standard explains this in detail.

When installing a script to one of these directories it is common to remove the file extension, which is simply noise to the person using it. For example, to install abc.bash as the command abc in /usr/local/bin you could run sudo install abc.bash /usr/local/bin/abc. The install command is similar to cp, but in addition to copying the file across it will:

  • Set the file owner and group to root

  • Allow root to read, write and execute the file

  • Allow everyone else to read and execute the file

This behavior can be modified; see man install.

If an alias or function already exists with the same name as a file on your $PATH you can force Bash to run the file command by using any of these:

  • The path, for example /bin/ls or ./ls

  • command FILE, for example command ls

  • \FILE, for example \ls

  • env FILE, for example env ls

I would generally recommend the last option as it is well documented and is often u

Editing Commands and Scripts

Interactive editing covers basically everything we do in a shell. Here we’ll look into the various ways to edit commands and text files and how they tie together.
Readline

Bash command line editing is provided by the GNU Readline library, which is also used by many other programs. This is why text editing works remarkably similar in interactive shells for other programming languages – many of them, like Python, also use Readline.

The things that keyboard shortcuts bind to in Readline are variously called “commands”, “functions” and “macros” in the documentation. In Readline, “command” and “function” are synonymous. Commands/functions do something to the state of the text (like copying a word or changing it to uppercase) while macros insert some literal text. I will use “macro” to refer to both Readline commands/function and macros to distinguish them from Bash commands and functions.

Readline comes with a lot of macros – there are macros even for moving one character to the left (backward-char) and “accepting” (usually running) a line (accept-line). We’ll look at some common ones here (assuming a default configuration), how to create keyboard shortcuts, and some quality–of–life examples.

Moving the cursor

The left and right arrows move one character at a time. Combined with the Ctrl key they move one word at a time. You can also use Home and End to go to the start and end of the line, respectively.

Readline and Bash define “word” differently. Bash uses $IFS, the internal field separator variable (by default consisting of a space character, tab character and newline) to split words, while Readline treats only alphanumeric characters as part of words. This means a string like foo-bar is one “Bash word” and two “Readline words.”

Editing a command on the command line can get clunky, in particular when working on a command on multiple lines. It’s not possible to go back up to the previous lines to edit them within Bash itself, but we can use the Ctrl–x + Ctrl–e shortcut to open the command as a temporary file in an editor. That way we can modify multiple lines freely, and when exiting the editor the saved file is executed as if we typed it directly.

Deleting text

Backspace and Delete delete backwards and forwards, respectively. Alt–Backspace deletes a word backwards. The following setting configures Alt–Delete to delete a word forwards; see Configuring Readline for how to add it.

"\e[3;3~": kill-word

Command history

By default, Bash will:

  • record most new commands in memory,

  • save them to a file when the shell exits, and

  • read that file when starting.

Let’s unpack that with an example session as a brand–new user:

$ echo "$HISTFILE"
/home/user/.bash_history

In other words, $HISTFILE is the absolute path to the current history file. Continuing the session above, we’ll try to print the contents of that file using the cat (concatenate) command:

$ cat "$HISTFILE"
cat: /home/user/.bash_history: No such file or directory

This matches the description of the behavior above: this file only exists once we exit Bash for the first time as this user. In other words, the Bash history has not yet been written to disk. We can access the previous commands from memory by using the Up and Down buttons. Basically, after running a command, pressing Up will replace the current command line with the previous command we ran, cat “$HISTFILE”, and put the cursor at the end of that line. If we press Up again it will instead show the echo “$HISTFILE” command. We can then edit that line and run it, or continue pressing Up and Down to find the command we want to run or modify.

If we now replace the shell with a new one (or close the terminal and reopen it), we find that the commands from the previous shell have been written to the history file in the order they ran, oldest first:

$ exec "$SHELL"
$ cat "$HISTFILE"
echo "$HISTFILE"
cat "$HISTFILE"
exec "$SHELL"

Since these have been read into memory when Bash started, it’s as if we never quit the previous shell. A gotcha is that each shell overwrites the history file with what was in the original history file plus what was typed in that shell. So if we have several shells open they end up clobbering each other when exiting, and we end up with only the history from the last shell which exited.

Bash can be configured to keep some commands out of the history. The $HISTIGNORE variable, which is usually not set, is a set of colon–separated extended globbing (shopt -s extglob) patterns of commands which will not be written to the history. This may be helpful if we need to type lots of similar commands which include passwords or the like. It is less useful than it could be, because of course we don’t want to store the password itself in a plaintext configuration file, and we instead have to try to match the surrounding command line. The $HISTCONTROL variable normally contains “ignorespace”, which gives us a simpler way to exclude a command from the history file: simply start the command with a space character.

Another useful Readline feature which is enabled by default is history search. Pressing Ctrl–r starts a reverse text search through history. We can then enter a substring of the command we are interested in. Pressing Ctrl–r again leafs through older matches for the same string. Once we find the relevant command we can either press Enter to run it immediately or Escape to put it on the command line, ready to edit.

Configuring readline

The main reason for configuring Readline is to add custom key bindings. The user configuration file is ~/.inputrc, and the configuration format is pretty straightforward. Bindings are configured as “SHORTCUT”: ACTION, where \C- means holding down Ctrl while pressing the next character, \M- means holding down Left Alt while pressing the next character, and \e means pressing and releasing the Escape character. Lines starting with # are comments, which in this case mention unbound macros for completeness.

Left Alt is referred to as “Meta” in GNU software for historical reasons, hence the \M above.

If ~/.inputrc is misconfigured in some way it can make terminal use difficult. The easiest way to work around a broken ~/.inputrc is to rename it using a GUI file manager and then restarting the terminal.

We can use bind -p to list the currently configured bindings (limited to the first ten lines by head for brevity):

$ bind -p | head

"\C-g": abort
"\C-x\C-g": abort
"\M-\C-g": abort
"\C-j": accept-line
"\C-m": accept-line
# alias-expand-line (not bound)
# arrow-key-prefix (not bound)
# backward-byte (not bound)
"\C-b": backward-char

Some bindings are not on that list: Enter does the same as accept-line but is built into Bash, Tab is bound to complete, and Alt–Backspace is bound to backward-kill-word. There might be others.

Most of the Escape key shortcuts can also be triggered by holding down the Alt button while pressing the next character.

Other Readline settings are in the form set NAME VALUE:

$ bind -v
set bind-tty-special-chars on
set blink-matching-paren off
[…]
set keymap emacs
set keyseq-timeout 500
set vi-cmd-mode-string (cmd)
set vi-ins-mode-string (ins)

The settings in Readline are called “variables” and “variable settings” in the documentation. I chose “settings” to avoid confusion with Bash variables.

Let’s try out some useful configuration. First, insert these lines in ~/.inputrc:

# Search through the history for the characters before the cursor using the up
# and down arrows.
"\e[A": history-search-backward
"\e[B": history-search-forward

Save and close the file, then run bind -f ~/.inputrc to apply the change in the current shell. This example changes the behavior of the up and down arrows on the keyboard. Normally, they replace the current command with the ones from the history file, leafing back and forth with each press of Up and Down. After the change above they will still do that when the cursor is at the start of the command. Otherwise they will leaf through commands starting with the characters before the cursor. This is easier to understand by just trying it out. For example, if you have several ls commands in the command history, typing ls on a new command line and then pressing up will leaf through history entries starting with “ls”. This can be quite handy when searching for a command with a specific prefix through a big history file.

By default ~/.inputrc is only applied when starting a shell.

How does the up arrow map to [A? The reference is in man console_codes, but if you simply want to know the escape sequence for a combination of keys just press and release the Escape key and then press the key combination. For example, Escape, Up prints [A at the cursor, and Escape, Alt–Delete prints [3;3~, as seen in the configuration for kill-word above.

Editors

On modern *nix systems there are heaps of editors available. I’ll cover the minimal basics of two of the most common, nano and vim, which are available on most systems.

Some commands, such as crontab -e to edit the current cron table, need to open an interactive editor. The choice of editor is controlled by $VISUAL, which is usually just a command to run such an editor:

$ echo "$VISUAL"
vim

To set a specific default editor we can set the $VISUAL variable in ~/.bashrc.

Nano

nano is probably the most beginner–friendly terminal editor around. To start editing a new or existing file simply run nano FILE. If FILE does not yet exist it will not be created until you save within the editor, and if FILE does exist it will be opened and syntax highlighted if applicable. At this point you’ll notice a sort of menu at the bottom of the screen, which shows a number of options depending on the width of the terminal window. The options start with a caret (^) character, which is a common symbol for the Ctrl key. The characters after the caret are all uppercase, but you don’t need Shift or Caps Lock to use them, so the ^X shortcut corresponds to Ctrl–x. When exiting, nano will ask whether you want to save your changes. Ctrl–g shows a help page with more options and descriptions.

Vim

vim, on the other hand, has a famously steep learning curve. One example of this is the famous question How do I exit the Vim editor?. Basically Vim has several modes:

  1. Vim starts in normal mode, which is mostly used to access other modes. Pressing Escape takes you back to normal mode from other modes.

  2. Pressing i or Insert in normal mode takes you to insert mode, where we can edit the file normally.

  3. Pressing / in normal mode goes to forward search mode, where we can enter a basic regular expression and search for it interactively. Pressing Enter saves the search and goes back to normal mode. If there are multiple matches we can go to the next one by pressing n and the previous one by pressing Shift–n in normal mode.

  4. Pressing : in normal mode goes to command mode, where we can enter a command and run it by pressing Enter. The most common commands are:

  • :exit or :x to save (if necessary) and exit

  • :quit or :q to quit

  • :quit! or :q! to force quit (for example to abandon changes in a file)

There are many other Vim modes and commands, but this is enough to get started.

Copying and Pasting

Something as simple and fundamental as copying and pasting has hidden depths. There are many ways to do it, and in certain contexts some are more convenient than others. Knowing about them is the first step to choosing which ones to use and when. I would encourage experimenting with the below to get an intuitive feel for how it all works.

The mouse and keyboard functionality below is known to work in GNOME Terminal on Linux. Since it’s implemented in the terminal rather than Bash itself it may be different on your platform.

Mouse

Most Linux applications including terminals support quick copying and pasting using a mouse. Simply click and drag over some text with the left button, and as soon as you lift your finger the selection will be copied into what is called the “primary selection,” where it stays until you select something else. You can then middle–click to paste the selection in basically any application.

Left–clicking and dragging as above selects one character at a time. You can also select a word or line by double- or triple–clicking, respectively, and select multiples by holding down the mouse button after the last click and dragging.

A “line” in this case really means up to and including a newline character or “hard” line break in the terminal backlog, and not what is called a “soft” line break caused by a long line overflowing to the next in your terminal. Soft line breaks are not part of the terminal backlog, and so will not be part of any text you copy. The difference becomes obvious if you resize the terminal: soft line breaks are introduced when the width is lowered, but disappear when you make the terminal wide enough to fit up to the hard line break.

In most terminals, holding down Alt or Ctrl when starting a selection will select a rectangle of text, which is mostly useful if you have some column–aligned text you want to extract from some output with irrelevant content such as leading timestamps in a log.

Starting a selection and then Shift-clicking elsewhere will extend the selection to include the text up to that point. This can be helpful if, for example, what you want to select can’t fit into a window: simply start the selection at either end, release the mouse button (and keyboard, if you’re doing a rectangular selection), scroll to the other end of the text, and Shift-click where you want the selection to end. You can easily correct if you missed the exact character by Shift-clicking or Shift-clicking and dragging. This will only adjust the end of the selection you expanded last. If you want to correct the opposite end of the selection you can Shift-click outside that end of the selection to be able to adjust it. This is a bit clunky to explain, so I’d encourage simply playing

Keyboard shortcuts

You’re probably already familiar with Ctrl–c to copy and Ctrl–v to paste. But since these shortcuts have special meanings in terminals, the shortcuts are Ctrl-Shift–c and Ctrl-Shift–v. These commands interact with what is confusingly called the “clipboard” selection. It is identical to the “primary” selection except for how you interact with it.

Ctrl–c sends SIGINT to the foreground process. Ctrl–v runs the quoted-insert readline command, which inserts the next character verbatim. This can be used to insert for example a Tab character without triggering autocompletion.

The contents of the primary and clipboard selections can be different. To see how, select some text and copy it into the clipboard selection using the appropriate keyboard shortcut or context menu. At this point both selections are the same, since we selected some text (putting it into the primary selection) and then did a separate action to also put the same text into the clipboard selection. If we now select some other text the primary selection will be overwritten, but the clipboard selection stays the same. This can be used to copy two things at the same time: put something in the clipboard selection, put something else in the primary selection, then paste them separately.

Within the terminal you can use Shift-Insert to paste the primary selection, but unfortunately this shortcut pastes the clipboard selection instead in most other applications. Hopefully this inconsistency will be fixed in future versions of Linux desktop applications.

Bash also has a separate built–in selection. Ctrl–k moves the text from the keyboard cursor position up to the end of the line into the Bash selection, and Ctrl–y pastes it at the cursor position. This can be useful for example when editing a command, and realizing that we need to run a different command before resuming editing this one. If we press Ctrl–c to cancel editing we lose the command. Instead, going to the start of the command and pressing Ctrl–k we get a clean command line where we can run something else before pressing Ctrl–y to resume editing the previous command. This is mostly useful when not using a mouse.

Ctrl–k and Ctrl–y are mapped to the readline functions kill-line and yank. To get a list of functions and their shortcuts, including more clipboard functions, run bind -P. To translate these into more human readable form you can pipe the output to perl -pe ‘s/((?<!\)(?:\\))\C/\1Ctrl/g;s/((?<!\)(?:\\))\e/\1Alt-/g’. If you’re reading this on paper, I’m sorry.

Commands

We can also copy and paste using only the command line. xclip interacts with the same primary and clipboard selections mentioned above, reading from standard input and printing selection contents to standard output. This can be useful in many situations. For one thing, it guarantees that what you are copying is exact – it will include any non–printable characters as they are:

$ printf 'a\tb\0c' | xclip
$ xclip -out | od --address-radix=n --format=c --width=1
   a
  \t
   b
  \0
   c

In the first command, printf outputs the characters a, Tab (\t), b, NUL (\0) and c to standard output. xclip reads that from standard input and saves it to the default (primary) selection of the X Window System clipboard. In the second command we print the contents of the primary selection to standard output and format it using od into their individual characters.

The od options in detail: --address-radix=n disables printing the offset of each line, --format=c formats printable characters as themselves and non–printable characters as their printf-style backslash escapes, and --width=1 outputs one byte per line.

We can also do things like passing an image directly from a browser to a command line tool. Right–click on a raster image in a web browser and copy it. At this point the image and some metadata are in the clipboard selection, and we can ask xclipt about output formats:

$ xclip -out -selection clipboard -target TARGETS
TIMESTAMP
TARGETS
MULTIPLE
SAVE_TARGETS
text/html
text/_moz_htmlinfo
text/_moz_htmlcontext
image/png
image/bmp
image/x-bmp
image/x-MS-bmp
image/x-icon
image/x-ico
image/x-win-bitmap
image/vnd.microsoft.icon
application/ico
image/ico
image/icon
text/ico
image/jpeg
image/tiff

The uppercase names are metadata entries such as the X server timestamp at which the selection was made. The rest of the entries are media types.

We can now paste the image in any of the image/… formats. For example, to get the image in PNG format, resize it to 16 pixels wide while conserving aspect ratio, and then Base64 encode it:

$ xclip -out -selection clipboard -target image/png | convert -resize 16 - - | base64
iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAMAAAAoLQ9TAAAABGdBTUEAALGPC/xhBQAAACBjSFJN
AAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAACJVBMVEUXFxcuLi5SUlK7u7v/
//////8uLi7t7e3///9WVlb/////+O3/s3H/gRb/gRT/gRX/gBX/iRf/phL/tAb+swWDfX38sAXz
[…]

Terminal

Say we want to show someone what we’re seeing in the terminal. A screenshot gives us full visual fidelity, but the text can’t be selected or inspected. This makes it harder to reproduce the situation, especially if any ambiguous characters (like 0 and O) are involved. Copying the text will lose the formatting, such as color. Formatted text to the rescue! You can copy and send an HTML snippet or file to get the best of both worlds.

For example, take this terminal window:

If you select the first line and use the context (or right–click) menu “Copy as HTML” option you get the following contents:

vagrant@ubuntu-bionic:~$

You can also use a program like aha to convert anything containing ANSI format characters to a full XHTML document:

$ grep --color=always foo <<< foobar | aha
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- This file was created with the aha Ansi HTML Adapter.
 https://github.com/theZiz/aha -->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="application/xml+xhtml; charset=UTF-8" />
<title>stdin</title>
</head>
<body>
<pre>
<span style="color:red;font-weight:bold;">foo</span>bar
</pre>
</body>
</html>

grep, like many programs which can produce colored output, disables this feature when their output goes anywhere except the terminal. This is because the formatting characters are not part of the text representation of the output, and can easily cause problems in pipelined commands. So we have to use command line options to force color output.

Copying code from elsewhere

There are three errors in the following code snippet:

grep –color=always “some pattern” ./𝘮𝘪𝘥𝘥𝘭𝘦

These and other errors are common on the web, but can come from anywhere. Sometimes they are caused by user ignorance and other times by the platform itself. Can you tell what the issues are? In order of their appearance:

  1. The funny–looking horizontal line before color is not a double dash, and not even a single dash, but rather an en dash. WYSIWYG tools often convert double dashes to en dashes automatically. But en dashes are not used to prefix options, so the whole word is going to be treated as a pattern and the rest of the arguments will be treated as paths, leading to a confusing result:
$ grep –color=always “some pattern” ./middle
grep: “some: No such file or directory
grep: pattern”: No such file or directory
grep: ./middle: No such file or directory
  1. As you can see from the error message above, the quotes around some pattern are not syntactic. That is, instead of marking the start and end of the filename they are treated as part of the filename. That is because they are typographic quotes, another piece of common automatic formatting by WYSIWYG tools. And since there are no syntactic quotes that part of the command is split into words.

  2. The string middle is slanted or italicized, which hints at the problem for anyone familiar with Markdown and similar formats. These formats are sent through a preprocessor program to produce HTML. And a common way to produce italicized text in the resulting HTML is to surround the text with asterisks, which we know as globbing characters!
    So the original command was probably something like this:

grep --color=always "pattern" ./*middle*

“Probably” is the best we can do without more information. Although the typographic characters would normally set off alarms they could of course be intentional. And the text which resulted in 𝘮𝘪𝘥𝘥𝘭𝘦 could also be from any other common italicization such as middle or /middle/.

Copy/paste exploit

There is an even worse problem with copying from a WYSIWYG context like the web directly into a terminal: it is trivial to hide exploits in WYSIWYG code! If you can insert arbitrary JavaScript, CSS or even HTML into a web page it is really easy to do this. For example, does this code look safe to copy?

echo foo

What if I told you that the underlying markup is an example of WYSINWYC (what you see is not what you copy):

<code>echo <span style="position: absolute; left: -999em; top: -999em;">
    &gt; /dev/null; PS1='exploited\$ ';<br>
    echo
</span> foo</code>

his exploit is of course harmless – it simply sets your terminal prompt to one which starts with the string “exploited”.

When you select echo and foo you are also selecting the off–screen exploit code, which includes a literal newline because of the HTML line break
. So when pasting, the exploit code is run even if it looked like you only selected a single line.

A more advanced exploit could hide the fact that anything was run at all, for example by inserting carriage returns, running a background process, scheduling a download of a bigger script and other tricks.

Many websites have implemented protections against such exploits, typically by forbidding arbitrary JavaScript and CSS and only allowing a safe subset of HTML in user–contributed content. Even so, it’s easy to trip up and accidentally allow something unsafe. Fortunately the fix is pretty simple:

Paste and check untrusted code in a GUI plaintext editor before running it!

It’s important that it’s a GUI editor because the exploit could contain control characters which could start a shell within command line editors such as Vim.

It’s important that it’s a plaintext editor such as an IDE to avoid auto–formatting as we’ve seen above. Also, the paste buffer could contain some safe WYSIWYG contents but an unsafe plaintext string, so that completely different things are pasted into a WYSIWYG application and your terminal. For example:

<code>echo<img
    alt="&gt; /dev/null; PS1='exploited\$ '; echo"
    src="1x1.png"
    style="position: absolute;"
> foo</code>

Again, this shows up as echo foo in a browser. And copying the text into a WYSIWYG editor also rendered the result as “echo foo”. Only when pasted as plain text does the exploit code show up, since the alternative text is meant to be a stand–in when the image is unavailable.

Some combinations of browsers, terminals and editors may give different results, but the general exploit is unlikely to be fixed because it would require solutions which would break many web sites.

Listing Files

“Listing” files can refer to two separate types of tasks: searching and enumerating.

Searching for files means you don’t yet know which files you’re looking for. For example, you might be looking for something you suspect exists on the filesystem but can’t pinpoint yet. It is also useful to work out and validate assumptions about which paths you care about, for example during prototyping.

Basic enumeration is when you know the exact paths to the files you’re interested in, and you need to refer to them in your project, by absolute or relative paths. More advanced enumeration involves patterns of metadata, such as “every file within a specific directory”, or “every empty file”.

Simple path patterns are easy to handle reliably in scripts using shell builtin syntax called globs.

Complex patterns or other metadata searches often require the use of the find command.

Searching

ls

ls is most useful for its default task – listing alphabetically all the non–hidden files in a directory, in a human–readable form, with color coding for special files like directories, executables and symlinks. When exploring a directory, ls and globs can be enough to get a broad idea of where things are.

Unfortunately ls is useless for reliable scripting — see Why you shouldn’t parse the output of ls(1) and Why not parse ls (and what to do instead)? Anything we could possibly want to do in a script can be done more reliably with find or globs. With that in mind, here are some handy ls options and their scriptable counterparts:

List all files except . and …

ls command: ls --almost-all
Scripting: find — it lists hidden files by default

Print file metadata

ls command: ls -l

Scripting: find -printf PATTERN prints arbitrary metadata

Reverse order

ls command: ls --reverse

Scripting: find EXPRESSION -print0 | sort --reverse --zero-terminated

Order by most recent first

ls command: ls -t

Scripting: find PATH… -printf ‘%T@\t%p\0’ | sort --key=1 --reverse --zero-terminated | cut --fields=2 --zero-terminated

locate

locate website

The mlocate package contains two handy tools: updatedb to create and update an index of all the files on your system and locate to search through the index. locate STRING lists all paths which contain the given string. Because mlocate keeps track of paths in a database it is much faster than find / -path STRING, but since it’s a separate database it’s not necessarily up–to–date. updatedb usually runs as part of a cron job installed as part of the package, for example in /etc/cron.daily/mlocate, but you can also run sudo updatedb to update it anytime.

updatedb may be configured to exclude some filesystems and paths by default. For example, on my system it excludes network file systems (which could take a long time to index) and the /tmp directory, among many others.

When using locate we have to be careful not to match too many things. For example, locate /foo will print any directory with a name starting with “foo” and any files inside such directories.

To limit this to directories with an exact name and all files within them we can use locate /foo/.

To limit it to only files called “foo” we can use locate --regex ‘/foo$’.

Printing only directories with a specific name is not supported, because locate doesn’t distinguish between directories and other files.

grep

grep website

grep searches through files or standard input for lines matching specified patterns. By default it prints these lines to standard output, and has an exit code of 0 if there was any match or 1 if there was no match. I would suggest a read–through of the excellent man grep to pick out the most relevant options, but suffice it to say that this is a tool everyone who does anything in Bash should know.

Some particularly useful options:

Search for literal strings rather than patterns: --fixed-strings

Use complex Perl-compatible regular expressions: --perl-regexp

Search for more than one pattern: --regexp=PATTERN1 --regexp=PATTERN2 … (any line matching any of the patterns is printed, and this option also works with --fixed-strings despite the name)

Ignore case distinctions: --ignore-case

Match whole words rather than anywhere on a line: --word-regexp (grep --word-regexp PATTERN is equivalent to grep ‘\bPATTERN\b’)

Match whole lines: --line-regexp (grep --line-regexp PATTERN is equivalent to grep ‘^PATTERN$’)

Print a count of matching lines rather than the lines themselves: --count

Print only the filenames of matching files: --files-with-matches (by default, when searching through a single file only the matching lines are printed, and when searching through more than one file the filename and lines are both printed with a : separator)

Print the filenames of files with no matches: --files-without-match

Print a NUL byte after each filename rather than a newline: --null (useful with the two options above to handle arbitrary filenames, for example ones which could contain a newline character)

Print each match on a separate line without the rest of the line: --only-matching

Don’t print anything on standard output: --quiet (this is useful to check whether a file contains something without actually printing that line, as in if grep --quiet PATTERN)

Print N lines around each matching line for context: --context=N

Search through files recursively: --recursive

Treat input and output lines as ending with a NUL character: --null-data (useful in a pipeline with NUL-separated records)

grep is fundamentally a line-based tool, and is unsuited for extracting specific parts of nested file formats, especially those with flexible formatting such as JSON and XML.

ripgrep has similar options to grep, but is more developer–focused: it searches recursively by default, and skips a bunch of files and directories which you typically don’t want to search through: hidden files (including the .git directory), files listed in .gitignore, and binary files.

By convention

On Linux there are two important standards for the location of files. These can be useful to help you find files with less guesswork.

The Filesystem Hierarchy Standard consists of a “set of requirements and guidelines for file and directory placement under UNIX–like operating systems.” It explains the purpose of each of the main directories on a *nix system, such as /etc being used for system–wide configuration files.

The XDG Base Directory Specification specifies environment variables which indicate where application data and configuration should be stored. This is especially useful for user–specific configuration, which in some cases still clutter the home directory. By putting configuration in application–specific directories inside $XDG_CONFIG_HOME users can trivially distinguish them, even if each application has multiple configuration files.

Enumerating

Globs

A common task in shell scripts is to loop over a changing but well–defined set of files, for example to delete all the log files after a successful run of a program. The names of such files often include the date and/or time they were created, so you can’t simply hard–code a path in the cleanup script. You need some way of referring to all the log files, regardless of their actual name, in other words a pattern, where the unknown part of a path might be matched by a wildcard.

Globs are patterns. By far the most common wildcard in globs is the asterisk, *. It matches any number of characters (including zero) at that point in the path

$ cd "$(mktemp --directory)"
$ touch ./2000-12-31.log ./2000-12-31.log.tar.gz ./output.txt
$ echo ./*.log
./2000-12-31.log

If you’re wondering why I don’t just echo *.log see the start of Including Files and the excellent Back To The Future: Unix Wildcards Gone Wild, which has several examples of why starting an argument with a wildcard character is a bad idea.

If we care about the number of characters but not their contents we can use the ? wildcard. It matches any single character:

$ cd "$(mktemp --directory)"
$ touch ./001.png ./2.png
$ echo ./???.png
./001.png

Those familiar with regular expressions will recognize that * is equivalent to .*, and ? is equivalent to … But there are important differences:

Globs are always anchored. That is, ./.log is equivalent to ^./..log$ (with the modifier that . includes newline characters), which is why it does not match “./2000-12-31.log.tar.gz”.

. in globs is a literal dot, not a wildcard.

In general, it’s best to think of them as two different languages which just happen to have some superficial similarities.

If we care about the specific characters we can specify the characters we want to match at that location, in square brackets:

$ cd "$(mktemp --directory)"
$ touch ./1.png ./2.png ./a.png ./b.png
$ echo ./[abcde].png
./a.png ./b.png

A handy but dangerous shortcut is matching character classes. For example, we can match any lowercase ASCII character using the [[:lower:]] pattern:

$ export LC_ALL=en_US.UTF-8
$ cd "$(mktemp --directory)"
$ touch ./A.png ./a.png ./ç.png
$ export LC_ALL=POSIX
$ echo ./[[:lower:]].png
./a.png

Character classes and more are explained in detail in man 7 regex. This is one example where the functionality of globs and regexes blend together.

Hold on, what’s with LC_ALL? And why isn’t the cedilla treated as a lowercase character? The answer to both comes back to locales, which for the purposes of this chapter can be treated as the mapping from bytes to what is loosely called “characters.” Basically, to ensure that the code has a chance of treating strings identically across configurations you must declare the locale first. LC_ALL is a locale override variable, and the value “POSIX” refers to the POSIX locale which is the only one guaranteed to be available on all modern *nix installations. In the POSIX locale [[:lower:]] maps to [abcdefghijklmnopqrstuvwxyz].

Globbing tips

TIP: Use a slash at the end of a glob to match only directories:

$ cd "$(mktemp --directory)"
$ mkdir ./a ./b
$ touch ./1
$ echo ./*/
./a/ ./b/

TIP: Globs won’t match dotfiles by default:

$ cd "$(mktemp --directory)"
$ touch ./.hidden ./shown
$ echo ./*
./shown

TIP: Set the dotglob shell option to match dotfiles:

$ shopt -s dotglob
$ echo ./*
./.hidden ./shown
$ shopt -u dotglob
$ echo ./*
./shown

TIP: Globs are not recursive by default:

$ cd "$(mktemp --directory)"
$ mkdir --parents ./a/b
$ touch ./1.jpg ./a/2.jpg ./a/b/3.jpg
$ echo ./*.jpg
./1.jpg

TIP: Set the globstar shell option to make the special pattern ** match recursively:

$ shopt -s globstar
$ echo ./**/*.jpg
./1.jpg ./a/2.jpg ./a/b/3.jpg

Note that ./**.jpg does not do the same thing, because ** only matches full file and directory names, not a directory path plus a filename.

find {#find}

find website

The find command takes two sets of parameters – a list of directory paths followed by an expression which is like a separate language. For example, to look for PNG files in the “assets” and “images” directories, you would run find ./assets ./images -type f -name ‘*.png’. A few things to note about this command:

  1. Unlike commands like sed and awk the expression is several words, not a single word. This makes it harder to distinguish between shell and find features in the command.

  2. find supports globs in a similar way to Bash, but Bash will expand globs unless quoted or escaped before passing the argument to find. For example, if we forgot to quote *.png above we would get this broken behavior:

$ cd "$(mktemp --directory)"
$ mkdir ./assets
$ touch ./logo.png ./assets/arrow.png
$ find ./assets -name *.png

The find command does not output anything, because after Bash expands the glob the command which actually runs is find ./assets -name logo.png. And if the glob does not happen to match anything in the current directory the behavior would depend on whether failglob or nullglob is set, and would only work by accident if both were unset.

  1. The exit code is zero even if no files are found.

  2. The expression has a truth value per file. This becomes relevant for more complex expressions.

  3. There is an implicit -print flag which prints the filenames matching the preceding expression terminated by a newline at the end of the expression. For any kind of looping over files you should instead use -print0 (or -exec printf ‘%s\0’ {} + if it’s not available) to use a NUL terminator.

  4. The command is recursive by default. This can be controlled with the -mindepth N and -maxdepth M expressions. For example, find . -mindepth 1 -maxdepth 1 lists all the files within the current directory, excluding the directory itself, “.”.

  5. The ever–present link to the parent directory — “…” — is not part of the results.

Run commands on matching files with -execdir

One of find’s superpowers is running commands on files. The easiest way to do this is with -execdir. A common use case is to combine it with grep to search through an arbitrarily complex set of files:

find “$@” -type f -mtime +1 -execdir grep --fixed-strings
–regex XML --regex HTML {} +

  • The first argument is the starting point, which expands to the quoted arguments.

  • -type f specifies that this applies only to “regular” files, as opposed to directories, symlinks etc.

  • -mtime +1 specifies that the modification time of the files must be at least two days in the past, because find always rounds the modification time down.

  • The rest of the arguments follow the synopsis -execdir COMMAND ARGUMENT… {} +. The arguments are passed unchanged to the command.

  • The {} indicates where in the command line the paths will be inserted.

  • The + tells find to run as few commands as possible (can be more than one command for technical reasons) to get through all the matching files. If you need to run a single command per file, for example if {} is not the last argument, you can instead pass a quoted or escaped semicolon character:

find "$1" -type f -execdir mv {} "$2" \;

Using find with or

find will act on files which match all the tests in the expression. If we want to do something with files which match either one or another expression we need to use -or and group the sub–expressions using literal parentheses to override operator precedence. For example, let’s expand the example at the beginning of this section to list both JPEG and PNG files with NUL terminators:

find ./assets ./images -type f ( -name ‘.jpg’ -or -name '.png’ ) -print0.