Just Starting Out in Bash Scripting? Here’s How to Do It Right

0
Just Starting Out in Bash Scripting? Here’s How to Do It Right

Bash scripts are powerful, but with power comes great responsibility. It’s very easy for sloppy or poorly-planned code to do real damage, so it’s a good idea to be careful and practice defensive programming.

Thankfully, Bash has several built-in mechanisms to help protect you. Many of these involve updates to syntax that have replaced older, problematic methods. You can use these suggestions to reduce the chance of bugs, debug your programs, and handle edge cases.

Use a Good Shebang Line

The first line of your shell script should always be a special comment, known as the shebang or hashbang, that declares which interpreter should run the script. It can be the name of a shell, a programming language, or—in theory—any other command. You can get by without a shebang, but to make your script standalone, and have it advertise the language it was written in, a shebang is essential.

There are two main schools of thought on how you should structure your shebang. The first is more traditional and looks like this:

        #!/bin/bash
echo "Hello, world"

This shebang line tells whatever shell happens to run the script that it should hand off to a program that lives at /bin/bash. While this approach is fine and should work almost all the time, some people prefer the following:

        #!/usr/bin/env bash
echo "Hello, world"

With just one argument, the env command simply runs it i.e., this shebang will cause env to run bash and pass the script to it. The big difference here is that env uses a command name rather than a full path to an executable. It’s the same kind of difference that you get on the command line when you run a program:

        ls -l *.md
/bin/ls -l *.md

A bare command name, without a path, will run whichever version of that command makes the most sense in context. It could be a shell function, an alias, or a program file located somewhere in your PATH. Importantly, if you have versions in, say, /bin, /usr/local/bin, and ~/.local/bin, env will typically run the “most local,” which is usually what you would want.

The env approach has the advantage that it doesn’t matter if your bash program is in /bin, /local/bin, ~/bin, or anywhere else, as long as it’s in your PATH. This is the more portable option: it will work on more diverse systems, which may not be set up exactly the same as yours.

Meanwhile, the /bin/bash version will guarantee that the program in a specific location will run, even if a different one is installed elsewhere. This can be a more secure option, since another version of bash cannot hijack the script.

Neither approach is more correct than the other; they are simply different. The important thing is to understand the differences and choose the approach that is correct for your situation. If you’re just writing scripts for your own use, it shouldn’t matter too much either way.

Always Quote Your Variables

Few things have caused more problems under Linux than its approach to whitespace, which separates commands from their arguments, and each argument from the others. If you’re not careful, it’s easy for spaces to cause problems, especially when you start working with variables.

Take this example:

        #!/bin/bash

FILENAME="docs/Letter to bank.doc"
ls $FILENAME

When Bash expands a variable, it does so very literally; the final line will end up as the equivalent of:

        ls docs/Letter to bank.doc

Because spaces separate arguments, Bash will interpret this as a call to ls with three arguments: “docs/Letter, ” “to,” and “bank.doc:”

To avoid this problem, make sure you always quote variables when you use them, like this:

        ls "$FILENAME"

You may have spotted scripts that also enclose variable names in braces, like this:

        ls "${FILENAME}"

That’s another good idea, although it’s not necessary in this specific example. Enclosing a variable name in braces makes it easier to follow with other literal text, like:

        echo "_${FILENAME}_ is one of my favourite files"

Without the braces, Bash would try to find a variable named FILENAME_ and would fail.

Stop Your Script on Error

Few things are as risky as unchecked failure. In a shell script, you may be calling many different commands, hoping that they will succeed. You should be checking that carefully, but here’s a useful safety net that will help protect you anyway:

        set -e

The Bash manual describes the function of this setting as:

Exit immediately if a pipeline, which may consist of a single simple command, a list, or a compound command returns a non-zero status.

In simple terms, your script will halt if something goes wrong, and you haven’t already handled it. Take this example:

        #!/bin/bash

touch /file
echo "Now do something with that file..."

The script here assumes that touch will succeed, but that assumption is dangerous:

Adding a call to set -e will cause the script to halt as soon as the touch command fails:

The set command can change various options that control how the shell works. See also, for example, the pipefail setting:

        set -o pipefail

This ensures that a pipeline will exit with a non-zero status, to indicate failure, if any of its components fail. By default, a failure that occurs early on in a pipeline can easily go unnoticed.

Pay It Forward: Halt on Failure

Failing on error is an important catch-all, but you should also look to handle specific failures and take appropriate action. A simple way to check for failure is to check a command’s exit status.

You can check a command’s exit status by inspecting the $? variable after you’ve run it:

        cd "$DIR"

if [ $? -ne 0 ]; then
exit
fi

As a shorthand, you can also use Bash logical operators:

        cd "$DIR" || (echo "bad"; exit)

Debug Each Command

Another highly valuable shell option is xtrace:

        set -o xtrace

This option causes the shell to print commands before it executes them, which is very useful when debugging:

A script using the xtrace setting to show the details of commands—date and ls—that it runs.

The shell now prints each command as it runs, including its arguments.

There are many other options that can help you control the shell’s behavior using set. I strongly recommend reading up on the set builtin in the Bash manual.

Use Long Parameters When Calling Other Commands

Linux commands can be confusing because they tend to use single-letter options:

        rm -rf filename

With commonly used commands, this is less of an issue, but there are so many commands and options out there that you’re bound to come across something unfamiliar eventually. Good programming practices should ensure your script is readable, whether it’s to another member of your team, someone you’ve never communicated with, or yourself at some point in the future.

Here’s a much more readable equivalent of the previous command:

        rm --recursive --force filename

Many modern commands, or modern versions of long-established ones, support long options like this, which start with a “–” and are full words rather than single letters. You can’t combine them like single-letter options, but they are much more readable.

You shouldn’t have to type out these options in full every time you use a command if you can remember the shorter alternative. But in your own shell scripts, especially those that you may share with others, using long options is a form of self-documenting code that you should always aim for.

Use Modern Notation for Command Substitution

In Bash scripts, there are two ways to run a command and capture its output inside a variable:

        VAR=$(ls)
VAR2=`ls`

You’ll see both of these in use, so which one is better?

The backtick approach is actually deprecated; it’s a bit more awkward for various reasons, like the fact that it doesn’t support nesting very well. So, always prefer the modern form, using parentheses.

Declare Default Values

Another handy bit of advanced syntax, this one lets you specify default variable values without writing extra code to check for an empty string:

        CMD=${PAGER:-more}

In this example, the value of $CMD will be the value of the PAGER environment variable, if set, and “more” otherwise.

You can even nest default values. This lets you support a command-line argument, with fallbacks for an environment variable, then a default, e.g.:

        DIR=${1:-${HOME:-/Users/bobby/home}}
    

Be Explicit About Options With a Double Dash

Just as spaces in filenames can be problematic, so too can many other characters. A classic example is the case of a file with a leading “-:”

        echo "nothing much" > -a-silly-filename

You can confirm this file’s existence by listing its directory:

A terminal session creating a filename with a leading hyphen, then confirming its existence with ls.

But interacting with the file directly, using its name, will cause problems:

An error reported from ls when processing a filename beginning with a hyphen. The error reads "unrecognized option".

Like most commands, ls expects an argument beginning with a “-” to be an option, hence the “unrecognized option” error. This might seem like a trivial issue, but it gets much worse if you consider a command like this:

        rm *

If you have a file in your directory named “-rf,” this could lead to disaster!

You can avoid a lot of problems by keeping filenames simple: lowercase a-z names without any other characters should never cause a problem. However, you should always program defensively in your own scripts and programs to avoid issues.

The best protection against this kind of problem is the “double dash” syntax, which looks like this:

        rm -- *.md

The double dash explicitly declares that “everything after this is an argument,” meaning that rm won’t interpret any weird filenames as if they were options instead.

Use Local Variables in Functions

You may have heard that global variables are unsafe or otherwise recommended against. While the truth is more nuanced, it’s often a good idea to avoid global variables unless you really know what you’re doing.

In a shell script, variables are global by default, even inside functions:

        #!/bin/bash

function run {
    DIR=`pwd`
    echo "doing something..."
}

DIR="/usr/local/bin"
run
echo $DIR

It’s easy to accidentally reuse a variable name and forget that you’re changing its value throughout your entire script, not just inside the function that’s running. The fix is simple, though: simply declare it as a local variable:

        function run {
    local DIR=`pwd`
}

link

Leave a Reply

Your email address will not be published. Required fields are marked *