🥇Bash Scripting Best Practices: A Quick Guide to Reliable and Performant Bash Scripts

Shell wallpaper by manapi

Debugging bash scripts is like looking for a needle in a haystack, especially when new additions appear in an existing codebase without considering structure, logging, and reliability issues in a timely manner. In such situations, you can find yourself both because of your own mistakes, and when managing complex piles of scripts.

Team Mail.ru Cloud Solutions translated an article with recommendations to help you write, debug, and maintain your scripts better. Believe it or not, nothing beats the satisfaction of writing clean, ready-to-use bash code that works every time.

In the article, the author shares what he has learned over the past few years, as well as some common mistakes that took him by surprise. This is important because every software developer at some point in their career works with scripts to automate routine work tasks.

Trap Handlers

Most of the bash scripts I've come across never use an efficient cleanup mechanism when something unexpected happens during script execution.

Surprises can arise from outside, such as receiving a signal from the core. Handling such cases is extremely important in order for the scripts to be reliable enough to run on production systems. I often use exit handlers to respond to scenarios like this:

function handle_exit() {
  // Add cleanup code here
  // for eg. rm -f "/tmp/${lock_file}.lock"
  // exit with an appropriate status code
}
  
// trap <HANDLER_FXN> <LIST OF SIGNALS TO TRAP>
trap handle_exit 0 SIGHUP SIGINT SIGQUIT SIGABRT SIGTERM

trap is a built-in shell command that helps you register a cleanup function that is called on any signals. However, special care must be taken with handlers such as SIGINT, which causes the script to abort.

In addition, in most cases, you should only catch EXIT, but the idea is that you can actually customize the behavior of the script on a per-signal basis.

Set Builtin Functions - Quick Exit on Error

It is very important to react to errors as soon as they occur, and quickly stop execution. Nothing could be worse than continuing to run a command like this:

rm -rf ${directory_name}/*

Note that the variable directory_name not determined.

It is important to use built-in functions to handle such scenarios set, such as set -o errexit, set -o pipefail or set -o nounset at the beginning of the script. These functions ensure that your script exits as soon as it encounters any non-zero exit code, use of undefined variables, invalid commands passed over a pipe, and so on:

#!/usr/bin/env bash

set -o errexit
set -o nounset
set -o pipefail

function print_var() {
  echo "${var_value}"
}

print_var

$ ./sample.sh
./sample.sh: line 8: var_value: unbound variable

Note: built-in functions like set -o errexit, will exit the script as soon as the "raw" return code (other than zero) appears. So it's better to introduce custom error handling, like this:

#!/bin/bash
error_exit() {
  line=$1
  shift 1
  echo "ERROR: non zero return code from line: $line -- $@"
  exit 1
}
a=0
let a++ || error_exit "$LINENO" "let operation returned non 0 code"
echo "you will never see me"
# run it, now we have useful debugging output
$ bash foo.sh
ERROR: non zero return code from line: 9 -- let operation returned non 0 code

Scripting like this forces you to pay more attention to the behavior of all the commands in the script and anticipate the possibility of an error occurring before it catches you off guard.

ShellCheck to catch errors during development

It's worth integrating something like Shellcheck into your development and test pipelines to check your bash code against best practices.

I use it in my local development environments to get reports on syntax, semantics, and some code errors that I may have missed while developing. This is a static analysis tool for your bash scripts and I highly recommend using it.

Using your own exit codes

Return codes in POSIX are not just zero or one, but zero or a non-zero value. Use these options to return custom error codes (between 201-254) for various error cases.

This information can then be used by other scripts that wrap yours to understand exactly what type of error has occurred and react accordingly:

#!/usr/bin/env bash

SUCCESS=0
FILE_NOT_FOUND=240
DOWNLOAD_FAILED=241

function read_file() {
  if ${file_not_found}; then
    return ${FILE_NOT_FOUND}
  fi
}

Note: please be especially careful with the variable names you define to prevent accidental redefinition of environment variables.

Logging functions

Nice and structured logging is important to easily understand the results of your script execution. As with other high-level programming languages, I always use my own logging functions in my bash scripts, such as __msg_info, __msg_error and so on.

This helps ensure a standardized logging structure by only making changes in one place:

#!/usr/bin/env bash

function __msg_error() {
    [[ "${ERROR}" == "1" ]] && echo -e "[ERROR]: $*"
}

function __msg_debug() {
    [[ "${DEBUG}" == "1" ]] && echo -e "[DEBUG]: $*"
}

function __msg_info() {
    [[ "${INFO}" == "1" ]] && echo -e "[INFO]: $*"
}

__msg_error "File could not be found. Cannot proceed"

__msg_debug "Starting script execution with 276MB of available RAM"

I usually try to have some mechanism in my scripts __initwhere such logger variables and other system variables are initialized or set to their default values. These variables can also be set from command line options during script invocation.

For example, something like:

$ ./run-script.sh --debug

When such a script is executed, it ensures that the system-wide settings are set to their defaults if required, or at least initialized to something appropriate if needed.

I usually base the choice of what to initialize and what not on a trade-off between the user interface and configuration details that the user can/should delve into.

Architecture for reuse and clean system state

Modular / reusable code

├── framework
│   ├── common
│   │   ├── loggers.sh
│   │   ├── mail_reports.sh
│   │   └── slack_reports.sh
│   └── daily_database_operation.sh

I keep a separate repository that can be used to initialize a new bash project/script that I want to develop. Anything that can be reused can be stored in a repository and retrieved by other projects that want to use such functionality. This organization of projects greatly reduces the size of other scripts and also ensures that the codebase is small and easy to test.

As in the example above, all logging functions such as __msg_info, __msg_error and others, such as reports on Slack, are contained separately in common/* and are dynamically included in other scenarios, like daily_database_operation.sh.

Leave a clean system behind

If you are loading some resources during script execution, it is recommended to store all such data in a shared directory with a random name, for example /tmp/AlRhYbD97/*. You can use random text generators to choose a directory name:

rand_dir_name="$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 16 | head -n 1)"

Upon completion, cleanup of such directories can be provided in the hook handlers discussed above. If the temporary directories are not taken care of, they accumulate and at some point cause unexpected problems on the host, such as a full disk.

Using lock files

It is often necessary to ensure that only one instance of a script is running on a host at any given time. This can be done using lock files.

I usually create lock files in /tmp/project_name/*.lock and check their presence at the beginning of the script. This helps the script exit gracefully and avoid unexpected system state changes by another script running in parallel. Lock files are not needed if you need the same script to run in parallel on a given host.

Measure and improve

We often have to work with scenarios that run over a long period of time, such as daily database operations. Such operations usually involve a sequence of steps: loading data, checking for anomalies, importing data, sending status reports, and so on.

In such cases, I always try to break the script into separate small scripts and report their status and execution time with:

time source "${filepath}" "${args}">> "${LOG_DIR}/RUN_LOG" 2>&1

Later, I can view the execution time with:

tac "${LOG_DIR}/RUN_LOG.txt" | grep -m1 "real"

This helps me identify problem/slow areas in scripts that need optimization.

Good luck!

What else to read:

Source: habr.com

Bash Scripting Best Practices: A Quick Guide to Reliable and Performant Bash Scripting