Cron in Linux: history, usage and device

Cron in Linux: history, usage and device

The classic wrote that happy hours are not observed. In those wild times, there were no programmers or Unix yet, but these days, programmers know for sure: instead of them, cron will track the time.

Command line utilities are both weakness and routine for me. sed, awk, wc, cut and other old programs are run by scripts on our servers daily. Many of them are designed as tasks for cron, a scheduler from the 70s.

I used cron superficially for a long time, without delving into the details, but one day, faced with an error when running the script, I decided to look into it thoroughly. This is how this article came about, during which I got acquainted with the POSIX crontab, the main cron options in popular Linux distributions, and the design of some of them.

Are you using Linux and running tasks in cron? Are you interested in the architecture of system applications in Unix? Then we're on our way!

Content

Origin of species

Periodic execution of user or system programs is an obvious necessity in all operating systems. Therefore, programmers have recognized the need for services that allow centralized planning and execution of tasks for a very long time.

Unix-like operating systems trace their lineage back to Version 7 Unix, developed in the 70s of the last century at Bell Labs, including the famous Ken Thompson. Version 7 Unix also shipped with cron, a service for regularly running superuser tasks.

A typical modern cron is a simple program, but the algorithm of the original version was even simpler: the service woke up once a minute, read a table with tasks from a single file (/etc/lib/crontab) and performed for the superuser those tasks that should have been completed in the current minute .

Subsequently, improved versions of the simple and useful service were shipped with all Unix-like operating systems.

Generalized descriptions of the crontab format and the basic principles of the utility in 1992 were included in the main standard for Unix-like operating systems - POSIX - and thus cron from the de facto standard became the de jure standard.

In 1987, Paul Vixie, after polling Unix users about cron wishes, released another version of the daemon that fixed some of the problems of traditional cron and extended the syntax of table files.

By the third version of Vixie cron began to meet the requirements of POSIX, in addition, the program had a liberal license, or rather, there was no license at all, except for the wishes in the README: the author does not give guarantees, the author's name cannot be deleted, and the program can only be sold together with source code. These requirements turned out to be compatible with the principles of free software, which was gaining popularity in those years, so some of the key Linux distributions that appeared in the early 90s took Vixie cron as a system one and are still developing it.

In particular, Red Hat and SUSE are developing a fork of Vixie cron - cronie, while Debian and Ubuntu use the original edition of Vixie cron with many patches.

Let's first look at the POSIX user-defined crontab utility, and then look at the syntax extensions introduced in Vixie cron and the use of Vixie cron variations in popular Linux distributions. And finally, the icing on the cake is disassembling the device of the cron daemon.

POSIX crontab

While the original cron always ran for the superuser, modern schedulers are more likely to deal with regular user tasks, which is safer and more convenient.

Crons come with a set of two programs: a constantly running cron daemon and a user-accessible crontab utility. The latter allows you to edit task tables specific to each user in the system, while the daemon runs tasks from user and system tables.

В POSIX standard the behavior of the daemon is not described in any way and only the user program is formalized crontab. The existence of mechanisms for launching user tasks is, of course, implied, but not described in detail.

There are four things that can be done by calling the crontab utility: edit the user's task table in the editor, load the table from a file, show the current task table, and clear the task table. Examples of the crontab utility:

crontab -e # редактировать таблицу задач
crontab -l # показать таблицу задач
crontab -r # удалить таблицу задач
crontab path/to/file.crontab # загрузить таблицу задач из файла

On call crontab -e the editor specified in the standard environment variable will be used EDITOR.

The tasks themselves are described in the following format:

# строки-комментарии игнорируются
#
# задача, выполняемая ежеминутно
* * * * * /path/to/exec -a -b -c
# задача, выполняемая на 10-й минуте каждого часа
10 * * * * /path/to/exec -a -b -c
# задача, выполняемая на 10-й минуте второго часа каждого дня и использующая перенаправление стандартного потока вывода
10 2 * * * /path/to/exec -a -b -c > /tmp/cron-job-output.log

The first five fields of records: minutes [1..60], hours [0..23], days of the month [1..31], months [1..12], days of the week [0..6], where 0 — Sunday. The last, sixth, field is a line that will be executed by the standard command interpreter.

In the first five fields, values ​​can be listed separated by commas:

# задача, выполняемая в первую и десятую минуты каждого часа
1,10 * * * * /path/to/exec -a -b -c

Or with a hyphen:

# задача, выполняемая в каждую из первых десяти минут каждого часа
0-9 * * * * /path/to/exec -a -b -c

User access to task scheduling is regulated in POSIX by the cron.allow and cron.deny files, which list, respectively, users with crontab access and users without program access. The standard does not regulate the location of these files in any way.

Started programs, according to the standard, must be passed at least four environment variables:

  1. HOME is the user's home directory.
  2. LOGNAME - user login.
  3. PATH - the path where you can find the standard system utilities.
  4. SHELL is the path to the used shell.

Notably, POSIX says nothing about where the values ​​for these variables come from.

Bestseller - Vixie cron 3.0pl1

The common ancestor of popular cron variants is Vixie cron 3.0pl1, introduced in the comp.sources.unix mailing list in 1992. We will consider the main features of this version in more detail.

Vixie cron comes in two programs (cron and crontab). As usual, the daemon is responsible for reading and running tasks from the system task table and individual user task tables, while the crontab utility is responsible for editing user tables.

Task table and configuration files

The superuser task table is located in /etc/crontab. The syntax of the system table corresponds to the syntax of Vixie cron, with the exception that the sixth column indicates the name of the user on whose behalf the task is launched:

# Запускается ежеминутно от пользователя vlad
* * * * * vlad /path/to/exec

Regular user task tables reside in /var/cron/tabs/username and use a common syntax. When you run the crontab utility as a user, it is these files that are edited.

The lists of users who have access to crontab are managed in the /var/cron/allow and /var/cron/deny files, where it is enough to enter the username on a separate line.

Extended Syntax

Compared to the POSIX crontab, Paul Wixey's solution contains some very useful modifications to the syntax of the utility's task tables.

A new table syntax has become available: for example, you can specify the days of the week or months by name (Mon, Tue, and so on):

# Запускается ежеминутно по понедельникам и вторникам в январе
* * * Jan Mon,Tue /path/to/exec

You can specify the step through which tasks are launched:

# Запускается с шагом в две минуты
*/2 * * * Mon,Tue /path/to/exec

Steps and intervals can be mixed:

# Запускается с шагом в две минуты в первых десять минут каждого часа
0-10/2 * * * * /path/to/exec

Intuitive alternatives to the usual syntax are supported (reboot, yearly, annually, monthly, weekly, daily, midnight, hourly):

# Запускается после перезагрузки системы
@reboot /exec/on/reboot
# Запускается раз в день
@daily /exec/daily
# Запускается раз в час
@hourly /exec/daily

Task Execution Environment

Vixie cron allows you to change the environment of launched applications.

The environment variables USER, LOGNAME and HOME are not just provided by the daemon, but are taken from the file Passwd. The PATH variable is set to "/usr/bin:/bin" and the SHELL variable is set to "/bin/sh". The values ​​of all variables except LOGNAME can be changed in user tables.

Some environment variables (primarily SHELL and HOME) are used by cron itself to start a task. Here's what using bash instead of the default sh to run user tasks might look like:

SHELL=/bin/bash
HOME=/tmp/
# exec будет запущен bash-ем в /tmp/
* * * * * /path/to/exec

Eventually, all environment variables defined in the table (either used by cron or needed by the process) will be passed to the running task.

To edit files, crontab uses the editor specified in the VISUAL or EDITOR environment variable. If the environment where crontab was run does not have these variables defined, then "/usr/ucb/vi" is used (ucb is probably the University of California, Berkeley).

cron on Debian and Ubuntu

Debian and derivatives developers have released heavily modified version version of Vixie cron 3.0pl1. There are no differences in the syntax of table files; for users, this is the same Vixie cron. Biggest New Features: Support syslog, SELinux и PAM.

Of the less noticeable, but tangible changes - the location of the configuration files and task tables.

User tables in Debian are located in the /var/spool/cron/crontabs directory, the system table is still there - in /etc/crontab. Debian package-specific task tables are placed in /etc/cron.d, from which the cron daemon automatically reads them. User access control is governed by the /etc/cron.allow and /etc/cron.deny files.

The default shell is still /bin/sh, which in Debian is a small POSIX-compliant shell dash, launched without reading any configuration (in non-interactive mode).

Cron itself in recent versions of Debian is run via systemd, and the startup configuration can be found in /lib/systemd/system/cron.service. There is nothing special in the service configuration, any finer task management can be done through environment variables declared directly in the crontab of each user.

cronie on RedHat, Fedora and CentOS

crony - fork of Vixie cron version 4.1. As in Debian, the syntax has not changed, but added support for PAM and SELinux, working in a cluster, monitoring files with inotify, and other features.

The default configuration is in the usual places: the system table is in /etc/crontab, packages put their tables in /etc/cron.d, user tables go to /var/spool/cron/crontabs.

The daemon runs under systemd, the service configuration is /lib/systemd/system/crond.service.

On Red Hat-like distributions, the default startup is /bin/sh, which is the standard bash. Note that when running cron jobs via /bin/sh, the bash shell starts in POSIX-compliant mode and does not read any additional configuration, running in non-interactive mode.

cronie in SLES and openSUSE

The German SLES distribution and its openSUSE derivative use the same cronie. The daemon here also runs under systemd, the service configuration is in /usr/lib/systemd/system/cron.service. Configuration: /etc/crontab, /etc/cron.d, /var/spool/cron/tabs. /bin/sh is the same bash running in POSIX-compliant non-interactive mode.

Vixie cron device

Modern descendants of cron have not changed radically compared to Vixie cron, but still acquired new features that are not required to understand the principles of the program. Many of these extensions are sloppy and confuse the code. The original cron source code by Paul Wixie is a pleasure to read.

Therefore, I decided to analyze the cron device using the example of a program common to both branches of cron development - Vixie cron 3.0pl1. I will simplify the examples by removing ifdefs that make reading difficult and omitting secondary details.

The work of the demon can be divided into several stages:

  1. Program initialization.
  2. Collect and update the list of tasks to run.
  3. The work of the main cron loop.
  4. Starting a task.

Let's take them in order.

Initialization

On startup, after checking the process arguments, cron installs the SIGCHLD and SIGHUP signal handlers. The first one logs an entry about the termination of the child process, the second one closes the file descriptor of the log file:

signal(SIGCHLD, sigchld_handler);
signal(SIGHUP, sighup_handler);

The cron daemon in the system always runs alone, only in the role of superuser and from the main cron directory. The following calls create a lock file with the PID of the daemon process, make sure the user is correct, and change the current directory to the main directory:

acquire_daemonlock(0);
set_cron_uid();
set_cron_cwd();

The default path is set, which will be used when starting processes:

setenv("PATH", _PATH_DEFPATH, 1);

Next, the process is “daemonized”: it creates a child copy of the process by calling fork and a new session in the child process (by calling setsid). The parent process is no longer needed - and it exits:

switch (fork()) {
case -1:
    /* критическая ошибка и завершение работы */
    exit(0);
break;
case 0:
    /* дочерний процесс */
    (void) setsid();
break;
default:
    /* родительский процесс завершает работу */
    _exit(0);
}

Terminating the parent process releases the lock on the lock file. In addition, you need to update the PID in the file to a child. After that, the database of tasks is filled:

/* повторный захват лока */
acquire_daemonlock(0);

/* Заполнение БД  */
database.head = NULL;
database.tail = NULL;
database.mtime = (time_t) 0;
load_database(&database);

Then cron moves on to the main work loop. But before that, it's worth taking a look at loading the task list.

Collecting and updating the list of tasks

The load_database function is responsible for loading the list of tasks. It checks the main system crontab and user files directory. If the files and directory have not changed, then the list of tasks is not reread. Otherwise, a new list of tasks starts to form.

Loading a system file with special file and table names:

/* если файл системной таблицы изменился, перечитываем */
if (syscron_stat.st_mtime) {
    process_crontab("root", "*system*",
    SYSCRONTAB, &syscron_stat,
    &new_db, old_db);
}

Loading custom tables in a loop:

while (NULL != (dp = readdir(dir))) {
    char    fname[MAXNAMLEN+1],
            tabname[MAXNAMLEN+1];
    /* читать файлы с точкой не надо*/
    if (dp->d_name[0] == '.')
            continue;
    (void) strcpy(fname, dp->d_name);
    sprintf(tabname, CRON_TAB(fname));
    process_crontab(fname, fname, tabname,
                    &statbuf, &new_db, old_db);
}

After that, the old database is replaced by a new one.

In the examples above, the call to process_crontab checks for the existence of a user corresponding to the table's filename (unless it's the superuser) and then calls load_user. The latter already reads the file itself line by line:

while ((status = load_env(envstr, file)) >= OK) {
    switch (status) {
    case ERR:
        free_user(u);
        u = NULL;
        goto done;
    case FALSE:
        e = load_entry(file, NULL, pw, envp);
        if (e) {
            e->next = u->crontab;
            u->crontab = e;
        }
        break;
    case TRUE:
        envp = env_set(envp, envstr);
        break;
    }
}

Here either the environment variable is set (strings like VAR=value) by the load_env / env_set functions, or the task description (* * * * * /path/to/exec) is read by the load_entry function.

The entry entity that load_entry returns is our task, which is placed in the general list of tasks. In the function itself, a verbose parsing of the time format is carried out, but we are more interested in the formation of environment variables and task launch parameters:

/* пользователь и группа для запуска задачи берутся из passwd*/
e->uid = pw->pw_uid;
e->gid = pw->pw_gid;

/* шелл по умолчанию (/bin/sh), если пользователь не указал другое */
e->envp = env_copy(envp);
if (!env_get("SHELL", e->envp)) {
    sprintf(envstr, "SHELL=%s", _PATH_BSHELL);
    e->envp = env_set(e->envp, envstr);
}
/* домашняя директория */
if (!env_get("HOME", e->envp)) {
    sprintf(envstr, "HOME=%s", pw->pw_dir);
    e->envp = env_set(e->envp, envstr);
}
/* путь для поиска программ */
if (!env_get("PATH", e->envp)) {
    sprintf(envstr, "PATH=%s", _PATH_DEFPATH);
    e->envp = env_set(e->envp, envstr);
}
/* имя пользовтеля всегда из passwd */
sprintf(envstr, "%s=%s", "LOGNAME", pw->pw_name);
e->envp = env_set(e->envp, envstr);

With the actual list of tasks, the main loop works.

Main Loop

The original cron from Version 7 Unix worked quite simply: it reread the configuration in a loop, started the tasks of the current minute as the superuser, and slept until the beginning of the next minute. This simple approach on older machines required too many resources.

In SysV, an alternative version has been proposed where the daemon sleeps either until the nearest minute for which the task is defined, or for 30 minutes. Resources for rereading the configuration and checking tasks in this mode consumed less, but it became inconvenient to quickly update the list of tasks.

Vixie cron returned to checking task lists once a minute, since by the end of the 80s there were much more resources on standard Unix machines:

/* первичная загрузка задач */
load_database(&database);
/* запустить задачи, поставленные к выполнению после перезагрузки системы */
run_reboot_jobs(&database);
/* сделать TargetTime началом ближайшей минуты */
cron_sync();
while (TRUE) {
    /* выполнить задачи, после чего спать до TargetTime с поправкой на время, потраченное на задачи */
    cron_sleep();

    /* перечитать конфигурацию */
    load_database(&database);

    /* собрать задачи для данной минуты */
    cron_tick(&database);

    /* перевести TargetTime на начало следующей минуты */
    TargetTime += 60;
}

The cron_sleep function is directly involved in the execution of tasks, calling the functions job_runqueue (enumerating and starting tasks) and do_command (starting each individual task). The last function is worth analyzing in more detail.

Starting a task

The do_command function is done in good Unix style, that is, it does a fork to execute a task asynchronously. The parent process continues to run tasks, while the child process prepares the task process:

switch (fork()) {
case -1:
    /*не смогли выполнить fork */
    break;
case 0:
    /* дочерний процесс: на всякий случай еще раз пробуем захватить главный лок */
    acquire_daemonlock(1);
    /* переходим к формированию процесса задачи */
    child_process(e, u);
    /* по завершению дочерний процесс заканчивает работу */
    _exit(OK_EXIT);
    break;
default:
    /* родительский процесс продолжает работу */
    break;
}

There is a lot of logic in child_process: it takes the standard output and error streams on itself, so that it can then be sent to the mail (if the MAILTO environment variable is specified in the task table), and, finally, it waits for the task's main process to complete.

The task process is formed by another fork:

switch (vfork()) {
case -1:
    /* при ошибки сразу завершается работа */
    exit(ERROR_EXIT);
case 0:
    /* процесс-внук формирует новую сессию, терминал и т.д.
     */
    (void) setsid();

    /*
     * дальше многословная настройка вывода процесса, опустим для краткости
     */

    /* смена директории, пользователя и группы пользователя,
     * то есть процесс больше не суперпользовательский
     */
    setgid(e->gid);
    setuid(e->uid);
    chdir(env_get("HOME", e->envp));

    /* запуск самой команды
     */
    {
        /* переменная окружения SHELL указывает на интерпретатор для запуска */
        char    *shell = env_get("SHELL", e->envp);

        /* процесс запускается без передачи окружения родительского процесса,
         * то есть именно так, как описано в таблице задач пользователя  */
        execle(shell, shell, "-c", e->cmd, (char *)0, e->envp);

        /* ошибка — и процесс на запустился? завершение работы */
        perror("execl");
        _exit(ERROR_EXIT);
    }
    break;
default:
    /* сам процесс продолжает работу: ждет завершения работы и вывода */
    break;
}

That, in general, is the whole cron. I omitted some interesting details, for example, accounting for remote users, but outlined the main thing.

Afterword

Сron is a surprisingly simple and useful program, made in the best traditions of the Unix world. She does not do anything extra, but she has been doing her job remarkably for several decades now. It took less than an hour to review the code for the version that ships with Ubuntu, and I had a lot of fun! I hope I was able to share it with you.

I don't know about you, but I'm a little sad to realize that modern programming, with its tendency to overcomplicate and over-abstract, has long ceased to be conducive to such simplicity.

There are many modern alternatives to cron: systemd-timers allow you to organize complex systems with dependencies, in fcron you can more flexibly regulate the resource consumption of tasks. But personally, the simplest crontab has always been enough for me.

In short, love Unix, use simple programs, and don't forget to read the mana for your platform!

Source: habr.com

Add a comment