Strace in Linux: history, design and use

Strace in Linux: history, design and use

In Unix-like operating systems, a program communicates with the outside world and the operating system through a small set of functions - system calls. So, for debugging purposes, it is useful to spy on running processes with system calls.

The utility helps to monitor the "intimate life" of programs on Linux straceto which this article is devoted. A brief history is attached to examples of the use of "spy" equipment. strace and a description of the design of such programs.

Content

Origin of species

The main interface between programs and the OC kernel in Unix is ​​system calls. system calls, syscalls), the interaction of programs with the outside world occurs exclusively through them.

But in the first public version of Unix (Version 6 Unix, 1975) there were no convenient ways to monitor the behavior of user processes. To resolve this issue, Bell Labs to the next version (Version 7 Unix, 1979) proposed a new system call - ptrace.

ptrace was developed primarily for interactive debuggers, but by the end of the 80s (in the era of commercial already System V Release 4) on this basis, narrowly focused debuggers - system call tracers - appeared and became widely used.

First the same version of strace was published by Paul Cronenburg in the comp.sources.sun mailing list in 1992 as an alternative to the proprietary utility trace from sun. Both the clone and the original were intended for SunOS, but by 1994 strace has been ported to System V, Solaris, and the increasingly popular Linux.

Today, strace only supports Linux and relies on the same ptrace, overgrown with many extensions.

Modern (and very active) maintainer strace β€” Dmitry Levin. Thanks to him, the utility acquired advanced features such as error injection into system calls, support for a wide range of architectures, and, most importantly, mascot. Unofficial sources claim that the choice fell on the ostrich because of the consonance of the Russian word "ostrich" and the English word "strace".

It is also important that the ptrace system call and tracers were never included in POSIX, despite a long history and implementation in Linux, FreeBSD, OpenBSD and traditional Unix.

The strace device in a nutshell: Piglet Trace

"You are not expected to understand this" (Dennis Ritchie, comment in Version 6 Unix source code)

From early childhood, I hate black boxes: I didn’t play with toys, but tried to figure out their device (adults used the word β€œbroke”, but don’t believe evil tongues). Perhaps that is why the informal culture of the first Unixes and the modern open-source movement are so close to me.

In the framework of this article, it is unreasonable to disassemble the source code of strace, which has become fertile over decades. But secrets for readers should not remain. Therefore, to show the principle of operation of such strace programs, I will give the code of a miniature tracer - Piglet Trace (ptr). It does not know how to do anything special, but the main thing - the system calls of the program - displays:

$ gcc examples/piglet-trace.c -o ptr
$ ptr echo test > /dev/null
BRK(12) -> 94744690540544
ACCESS(21) -> 18446744073709551614
ACCESS(21) -> 18446744073709551614
unknown(257) -> 3
FSTAT(5) -> 0
MMAP(9) -> 140694657216512
CLOSE(3) -> 0
ACCESS(21) -> 18446744073709551614
unknown(257) -> 3
READ(0) -> 832
FSTAT(5) -> 0
MMAP(9) -> 140694657208320
MMAP(9) -> 140694650953728
MPROTECT(10) -> 0
MMAP(9) -> 140694655045632
MMAP(9) -> 140694655070208
CLOSE(3) -> 0
unknown(158) -> 0
MPROTECT(10) -> 0
MPROTECT(10) -> 0
MPROTECT(10) -> 0
MUNMAP(11) -> 0
BRK(12) -> 94744690540544
BRK(12) -> 94744690675712
unknown(257) -> 3
FSTAT(5) -> 0
MMAP(9) -> 140694646390784
CLOSE(3) -> 0
FSTAT(5) -> 0
IOCTL(16) -> 18446744073709551591
WRITE(1) -> 5
CLOSE(3) -> 0
CLOSE(3) -> 0
unknown(231)
Tracee terminated

Piglet Trace recognizes about a hundred Linux system calls (see below). chart) and only works on x86-64 architecture. This is sufficient for educational purposes.

Let's take a look at how our clone works. In the case of Linux, debuggers and tracers use, as mentioned above, the ptrace system call. It works by passing in the first argument the command IDs, of which we only need PTRACE_TRACEME, PTRACE_SYSCALL ΠΈ PTRACE_GETREGS.

The tracer starts in normal Unix style: fork(2) launches a child process, which in turn uses exec(3) launches the program under investigation. The only subtlety here is the challenge ptrace(PTRACE_TRACEME) before exec: the child process is waiting for the parent process to watch it:

pid_t child_pid = fork();
switch (child_pid) {
case -1:
    err(EXIT_FAILURE, "fork");
case 0:
    /* Child here */
    /* A traced mode has to be enabled. A parent will have to wait(2) for it
     * to happen. */
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    /* Replace itself with a program to be run. */
    execvp(argv[1], argv + 1);
    err(EXIT_FAILURE, "exec");
}

The parent process should now call wait(2) in the child process, that is, make sure that the switch to trace mode has occurred:

/* Parent */

/* First we wait for the child to set the traced mode (see
 * ptrace(PTRACE_TRACEME) above) */
if (waitpid(child_pid, NULL, 0) == -1)
    err(EXIT_FAILURE, "traceme -> waitpid");

At this point, the preparations are complete and you can proceed directly to tracking system calls in an endless loop.

Π’Ρ‹Π·ΠΎΠ² ptrace(PTRACE_SYSCALL) ensures that the next wait parent will complete either before the system call is executed, or immediately after it completes. Between two calls, you can perform any actions: replace the call with an alternative one, change the arguments or return value.

It is enough for us to call the command twice ptrace(PTRACE_GETREGS)to get the state of the register rax before the call (system call number) and immediately after (return value).

Actually, the cycle:

/* A system call tracing loop, one interation per call. */
for (;;) {
    /* A non-portable structure defined for ptrace/GDB/strace usage mostly.
     * It allows to conveniently dump and access register state using
     * ptrace. */
    struct user_regs_struct registers;

    /* Enter syscall: continue execution until the next system call
     * beginning. Stop right before syscall.
     *
     * It's possible to change the system call number, system call
     * arguments, return value or even avoid executing the system call
     * completely. */
  if (ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL) == -1)
      err(EXIT_FAILURE, "enter_syscall");
  if (waitpid(child_pid, NULL, 0) == -1)
      err(EXIT_FAILURE, "enter_syscall -> waitpid");

  /* According to the x86-64 system call convention on Linux (see man 2
   * syscall) the number identifying a syscall should be put into the rax
   * general purpose register, with the rest of the arguments residing in
   * other general purpose registers (rdi,rsi, rdx, r10, r8, r9). */
  if (ptrace(PTRACE_GETREGS, child_pid, NULL, &registers) == -1)
      err(EXIT_FAILURE, "enter_syscall -> getregs");

  /* Note how orig_rax is used here. That's because on x86-64 rax is used
   * both for executing a syscall, and returning a value from it. To
   * differentiate between the cases both rax and orig_rax are updated on
   * syscall entry/exit, and only rax is updated on exit. */
  print_syscall_enter(registers.orig_rax);

  /* Exit syscall: execute of the syscall, and stop on system
   * call exit.
   *
   * More system call tinkering possible: change the return value, record
   * time it took to finish the system call, etc. */
  if (ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL) == -1)
      err(EXIT_FAILURE, "exit_syscall");
  if (waitpid(child_pid, NULL, 0) == -1)
      err(EXIT_FAILURE, "exit_syscall -> waitpid");

  /* Retrieve register state again as we want to inspect system call
   * return value. */
  if (ptrace(PTRACE_GETREGS, child_pid, NULL, &registers) == -1) {
      /* ESRCH is returned when a child terminates using a syscall and no
       * return value is possible, e.g. as a result of exit(2). */
      if (errno == ESRCH) {
          fprintf(stderr, "nTracee terminatedn");
          break;
      }
      err(EXIT_FAILURE, "exit_syscall -> getregs");
  }

  /* Done with this system call, let the next iteration handle the next
   * one */
  print_syscall_exit(registers.rax);
}

That's the whole tracer. Now you know where to start the next porting DTrace on Linux.

Basics: running a program under strace

As a first use case strace, perhaps, it is worth giving the easiest way - running an application running strace.

In order not to delve into the endless list of calls to a typical program, we write minimum program around write:

int main(int argc, char *argv[])
{
    char str[] = "write me to stdoutn";
    /* write(2) is a simple wrapper around a syscall so it should be easy to
     * find in the syscall trace. */
    if (sizeof(str) != write(STDOUT_FILENO, str, sizeof(str))){
        perror("write");
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

Let's build the program and make sure it works:

$ gcc examples/write-simple.c -o write-simple
$ ./write-simple
write me to stdout

And finally run it under strace:

$ strace ./write-simple
pexecve("./write", ["./write"], 0x7ffebd6145b0 /* 71 vars */) = 0
brk(NULL)                               = 0x55ff5489e000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=197410, ...}) = 0
mmap(NULL, 197410, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7a2a633000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "177ELF21133>1260342"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7a2a631000
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f7a2a04c000
mprotect(0x7f7a2a233000, 2097152, PROT_NONE) = 0
mmap(0x7f7a2a433000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f7a2a433000
mmap(0x7f7a2a439000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f7a2a439000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7f7a2a6324c0) = 0
mprotect(0x7f7a2a433000, 16384, PROT_READ) = 0
mprotect(0x55ff52b52000, 4096, PROT_READ) = 0
mprotect(0x7f7a2a664000, 4096, PROT_READ) = 0
munmap(0x7f7a2a633000, 197410)          = 0
write(1, "write me to stdoutn", 20write me to stdout
)  = 20
exit_group(0)                           = ?

Very verbose and not very informative. There are two problems here: the output of the program is mixed with the output strace and an abundance of system calls that don't interest us.

You can separate the program's standard output stream and strace error output using the -o switch, which redirects the list of system calls to the argument file.

It remains to deal with the problem of "extra" calls. Suppose we are only interested in calls write... Key -e allows you to specify expressions by which system calls will be filtered. The most popular variant of the condition is, of course, trace=*, with which you can leave only the calls of interest to us.

When used simultaneously -o ΠΈ -e we will get:

$ strace -e trace=write -owrite-simple.log ./write-simple
write me to stdout
$ cat write-simple.log
write(1, "write me to stdoutn", 20
)  = 20
+++ exited with 0 +++

So, you see, it is much easier to read.

And you can also remove system calls - for example, those associated with the allocation and release of memory:

$ strace -e trace=!brk,mmap,mprotect,munmap -owrite-simple.log ./write-simple
write me to stdout
$ cat write-simple.log
execve("./write-simple", ["./write-simple"], 0x7ffe9972a498 /* 69 vars */) = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=124066, ...}) = 0
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "177ELF21133>1260342"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7f00f0be74c0) = 0
write(1, "write me to stdoutn", 20)  = 20
exit_group(0)                           = ?
+++ exited with 0 +++

Note the escaped exclamation mark in the list of excluded calls: this is required by the shell. shell).

In my version of glibc, the system call terminates the execution of a process exit_grouprather than traditional _exit. This is the difficulty of working with system calls: the interface with which the programmer works is not directly related to system calls. Moreover, it changes regularly depending on implementation and platform.

The Basics: Attaching to a Process on the Fly

Initially, the ptrace system call on which the strace, could only be used when running the program in special mode. Such a restriction may have sounded reasonable in the days of Version 6 Unix. Nowadays, this is no longer enough: sometimes you need to investigate the problems of a working program. A typical example is a process that is locked on a handle or is sleeping. Therefore, modern strace can join processes on the fly.

Hanging example Action:

int main(int argc, char *argv[])
{
    (void) argc; (void) argv;

    char str[] = "write men";

    write(STDOUT_FILENO, str, sizeof(str));

    /* Sleep indefinitely or until a signal arrives */
    pause();

    write(STDOUT_FILENO, str, sizeof(str));

    return EXIT_SUCCESS;
}

Let’s build the program and make sure it’s frozen:

$ gcc examples/write-sleep.c -o write-sleep
$ ./write-sleep
./write-sleep
write me
^C
$

Now let's try to join it:

$ ./write-sleep &
[1] 15329
write me
$ strace -p 15329
strace: Process 15329 attached
pause(
^Cstrace: Process 15329 detached
 <detached ...>

Program blocked by a call pause. Let's see how she reacts to the signals:

$ strace -o write-sleep.log -p 15329 &
strace: Process 15329 attached
$
$ kill -CONT 15329
$ cat write-sleep.log
pause()                                 = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=14989, si_uid=1001} ---
pause(
$
$ kill -TERM 15329
$ cat write-sleep.log
pause()                                 = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=14989, si_uid=1001} ---
pause()                                 = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=14989, si_uid=1001} ---
+++ killed by SIGTERM +++

We launched the hung program and joined it with strace. Two things came to light: the pause system call ignores signals without handlers, and, more interestingly, strace monitors not only system calls, but also incoming signals.

Example: tracking child processes

Working with processes through a call fork is the foundation of all Unixes. Let's see how strace works with the process tree using the example of a simple "breeding" Action:

int main(int argc, char *argv[])
{
    pid_t parent_pid = getpid();
    pid_t child_pid = fork();
    if (child_pid == 0) {
        /* A child is born! */
        child_pid = getpid();

        /* In the end of the day printf is just a call to write(2). */
        printf("child (self=%d)n", child_pid);
        exit(EXIT_SUCCESS);
    }

    printf("parent (self=%d, child=%d)n", parent_pid, child_pid);

    wait(NULL);

    exit(EXIT_SUCCESS);
}

Here, the parent process creates a child process, both of which write to standard output:

$ gcc examples/fork-write.c -o fork-write
$ ./fork-write
parent (self=11274, child=11275)
child (self=11275)

By default, we will see only the parent process's system calls:

$ strace -e trace=write -ofork-write.log ./fork-write
child (self=22049)
parent (self=22048, child=22049)
$ cat fork-write.log
write(1, "parent (self=22048, child=22049)"..., 33) = 33
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=22049, si_uid=1001, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++

The flag helps to keep track of the entire process tree. -fWith which strace monitors system calls in child processes. This adds to each line of output pid process that makes system output:

$ strace -f -e trace=write -ofork-write.log ./fork-write
parent (self=22710, child=22711)
child (self=22711)
$ cat fork-write.log
22710 write(1, "parent (self=22710, child=22711)"..., 33) = 33
22711 write(1, "child (self=22711)n", 19) = 19
22711 +++ exited with 0 +++
22710 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=22711, si_uid=1001, si_status=0, si_utime=0, si_stime=0} ---
22710 +++ exited with 0 +++

In this context, filtering by system call groups can be useful:

$ strace -f -e trace=%process -ofork-write.log ./fork-write
parent (self=23610, child=23611)
child (self=23611)
$ cat fork-write.log
23610 execve("./fork-write", ["./fork-write"], 0x7fff696ff720 /* 63 vars */) = 0
23610 arch_prctl(ARCH_SET_FS, 0x7f3d03ba44c0) = 0
23610 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f3d03ba4790) = 23611
23610 wait4(-1,  <unfinished ...>
23611 exit_group(0)                     = ?
23611 +++ exited with 0 +++
23610 <... wait4 resumed> NULL, 0, NULL) = 23611
23610 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23611, si_uid=1001, si_status=0, si_utime=0, si_stime=0} ---
23610 exit_group(0)                     = ?
23610 +++ exited with 0 +++

By the way, what system call is used to create a new process?

Example: file paths instead of handles

Knowing the file descriptors is certainly useful, but the names of the specific files that the program accesses can also come in handy.

The next program writes a line to a temporary file:

void do_write(int out_fd)
{
    char str[] = "write me to a filen";

    if (sizeof(str) != write(out_fd, str, sizeof(str))){
        perror("write");
        exit(EXIT_FAILURE);
    }
}

int main(int argc, char *argv[])
{
    char tmp_filename_template[] = "/tmp/output_fileXXXXXX";

    int out_fd = mkstemp(tmp_filename_template);
    if (out_fd == -1) {
        perror("mkstemp");
        exit(EXIT_FAILURE);
    }

    do_write(out_fd);

    return EXIT_SUCCESS;
}

During a normal call strace will show the value of the descriptor number passed to the system call:

$ strace -e trace=write -o write-tmp-file.log ./write-tmp-file
$ cat write-tmp-file.log
write(3, "write me to a filen", 20)  = 20
+++ exited with 0 +++

With a flag -y the utility shows the path to the file that the descriptor corresponds to:

$ strace -y -e trace=write -o write-tmp-file.log ./write-tmp-file
$ cat write-tmp-file.log
write(3</tmp/output_fileCf5MyW>, "write me to a filen", 20) = 20
+++ exited with 0 +++

Example: Tracking file accesses

Another useful feature is to display only system calls associated with a specific file. Next program appends a string to an arbitrary file passed as an argument:

void do_write(int out_fd)
{
    char str[] = "write me to a filen";

    if (sizeof(str) != write(out_fd, str, sizeof(str))){
        perror("write");
        exit(EXIT_FAILURE);
    }
}

int main(int argc, char *argv[])
{
    /*
     * Path will be provided by the first program argument.
     *  */
    const char *path = argv[1];

    /*
     * Open an existing file for writing in append mode.
     *  */
    int out_fd = open(path, O_APPEND | O_WRONLY);
    if (out_fd == -1) {
        perror("open");
        exit(EXIT_FAILURE);
    }

    do_write(out_fd);

    return EXIT_SUCCESS;
}

By default strace outputs a lot of redundant information. Flag -P with an argument causes strace to output only accesses to the specified file:

$ strace -y -P/tmp/test_file.log -o write-file.log ./write-file /tmp/test_file.log
$ cat write-file.log
openat(AT_FDCWD, "/tmp/test_file.log", O_WRONLY|O_APPEND) = 3</tmp/test_file.log>
write(3</tmp/test_file.log>, "write me to a filen", 20) = 20
+++ exited with 0 +++

Example: multithreaded programs

Utility strace can help when working with multi-threaded the program. The following program writes to standard output from two streams:

void *thread(void *arg)
{
    (void) arg;

    printf("Secondary thread: workingn");
    sleep(1);
    printf("Secondary thread: donen");

    return NULL;
}

int main(int argc, char *argv[])
{
    printf("Initial thread: launching a threadn");

    pthread_t thr;
    if (0 != pthread_create(&thr, NULL, thread, NULL)) {
        fprintf(stderr, "Initial thread: failed to create a thread");
        exit(EXIT_FAILURE);
    }

    printf("Initial thread: joining a threadn");
    if (0 != pthread_join(thr, NULL)) {
        fprintf(stderr, "Initial thread: failed to join a thread");
        exit(EXIT_FAILURE);
    };

    printf("Initial thread: done");

    exit(EXIT_SUCCESS);
}

It is necessary to compile it, of course, with a special hello to the linker - the -pthread flag:

$ gcc examples/thread-write.c -pthread -o thread-write
$ ./thread-write
/thread-write
Initial thread: launching a thread
Initial thread: joining a thread
Secondary thread: working
Secondary thread: done
Initial thread: done
$

Flag -f, as in the case of normal processes, will prepend each line with the pid of the process.

Naturally, we are not talking about a thread identifier in the sense of implementing the POSIX Threads standard, but about a number used by the Linux task scheduler. From the point of view of the latter, there are no processes and threads - there are tasks that need to be distributed among the available cores of the machine.

When working in multiple threads, system calls become too much:

$ strace -f -othread-write.log ./thread-write
$ wc -l thread-write.log
60 thread-write.log

It makes sense to restrict ourselves to process management and system call write:

$ strace -f -e trace="%process,write" -othread-write.log ./thread-write
$ cat thread-write.log
18211 execve("./thread-write", ["./thread-write"], 0x7ffc6b8d58f0 /* 64 vars */) = 0
18211 arch_prctl(ARCH_SET_FS, 0x7f38ea3b7740) = 0
18211 write(1, "Initial thread: launching a thre"..., 35) = 35
18211 clone(child_stack=0x7f38e9ba2fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f38e9ba39d0, tls=0x7f38e9ba3700, child_tidptr=0x7f38e9ba39d0) = 18212
18211 write(1, "Initial thread: joining a thread"..., 33) = 33
18212 write(1, "Secondary thread: workingn", 26) = 26
18212 write(1, "Secondary thread: donen", 23) = 23
18212 exit(0)                           = ?
18212 +++ exited with 0 +++
18211 write(1, "Initial thread: done", 20) = 20
18211 exit_group(0)                     = ?
18211 +++ exited with 0 +++

By the way, questions. What system call is used to create a new thread? How is this call for threads different from the call for processes?

Master Class: Process Stack at the Time of a System Call

One of the recent strace capabilities - displaying the stack of function calls at the time of the system call. Simple example:

void do_write(void)
{
    char str[] = "write me to stdoutn";
    if (sizeof(str) != write(STDOUT_FILENO, str, sizeof(str))){
        perror("write");
        exit(EXIT_FAILURE);
    }
}

int main(int argc, char *argv[])
{
    do_write();
    return EXIT_SUCCESS;
}

Naturally, the program output becomes very voluminous, and, in addition to the flag -k (displaying the call stack), it makes sense to filter system calls by name:

$ gcc examples/write-simple.c -o write-simple
$ strace -k -e trace=write -o write-simple.log ./write-simple
write me to stdout
$ cat write-simple.log
write(1, "write me to stdoutn", 20)  = 20
 > /lib/x86_64-linux-gnu/libc-2.27.so(__write+0x14) [0x110154]
 > /home/vkazanov/projects-my/strace-post/write-simple(do_write+0x50) [0x78a]
 > /home/vkazanov/projects-my/strace-post/write-simple(main+0x14) [0x7d1]
 > /lib/x86_64-linux-gnu/libc-2.27.so(__libc_start_main+0xe7) [0x21b97]
 > /home/vkazanov/projects-my/strace-post/write-simple(_start+0x2a) [0x65a]
+++ exited with 0 +++

Master class: error injection

And one more new and very useful feature: error injection. Here program, which writes two lines to the output stream:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

void do_write(const char *str, ssize_t len)
{
    if (len != write(STDOUT_FILENO, str, (size_t)len)){
        perror("write");
        exit(EXIT_FAILURE);
    }
}

int main(int argc, char *argv[])
{
    (void) argc; (void) argv;

    char str1[] = "write me 1n";
    do_write(str1, sizeof(str1));

    char str2[] = "write me 2n";
    do_write(str2, sizeof(str2));

    return EXIT_SUCCESS;
}

Tracking both write calls:

$ gcc examples/write-twice.c -o write-twice
$ ./write-twice
write me 1
write me 2
$ strace -e trace=write -owrite-twice.log ./write-twice
write me 1
write me 2
$ cat write-twice.log
write(1, "write me 1n", 12)          = 12
write(1, "write me 2n", 12)          = 12
+++ exited with 0 +++

And now we use the expression injectto insert error EBADF to all write calls:

$ strace -e trace=write -e inject=write:error=EBADF -owrite-twice.log ./write-twice
$ cat write-twice.log
write(1, "write me 1n", 12)          = -1 EBADF (Bad file descriptor) (INJECTED)
write(3, "write: Bad file descriptorn", 27) = -1 EBADF (Bad file descriptor) (INJECTED)
+++ exited with 1 +++

I wonder what errors return all challenges write, including the call hidden behind perror. It makes sense to return an error only for the first of the calls:

$ strace -e trace=write -e inject=write:error=EBADF:when=1 -owrite-twice.log ./write-twice
write: Bad file descriptor
$ cat write-twice.log
write(1, "write me 1n", 12)          = -1 EBADF (Bad file descriptor) (INJECTED)
write(3, "write: Bad file descriptorn", 27) = 27
+++ exited with 1 +++

Or the second one:

$ strace -e trace=write -e inject=write:error=EBADF:when=2 -owrite-twice.log ./write-twice
write me 1
write: Bad file descriptor
$ cat write-twice.log
write(1, "write me 1n", 12)          = 12
write(1, "write me 2n", 12)          = -1 EBADF (Bad file descriptor) (INJECTED)
write(3, "write: Bad file descriptorn", 27) = 27
+++ exited with 1 +++

The error type is optional:

$ strace -e trace=write -e fault=write:when=1 -owrite-twice.log ./write-twice
$ cat write-twice.log
write(1, "write me 1n", 12)          = -1 ENOSYS (Function not implemented) (INJECTED)
write(3, "write: Function not implementedn", 32) = 32
+++ exited with 1 +++

In combination with other flags, you can "break" access to a particular file. Example:

$ strace -y -P/tmp/test_file.log -e inject=file:error=ENOENT -o write-file.log ./write-file /tmp/test_file.log
open: No such file or directory
$ cat write-file.log
openat(AT_FDCWD, "/tmp/test_file.log", O_WRONLY|O_APPEND) = -1 ENOENT (No such file or directory) (INJECTED)
+++ exited with 1 +++

Besides error injection, can introduce delays when making calls or receiving signals.

Afterword

Utility strace is a simple and reliable tool. But in addition to system calls, other aspects of the operation of programs and the operating system also happen to be debugged. For example, it can track calls to dynamically linked libraries ltrace, look into the operation of the operating system can SystemTap ΠΈ ftrace, and allows you to deeply investigate program performance perf. Nevertheless, it is strace - the first line of defense in case of problems with my own and other people's programs, and I use it at least a couple of times a week.

In short, love Unix, read man 1 strace and feel free to spy on your programs!

Source: habr.com

Add a comment