2014-01-19

How to run custom code before and after main in GCC?

This blog post explains how to register C or C++ code to be run at process startup (i.e. just before *main*) and process exit (e.g. when *main* returns or when *exit*(...) is called). Code to be run at loading and unloading of shared libraries and Linux kernel modules are not covered in this post.

The behavior described in this post has been tested with gcc-4.1 ... gcc-4.8 and clang-3.0 ... clang-3.4. Older versions of GCC and Clang may behave differently.

The behavior described in this post has been tested with (e)glibc-2.7 ... (e)glibc-2.15 and uClibc-0.9.30.1 ... uClibc-0.9.33). Earlier versions and other libc implementations may behave differently. For example, dietlibc-0.33 doesn't execute any of the registered code (so the example below prints just MAIN; MYATEX2; MYATEX1).

The new way (available since gcc-2.7) is declaring a function with attribute((constructor)) (this will make it run at process startup) and declaring a function with attribute((destructor)) (this will make it run at process exit). The double underscores around __attribute__ are there to prevent GCC warnings when compiling standard C (or C++) code. Example:

#include <unistd.h>

__attribute__((constructor)) static void myctor(void) {
  (void)!write(2, "HI\n", 3);
}

__attribute__((destructor)) static void mydtor(void) {
  (void)!write(2, "BYE\n", 4);
}

Upon specifying one of these attributes, GCC (or Clang) appends the function pointer to the sections .init_array (or sometimes .ctors) or .fini_array (or sometimes .dtors), respectively. (You can take a look at objdump -x prog to see if these sections are present.) The libc initialization and exit code will run all functions in these sections. There is a well-defined order (see below) in which these registered functions get run, but the order is within the same translation unit (C or C++ source file) only. It's undefined in which order the translation units are processed.

Please note that the process exit functions are not always called: for example, if the process receives a signal which terminates it (e.g. either from another process or from itself, or from itself, by calling abort()), or if the process calls _exit(...) (with an underscore), then none of the process exit functions are called.

Please note that it's possible to register more process exit functions at runtime, by calling atexit(3) or on_exit(3).

C++ static initialization is equivalent to attribute((constructor)):

#include <unistd.h>
#include <string>

static int get_answer() {
  (void)!write(1, "GA\n", 3);
  return 42;
}
 
/* The call to get_answer works in C++, but it doesn't work in C, because
 * the value of myanswer1 is not a compile-time constant.
 */
int myanswer = get_answer();
std::string hello("World");  /* Registers both a constructor and destructor. */

There is an older alternative for registering process startup and exit functions: by adding code to the body of the _init function in the .init section and to the body of _fini function in the .fini section. The headers of these functions are defined in crti.o and the footers are defined in crtn.o (both of which are part of the libc, use e.g. objdump -d .../crti.o to disassemble them). GCC itself uses this registration mechanism in crtbegin.o to register __do_global_dtors_aux and in crtend.o to register __do_global_ctors_aux.

It is possible to use this older registration alternative in your C or C++ code, but it's a bit inconvenient. Here are some helper macros which make it easy:

/* Usage: DEFINE_INIT(my_init1) { ... }
 * Defines function my_init1 which will be called at startup, before main().
 * As a side effect, defines `static void name() { ... }'.
 */
#define DEFINE_INIT(name) \
    static void name(void); \
    /* If we declared this static, it wouldn't get called. */ \
    __attribute__((section(".trash"))) void __INIT_HELPER__##name(void) { \
      static void (* volatile f)(void) = name; \
      __asm__ __volatile__ (".section .init"); \
      f(); \
      __asm__ __volatile__ (".section .trash"); \
    } \
    static void name(void)

/* Usage: DEFINE_FINI(my_fini1) { ... }
 * Defines function my_fini1 which will be called at process exit.
 * As a side effect, defines `static void name() { ... }'.
 */
#define DEFINE_FINI(name) \
    static void name(void); \
    /* If we declared this static, it wouldn't get called. */ \
    __attribute__((section(".trash"))) void __FINI_HELPER__##name(void) { \
      static void (* volatile f)(void) = name; \
      __asm__ __volatile__ (".section .fini"); \
      f(); \
      __asm__ __volatile__ (".section .trash"); \
    } \
    static void name(void)

For your reference, here are the corresponding much simpler macros for attribute((constructor)) and attribute((destructor)):

/* Usage: DEFINE_CONSTRUCTOR(my_init1) { ... }
 * Defines function my_init1 which will be called at startup, before main().
 * As a side effect, defines `static void name() { ... }'.
 */
#define DEFINE_CONSTRUCTOR(name) \
    __attribute__((constructor)) static void name(void)

/* Usage: DEFINE_DESTRUCTOR(my_init1) { ... }
 * Defines function my_fini1 which will be called at process exit.
 * As a side effect, defines `static void name() { ... }'.
 */
#define DEFINE_DESTRUCTOR(name) \
    __attribute__((destructor)) static void name(void)

It is possible to use the old and the new registration mechanisms at the same time. Here is a sample code which uses both, and C++ static initialization and atexit and on_exit as well.

#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#ifdef __cplusplus
class C {
 public:
  C(const char *msg): msg_(msg) {
    (void)!write(1, "+", 1);  (void)!write(1, msg_, strlen(msg_));
  }
  ~C() {
    (void)!write(1, "-", 1);  (void)!write(1, msg_, strlen(msg_));
  }
 private:
  const char *msg_;
};
#endif

DEFINE_INIT(myinit1) { (void)!write(1, "MYINIT1\n", 8); }
DEFINE_CONSTRUCTOR(myctor1) { (void)!write(1, "MYCTOR1\n", 8); }

#ifdef __cplusplus
static int get_answer(const char *msg) {
  (void)!write(1, msg, strlen(msg));
  return 42;
}
C myobj1("MYOBJ1\n");
int myanswer1 = get_answer("ANSWER1\n");
C myobj2("MYOBJ2\n");
int myanswer2 = get_answer("ANSWER2\n");
#endif

DEFINE_INIT(myinit2) { (void)!write(1, "MYINIT2\n", 8); }
DEFINE_CONSTRUCTOR(myctor2) { (void)!write(1, "MYCTOR2\n", 8); }
DEFINE_FINI(myfini1) { (void)!write(1, "MYFINI1\n", 8); }
DEFINE_DESTRUCTOR(mydtor1) { (void)!write(1, "MYDTOR1\n", 8); }
DEFINE_FINI(myfini2) { (void)!write(1, "MYFINI2\n", 8); }
DEFINE_DESTRUCTOR(mydtor2) { (void)!write(1, "MYDTOR2\n", 8); }
static void myatex1() { (void)!write(1, "MYATEX1\n", 8); }
static void myatex2() { (void)!write(1, "MYATEX2\n", 8); }
static void myonexit(int exitcode, void *arg) {
  const char *msg = (const char*)arg;
  (void)exitcode;
  (void)!write(1, msg, strlen(msg));
}

int main(int argc, char **argv) {
  (void)argc; (void)argv;
  atexit(myatex1);
  on_exit(myonexit, (void*)"MYONEX1\n");
  (void)!write(1, "MAIN\n", 5);
  atexit(myatex2);
  on_exit(myonexit, (void*)"MYONEX2\n");
  return 0;
}

It is not intuitive in which order these are run. Here is the output:

MYINIT1
MYINIT2
+MYOBJ1
ANSWER1
+MYOBJ2
ANSWER2
MYCTOR2
MYCTOR1
MAIN
MYONEX2
MYATEX2
MYONEX1
MYATEX1
-MYOBJ2
-MYOBJ1
MYDTOR1
MYDTOR2
MYFINI1
MYFINI2

Please note that gcc-4.3 and below run MYDTOR1 and MYDTOR2 in the opposite order. All other compilers tested (see above which) use exactly this order. The order is libc-independent, because newer compiler versions with the same libc resulted in different order, while other libc versions with the same compiler version kept the order intact. Please note again that the order of .ctors, .dtors and others is undefined across translation units (C or C++ source files).

2014-01-15

Announcing mplaylist: Audio playlist player using mplayer, with checkpointing

This blog post is the formal announcement of mplaylist, and audio playlist player using mplayer, with checkpointing.

mplaylist is Python script which can play audio playlists (.m3u files), remembering the current playback position (file and time) even when killed, so it will resume playback at the proper position upon restart. The playback position is saved as an .m3u.pos file next to the .m3u file. mplaylist uses mplayer for playing the audio files.

mplayer needs Python and a Unix system with mplayer installed. (It may be easy to port to Windows, but it has not been tried.) Download the script directly from here. There is no GUI. You have to start mplayback from the command-line, in a terminal window.

The reason why I wrote mplaylist is that I needed the following features and I couldn't easily find an audio player for Ubuntu which had all of them:

  • It supports multiple playlists.
  • It remembers the playback position (file and time) for each playlist.
  • Preferably, it remembers playback position even when the process is killed.
  • Lets the user adjust playback speed, without changing the pitch.

mplaylist supports all these features. Checkpointing (i.e. remembering the playback position) works even if both the mplayer and mplaylist processes are killed with kill -9 (SIGKILL). If you have a journaling filesystem with block device barriers properly set up, checkpointing also works if you unplug the power cable.

Please note that mplaylist is not only for music files. It works excellently for playing back (series of) audio books and (series of) talks.

2014-01-11

How to prevent YouTube from using HTTPS

This blog post explains how to configure your web browser (Mozilla Firefox or Google Chrome) to prevent YouTube from redirecting from the http:// protocol to https://. The instructions below work no matter if you are logged in to YouTube.

YouTube has started doing this recently in the last couple of months, and also some browser extensions do it now. Please note that using HTTPS gives you more privacy (e.g. governments and internet service providers spying on you) than HTTP, so please think about it carefully if you want to revert to HTTP on YouTube or not.

Test the protocol: Type youtube.com to your address bar, make sure https:// doesn't show up why typing, and press Enter. Wait for the page to load. If you can't see https:// added to the beginning of the address, and you don't see a lock icon on the left side of the address, then we're done, stop.

If you have the Disconnect browser extension installed, disable it. (You may want to enable or reconfigure it later, after finishing these steps.) If Firefox asks for a browser restart, then restart it. Test the protocol.

If you have the YouTube Center browser extension or the corresponding Greasemonkey script installed, configure it by unticking the Use secure protocol checkbox. Test the protocol.

Remove (delete) all your YouTube cookies. In Chrome, copy-paste chrome://chrome/settings/content to the address bar, press Enter, click on the All cookies and site data... button, search for youtube, make sure that nothing unrelated shows up, and click on the Remove all button. In Firefox, open Edit / Preferences / Privacy / remove individual cookies, search for youtube.com, and click on the Remove all cookies button. Test the protocol.

If you're using Firefox on Linux, remove YouTube from the secure site table. To do it, exit from Firefox, and run the following command in a terminal window (without the leading $):

$ sqlite3.static ~/.mozilla/firefox/*.default/permissions.sqlite "DELETE FROM moz_hosts WHERE type LIKE 'sts%' AND host LIKE '%youtube.com'"

If you get an error message and you don't know how to fix it, or you are using Firefox on non-Linux, you can run the same DELETE FROM ... SQL query (between but without the double quotes above) using the SQLite Manager Firefox extension. Test the protocol.

Test the protocol. If it is still redirecting to https://, then take notes which of your browser extensions are enabled, disable all your browser extensions, and restart the browser. Test the protocol. If it's not redirecting anymore, then enable your browser extensions one-by-one, and figure out which one is the culprit. (There may be multiple ones.) Keep the culprit disabled or change its settings.

If it is still redirecting with all your extensions disabled, then this howto can't help you, try to find a solution on the web, and/or ask a question on webapps.stackexchange.com. Don't forget to reenable your browser extensions.

Some anecdotes: on Firefox, deleting the cookies solve the problem for me, and on Chrome disabling Disconnect solved the problem for me.

2014-01-10

How to remove almost all files from a Git repository

This blog post explains how to remove all files (including their history) from a Git repository, except for files in a whitelist. This can be useful to split a Git repository to two smaller repositories.

This can lead to a data loss, so make sure you have a backup of the repository. Also read the basics about rewriting history and git filter-branch first.

Here is the command which keeps only the files foo and bar/baz (type it without the leading $):

$ (export KEEP="$(echo 'foo'; echo 'bar/baz')";
  NL="$(echo;echo x)"; export NL="${NL%x}"; git filter-branch -f \
  --index-filter 'X="$IFS"; IFS="$NL";
  set -- $(git ls-files | grep -vFx "$KEEP");
  IFS="$X"; test $# -gt 0 &&
  git rm --cached --ignore-unmatch -- "$@"; :' --prune-empty HEAD)

This needs a Bourne-compatible shell, so it won't work out-of-the-box in the Windows command-line, but it will work on most modern Unix systems.

This looks like unnecessarily complex, elaborate and bloated, but all the little tricks are necessary to make it work with files with funny characters in their name and with all modern Bourne-compatible shells. (Only newline and apostrophe (') won't work.)

To keep empty commits, omit the --ignore-unmatch flag.

Please note that if the files you are interested in were renamed, then this command doesn't recognize old names of the files: you have to enumerate the old pathnames explicitly to keep them.

To do the other way round, i.e. to keep all files except foo and bar/baz, do this:

$ (export KEEP="$(echo 'foo'; echo 'bar/baz')";
  NL="$(echo;echo x)"; export NL="${NL%x}"; git filter-branch -f \
  --index-filter 'X="$IFS"; IFS="$NL"; set -- $KEEP;
  IFS="$X"; test $# -gt 0 &&
  git rm --cached --ignore-unmatch -- "$@"; :' --prune-empty HEAD)

2014-01-05

A short file size comparison of small libc implementations for Linux

This blog post gives a short executable file size comparison when the same statically linked, i386 ELF executable was compiled with various small (tiny) libc implementations for Linux.

TL;DR diet libc is producing the smallest executables.

Compiler used: GCC 4.6.3 in Ubuntu Precise.

libc implementations used:

All file sizes are the size of statically linked, Linux i386 ELF, stripped executable, except for source file (where it is the size of a .c source file) and dynamic (where it is the size of a dynamic executable of the same kind).

The source file size reducing compiler flags and tricks in this blog post were used. The programs used dynamic memory allocation (malloc(3), free(3), realloc(3)), system call I/O (e.g. read(2) and write(2)), but none of the printf*(3) functions or stdio.

Compilation results for clang_trampoline.c:

  • source file: 37889 bytes
  • diet libc: 15176 bytes
  • dynamic: 17644 bytes
  • musl: 22420 bytes
  • uClibc: 22580 bytes
  • static: 709120 bytes

Compliation results for xstatic.c:

  • source file: 30410 bytes
  • diet libc: 12316 bytes
  • dynamic: 13516 bytes
  • musl: 18992 bytes
  • uClibc: 19412 bytes
  • static: 705024 bytes

Interesting observation: the diet libc version is smaller than the dynamic version. That's because linking against dynamic shared libraries has its own overhead (e.g. symbol table, PLT) in the executable.

Announcing pts-xstatic: A tool for creating small, statically linked Linux i386 executables with any compiler

This blog post announces pts-xstatic, a convenient wrapper tool for compiling and creating portable, statically linked Linux i386 executables. It works on Linux i386 and Linux x86_64 host systems. It wraps an existing compiler (GCC or Clang) of your choice, and it links against uClibc and the other base libraries included in the pts-xstatic binary release.

See the most recent README for all details.

C compilers supported: gcc-4.1 ... gcc-4.8, clang-3.0 ... clang-3.3. C++ compilers supported: g++ and clang++ corresponding to the supported C compilers. Compatible uClibc C and C++ headers (.h) and precompiled static libraries (e.g. libc.a, libz.a, libstdc++.a) are also provided by pts-xstatic. To minimize system dependencies, pts-xstatic can compile with pts-clang (for both C and C++), which is portable, and you can install it as non-root.

As an alternative of pts-xstatic, if you want a tiny, self-contained (single-file) for Linux i386, please take a look at pts-tcc. With pts-xstatic, you can create faster and smaller statically linked executables, with the compiler of your choice.

As an alternative for pts-xstatic and uClibc, see diet libc and its diet tool (which is an alternative of the xstatic tool), with which you can create even smaller binaries.

Motivation

  1. Available uClibc GCC toolchain binary releases are very old, e.g. the i686 release contains gcc-4.1.2 compiled on 2009-04-11.
  2. With uClibc Buildroot, the uClibc version is tied to a specific GCC version. It's not possible to compile with your favorite preinstalled C or C++ compiler version, and link against your favorite uClibc version. pts-xstatic makes this possible.
  3. libstdc++ is not easily available for uClibc, and it's a bit cumbersome to compile. pts-xstatic contains a precompiled version.

Minimum installation

If you want to install try pts-xstatic quickly, without root access, without installing any dependencies, and without changing any settings, this is the easiest way:

$ cd /tmp
$ rm -f pts-xstatic-latest.sfx.7z
$ wget http://pts.50.hu/files/pts-xstatic/pts-xstatic-latest.sfx.7z
$ chmod +x pts-xstatic-latest.sfx.7z
$ ./pts-xstatic-latest.sfx.7z -y  # Creates the pts-xstatic directory.
$ rm -f pts-clang-latest.sfx.7z
$ wget http://pts.50.hu/files/pts-clang/pts-clang-latest.sfx.7z
$ chmod +x pts-clang-latest.sfx.7z
$ ./pts-clang-latest.sfx.7z -y  # Creates the pts-clang directory.
$ cat >>hw.c <<'END'
#include <stdio.h>
int main(void) {
  return !printf("Hello, %s!\n", "World");
}
END
$ pts-xstatic/bin/xstatic pts-clang/bin/clang -s -O2 -W -Wall hw.c && ./a.out
Hello, World!
$ strace -e open ./a.out
Hello, World!
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
$ ls -l a.out
-rwxr-xr-x 1 pts pts 16888 Jan  2 23:17 a.out
Compare the file size with statically linking against regular (e)glibc:
$ gcc -static -m32 -o a.big -s -O2 -W -Wall hw.c && ./a.big
Hello, World!
$ strace -e open ./a.big
Hello, World!
$ file a.big
a.big: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.24, BuildID[sha1]=0x37284f286ffeecdb7ac5d77bfa83ade4310df098, stripped
$ ls -l a.big
-rwxr-xr-x 1 pts eng 684748 Jan  2 23:20 a.big

FYI with diet libc, the generated a.out file is only 8668 bytes long.

See full installation instructions in the most recent README.

Does pts-xstatic create portable executables?

pts-xstatic creates portable, statically linked, Linux ELF i386 executables, linked against uClibc. By default, these executables don't need any external file (not even the file specified by argv[0], not even the /proc filesystem) to run. NSS libraries (the code needed for e.g. getpwent(3) (getting info of Unix system users) and gethostbyname(3) (DNS resolution)) are also included. The executables also work on FreeBSD in Linux mode if the operating system field in the ELF header frm SYSV to Linux.

As an alternative to pts-xstatic: gcc -static (or clang -static) doesn't provide real portability, because for calls such as getpwent(3) (getting info of Unix system users) and gethostbyname(3) (DNS resolution), glibc loads files such as libnss_compat.so, libnss_dns.so. On the target system those libraries may be incompatible with your binary, so you may get a segfault or unintended behavior. pts-xstatic solves this, because it uses uClibc.

It can be useful to embed locale files, gconv libraries, arbitrary data and configuration files needed by the program, Neither `gcc -static', pts-xstatic or statifier can do it, but Ermine can. Ermine is not free software, but you can get a free-of-charge time-limited trial, and you can ask for a discount for noncommercial use. See all details here, and give it a try!

More info

See the most recent README for full installation instructions, usage details, full feature list etc.

2014-01-02

How to detect integer overflow in C and C++ addition and subtraction

This blog post explains how to detect integer overflow (and underflow) in C and C++ addition and subtraction, and it also gives example code.

Overflow (or underflow, we use these terms interchangeably) occurs when the result of an arithmetic operation cannot be represented as an integer of the same type (and size) as the operands. For unsigned addition, overflow indicates that the result is too large. For unsigned subtraction, overflow indicates that the result is negative. For signed addition and subtraction, overflow indicates that the result is either too small or too large.

When chaining additions, it's useful to compute the sum x + y + c, where c is the carry bit (either 0 or 1) resulting from the previous, less significant addition. Similarly, when chaining subtractions, it's useful to compute the difference x - y - c, where c is the borrow bit (either 0 or 1) resulting from the previous, less significant subtraction.

The freely available Chapter 2 (Basics) of the book Hacker's Delight has a detailed and informative subsection about overflow processing. The formulas presented below are based on formulas in that section. Please read the entire section of the book for a detailed explanation and more formulas (which are useful in other environments).

One simple observation is that signed addition overflows iff the sign of the two operands (x and y) are the same, but it's different from the sign of the sum. Based on similar observations we can devise the following formulas:

  • signed x + y + c overflows iff this is negative: ((x+y+c)^x)&((x+y+c)^y)
  • signed x + y + c overflows iff this is negative: z&(((x^z)+y+c)^~y) after z=(x^~y)&((1<<sizeof(x)*8-1)) (no temporary overflow)
  • signed x - y - c overflows iff this is negative: ((x-y-c)^x)&((x-y-c)^~y)
  • signed x - y - c overflows iff this is negative: z&(((x^z)-y-c)^y) after z=(x^~y)&((1<<sizeof(x)*8-1)) (no temporary overflow)
  • unsigned x + y + c overflows iff this is negative: (x&y)|((x|y)&~(x+y+c))
  • unsigned x - y - c overflows iff this is negative: (~x&y)|((~x|y)&(x-y-c))

Please note that none of the formulas above contain branches, so the CPU pipeline doesn't have to flushed in order to compute them. To convert the sign bit (i.e. negativity) to a bool (0 or 1), shift it down like this: (int)(((((x+y+c)^x)&((x+y+c)^y))>>(sizeof(x)*8-1))&1).

Please note that in standard C and C++ the result of addition and subtraction is undefined (!) if an overflow occurs. The GCC flags -fwrapv and -fno-strict-overflow disable this undefined behavior. But since our code can't be sure if it's compiled with these flags enabled, we must use an overflow-detection formula in which no temporary overflow occurs. Such formulas are also given above. Another option is casting the operands to the corresponding unsigned type, adding them as unsigned (which happens normally, only the least significant bits are kept, as many as possible), and then casting the result back to signed. To do so, we must add these explicit casts in x+y+c and x-y-c in the signed formulas above. These casts can get tricky if we don't know the type of the operands, because there is no overloaded generic cast in C (which e.g. casts int to unsigned and long long to unsigned long long).

See the final code on Github. It can be included as a .h file in C and C++ code. It works with GCC 4.1 and above and Clang 3.0 and above. It uses the GCC extension __typeof__ (also works in Clang) and it uses function overloading in C++ for the generic unsigned cast. In C, it uses the GCC extension __builtin_choose_expr for this cast. It also uses statement expressions in macro bodies to declare temporary variables to avoid useing the arguments more than once.

Further reading:

  • About the C11 _Generic selections (for implementing overridden functions and macros) in this blog post.
  • P99, a huge macro library for C99 (C dialects earlier than C11).
  • An article about proper overflow detection in all C and C++ arithmetic operations. Overflow detection is much harder to do correctly than what you think. The article contains many incorrect naïve implementations, and also the correct (complicated) implementations. Read it, it's worth it!