2010-01-31

How to swap two nodes in a doubly linked list

This blog post gives example C code how to swap two elements (nodes, items) of a doubly linked list. The code works for both circular and non-circular lists, even if the two arguments are the same, or if they are adjacent in the list. (It is surprisingly complicated to give a correct and elegant solution.)

#include <stdlib.h>  /* NULL */

typedef int node_data_t;  /* can be any other type as well */

struct node {
  struct node* prev;
  struct node* next;
  node_data_t data;
};

/** If node1 and node2 are non-NULL members of a doubly-linked list
 * (circular or not), swap their data fields.
 */
void swap_data(struct node* node1, struct node* node2) {
  node_data_t temp_data = node1->data;
  node1->data = node2->data;
  node2->data = temp_data;
}

/** If node1 and node2 are non-NULL members of a doubly-linked list
 * (circular or not), swap their data fields. If node1 or node2 is the head
 * of the non-circular list, return the new head.
 */
struct node* swap(struct node* node1, struct node* node2) {
  struct node* temp;
  temp = node1->next;
  node1->next = node2->next;
  node2->next = temp;
  if (node1->next != NULL)
    node1->next->prev = node1;
  if (node2->next != NULL)
    node2->next->prev = node2;
  temp = node1->prev;
  node1->prev = node2->prev;
  node2->prev = temp;
  if (node1->prev != NULL)
    node1->prev->next = node1;
  if (node2->prev == NULL)
    return node2;
  node2->prev->next = node2;
  return node1;
}

The code above is based on the discussion at http://bytes.com/topic/c/answers/219236-double-linked-list-elements-swap

swap_data is simpler, and it's also faster if the data is small (i.e. at most 2 pointers). However, swap_data cannot be used when there are external pointers inside the data.

2010-01-30

How to make Midnight Commander exit to its current directory

This blog post gives instructions how to make the current directory of the shell calling Midnight Commander be mc's current directory upon exiting from mc.

The quick answer is to add alias mc=". /usr/share/mc/bin/mc-wrapper.sh" (or something similar, with a different path) to your shell startup script. If you don't know how to do that, read on.

On Ubuntu (Hardy, Jaunty, Karmic etc.) or Debian (Lenny etc.), run this as root (copy-paste as a whole):

grep -l bash\.bashrc /etc/profile || (echo
    echo 'test "$PS1" && test "$BASH" && . /etc/bash.bashrc') |
    tee -a /etc/profile
echo 'type -p -a mc >/dev/null &&
    alias mc=". /usr/share/mc/bin/mc-wrapper.sh"' | tee -a /etc/bash.bashrc

On the Mac OS/X with mc installed from MacPorts, open a Terminal window, type sudo bash, press Enter, type your password, press Enter, and then run the following (copy-paste as a whole):

grep -l bashrc /etc/profile || (echo
    echo 'test "$PS1" && test "$BASH" && . /etc/bashrc') |
    tee -a /etc/profile
echo 'type -p -a mc >/dev/null &&
    alias mc=". /opt/local/share/mc/bin/mc-wrapper.sh"' | tee -a /etc/bashrc

After doing so, open a new terminal window (or SSH connection), and it should work as expected (try it by running mc /tmp, and exiting by pressing Esc, 0, Enter).

For your information, the underlying solution (in mc-wrapper.sh) makes Midnight Commander print its current directory when it exits, and then the caller shell code does a cd command to that directory.

2010-01-16

How to regenerate the Apache SSL key and certificate on Debian Lenny

This blog post explains how to regerated the Apache 2 SSL server key and certification on Debian Lenny.

The default, self-signed certificate used by Apache 2 is called snakeoil, it's generated based on the hostname (as reported by hostname -f) to files /etc/ssl/certs/ssl-cert-snakeoil.pem and /etc/ssl/private/ssl-cert-snakeoil.key when the ssl-cert package is installed. Here is how to regenerate the key and the certificate in case the hostname is changed:

# hostname -f
# make-ssl-cert generate-default-snakeoil --force-overwrite
# ls -l /etc/ssl/certs/ssl-cert-snakeoil.pem /etc/ssl/private/ssl-cert-snakeoil.key
... (check the last-modified-time)
# /etc/init.d/apache2 restart

2010-01-08

Emulating Stackless Python using greenlet

This blog post documents why and how to emulate Stackless Python using greenlet.

Stackless Python is an extended version of the Python language (and its CPython reference implementation). New features include lightweight coroutines (called tasklets), communication primitives using message passing (called channels), manual and/or automatic coroutine scheduling, not using the C stack Python function calls, and serialization of coroutines (for reloading in another process). Stackless Python could not be implemented as a Python extension module – the core of the CPython compiler and interpreter had to be patched.

greenlet is an extension module to CPython providing coroutines and low-level (explicit) scheduling. The most important advantage of greenlet over Stackless Python is that greenlet could be implemented as a Python extension module, so the whole Python interpreter doesn't have to be recompiled in order to use greenlet. Disadvantages of greenlet include speed (Stackless Python can be 10%, 35% or 900% faster, depending on the workflow); possible memory leaks if coroutines have references to each other; and that the provided functionality is low-level (i.e. only manual coroutine scheduling, no message passing provided).

greenstackless, the Python module I've recently developed, provides most of the (high-level) Stackless Python API using greenlet, so it eliminates the disadvantage of greenlet that it is low-level. See the source code and some tests (the latter with tricky corner cases). Please note that although greenstackless is optimized a bit, it can be much slower than Stackless Python, and it also doesn't fix the memory leaks. Using greenstackless is thus not recommended in production environments; but it can be used as a temporary, drop-in replacement for Stackless Python if replacing the Python interpreter is not feasible.

Some other software that emulates Stackless using greenlet:

  • part of Concurrence: doesn't support stackless.main, tasklet.next, tasklet.prev, tasklet.insert, tasklet.remove, stackless.schedule_remove, doesn't send exceptions properly. (Because of these features missing, it doesn't pass the unit test above.)
  • part of pypy doesn't support stackless.main, tasklet.next, tasklet.prev, doesn't pass the unit test above.

For the other way round (emulating greenlet using Stackless), see greenlet_fix (source code). Although Stackless is a bit faster than greenlet, the emulation in greenlet_tix makes it about about 20% slower than native greenlet.

My original purpose for this emulation was to use gevent with Stackless and see if it becomes faster (than the default greenlet). It turned out to become slower. Then I benchmarked Concurrence (with libevent and Stackless by default), pyevent with Stackless, pyevent with greenlet, gevent (with libevent and greenlet by default), Syncless (with epoll and Stackless by default), eventlet and Tornado (using callbacks), and found out the following:

  • Syncless was the fastest in my nonscientific benchmark, which was suprising since it had a clumsy event loop implementation in pure Python.
  • Wrapping libevent's struct evbuffer in Pyrex for line buffering is slower than manual buffering.
  • Using input buffering (for readline()) is much slower than manual buffering (reading until '\n\r\n' and splitting the input by \n).
  • Providing a proper WSGI environment is much slower than ad-hoc parsing of the first line of the HTTP request.
  • Wrapping libevent's struct evbuffer and the coroutine switching in Pyrex made it about 5% faster than manual buffering.
  • Syncless does too much unnecessary communication (using Stackless channels) between a worker and a main loop. This can be simplified and made faster using stackless.schedule_remove(). So the current Syncless implementation is a dead end, it should be rewritten from scratch.

My conclusion was that in order to get the fastest coroutine-based, non-blocking, line-buffering-capable I/O library for Python, I should wrap libevent (including event registration and I/O buffering) and Stackless and the WSGI server in hand-optimized Pyrex, manually checking the .c file Pyrex generates for inefficiencies. I'll be doing so soon.

2010-01-03

Does string append run in linear time in Python?

In Python 2.x, the string append operation (string1 += string2) runs in average linear time (in the length of string2) iff there is only one reference to string1.

Example (leave exactly one of the lines in the loop body uncommented):

s = 'v' 
h = {'k': 'v'} 
for i in xrange(int(__import__('sys').argv))): 
  s += 'v'  # linear (one reference to s) 
  t = s; s += 'v'  # quadratic (two references to s) 
  h['k'] += 'v'  # quadratic (two intermediate references to h['k']) 
  s = h['k']; h['k'] = ''; s += 'W'; h['k'] = s  # linear 
  s = h.pop('k'); s += 'W'; h['k'] = s  # linear