Frequently Asked Questions (with answers!) about LinuxThreads

--------------------------------------------------

Q1: Which version of the C library should I use with LinuxThreads?

A: Most if not all Linux distributions come with libc version 5,
maintained by H.J.Lu.  For LinuxThreads to work properly, you must use
either libc 5.2.18 or libc 5.4.12 or later.  Avoid 5.3.12 and 5.4.7:
these have problems with the per-thread errno variable.

Unfortunately, many popular Linux distributions (e.g. RedHat 4.1) come
with libc 5.3.12 preinstalled -- the one that does not work with
LinuxThreads.  Fortunately, you can often find pre-packaged binaries
of more recent versions of libc for these distributions.  In the case
of RedHat 4.1, there is a RPM package for libc-5.4.23 in the "contrib"
area of RedHat FTP sites.

--------------------------------------------------

Q2: What about glibc 2, a.k.a. libc 6?

A: It's the next generation libc for Linux, developed by Ulrich
Drepper and other FSF collaborators.  glibc 2 offers much better
support for threads than libc 5.  Indeed, thread support was
planned from the very early stages of glibc 2, while it's a
last-minute addition to libc 5.  glibc 2 actually comes with a
specially adapted version of LinuxThreads, which you can drop in the
glibc 2 sources as an add-on package.  Just make sure to get glibc
2.0.1 or later, since glibc 2.0 has a serious problem with threads
(thread creation crashes).

--------------------------------------------------

Q3: So, should I switch to glibc 2, or stay with a recent libc 5?

A: glibc 2 will eventually provide a better environment for serious
thread programming.  But it is currently at the beta-test stage, and
probably not yet ready for the general public.  So, if you feel ready
to beta-test, switch to glibc 2.  Otherwise, stay with libc 5.4.x.

--------------------------------------------------

Q4: Where can I find glibc 2?

A: On prep.ai.mit.edu and its many, many mirrors around the world.
See http://www.gnu.org/order/ftp.html for a list of mirrors.

--------------------------------------------------

Q5: When I compile LinuxThreads, I run into problems in file libc_r/dirent.c:
        libc_r/dirent.c:94: structure has no member named `dd_lock'

A: I haven't actually seen this problem, but several users reported it.
My understanding is that something is wrong in the include files of
your Linux installation (/usr/include/*). Make sure you're using a
supported version of the C library. (See above, "Which version of the
C library should I use with LinuxThreads?").

--------------------------------------------------

Q6: When I compile LinuxThreads, I run into problems with
/usr/include/sched.h: there are several occurrences of `_p' that the C
compiler does not understand.

A: Yes, /usr/include/sched.h that comes with libc 5.3.12 is broken.
Replace it with the sched.h file contained in the LinuxThreads distribution.
But really you should not be using libc 5.3.12 with LinuxThreads!
(See Q1 above.)

--------------------------------------------------

Q7: When I'm running a program that creates N threads, top or ps
display N+2 processes that are running my program. What do all these
processes correspond to?

A: Due to the general "one process per thread" model, there's one
process for the initial thread and N processes for the threads it
created using pthread_create.  That leaves one process unaccounted for.
That extra process corresponds to the "thread manager" thread, a
thread created internally by LinuxThreads to handle thread creation
and thread termination.  This extra thread is asleep most of the time.

--------------------------------------------------

Q8: My program does fdopen() on a file descriptor opened on a pipe.
When I link it with LinuxThreads, fdopen() always returns NULL!

A: You're using one of the buggy versions of libc (5.3.12, 5.4.7., etc).
See Q1 above.

--------------------------------------------------

Q9: My program crashes the first time it calls pthread_create() !

A: You wouldn't be using glibc 2.0, by any chance?  That's a known bug
with glibc 2.0.  Please upgrade to 2.0.1 or later.

--------------------------------------------------

Q10: I'd like to be informed of future developments on
LinuxThreads. Is there a mailing list for this purpose?

A: I post LinuxThreads-related announcements on the newsgroup
comp.os.linux.announce, and also on the mailing list
linux-threads@magenet.com.  You can subscribe to the latter by writing
majordomo@magenet.com.

--------------------------------------------------

Q11: What are good places for discussing LinuxThreads?

A: For questions about programming with POSIX threads in general, use
the newsgroup comp.programming.threads.  Be sure you read the FAQ
        http://www.serpentine.com/~bos/threads-faq/
before you post.

For Linux-specific questions, use comp.os.linux.development.{apps,kernel}.
c.o.l.d.kernel is especially appropriate for questions relative to the
interface between the kernel and LinuxThreads.

Very specific LinuxThreads questions, and in particular everything
that looks like a potential bug in LinuxThreads, should be mailed
directly to me (Xavier.Leroy@inria.fr).  Before mailing me, make sure
that your question is not answered in this FAQ.

--------------------------------------------------

Q12: What are good books and other sources of information on POSIX
threads?

The FAQ for comp.programming.threads lists some good books:
        http://www.serpentine.com/~bos/threads-faq/

There are also some online tutorials. Follow the links from the
LinuxThreads web page:
        http://pauillac.inria.fr/~xleroy/linuxthreads

--------------------------------------------------

Q13: I'd like to read the POSIX 1003.1c standard. Is it available
online?

A: Unfortunately, no.  POSIX standards are copyrighted by IEEE, and
IEEE does not distribute them freely.  You can buy paper copies from
IEEE, but the price is fairly high ($120 or so). If you disagree with
this policy and you're an IEEE member, be sure to let them know.

On the other hand, you probably don't want to read the standard.  It's
hard to read and targeted to readers who already know threads inside-out.
A good book on POSIX threads provides the same information in a much
more readable form.

If you have access to a machine running Solaris 2.5 or later, its
manual pages for the pthread_* functions are a fairly good rendition
of the POSIX 1003.1c standard.

--------------------------------------------------

Q14: Where is pthread_yield() ? How comes LinuxThreads does not
implement it?

A: Because it's not part of the (final) POSIX 1003.1c standard.
Several drafts of the standard contained pthread_yield(), but then the
POSIX guys discovered it was redundant with sched_yield() and dropped it.
So, just use sched_yield() instead.

--------------------------------------------------

Q16: I've found some type errors in <pthread.h>.  For instance, the
second argument to pthread_create() should be a pthread_attr_t,
not a pthread_attr_t *. Also, didn't you forget to declare
pthread_attr_default?

A: No, I didn't.  What you're describing is draft 4 of the POSIX
standard, which is used in DCE threads.  LinuxThreads conforms to the
final standard.  Even though the functions have the same names as in
draft 4 and DCE, their calling conventions are slightly different.  In
particular, attributes are passed by reference, not by value, and
default attributes are denoted by the NULL pointer.  Since draft 4/DCE
will eventually disappear, you'd better port your program to use the
standard interface.

--------------------------------------------------

Q17: I'm porting an application from Solaris and I have to rename all
thread functions from thr_blah to pthread_blah.  This is very
annoying.  Why did you change all the function names?

A: POSIX did it.  The thr_* functions correspond to Solaris threads, a
proprietary thread interface that you'll find only under Solaris.  The
pthread_* functions correspond to POSIX threads, an international
standard available for many, many platforms.  Even Solaris 2.5 and
later support the POSIX threads interface.  So, do yourself a favor
and rewrite your code to use POSIX threads: this way, it will run
unchanged under Linux, Solaris, and quite a lot of other platforms.

--------------------------------------------------

Q18: How can I suspend and resume a thread from another thread?
Solaris has the thr_suspend() and thr_resume() functions to do that;
why don't you?

A: The POSIX standard provides *no* mechanism by which a thread A can
suspend the execution of another thread B, without cooperation from B.
The only way to implement a suspend/restart mechanism is to have B
check periodically some global variable for a suspend request
and then suspend itself on a condition variable, which another thread
can signal later to restart B.

Notice that thr_suspend() is inherently dangerous and prone to race
conditions.  For one thing, there is no control on where the target
thread stops: it can very well be stopped in the middle of a critical
section, while holding mutexes.  Also, there is no guarantee on when
the target thread will actually stop.  For these reasons, you'd be
much better off using mutexes and conditions instead.  The only
situations that really require the ability to suspend a thread are
debuggers and some kind of garbage collectors.

If you really must suspend a thread in LinuxThreads, you can send it a
SIGSTOP signal with pthread_kill. Send SIGCONT for restarting it.
Beware, this is specific to LinuxThreads and entirely non-portable.
Indeed, a truly conformant POSIX threads implementation will stop all
threads when one thread receives the SIGSTOP signal!

--------------------------------------------------

Q19: Are there C++ wrappers for LinuxThreads?

A: Douglas Schmidt's ACE library contains, among a lot of other
things, C++ wrappers for LinuxThreads and quite a number of other
thread libraries.  Check out http://www.cs.wustl.edu/~schmidt/ACE.html

--------------------------------------------------

Q20: I'm trying to use LinuxThreads from a C++ program, and the compiler
complains about the third argument to pthread_create() !

A: You're probably trying to pass a class member function or some
other C++ thing as third argument to pthread_create().  Recall that
pthread_create() is a C function, and it must be passed a C function
as third argument.

--------------------------------------------------

Q21: Can I debug LinuxThreads program using gdb?

A: Essentially, no.  gdb is basically not aware of the threads.  It
will let you debug the main thread, and also inspect the global state,
but you won't have any control over the other threads.  Worse, you
can't put any breakpoint anywhere in the code: if a thread other than
the main thread hits the breakpoint, it will just crash!

For running gdb on the main thread, you need to instruct gdb to ignore
the signals used by LinuxThreads. Just do:

        handle SIGUSR1 nostop pass noprint
        handle SIGUSR2 nostop pass noprint

--------------------------------------------------

Q22: What about attaching to a running thread using the "attach" command
of gdb?

A: For reasons I don't fully understand, this does not work.

--------------------------------------------------

Q23: But I know gdb supports threads on some platforms! Why not on
Linux?

A: You're correct that gdb has some built-in support for threads, in
particular the IRIX "sprocs" model, which is a "one thread = one process"
model fairly close to LinuxThreads.  But gdb under IRIX uses ioctls on
/proc to control debugged processes, while under Linux it uses the
traditional ptrace(). The support for threads is built in the /proc
interface, but some work remains to be done to have it in the ptrace()
interface.  In summary, it should not be impossible to get gdb to work
with LinuxThreads, but it's definitely not trivial.

--------------------------------------------------

Q24: OK, I'll do post-mortem debugging, then.  But gdb cannot read core
files generated by a multithreaded program!  What happens?

A: When one thread dies on a signal, possibly dumping core, all other
threads are sent the same signal.  This is required behavior according
to the POSIX standard.  Unfortunately, this causes all threads to dump
core at the same time in the same file, resulting in a garbled core
file.

--------------------------------------------------

Q25: How can I debug multithreaded programs, then?

A: Assertions and printf() are your best friends.  Try to debug
sequential parts in a single-threaded program first.  Then, put
printf() statements all over the place to get execution traces.
Also, check invariants often with the assert() macro.  In truth,
there is no other effective way (save for a full formal proof of your
program) to track down concurrency bugs.  Debuggers are not really
effective for concurrency problems, because they disrupt program
execution too much.

--------------------------------------------------

Q26: LinuxThreads does not implement pthread_attr_setstacksize() nor
pthread_attr_setstackaddr().  Why?

A: These two functions are part of optional components of the POSIX
standard, meaning that portable applications should test for the
"feature test" macros _POSIX_THREAD_ATTR_STACKSIZE and
_POSIX_THREAD_ATTR_STACKADDR (respectively) before using these
functions.

pthread_attr_setstacksize() lets the programmer specify the maximum
stack size for a thread.  In LinuxThreads, stacks start small (4k) and
grow on demand to a fairly large limit (2M), which cannot be modified
on a per-thread basis for architectural reasons.  Hence there is
really no need to specify any stack size yourself: the system does the
right thing all by itself.  Besides, there is no portable way to
estimate the stack requirements of a thread, so setting the stack size
is pretty useless anyway.

pthread_attr_setstackaddr() is even more questionable: it lets users
specify the stack location for a thread.  Again, LinuxThreads takes
care of that for you.  Why you would ever need to set the stack
address escapes me.

--------------------------------------------------

Q27: LinuxThreads does not suppor the PTHREAD_SCOPE_PROCESS value of
the "contentionscope" attribute.  Why?

A: With a "one-to-one" model, as in LinuxThreads (one kernel execution
context per thread), there is only one scheduler for all processes and
all threads on the system.  So, there is no way to obtain the behavior of
PTHREAD_SCOPE_PROCESS.

--------------------------------------------------

Q28: LinuxThreads does not implement process-shared mutexes, conditions,
and semaphores. Why?

A: This is another optional component of the POSIX standard.  Portable
applications should test _POSIX_THREAD_PROCESS_SHARED before using
this facility.

The goal of this extension is to allow different processes (with
different address spaces) to synchronize through mutexes, conditions
or semaphores allocated in shared memory (either SVR4 shared memory
segments or mmap()ed files).

The reason why this does not work in LinuxThreads is that mutexes,
conditions, and semaphores are not self-contained: their waiting
queues contain pointers to linked lists of thread descriptors, and
these pointers are meaningful only in one address space.

Matt Messier and I spent a significant amount of time trying to design a
suitable mechanism for sharing waiting queues between processes.  We
came up with several solutions that combined two of the following
three desirable features, but none that combines all three:
        - allow sharing between processes having different UIDs
        - supports cancellation
        - supports pthread_cond_timedwait
We concluded that kernel support is required to share mutexes,
conditions and semaphores between processes.  That's one place where
Linus Torvalds's intuition that "all we need in the kernel is clone()"
fails.

Until suitable kernel support is available, you'd better use
traditional interprocess communications to synchronize different
processes: System V semaphores and message queues, or pipes, or sockets.

--------------------------------------------------

Q29: You say all multithreaded code must be compiled with _REENTRANT defined.
What difference does it make?

A: It affects include files in three ways:

1- The include files define prototypes for the reentrant variants of
some of the standard library functions, e.g. gethostbyname_r() as a
reentrant equivalent to gethostbyname().

2- If _REENTRANT is defined, some <stdio.h> functions are no longer
defined as macros, e.g. getc() and putc(). In a multithreaded program,
stdio functions require additional locking, which the macros don't
perform, so we must call functions instead.

3- More importantly, <errno.h> redefines errno when _REENTRANT is
defined, so that errno refers to the thread-specific errno location
rather than the global errno variable.  This is achieved by the
following #define in <errno.h>:

        #define errno (*(__errno_location()))

which causes each reference to errno to call the __errno_location()
function for obtaining the location where error codes are stored.
libc provides a default definition of __errno_location() that always
returns &errno (the address of the global errno variable). Thus, for
programs not linked with LinuxThreads, defining _REENTRANT makes no
difference w.r.t. errno processing.  But LinuxThreads redefines
__errno_location() to return a location in the thread descriptor
reserved for holding the current value of errno for the calling
thread.  Thus, each thread operates on a different errno location.

--------------------------------------------------

Q30: Why is it so important that each thread has its own errno variable?

A: If all threads were to store error codes in the same, global errno
variable, then the value of errno after a system call or library
function returns would be unpredictable:  between the time a system
call stores its error code in the global errno and your code inspects
errno to see which error occurred, another thread might have stored
another error code in the same errno location. 

--------------------------------------------------

Q31: What happens if I link LinuxThreads with code *not* compiled with
-D_REENTRANT ?

A: Lots of trouble.  If the code uses getc() or putc(), it will
perform I/O without proper interlocking of the stdio buffers; this can
cause lost output, duplicate output, or just crash other stdio
functions.  If the code consults errno, it will get back the wrong
error code.  The following code fragment is a typical example:

        do {
          r = read(fd, buf, n);
          if (r == -1) {
            if (errno == EINTR)   /* an error we can handle */
              continue;
            else {                /* other errors are fatal */
              perror("read failed");
              exit(100);
            }
          }
        } while (...);

Assume this code is not compiled with -D_REENTRANT, and linked with
LinuxThreads.  At run-time, read() is interrupted.  Since the C
library was compiled with -D_REENTRANT, read() stores its error code
in the location pointed to by __errno_location(), which is the
thread-local errno variable.  Then, the code above sees that read()
returns -1 and looks up errno.  Since _REENTRANT is not defined, the
reference to errno accesses the global errno variable, which is most
likely 0.  Hence the code concludes that it cannot handle the error
and stops.

--------------------------------------------------

Q32: My program uses both Xlib and LinuxThreads.  It stops very early
with an "Xlib: unknown 0 error" message.  What does this mean?

A: That's a prime example of the errno problem described above (Q31).
The binaries for Xlib you're using have not been compiled with
-D_REENTRANT.  It happens Xlib contains a piece of code very much like the
one in question Q31.  So, your Xlib fetches the error code from the
wrong errno location and concludes that an error it cannot handle
occurred.

--------------------------------------------------

Q33: So, what can I do to build a multithreaded X Windows client?

A: You need to recompile the X libraries with multithreading options set.
They contain optional support for multithreading; it's just that all
binary distributions for Linux were built without this support.  See
the file README.Xfree3.2 in the LinuxThreads distribution for patches
and info on how to compile thread-safe X libraries from the Xfree3.2
distribution.  The Xfree3.2 sources are readily available in most
Linux distributions, e.g. as a source RPM for RedHat.  Be warned,
however, that X Windows is a huge system, and recompiling even just
the libraries takes a lot of time and disk space.

--------------------------------------------------

Q34: This is a lot of work. Don't you have precompiled thread-safe X
libraries that you could distribute?

A: No, I don't.  Sorry.  But you could approach the maintainers of
your Linux distribution to see if they would be willing to provide
thread-safe X libraries.

--------------------------------------------------

Q35: Can I use library FOO in a multithreaded program?

A: Most libraries cannot be used "as is" in a multithreaded program.
For one thing, they are not necessarily thread-safe: calling
simultaneously two functions of the library from two threads might not
work, due to internal use of global variables and the like.  Second,
the libraries must have been compiled with -D_REENTRANT to avoid
the errno problems explained in Q31.

--------------------------------------------------

Q36: What if I make sure that only one thread calls functions in these
libraries?

A: This avoids problems with the library not being thread-safe.  But
you're still vulnerable to errno problems.  At the very least, a
recompile of the library with -D_REENTRANT is needed.

--------------------------------------------------

Q37: LinuxThreads uses the signals SIGUSR1 and SIGUSR2, but SVGAlib and
other libraries also use those two signals. How can I use these
libraries with LinuxThreads?

A: There is no general solution to signal conflicts, but you can try
to modify either LinuxThreads or the other library so that it uses
different signals.  For LinuxThreads, the only file you need to change
is internals.h, more specifically the two lines:

        #define PTHREAD_SIG_RESTART SIGUSR1
        #define PTHREAD_SIG_CANCEL SIGUSR2

For instance, the following seems to work:

        #define PTHREAD_SIG_RESTART SIGSTKFLT
        #define PTHREAD_SIG_CANCEL SIGUNUSED

Both SIGSTKFLT and SIGUNUSED appear to be currently unused in the kernel.

--------------------------------------------------

Q38: I looked at the LinuxThreads sources, and I saw quite a lot of
spinlocks and busy-waiting loops to acquire these spinlocks.  Isn't
this a big waste of CPU time?

A: Look more carefully.  Spinlocks are used internally to protect
LinuxThreads's data structures, but these locks are held for very
short periods of time: 10 instructions or so.  The probability that a
thread has to loop busy-waiting on a taken spinlock for more than,
say, 100 cycles is very, very low.  When a thread needs to wait on a
mutex, condition, or semaphore, it actually puts itself on a waiting
queue, then suspends on a signal, consuming no CPU time at all.  The
thread will later be restarted by sending it a signal when the state
of the mutex, condition, or semaphore changes.

--------------------------------------------------

Q39: I don't understand how signal handling works in LinuxThreads.  Can
you explain?

A: Signal handling in the POSIX threads standard is fairly complex,
and, to make things worse, LinuxThreads is not 100% compliant.

The first thing to remember is that signal handlers are shared between
all threads: when a thread calls sigaction(), it sets how the signal
is handled not only for itself, but for all other threads in the
program as well.  On the other hand, signal masks are per-thread: each
thread chooses which signals it blocks independently of others.

According to the POSIX standard, signals are delivered to the process:
the set of all threads as a whole.  The systems picks any thread that
does not currently block the signal to execute the associated signal
handler.  In LinuxThreads, each thread is actually an independent
process, with a distinct PID; so, signals are always delivered to one
thread in particular, based on the PID.  This is to be viewed as a
LinuxThreads bug, but I don't see any way to implement the POSIX
behavior without kernel support.

--------------------------------------------------

Q40: How shall I go about mixing signals and threads in my program?

A: The less you mix them, the better.  Notice that all pthread_*
functions are not async-signal safe, meaning that you should not call
them from signal handlers.  This recommendation is not to be taken
lightly: your program can deadlock if you call a pthread_* function
from a signal handler!

The only sensible things you can do from a signal handler is set a
global flag, or call sem_post on a semaphore, to record the delivery
of the signal.  The remainder of the program can then either poll the
global flag, or use sem_wait() and sem_trywait() on the semaphore.

Another option is to do nothing in the signal handler, and dedicate
one thread (preferably the initial thread) to wait synchronously for
signals, using sigwait(), and send messages to the other threads
accordingly.

--------------------------------------------------

Q41: When one thread is blocked in sigwait(), other threads no longer
receive the signals sigwait() is waiting for!  What happens?

A: It's a bug in the LinuxThreads implementation of sigwait().
Basically, it installs signal handlers on all signals waited for, in
order to record which signal was received.  Since signal handlers are
shared with the other threads, this temporarily deactivates any signal
handlers you might have previously installed on these signals.  One
day, sigwait() will be implemented in the kernel, along with others
POSIX 1003.1b extensions, and this problem will go away.

