
Standard C++ Library Design Document
------------------------------------

This is an overview of libstdc++-v3, with particular attention
to projects to be done and how they fit into the whole.

The Library
-----------

This paper is covers two major areas:

 - Features and policies not mentioned in the standard that
   the quality of the library implementation depends on, including
   extensions and "implementation-defined" features;

 - Plans for required but unimplemented library features and
   optimizations to them.

Overhead
--------

The standard defines a large library, much larger than the standard
C library.  A naive implementation would suffer substantial overhead
in compile time, executable size, and speed, rendering it unusable
in many (particularly embedded) applications.  The alternative demands
care in construction, and some compiler support, but there is no
need for library subsets.

What are the sources of this overhead?  There are three main causes:

 - The library is specified almost entirely as templates, which
   with current compilers must be included in-line, resulting in
   very slow builds as tens or hundreds of thousands of lines
   of function definitions are read for each user source file.
   Indeed, the entire SGI STL, as well as the dos Reis valarray,
   are provided purely as header files, largely for simplicity in
   porting.  Iostream/locale is as large again.

 - The library is very flexible, specifying a multitude of hooks
   where users can insert their own code in place of the default.
   When these hooks are not used, any time and code expended to
   support that flexibility is wasted.  Some of the flexibility
   comes from virtual functions which current linkers tend to add
   to the executable file even when they cannot be called.

 - The library is specified to use a language feature, exceptions,
   which imposes a run time and code space cost to handle the
   possibility of exceptions even when they are not used.

What can be done to eliminate this overhead?  A variety of coding
techniques, and compiler, linker and library improvements and
extensions may be used, as covered below.  Most are not difficult.

Overhead: Compilation Time
--------------------------

Providing "ready-instantiated" template code in object code archives
allows us to avoid generating and optimizing template instantiations
for each file which uses them.  However, the number of such instantiations
that are useful to provide is limited, and anyway this is not enough
to minimize compile time.  In particular, it does not reduce time spent
parsing conforming headers.

Quick header parsing depends on defining extensions or compiler
improvements.   One approach is some variation on the techniques
marketed as "pre-compiled headers".  Until that is implemented we
can put lengthy template definitions in #if guards or alternative
headers so that users can skip over the the full definitions when
they need only the ready-instantiated specializations.  (Use of these
techniques need not make code their less portable.)

The language specifies the semantics of the "export" keyword, but
the Egcs compiler does not yet support it.  When it does, problems
with large template inclusions can largely disappear, given some
library reorganization, along with the need for much of the
apparatus described above.

In the SGI STL, and in some other headers, many of the templates
are defined as "inline", either explicitly or by their placement
in class definitions, which should not be inline.  This creates
code bloat.  Fixing it will require an audit of all inline functions
defined in the library to determine which merit inlining, and moving
the rest out of line.  This is an issue mainly in chapters 23, 25,
and 27.

Overhead: Flexibility Cost
--------------------------

The library offers many places where users can specify operations
to be performed by the library in place of defaults.  Sometimes
this seems to require that the library use a more-roundabout, and
possibly slower, way to accomplish the default than would be used
otherwise.  The primary protection against this overhead is
thorough compiler optimization.

The second line of defense against this overhead is explicit
specialization.  By defining helper function templates, and writing
specialized code for the default case, overhead can be eliminated for
that case without sacrificing flexibility.  This does place a greater
load on the optimizer.

The library specifies many virtual functions which current linkers
would load even though not they are called.  A smarter linker can
choose not to link these functions.  A prototype of this work has
already been done.  Some minor improvements to the compiler and to
ld would suffice to eliminate any such overhead.

The main areas in the standard interface where user flexibility
can result in overhead are:

 - Allocators:  Containers are specified to use user-definable
   allocator types and objects, making tuning for the container
   characteristics tricky.

 - Locales: the standard specifies locale objects used to implement
   iostream operations, involving many virtual functions which use
   streambuf iterators.

 - Algorithms and containers: these may be instantiated on any type,
   frequently duplicating code for identical operations.

 - Iostreams and strings: users are permitted to use these on their
   own types, and specify the operations the stream must use on these
   types.

Note that these sources of overhead are _avoidable_.  The techniques
to avoid them are covered below.

Overhead: Expensive Language Features
-------------------------------------

The main "expensive" language feature used in the standard library
is exception support, which requires compiling in cleanup code with
static table data to index it, and linking in library code to use
the table.  For small embedded programs the amount of such library
code and table data may be thought excessive.

To implement a library which does not use exceptions directly is
not difficult given minor compiler support (to "turn off" exceptions
and ignore exception contructs), and results in no great library 
maintenance difficulties.  It mainly involves replacing code that 
"throws" with a call to a "handler" function in a separate compilation 
unit that may be replaced by the user.  The main source of exceptions 
that would be difficult to avoid is memory allocation, but users can
define their own memory allocation primitives that do not throw.

Opportunities
-------------

The template capabilities of C++ offer enormous opportunities for
optimizing common library operations, well beyond what would be
considered "eliminating overhead".  In particular, many operations
done in Glibc with macros that depend on proprietary language
extensions can be implemented in pristine Standard C++.  For example,
the chapter 25 algorithms, and even C library functions such as strchr,
can be specialized for the case of static arrays of known (small) size.

Detailed optimization opportunities are identified below where
the component where they would appear is discussed.  Of course new
opportunities will be identified during implementation.

Unimplemented Required Library Features
---------------------------------------

The standard specifies hundreds of components, grouped broadly by
chapter.

  17 general
  18 support
  19 diagnostics
  20 utilities
  21 string
  22 locale
  23 containers
  24 iterators
  25 algorithms
  26 numerics
  27 iostreams
  Annex D  backward compatibility

Anyone participating in implementation of the library should obtain
a copy of the standard.  The emphasis in the following sections is
on unimplemented features and optimization opportunities.

Chapter 17  General
-------------------

Chapter 17 concerns overall library requirements.

The standard doesn't mention threads.  A multi-thread (MT) extension
primarily affects allocator (20), string (21), locale (22), and
iostreams (27).  The common support extension for this is covered
under chapter 20.

The standard requirements on names from the C headers create a
lot of work.  Names in the C headers must be visible in the std::
and sometimes the global namespace; the names in the two scopes
must refer to the same object.  More stringent is that Koenig
lookup implies that any types specified as defined in std::
really are defined in std::.  Names optionally implemented as
macros in C cannot be macros in C++.  A mostly-correct overview
may be read at <http://www.cantrip.org/cheaders.html>.

The components identified as "mostly complete" have not been
audited for conformance.  In many cases where conformance tests
pass we have non-conforming extensions that must be wrapped in
#if guards for "pedantic" use, and in some cases renamed in a
conforming way for continued use in the implementation regardless
of conformance flags.

The STL portion of the library still depends on a header
stl/bits/stl_config.h full of #ifdef clauses.  This apparatus
should be replaced with config/install machinery.

The SGI STL defines a type_traits<> template specialized for many
types in their code, including the numeric and pointer types and
some library types, which aids in writing specializations of other
operations.  Specializations for other, non-STL, types would make
more optimizations possible.

Chapter 18  Language support
----------------------------

Headers: <limits> <new> <typeinfo> <exception>
C headers: <cstddef> <climits> <cfloat>  <cstdarg> <csetjmp>
           <ctime>   <csignal> <cstdlib> (also 21, 25, 26)

This defines the built-in exceptions, rtti, numeric_limits<>,
operator new and delete.  Much of this is provided by the
compiler in its static runtime library.

Work to do includes defining numeric_limits<> specializations
in separate files for all target architectures.  This is largely
dog work except for those members whose values are not easily
deduced from available documentation.  Also, this involves some
work in target configuration to identify the correct choice of
file to build against and to install.

<cstddef> and various other headers define the macro NULL differently
than does C.

Chapter 19  Diagnostics
-----------------------

Headers: <stdexcept>
C headers: <cassert> <cerrno>

This defines the standard exception objects, which are "mostly
complete".  Cygnus has a version, and now SGI provides a slightly
different one.  It makes little difference which we use.

The C global name "errno", which C allows to be a variable
or a macro, is required in C++ to be a macro.  For MT it must
typically result in a function call.

Chapter 20  Utilities
---------------------
Headers: <utility> <functional> <memory>
C header: <ctime> (also in 18)

SGI STL provides "mostly complete" versions of all the components
defined in this chapter.

MT affects the allocator implementation, and there are bound to
be configuration/installation choices for different users' MT
requirements.  Anyway, users will want to tune allocator options
to support different target conditions, MT or no.

The primitives used for MT implementation should be provided,
as an extension, for users' own work.

There is usually plenty of room for improvement to operators new
and delete, which need not depend on malloc.

Chapter 21  String
------------------
Headers: <string>
C headers: <cctype> <cwctype> <cstring> <cwchar> (also in 27)
           <cstdlib> (also in 18, 25, 26)

We have "mostly-complete" char_traits<> implementations.  Many of the
char_traits<char> operations might be optimized further using existing
proprietary language extensions.

We have a "mostly-complete" basic_string<> implementation.  The work
to manually instantiate char and wchar_t specializations in object
files to improve link-time behavior in incomplete, and requires some
makefile-hackery.  (Similar work is needed for chapters 22 and 27.)

The standard C type mbstate_t from <cwchar> and used in char_traits<>
must be different in C++ than in C, because in C++ the default constructor
value mbstate_t() must be the "base" or "ground" sequence state.
(Depending on resolution of a recently raised issue on the reflector,
this may become unnecessary.)

There remain some basic_string template-member functions which do not
overload properly with their non-template brethren.  The infamous hack
akin to what was done in vector<> is needed to conform to 23.1.1 para 10.

Some of the functions in <cstdlib> are different from the C version.
In particular, bsearch and sort are overloaded.  Similarly, in
<cstring>, strchr is overloaded.  The functions isupper etc.
in <cctype> typically implemented as macros in C must become
functions immediately, because they are overloaded with others
of the same name defined in <locale>.

Replacing the string iterators, which currently are simple
character pointers, with class objects would greatly increase
the safety of the client interface, and also permit a "debug"
mode in which range, ownership, and validity are rigorously
checked.  The current use of raw pointers as string iterators
is evil.

Chapter 22  Locale
------------------
Headers: <locale>
C headers: <clocale>

We have a "mostly complete" class locale, with the exception of
code for constructing named locales.  The ways that locales are
named (particularly when categories (e.g. LC_TIME, LC_COLLATE)
are different) varies among all target environments.  This code
must be written in various versions and chosen by configuration
parameters.

Members of the facets defined in <locale> are currently stubs,
with a few exceptions.  Generally, there are two sets of facets:
the base class facets (which are supposed to implement the "C"
locale) and the "byname" facets, which are supposed to read files
to determine their behavior.  The base ctype<> and collate<>
facets are "mostly complete", except that the table of bitmask
values used for "is" operations and corresponding mask values
are still defined in libio and just included/linked.  The 
num_put<>::put members for integer types are "mostly complete".

The list of locale base class facet members and byname facet
members to be implemented is lengthy, and best identified by
looking in bits/locfacets.h and bits/locfacets.tcc.

Some of the facets are more important than others.  Specifically,
the members of ctype<>, numpunct<>, num_put<>, and num_get<> facets
are used by other library facilities defined in <string>, <istream>,
and <ostream>, and the codecvt<> facet is used by basic_filebuf<>
in <fstream>, so a conforming iostream implementation depends on
these.

The "long long" type needs to be supported, but code mentioning
it should be wrapped in #if guards to allow pedantic mode compiling.

Performance of num_put<> and num_get<> depend critically on
caching computed values in ios_base objects, and on extensions
to the interface with streambufs.

Specifically: retrieving a copy of the locale object, extracting
the needed facets, and gathering data from them, for each call to
(e.g.) operator<< would be prohibitively slow.   To cache format
data for use by num_put<> and num_get<> we have a _Format_cache<>
object stored in the ios_base::pword() array.  This is constructed
and initialized lazily, and is organized purely for utility.  It
is discarded when a new locale with different facets is imbued.

Using only the public interfaces of the iterator arguments to the
facet functions would limit performance by forbidding "vector-style"
character operations.  The streambuf iterator optimizations are
described under chapter 24, but the facets can also bypass the
streambuf iterators via explicit specializations and operate directly
on the streambufs, and use extended interfaces to get direct access to
the streambuf internal buffer arrays.  These extensions are described
under chapter 27.

Unused virtual members of locale facets can be omitted, as mentioned
above, by a smart linker.

Chapter 23  Containers
----------------------
Headers: <deque> <list> <queue> <stack> <vector> <map> <set> <bitset>

All the components in chapter 23 are implemented in the SGI STL.
They are "mostly complete"; they include a large number of
nonconforming extensions which must be wrapped.  Some of these
are used internally and must be renamed or duplicated.

The SGI components are optimized for large-memory environments.  For
embedded targets, different criteria might be more appropriate.  Users
will want to be able to tune this behavior.

A lot more work is needed on factoring out common code from different
specializations to reduce code size here and in chapter 25.  The
easiest fix for this is a (perhaps somewhat tricky) compiler/ABI
improvement that allows the compiler to recognize when a specialization
depends only on the size (or other gross quality) of a template
argument, and allow the linker to share the code with similar
specializations.  In its absence, many of the algorithms and
containers can be partial-specialized at least for the case of pointers.

As an optimization, containers can specialize on the default allocator
and bypass it, or take advantage of details of its implementation.

Replacing the vector iterators, which currently are simple
element pointers, with class objects would greatly increase
the safety of the client interface, and also permit a "debug"
mode in which range, ownership, and validity are rigorously
checked.  The current use of pointers for iterators is evil.

As mentioned below, the deque iterator is a good example of a
"staged" iterator type, and would benefit from specializations
of some algorithms.

Chapter 24  Iterators
---------------------
Headers: <iterator>

Standard iterators are "mostly complete", with the exception of
the stream iterators, which are not templatized.

The streambuf iterators (currently located in stl/bits/std_iterator.h,
but should be under bits/) can be rewritten to take advantage of
friendship with the streambuf implementation.

Many of the algorithms might be specialized for the streambuf
iterators, to take advantage of block-mode operations.

Matt Austern has identified opportunities where certain iterator
types, particularly including streambuf iterators and deque
iterators, have a "two-stage" quality, such that an intermediate
limit can be checked much more quickly than the true limit on
range operations.  If identified with a member of iterator_traits,
algorithms may be specialized for this case.

Chapter 25  Algorithms
----------------------
Headers: <algorithm>
C headers: <cstdlib> (also in 18, 21, 26))

The algorithms are "mostly complete".  As mentioned above, they
are optimized for speed at the expense of code and data size.

Specializations of many of the algorithms for non-STL types would
give performance improvements, but we must use great care not to
interfere with fragile template overloading semantics for the
standard interfaces.  Particularly appealing opportunities are 
for copy and find applied to streambuf iterators.

Chapter 26  Numerics
--------------------
Headers: <complex> <valarray> <numeric>
C headers: <cmath>, <cstdlib> (also 18, 21, 25)

Numeric components: Gabriel dos Reis's valarray and Drepper's complex,
are "mostly done".   Of course optimization opportunities abound
for the numerically literate.

Chapter 27  Iostreams
---------------------
Headers: <iosfwd> <streambuf> <ios> <ostream> <istream> <iostream>
         <iomanip> <sstream> <fstream>
C headers: <cstdio> <cwchar> (also in 21)

Iostream is currently in a very incomplete state.  <iosfwd>, <iomanip>,
ios_base, and basic_ios<> are "mostly complete".  basic_streambuf<>
is well along, but basic_istream<>, basic_ostream<>, basic_iostream<>,
the standard stream objects, <sstream> and <fstream> have not been
templatized yet.

The istream and ostream operators << and >> have not been changed
to use locale primitives, sentry objects, or char_traits members.

All these templates should be manually instantiated for char and
wchar_t.

The basic_filebuf<> template is a complex beast.  It is specified to
use the locale facet codecvt<> to translate characters between native
files and the locale character encoding.  In general this involves
two buffers, one of "char" representing the file and another of
"char_type", for the stream, with codecvt<> translating.  The process
is complicated by the variable-length nature of the translation, and
the need to seek to corresponding places in the two representations.
For the case of basic_filebuf<char>, when no translation is needed,
a single buffer suffices.  A specialized filebuf can be used to reduce
code space overhead when no locale has been imbued.

Streambuf is fertile ground for optimization extensions.  An extended
interface giving iterator access to its internal buffer would be very
useful for other library components.  A "skipn" member would allow
characters examined by that interface to be skipped over.

Iostream operations (primarily operators << and >>) can take advantage
of the case where user code has not specified and alternate locale, and
bypass locale operations entirely.

The definition of the relationship between the standard streams
cout et al. and stdout et al. requires something like a "stdiobuf".

Annex D
-------
Headers: <strstream>

Annex D defines many non-library features, and many minor
modifications to various headers, and a complete header.
It is "mostly done", though the libstdc++-2 <strstream>
header has not been checked to verify that it matches the
draft in those details that were clarified by the committee.

We still need to wrap all the deprecated features in #if guards
so that pedantic compile modes can detect their use.

Nonstandard Extensions
----------------------
Headers: <iostream.h> <hash> <rbtree> <pthread_alloc> <stdiobuf>

User code has come to depend on a variety of nonstandard components
that we must not omit.  Much of this code can be adopted from
libstdc++-v2 or from the SGI STL.

