The (B)Leading Edge :

"The (B)Leading Edge"
STL Gotcha's

Jack Reeves
©The C++ Report

Before I go into my topic for this column, I want to backtrack briefly with a few more comments about my previous column on Debugging with Exceptions[1]. After writing it, but before it was printed, I received an e-mail question from Steve Creek. Steve had observed that he often had a lot of common code in his exception handlers and wanted to know what I thought of the idea of separating the common code from the exception specific code with a technique like the following:

try {

// exception prone code here

catch (...) {

// common clean-up code here

try {

throw; // rethrow the exception here

catch (ExceptA& ex_a) {

// deal with exception type A here

catch (ExceptB& ex_b) {

// deal with exception type B here

catch (...) {

// deal with anything else here

throw;

This code catches the exception in a catch-all clause, performs the common clean-up, then rethrows the exception so that it can be caught by a specific handler. As I told Steve, I am firmly in favor of reducing the writing and maintaining of redundant code, so I did not see anything wrong with the idea. In fact, I used this same technique in my column on exception specifications[2] to allow a user defined unexpected_handler() to obtain some information about the exception that had violated the exception specification. I did suggest that in my own coding I would prefer turning things inside out as in the following:

try {

// exception prone code here

}

catch (...) {

// common clean-up code here

throw; // rethrow the exception

}

catch (ExceptA& ex_a) {

// deal with exception type A here

}

catch (ExceptB& ex_b) {

// deal with exception type B here

}

However, I noted that this seemed to be mostly a matter of style, and that I could not see any great advantage of one style over the other. In both cases, the common clean-up code is in the catch-all handler, and then the exception is rethrown to find a specific handler for the exception's type.

Discussing this with Steve made me realize that one of my examples in the previous column[1] was more restricted than it needed to be. In that column, I showed a central error handling function which can invoke an optional user defined error handler:

void

error(exception& ex)

{

if (error_handler) {

(*error_handler)(ex);

}

throw;

}

The central error handler is called from a catch clause within an helper function that contains a code fragment like this:

try {

throw logic_error(what);

} catch (exception& ex) {

error(ex);

}

As I explained then, you have to throw the exception before you call the central error function so that an exception of the correct type will already exist should the error handling function need to rethrow it. However, since the exception object already exists, there is no real need to pass it as a parameter to the central error handling function or to the user defined error handler -- if either function needs to find out the real type of the exception it can just rethrow the exception within an internal try-block. The error() function can even be eliminated, and the helper function(s) simplified to:

try {

throw logic_error(what);

} catch (...) {

if (error_handler)

(*error_handler)();

throw;

}

In essence, this technique uses the exception mechanism as a substitute for run-time-type-identification. I missed this idea, even though I noted that the user supplied error handler function could rethrow the exception in order to "handle" it.

After a couple of days of thought, I concluded that I would probably stay with my original style, which passes the exception object (by reference) as a parameter to the central error handling function. This way, the type of the exception can be obtained directly via RTTI if it is desired, without rethrowing the exception. The later course of action remains available, if needed.

Then along comes Steve with another observation: it seems that in testing his ideas using Microsoft's Visual C++ v4.2 compiler, he discovered that it did not work quite like either of us expected. Further, we discovered that we each had different expectations of how it should work based upon our reading of the Draft Working Paper[3]. Consider this code:

1: try {

2: throw runtime_error();

3: }

4: catch (...) {

5: try {

6: throw;

7: }

8: catch (runtime_error& ex) {

9: // empty

10: }

11: throw;

12: }

The question is: what behavior would you expect at the final throw statement (line 11). It seems that the MSVC++ compiler destroys the exception object when the inner catch clause exits at line 10. The throw; statement at line 11 doesn't have an exception available to rethrow. If the throw; statement at line 11 is removed, MSVC++ attempts to destroy the exception object a second time at the end of the outer catch clause.

Symantec's SC++ v7.21 exhibits similar behavior. It also destroys the exception object upon completion of the inner catch clause, but it seems perfectly happy to rethrow the no longer existing object on line 11. Like MSVC++, if the throw; statement at line 11 is removed, SC++ try's to destroy the exception object again at the end of the outer catch clause.

This is in contrast to the two Unix C++ compilers that I tried this on. Both the HPUX compiler (a Cfront based compiler) and the SunSoft SPARCWorks compiler worked the same. They do not destroy the exception object at the conclusion of the inner catch clause. The rethrow on line 11 does rethrow the exception. Without the throw; statement on line 11, the exception object is destroyed when the outer catch clause exits on line 12.

Apparently, Steve was expecting the Unix behavior where the exception object would exist (and could be rethrown) throughout the scope of the outer handler. On the other hand, I was expecting something pretty close to what Microsoft implemented where the exception would cease to exist whenever any handler exited without rethrowing it.

In the grand scheme of things, this is not a big problem for programmers. I have already presented the obvious work-around --turn the try-blocks inside out and rethrow the exception up the stack rather than down it. I am pretty sure that I would never have discovered this quirk on my own. I have gone into some detail over it as an example of the kind of details that standards committees (and compiler writers) have to struggle with. Obviously, there are at least two different interpretations of this part of the DWP (section 15.1). Since they lead to fundamentally different behavior from what appears to be a well formed program, this is a problem.

Whatever your personal peeve with the Draft C++ Standard might be (and everyone I have met has at least one), the creation of an ANSI/ISO Standard for the C++ Programming Language is a monumental achievement that has taken a tremendous amount of work by an awful lot of sharp and dedicated people. I remain impressed.

On the topic of what the DWP[3] says, I note that in the latest versions, the committee has answered several of my questions about function-try-blocks on main(). The DWP now makes it clear that in addition to all the other reasons, terminate() will be called when:

-- the construction or destruction of a non-local object with static storage duration exits using an exception.

-- the execution of a function registered with atexit() exits using an exception.

This makes it clear that a function-try-block on main() can not catch exceptions thrown during the construction of global objects, nor can it catch exceptions which occur as part of the program termination process. Furthermore, the DWP makes it clear that exit() is not called before entry to a handler of a function-try-block on main(). For all practical purposes, a function-try-block on main() is the same as on any other function, which is to say that it acts like a single try-block statement within the body of the function. This is pretty much what I expected, but it is nice to see it cleared up.

Now let me turn from exceptions to something much more interesting: the Standard Template Library. It seems that everywhere you turn these days, there is another article or a column about the STL (at least in those publications not overwhelmed with articles about Java). The STL has to be considered an outstanding programming achievement, on many different levels. One thing that has amazed me is how readily programmers, both those new to C++, and old hands, seem to have adopted the STL.

Certainly, there have been other container libraries before the STL, many quite good, and some even priced the same (at least the same as the HP reference implementation). I do not think it is just the fact that the STL has been adopted as the basis for parts of the (draft) Standard C++ Library[3], either. There is something about the lean power of the STL that appeals to the C programmer inside most C++ developers. At the same time, the extensibility of the STL's framework appeals to the software architects in most of us. Somehow, the STL gives the impression that you are not giving up anything, either in terms of efficiency, or in terms of down-the-road flexibility, by choosing to base your implementation on it. This is good stuff, to put it mildly.

When I looked over the way programmers were using the STL on the project that I am now part of, one thing did surprise me -- at least initially. Once upon a time, I developed two generic containers of my own: a linked list, and a dynamic array. For several years, I used these containers repeatedly on numerous different projects, large and small. During that time, I never consciously felt the need for anything more powerful. Therefore, when I first encountered the STL, it was the vector and list classes I expected to see used the most. Instead, I found that while vector was indeed popular, list was quite rare. By far the most popular class I encountered was map<>. At first this surprised me, but when I started thinking back over my own career, I realized that programmers have been using sets and maps since the beginning of software.

The problem has been that without direct implementations, we had to fake them. The easiest way to fake a "set" is to simply store stuff in an array and then linearly search the array to retrieve an element. When the linear search becomes unacceptable, we would sort the array and then do a binary search. A map was just two arrays, one holding a "set" of "keys", along with an index into the array of data elements. If these techniques were not sufficient, we would start looking at much more exotic solutions such as hash tables. When I looked at this way, I found several examples of some pretty sophisticated "map" structures that I have created over my career. Now that programmers have an implementation of map (and set) that they can use directly, it should not be surprising that they are finding numerous places to use them.

Another thing that I have noticed is that most programmers learn to use the STL container classes fairly rapidly, and soon start to incorporate them into their code. They also learn to use iterators rather quickly. On the other hand, many programmers seem to get stuck on the basics, and take awhile (and some prodding) to move on to using STL's algorithms. I have also noticed that most programmers do not have a very good grasp of anything more than the basics of the container classes. When I originally made this note, it was late spring 1995. At that time, there were not yet any really good books in publication about how to use the STL, so this is hardly surprising. Hopefully, with the advent of several books on the subject, and numerous colleges offering classes on the STL, this situation is better than it use to be. Nevertheless, there are aspects of the STL which I fear most programmers, even those that have used the STL for some time, may not be aware of.

One of the great things about good libraries such as the STL is that most of the time, most programmers do not have to concern themselves with knowing the details of how the library is implemented. Those programmers that do invest time and effort in learning something about the internals may not find the information all that pertinent to their every day activities. Nevertheless, spending some time to become familiar with the lesser known aspects of the STL can prove valuable. The programmer who knows the details can often choose between what would otherwise appear to be several equally valid approaches to doing the same thing, and fine tune an implementation to have just the performance and memory characteristics desired. On the other hand, the programmer without such knowledge may find themselves encountering a gotcha' that they did not know was there.

In order to make this discussion real, and hopefully cause it to stick with you, I am not going to just list things you should watch out for. Rather, I want to present several of the gotcha's that I and other programmers I know have encountered in terms of a couple of scenarios. As the saying goes, the situations actually happened, the names have been changed to protect those involved (i.e. me).

Consider: you are part of a team writing a C++ application that provides a new display interface to a data stream that comes over a communications link. One of the data items that can come over the link is a representation of an ordinary ASCII terminal screen. This consists of a number of lines of text. While the displayable portion of the text will always be less than 80 characters, the lines may contain embedded escape sequences, making their actual length indeterminate. Likewise, while the number of lines is typically less than 24, the actual number of lines can be greater than that (the display application is required to be able to scroll such a "screen"). Your job is to code the class 'Screen' which will hold one of these 'screens' after it has been read from the communications link.

Since this project is using the STL, you decide to build your screen class like so (a lot of details have been omitted):

class Screen {

typedef vector<char> Line;

typedef vector<Line> Page;

Page page;

public:

void load(DataStream& strm, int n);

...

};

It is the member function load() that we are concerned with. It receives an object of type DataStream which represents a wrapper around a low level C library. A DataStream object decodes the data elements being read from the communications link. The other parameter is the number of lines in the screen. With this information, the first implementation of load() became:

void Sreen::load(DataStream& strm, int n)

{

page.erase(page.begin(), page.end());

int len;

char* ptr;

while (n--) {

page.push_back(Line());

strm.get_string(ptr, len);

copy(ptr, ptr+len, back_inserter(page.back()));

}

This function first makes sure the vector of Lines (page) is empty. It then creates a new, empty vector of characters (the Line) and pushes it onto the end of the page. The function DataStream::get_string() has the following prototype:

get_string(char*&, int&)

It reads a string from the data stream, and returns the character pointer set to the beginning of the string in the input buffer. The int& parameter returns the length of the string. The string thus returned is copied into the Line vector.

After compiling this, and discovering that it works^*, our programmer gives himself (or herself) a pat on the back. After all, this code demonstrates a better than average knowledge of the STL. There is the use of the 'copy' algorithm, and the 'back_inserter' iterator adapter. Most importantly, by depending upon the fact that 'back()' returns a reference to the last element in the page vector, this function only makes one copy of the character data read from the communications link.

So, our programmer moves on, and after awhile so does the project. Then one day, a couple of gentlemen from the software QA department arrive at our programmer's cubical. They have that look on their faces that our programmer has come to know. It says they think they have found a problem in his code, but they are not sure, yet. Our hero steels himself to be very careful in how he answers questions (so as not to admit anything). "We have been testing the new display application," they begin, "and we were wondering: how much memory should one of those 'screen' objects take up after it gets read from the data link?" Since it has been awhile since our programmer worked on the Screen class implementation, only very faint warning bells go off at this question. Doing some rapid calculation, he figures that the typical full screen should be, on average, less than 2K. "Maybe 2 to 3K", he replies, sticking in a fudge factor for unknown overhead in the vector<> implementation.

At this point, the faces of the QA people light up with that "gotcha" look that instantly tells our programmer that he has given the wrong answer. "Would you believe that they are actually running between 80K and 100K?" they ask. No, our programmer would not believe that, but the QA guys have the evidence with them. Using 80K-100K for something that should be about 2K is fairly inelegant in its own right, but this application expected to cache the 'screens' after they were read. In a typical run, there might be several thousand of these objects loaded into memory. In fact, it was during such load testing that the QA people first picked up on the excess memory usage. They soon depart, leaving our programmer to try to figure out where all the memory is going.

Skipping a half days investigation, our programmer discovered that vector<> was the culprit. It seems that whenever an element was added to an empty vector, the vector made an initial allocation using the function call allocator.init_page_size(). It turned out that the page size on this system was 4K bytes. Thus, every line in a Screen was taking up 4K bytes, as was the vector of Lines itself. Our programmer had heard about how a vector would double the memory allocation every time it had to reallocate the internal data storage, but he had not heard about this huge initial allocation.

With a better understanding of what was going on, the problem now became: how to solve it? The first fix looked something like this:

while (n--) {

strm.get_string(ptr, len);

page.push_back(Line(len)); // added len

copy(ptr, ptr+len, back_inserter(page.back()));

}

This represents a gotcha that most programmers who work with vectors have made at one point or another. Vector<> actually has two separate personalities hiding behind a single interface. A vector can be used as a simple replacement for a dynamically allocated C++ array. Thus

X* px = new X[n];

can be replaced by

vector<X> px(n);

and as long as all access to the elements of px is via the subscript operator ([]), then the rest of the code does not have to change (except for removing the delete px; statement). On the other hand, a vector<> can be used as a dynamically expandable object. While the two modes can be mixed in the same piece of code, it does take a bit of care. In the case above, the code was switched from using vector<char> in its dynamic mode, to pre-allocating a vector<char> of a certain size. The "copy" still expects to dynamically expand the vector. The result is that it pushes the characters onto the end of the existing array, after the len null characters that were created by the call to Line(len). This problem was quickly caught and the code became:

while (n--) {

strm.get_string(ptr, len);

page.push_back(Line(len));

copy(ptr, ptr+len, page.back().begin()));

}

This seems to work, and it partially solves the memory problem (page is still being allocated a 4K chunk of memory). It introduces a performance problem, however. The original design was done the way it was in order to avoid copying the character data more than once. The code above replaces one pass over the data with four. There is the initial creation of Line(len) which makes len copies of char() into a temporary. Then this temporary is copy constructed into the page vector, which makes another len copies. The temporary is then destroyed, which requires a pass of len calls to char::~char() (which does nothing, but the loop still has to execute). Finally, the data in the buffer is copied into the Line vector in page. This version was not seen as a positive change.

Therefore, our programmer continued to search for other solutions. About this point, the vector reserve() function was discovered. reserve() is only a member function of the vector<> class (and string, but more on that later). As a result, it does not show up in any of the tables that summarize the interfaces of STL containers. Nor do I see it described in the tutorials on the STL that I have read. Typically, you have to discover it by reading the detailed reference for the vector class itself, or by reading the vector header file. In any case, reserve is a function which lets the user of a vector pre-allocate storage without having the vector automatically fill up that storage. Armed with this, Screen::load() became:

void Screen::load(DataStream& strm, int n)

{

page.erase(page.begin(), page.end());

page.reserve(n); // pre-allocate space for n Lines

int len;

char* ptr;

while (n--) {

strm.get_string(ptr, len);

page.push_back(Line());

page.back().reserve(len); // pre-allocate space

copy(ptr, ptr+len, back_inserter(page.back()));

}

This seemed to solve the problem, both for the Line vectors and for the Page vector itself.

Now, let's step back and make a few observations. Ignoring for the moment the Page vector, most of the memory problems with the Line vector would not have occurred had there not been a paranoia about making duplicate copies of the data. Consider for example:

while (n--) {

strm.get_string(ptr, len);

Line line;

copy(ptr, ptr+len, back_inserter(line));

page.push_back(line);

}

This creates a 4K vector (line) on each iteration through the loop, but the copy constructor invoked when the vector is added to the end of page allocates only enough memory to hold the used portion of line. This version makes three passes over the data, instead of only one (two copies and the destructor pass), but it leaves the page vector containing minimally allocated lines.

The real solution to this problem comes from knowing that all of vector's oddball memory allocation algorithms, both the page size allocation, and the memory doubling, occur only when a single data element is added to a vector that needs to be (re)allocated. This occurs only in push_back() and the version of insert() that takes a single data element. All other vector members, including the constructors, the assignment operator, and the versions of insert() that take either ranges, or replication counts, allocate only enough memory to hold the new elements being added to the vector. The following works just fine:

while (n--) {

strm.get_string(ptr, len);

page.push_back(Line());

page.back().insert(page.back().end(), ptr, ptr+len);

}

The moral of this little tale (in case you missed it): while a basic knowledge of the STL will get you by, a more thorough knowledge is worth the effort to obtain. Besides an understanding of the runtime performance characteristics of the various containers, your knowledge toolkit should also include information about the memory "footprint" of each container, and under what circumstances these characteristics hold. With such knowledge it becomes possible to write something like the above which has just the right performance both in terms of the number of copies made of the data and the amount of memory used to hold it.

Let's look at another example. In another part of the system, (or maybe it was another system altogether, it doesn't matter) there was a requirement for a simple message queue. This queue was needed to act as a buffer between a message producer and a message consumer. Normally, the message consumer could be expected to keep up with the producer without any problems. There were certain times during the day, however, when the production of messages would go from being a steady trickle, to being a flood. This message rush-hour could produce a backlog of 2000 - 3000 messages. Hence the need for the intermediate queue.

The designer of this element looked to the STL for a solution. As you probably know, there is no queue container in the STL. There is, however, a queue adapter. You can instantiate this adapter using either a deque container, or a list container. The question that needed to be answered was "Which is better, a queue based upon a deque, or one based upon a list?"

In the HP implementation of the STL, a deque is implemented as a collection of fixed sized buffers which actually hold the data. Another structure, the buffer map, holds pointers to the currently allocated data buffers (see Figure 1). The size of a buffer is based upon our old friend, the allocator page size. In the case of our message queue, the data is a pointer to a polymorphic Message object, so each buffer could contain 1024 such "messages".

This was more than enough space to comfortably handle the steady state condition of a few messages a second. In fact, it almost seemed like too much overhead for the normal condition. This was balanced, however, by the fact that during peak loads, the system would only have to allocate up to 3 additional buffers in order to accommodate the expected 3000+ message backlog. The buffer map is also initially allocated based upon the allocator page size, so it would contain slots for 1024 buffer pointers (in this implementation), more than enough under all conditions. Another key aspect that was attractive about the deque based solution was that the memory would be released as it was no longer needed.

Using a list based queue seemed to be less desirable for a number of reasons. In the first place, every Message pointer would be stored in a list node element. Since the list node contains forward and backward links for the list, each node would have over 200% overhead for the storage of every Message. In addition, the storage for a list is managed at the class level, not directly by the object itself. When the object needs to add a node to the list, it calls the get_node() function. This function checks a free list of nodes maintained at the class level. If there is a node on the free list, it returns it; if not, get_node() calls add_new_buffer(). Function add_new_buffer() allocates a chunk of memory which it then uses to parcel out new nodes. The size of this chunk of memory is determined by -- you guessed it -- the allocator page size. A very key point to keep in mind about list memory management is that once a buffer is allocated, it is not freed as long as there is any list object in existence. As list nodes are freed, they are placed on the free list, and only when all list objects have been destroyed will the node buffers be freed.

This was looked upon as a serious disadvantage for the message queue. Not only would the message rush-hour load cause over twice as many buffers to be allocated for the list based queue versus the deque (because of the 200% overhead within each list node), but once allocated, the memory would never be released as long as the application was running. As a result of this initial analysis, it seemed that the deque solution was the preferred choice by far. Thus the solution was originally implemented.

Obviously, there is a 'gotcha' in this (or I wouldn't be writing it). Do you see it? The problem occurs in the original specification of the message flow. Under normal circumstances, the message consumer has no problem keeping up with the message producer, maybe occasionally falling behind only a message or two. Under these circumstances, one of the advantages of the deque, the fact that it releases unused memory, becomes a liability. Whenever a deque data buffer is emptied, it is deallocated. When the deque itself becomes empty, the buffer map will also be deallocated. Thus, using a queue based upon a deque in this scenario means that the allocator is being repeatedly called to allocate a new buffer and map, and then called again to deallocate both. This resulted in a significant performance hit. While the list based solution uses more memory, it actually does memory allocation very, very rarely.

The moral here: not only is it necessary to understand the basic memory management characteristics of the container you are using, but it is also necessary to understand the implications of those characteristics for the specific problem at hand. Caveat emptor.

One final note on this problem: once you do the analysis, several other solutions come to mind. A custom allocator for the deque class might be created. Besides reducing the default page size to something that makes more sense for the steady state message flow condition, it could also cache at least one data buffer and a map so that the underlying memory manager was not being called all the time. This was not considered a viable possibility for this application, but might be reconsidered once better support for allocators is available under the proposed Draft C++ Standard Library version of the STL.

Another approach might be to derive a specialized version of deque from the original. The new version could change the memory management approach. Perhaps the map and the last data buffer would only be freed when the object was destroyed.

Finally, a completely new container class could be created. It might just be a queue, or it might be another type that could be used to instantiate the queue adapter. In any case, it would have a memory management model completely different from either deque or list -- one more suited to this particular task. I will be presenting a possibility along these lines in a later column. For now, simply note that the STL is quite extensible -- by design. Creating your own STL containers is neither difficult nor disapproved of.

While it is not relevant to this scenario, readers should be aware that the associative containers of the STL all have memory management schemes similar to the list container. Like list, the class does the actual memory management. Also like list, once the class has allocated a buffer of nodes, that buffer will not be released until all objects of that particular class type have been destroyed. While I do not have any stories to relate concerning the memory management characteristics of map and set, I am sure there are probably some out there.

One final scenario I consider worth repeating. In the Screen class described above, the lines were implemented as vector<char>. You may be wondering whether or not a string class could have been used instead of vector<char>. In the initial implementation of class Screen, no version of a "standard" string class was available, but by the time the memory problem had been discovered, one was available and using it was considered. Unlike the HP STL implementation, there is no widely accepted reference implementation of the proposed string class. There are a number of implementations available, however. The implementation which was used on the project in question was based upon the version in the GNU library. This is a quite good implementation, though it has a few flaws. This discussion is concerned with the approach to memory management taken by this particular implementation.

In the first place, this implementation used a reference counting scheme to avoid making unnecessary copies of the data whenever strings were copied. See Murray[4] or Meyers[5] for a description of this approach. While this reduced a string to containing a single pointer to the dynamically allocated data, it also meant that the allocated portion of each string included some 16 bytes of overhead for the reference count and the other information about the string data area itself. In addition to this overhead, the string data area was always allocated in blocks based on powers of 2. The smallest string data area was 16 characters, the next 32 characters, the next 64 characters, and so on. Even though the specification in the DWP for the string class has a reserve() function like that in the vector class, this specific implementation did not implement reserve. This meant that for the Screen application, a typical 80 character line would always allocate a 128 character data area. This ~40% overhead was deemed unacceptable for the application, and since there was no way around it, the implementation went back to using vector<char> instead.

Most of the discussion above has been based upon experiences using what has come to be called the Hewlett-Packard reference implementation of the STL. Since practically all existing implementations of the STL started out based upon the HP implementation, that seemed like a relevant place to start the discussion. The DWP[3] is somewhat different from the HP STL specification however, so a reasonable question is "How much of all of this will apply to a "Standard" version of the C++ library, when such becomes available?"

The "standard" answer has to be "I really do not know. The DWP necessarily can only specify the functionality that the library must provide, it can not impose requirements on how that functionality is implemented, other than in the very broadest sense. Therefore, you might reasonably expect different implementations of a Standard C++ Library to take different approaches and thus have quite different characteristics."

With that caveat out of the way, I will now return to speculating about how I think things will work. While the caveat above is true in a technical sense, certain practicalities remain. The first is that the HP reference STL implementation is just that, the reference that the C++ committee used when specifying the requirements for the corresponding portions of the (draft) Standard C++ Library. Many companies are going to stick with the basic HP implementation, just because it is a known quantity. Furthermore, even those companies which decide to undertake a complete rewrite can not afford to deviate far from the performance of the HP implementation. There is already a considerable body of legacy code that uses the HP implementation and people will not tolerate major changes in the performance characteristics of that code base.

Nevertheless, there are a number of aspects of the current DWP specification that differ from the current HP implementation and they are going to have an impact on how a truly "standard" version of the STL is implemented. Some of these are relevant to the gotcha's discussed in this column.

The first major change is that all containers in the DWP are parameterized by the allocator type. Where vector use to be defined as

template<class T> class vector;

it is now

template<class T, class Allocator> class vector;

Under normal circumstances this difference will not be apparent because the definition specifies a default for the Allocator type. The actual declaration for vector really looks like:

template<class T, class Allocator = allocator<T> >

class vector;

As long as you do not need anything other than the default allocator, you can declare a vector just like you currently do.

Allocators have always been a part of the STL, but because of current compiler limitations changing allocators is done via preprocessor macros. As a result, customizing allocators for particular tasks is not a common approach with the current implementation. With a "Standard" implementation, it will be much easier.

As a side note: the latest version of the DWP has a completely different section on the requirements for allocators than what appeared in the April 1995 committee draft. The current requirements are very close to the requirements as specified in the HP release of the STL. In a sense, the DWP has returned to its roots for allocators. Those readers who have David Muesser's and Atul Saini's excellent book[6] should note that the description of allocators in the book is based upon the April 1995 CD and is now out of date.

There is one change to the HP allocator requirements made by the committee that is still in the latest version of the DWP -- "standard" allocators do not have the function init_page_size(). This means that "standard" containers will not be able to depend upon help from the allocator in determining their memory management strategy. More on this below.

If having the Allocator type become one of the template parameters for containers was the only change that applied to allocators, the implementation changes would be very few and very simple. Unfortunately, that is not the only change. In the DWP, all containers classes also take an Allocator object as a parameter of their constructors. Thus, the default constructor for a vector switches from:

vector();

to:

explicit vector(const Allocator& = Allocator());

As you can see, a default parameter supplies a default constructed allocator object if one is not provided by the user. Again, users who need only the default allocator do not have to make any changes to existing code. Having allocators be objects passed to the constructors has some ramifications for the implementations, however.

First, note that one of the most often given reasons for this capability is to be able to have an allocator which works with shared memory. Such an allocator would need each object to be initialized with the location of the shared memory segment to be used for the allocations. An example taken straight from the DWP is shown in listing 1. A shared_allocator has no default constructor -- each object has to be constructed with a pointer to a region of memory. This means you have to supply such an allocator object to the constructor of every container in a program which uses such an allocator type. It also means that it is possible to have two separate container objects, each of exactly the same type, but each constructed with a different allocator. This has some definite implications for how such classes as list, and the associative containers will be implemented.

Recall that these containers currently do memory allocation at the class level. This will have to change. In a "standard" implementation, every container object will have to retain an internal copy of its allocator. This means that every object will be responsible for its own allocation strategy -- nothing can be pushed up to the class level. It should be apparent that strategies that were once reasonable when implemented by a class on behalf of all objects of the class type are not viable if implemented by every object of the class.

As I noted however, implementors of a "Standard C++ Library" will have to closely match the performance of the current HP implementation, at least for typical current usage. For the linked node container classes, this will depend upon the implementation of the standard allocator. The DWP makes it quite clear that while the standard allocator must use operators new and delete, not every call to allocate() has to result in a call to new, and not every call to deallocate() must result in an immediate call to delete. Thus, you can expect that Standard C++ Library implementations will provide specialization's of the standard allocator that provide essentially the same functionality as is now handled by the list class and its siblings. As a result, I would expect there to be little apparent difference in performance, either runtime or memory usage, between the current HP STL implementation and a Standard implementation, at least for the node based container classes. The situation with regards to vector and deque is a little more vague.

The vector class currently uses the init_page_size() function to determine how large to make an initial allocation when the user adds the first data element to an empty vector (in the absence of a call to reserve, of course). Since init_page_size() is not available in the (draft) Standard, a vector implementation will have to find some other way to determine its initial allocation size. This could be a problem. A large initial allocation clearly enhances the performance of later additions, and can make a significant difference if there are a lot of those additions. On the other hand, as our scenario above has shown, it can waste a considerable amount of space if the actual size of the vector only grows to use a few percent of the allocated space. Furthermore, a large initial allocation scheme might not even be workable with something like a shared memory allocator which has a very limited amount of memory available.

The escape clause for the vector class is that vector has the reserve() function. This allows the knowledgeable user to override the default behavior whenever desired or necessary. If I were implementing a "standard" vector class, I would opt for simplicity and have the initial default allocation size be 1 (as in alloc_.allocate(1)). The user could use reserve() to increase this as desired. This may not be considered acceptably close to the performance of the current reference implementation by some vendors. Their implementations may have larger initial allocations. The most obvious implementation that would come closest to the current implementation is to hard-wire the algorithm that is currently in init_page_size() into the vector implementation itself. Again, the user would be expected to use reserve() to override this if necessary.

I see the deque class as a real problem. There is no reserve() function for deque, therefore there is no "standard" way for the user to override whatever memory management strategy that deque implements. Like vector, a deque implementation could hardwire a scheme based upon a de-facto page size, and work pretty much just like it does now under typical usage. As the message queue scenario above hopefully shows, the typical performance might not be acceptable under certain circumstances, and the question becomes how to provide good typical performance while still allowing the knowledgeable user a mechanism to override the default behavior if desired.

The only real 'hook' defined in the DWP for a deque is the ability to customize the allocator. As a result, I expect "standard" implementations of deque to differ somewhat from the current reference implementation. As the saying goes, stay tuned to this station for more information.

The whole point of this exercise has been to convince you that it is probably in your best interest to obtain a deeper understanding of the inner workings of something as fundamental as the STL. I personally hope that vendors will take the hint and provide the detailed information in some form of documentation, and not force programmers to read the source code.

When it comes to the current STL implementation, there are now several good books on the market. When people ask me if there are any books about the STL that I would recommend, I always tell them that there are two, and if they are new to the STL that they need to get both. Muesser and Saini's STL Tutorial and Reference Guide[6] contains a very good tutorial introduction to the STL. My first exposure to the STL was via a version of that tutorial as part of Modena Software's documentation. In addition, the reference portion of Muesser and Saini's book is quite complete and very close to the DWP (with the caveat mentioned above about the section on allocators; I presume that a new edition of the book will correct that as soon as the next Committee Draft of the DWP is released). This book is already a valuable reference, and should remain so even when vendors start to release true versions of the Standard C++ Library.

Mark Nelson's book[7], on the other hand, makes only a cursory attempt at teaching the reader how to use the STL. It is unabashedly an in-depth presentation of how the components of the current STL work. As I noted in the beginning, most programmers seem to have no trouble learning the basics of using the STL, so I do not see the lack of a tutorial as a shortcoming. To go beyond basic usage and reap the full benefit of the capabilities represented by the STL framework requires more in-depth knowledge than the typical tutorial provides. For that reason, my copy of Nelson's book is showing signs of being rather dog eared in certain places. The potential downside of Nelson's presentation is that it is limited to the current HP public domain implementation, as run on the Borland C++ compiler. As noted above, certain aspects of any standard implementation are bound to be different from the HP implementation. Likewise, there are likely to be vendor specific differences from the HP STL. Nevertheless, I suspect that Nelson's book will remain valuable for quite some time, if only for diagrams and the explanations of how the various classes can be implemented. If you are using the HP implementation, or one derived from it (and you probably are), then Nelson's book is invaluable.

References

1. Jack Reeves, "The (B)Leading Edge: Exceptions and Debugging,", C++ Report, Vol. 8, No. 10, November/December 1996.

2. Jack Reeves, "The (B)Leading Edge: 10 Guidelines for Exception Specifications,", C++ Report, Vol. 8, No. 7, July 1996.

3. "Working Paper for Draft Proposed International Standard for Information Systems -- Programming Language C++", April 1995.

4. Robert Murray, C++ Strategies and Tactics, Addison-Wesley, 1993.

5. Scott Meyers, More Effective C++, Addison-Wesley, 1996.

6. David R. Muesser and Atul Saini, STL Tutorial and Reference Guide, Addison-Wesley, 1996.

7. Mark Nelson, C++ Programmer's Guide to the Standard Template Library, IDG Books, 1995.

Listing 1.

----------------------------------------------------------------

template <class T>

class shared_allocator : public std::allocator<T> {

class impl {

explicit impl(void* region);

~impl;

void* allocate(size_t);

void deallocate(void*);

template <class T> friend class shared_allocator<T>;

};

impl* impl_;

public:

template <class U> struct rebind {

typedef shared_allocator<U> other;

};

explicit shared_allocator(void* region)

: impl_(new impl(region)) {}

template <class U>

shared_allocator(const shared_allocator<U>&) throw();

template <class U>

shared_allocator&

operator=(const shared_allocator<U>&) throw();

~shared_allocator();

pointer allocate(size_t N, const void* hint)

{ return impl_->allocate(N * sizeof T); }

void deallocate(pointer p)

{ impl_->deallocate(p); }

};

Figure 1.

TBS

(something similar to Figure 5-2

from Nelson's book)