Memory management in C programs

Tags: nethack internals c memory programming | Written by Alex Smith, 2014-03-16

One large difference between C and most other programming languages is that in C, you have to handle memory yourself rather than having a garbage collector do it for you. Ensuring that memory is allocated at the correct moment is not very difficult (and something that needs to be done manually in pretty much every language); the hard part is to ensure that enough memory is allocated, and to ensure that the memory is deallocated when it is no longer in use.

There are several techniques available for memory management in C. Many of them are used in NetHack 3.4.3; and even more are used somewhere in NetHack 4. In this blog post, I'd like to look at them and discuss their advantages and disadvantages. I'm mostly concerned about correctness, rather than efficiency, here; that means that unless the performance difference is very large, I care more about clean code than I do about fast code.

Techniques for allocating fixed amounts of memory

There are basically two different issues here: tracking the lifetime of memory allocations, and ensuring that they're the right size. As such, I'll start by looking at allocations whose size is statically known at compile time, and then move on to techniques that can handle the size being unknown.

Fixed-size buffers on the stack

Probably the simplest way to allocate memory in C is to use a stack allocation; if a variable is declared inside a function without specifying static or extern, enough memory for that variable will be allocated when that variable's scope starts, and freed again when that variable's scope ends (sometimes known as an "automatic variable"). Because scopes in C are well-nested, a typical C implementation will implement this via using a stack of memory; newly allocated variables will be given space from the "top" of the stack, and that space will be released from the top of the stack when they leave scope. ("Top" is in quotes because it's far from rare for a stack to be upside-down in C, with the top being the end with the lowest memory address.)

There are some major advantages (and some minor advantages) of doing things this way:

It does not require any extra state, beyond that provided by the implementation, to work. Quite often, memory allocation schemes will themselves need memory to work, leading to something of an infinite regress. Stack allocations are often useful to provide that memory.
Unlike the vast majority of memory allocation schemes, it is exception-safe with no extra effort: longjmp is aware of stack allocations, and will automatically free them from any scope it jumps out of. (Some care is needed in the function containing setjmp itself; any memory that needs to persist past a longjmp must be marked as volatile, a requirement set by the standard in order to allow compilers to optimize around longjmp calls more easily. Technically speaking, the memory does not need to be marked as volatile if its value does not change bewtween setjmp and the matching longjmp; in practice, although compilers can handle this case, they tend to spew warnings, so it makes sense to mark the memory as volatile if it is read after the longjmp at all.)

This mostly does not matter in NetHack 3.4.3, which does not use any exception-like behaviour (if something goes wrong, it calls panic which attempts to produce a save file, and then exits the entire process, freeing all the memory that way). However, NetHack 4 uses exception-like behaviour internally (implemented using longjmp and setjmp, because C does not have true exceptions); NitroHack (and thus NetHack 4) uses exceptions in order to exit a game without exiting the process. From NetHack 4.3 (the current development version, which is unreleased but available using the savebreak branch of the repository) onwards, they are used even more heavily, handling situations like rewinding the game mid-turn.
It works correctly in the presence of multithreading, recursion, and asynchronous signals.
This technique does not require any global state apart from that managed by the C compiler; thus, it is impossible to, for instance, forget to save a variable stored in a stack allocation (unless the save code is running inside a scope containing it), or to have it accidentally persist from one game into the next.
If your code does end up overflowing a fixed-sized buffer, your program is going to be incorrect no matter what, but if that buffer is allocated on the stack, it maximizes the chance that your compiler will be able to generate code to catch the buffer overflow (especially if the overflow happens in a call to a standard string manipulation function like sprintf or strcat). It is incredibly unwise to rely on this to protect your code from exploits, but it's a rather helpful feature for debugging. (Statically allocated memory is almost as good in this respect, but dynamic allocations are hopeless.)

It is thus no surprise that stack allocations are used heavily in both NetHack 3.4.3 and NetHack 4. There are a couple of major disadvantages, though, which mean that this technique is not suitable for all memory:

The value can only stay in scope for, at most, the length of time for which a particular function is running. In particular, this means that it's impossible for a function to stack-allocate a value, and then return a reference to it. (At least, it's impossible to do that and have the program work reliably; C lets you attempt to do so, and many compilers won't catch the attempt and will end up producing meaningless code). It is possible to work around this via returning the value itself, rather than a reference to it, but this is inefficient if the value is large, and impossible if the value internally contains pointers (e.g. in NetHack, a struct obj representing a container with contents cannot be returned from a function by any means if it was stack-allocated).
It requires you to know how large the object could possibly be at compile time. In many cases, this is not very difficult ("it stores the player's X coordinate, thus it can't be any larger than COLNO, thus 8 bits is enough"), although even such clear-cut cases have a tendency to introduce hidden assumptions (NetHack does not run well with COLNO set to 300 rather than its default of 80, and Slash'EM Extended likewise ran into problems when it ended up with more than 256 levels simultaneously existing in one game). In other cases, NetHack just uses a value that seems as though it would be large enough.

And of course, the problem with trying to guess what value would be large enough, is that sometimes you guess wrong:

What do you want to name this scroll called                                     
1234567890123456789012345678901234567890123456789012345678901234567890123456789 
0                                                                               
named 12345678901234567890123456789012345678901234567890123456789012? 123456789 
01234567890123456789012345678901234567890123456789012345678901234567890Thank you
 for recording your gaming session.··│                                          
The stored file for that session is "ais523.2013-11-03-08-56-07", should you car
e to publish it for others' entertainment.                                      
             # #   #       └──────·──┘                                          
Abort trap   ###   ###                                                          
             #       #                                                          
           ###       ###                                                        
           #┌───┐      #  ┌───┐                                                 
          ##│····#########│···│                                                 
          ##│···│        #▒·<·│                                                 
         ###│···+        #│··{│                                                 
        ####│···│         │···│                                                 
  ┌────┐##  └───┘         │···│                                                 
  │·>·^·##                └───┘                                                 
  │··$·│                                                                        
  └────┘                                                                        
                                                                                
Ais523 the Hatamoto    St:14 Dx:15 Co:18 In:8 Wi:10 Ch:10  Lawful S:97          
Dlvl:1  $:85 HP:14(15) Pw:2(2) AC:4  Xp:1/3 T:77

The above is an actual screenshot (produced by using Jettyplay's HTML screenshot function on the ttyrec) of one of my games in last year's /dev/null/nethack tournament. Here's the offending code from NetHack 3.4.3:

char buf[BUFSZ], qbuf[QBUFSZ];
const char *aname;
short objtyp;

Sprintf(qbuf, "What do you want to name %s %s?",
    is_plural(obj) ? "these" : "this", xname(obj));
getlin(qbuf, buf);

QBUFSZ, the amount of memory allocated to hold the prompt, is 128, so qbuf is allocated 128 characters. As can be seen from the screenshot, the prompt is well over 128 characters long (I'd intentionally caused this crash by giving the item a very long name). However, the crash did not happen straight away; the most likely reason is that qbuf overflowed into buf, which happened to be just after it on the stack, and thus the runtime system noticed nothing wrong, with the actual crash caused by an overflow due to the entire message (including my response) being more than 256 characters (BUFSZ, the most common fixed size for a string in NetHack 3.4.3); getlin assumes that its first argument is less than QBUFSZ characters when ensuring that it allocates enough memory for its own use. Of course, nothing guarantees that a buffer overflow will be caught at all, but modern operating systems are getting increasingly good at catching them because they frequently lead to security exploits. It would be unwise to rely on this in practice, though.

Strings are probably the largest offender in this regard in NetHack 3.4.3, although there are some overflows in various other unexpected places. For instance, filling the largest bigroom variant entirely with boulders, then attempting to travel diagonally from one end to the other, causes a crash; this is because a fixed-size buffer on the stack is used to hold the internal state of the travel logic, and although the buffer should be large enough, a bug causes boulders to be stored 64 times in the buffer, and although only a small subset of the boulders is ever stored there at a time, a sufficiently full level will nonetheless overflow the buffer. (Incidentally, I discovered this bug accidentally writing AceHack; in AceHack, changes to the travel logic meant that the bug could be triggered using lava instead of boulders, and I got a crash trying to travel in the Valkyrie quest.)

Although this technique is a useful one, and probably used for the majority of memory storage in both NetHack 3.4.3 and NetHack 4, its drawbacks mean that we need to look for other options in the cases where it fails.

Static allocation

This is arguably even simpler than stack allocation; write a variable outside any function, and/or use the static keyword, and it's allocated once at the start of the program and only deallocated when the program exits. (The static keyword causes the variable to have a limited scope, despite the infinite duration of its allocation; without the keyword, the compiler will make the variable anywhere in the program, including in different files, a so-called "global" variable. Even without the keyword, though, the variable is still considered statically allocated.)

There are many situations where it's tempting to do this. Here are some of the reasons why people do:

It's one of the easiest ways to solve visibility issues; if a variable is global, it only requires a quick extern declaration anywhere in the code to be able to read or write it. If there's truly only one copy of the variable, this is typically simpler and less error-prone than trying to pass pointers to it everywhere as function arguments (and what is sometimes suggested as the "correct" solution to the problem, dependency injection, is basically impossible to do in C due to its poor metaprogramming abilities).

Of course, the flip side of this is that it becomes ingrained deep into your program design that there is only one copy of that variable, and it's easy to introduce bugs by using the variable while its value is meaningless (although it's allocated for the entire program's duration, often it isn't meaningful for the entire duration). Both these problems can be alleviated to some extent by using an accessor function or macro (which is part of the reason that NetHack 4 has been introducing accessors and mutators for many globally visible variables).

Still, this is one of the better reasons to use a global variable.
It's possible to return a reference to a statically allocated variable from a function. So long as nothing overwrites the variable until its next use, this is a method of returning large or complex (i.e. containing internal pointers) structures from functions that can actually be made to work, unlike stack allocations, which can never correctly be referenced.

The downside of this is that it's extraordinarily fragile. It fails completely in the presence of multithreading or if called from signal handlers; it is very hard to get it working correctly for recursive functions; and even for regular old-fashioned function calls with nothing special about them, statements about the lifetime of the value that's returned are very hard to make, meaning that apparently innocuous changes to the code can cause major problems.

This is no longer considered a legitimate reason to statically allocate variables in NetHack 4 (and increasingly, in other contexts, too; many standard POSIX functions have new versions with names ending in _r that do not attempt to use it). Some still exist because we haven't got around to getting rid of them yet, or haven't yet found a suitable replacement.
The value stored in a statically allocated variable will persist until explicitly changed, even across function call boundaries.

This has lead to a huge number of subtle bugs in NetHack (for instance, the way in which you can use a stethoscope for free multiple times in a turn in NetHack 3.4.3 by saving in between). The problem is that these values are typically effectively part of the gamestate, yet nothing's forcing them to be reset between one game and the next (NetHack 3.4.3 exits the process between games to help reduce this problem), or to be saved in the save file when exiting the process mid-game.

So although we have some advantages here, we also have a lot of problems. One of the ongoing projects in NetHack 4 development at the moment is the "globals purge", where we try to get rid of global and static variables (e.g. by making them stack-allocated when possible, or replacing them with constants when possible), or at least to make them better-behaved. Here are some of the reasons we keep them around, and some of the techniques we use in order to alleviate some of their issues:

Variables that are global for visibility reasons typically have to be kept as global for that reason, but in NetHack 4, we namespace them into a few structs (u, flags, program_state). This means that at least we have only a few places to look to ensure that they are being saved and restored correctly. It also helps to future-proof for things like multiplayer play (you can replace the structs with arrays if there ever needs to be more than one of them, and then only have to deal with communicating the array index rather than the entire contents of the struct).

When I wrote the multiplayer patch for AceHack, global variables were one of the larger issues, given that many of them needed to be different for different players. My solution was a little extreme; I used an entire separate process for each player, meaning that each player would have their own set of global variables, and then communicated all the data that needed to be shared via passing save files around between processes every action. This might seem backwards and wasteful, but it was easier to track all the shared data (using the save file), than it was to track the various random information stored in globals! For NetHack 4, I'm hoping that a multiplayer patch would not need to resort to such extreme lengths.
Some functions legitimately need to preserve their state for a short time, communicating it to other functions or to future invocations of the same function, but not to need to place it in the save file. An example that caused us trouble for a while was the state used while praying; you start praying, and three turns later the prayer finishes. The game needs to decide whether the prayer is receptive or not when you start praying, because it effects whether you are invulnerable or not, but cannot repeat the calculation at the end (e.g. because your prayer timeout might have dropped from +1 to −2 during those three turns), and so needs to remember the value across those three turns.

It's impossible to save while praying in NetHack 3.4.3; in NetHack 4.3, it's possible to save at any time (in order to handle hangup correctly), but this is accomplished via remembering the gamestate the last time it could safely be saved (i.e. not during prayer), and the character's actions since, and replaying them upon reload (or reverting if changes to the code mean they can't be replayed). Thus, the prayer state never needs to appear in the save file.

In NetHack 4, we've introduced a struct turnstate for this sort of variable. In order to use the turnstate mechanism, the code must ensure that there is no attempt to calculate the gamestate portion of the save file while turnstate variables have non-default values, putting them back to their defaults once the player regains control of their character. The turnstate values are repeatedly checked to ensure that this is the case; as a result, any values in turnstate cannot end up bypassing the save/reload system (and we get a useful debug message if some broken code attempts to do this).

Rotating buffers

This is a trick that's sometimes used to work around one of the problems with static variables: returning them from functions gives them a lifetime that's hard to determine, and sometimes too short. The idea is to have an array of static variables, that are used in sequence, cycling back to the first once the last is used.

The result is a variable that works almost entirely like a static variable in terms of its properties as a return value. Its lifetime is still basically impossible to determine, but longer than it would be otherwise. The code then simply hopes it's long enough (or to put it more kindly, ensures that the return value from the function is copied somewhere safe before the function is called another n times).

This technique is used by NetHack 3.4.3 to handle strings returned from object naming functions (which are often built up out of other strings returned by object naming functions), mediated by the function nextobuf. To make things even more confusing, some of the object naming functions assume that the return values from other functions were constructed like this (an assumption that is not documented anywhere), and take the opportunity to alter them in-place.

With respect to object naming functions in particular, this technique has another problem; given that the buffers are fixed size, they can overflow if a sufficiently long object name is used. I'm not sure what the longest possible object name is, but it's got to be quite close to 256 characters.

I haven't yet ripped this mess out of NetHack 4, but I want to. Part of the reason I'm writing this blog post is to give me a nice list of strategies I could potentially replace it with, to help me decide what the correct fix is.

Single dynamic allocation with ownership semantics

There is one other place where memory can be allocated in C; instead of placing it on the stack, or statically allocating it, it can be allocated using malloc, which obtains a block of memory of a given size in some system-specific manner (e.g. one common mechanism is to find some free space in a single large resizable block of memory known as the "heap", making it larger if necessary; there are others).

The concept of single dynamic allocation is quite simple: you need a variable of a particular size, and so you just create it with a call to malloc when you need it. This neatly gets around all the problems with the previous methods, because the memory will persist as long as you need it, and won't be overwritten except if you explicitly overwrite it.

Instead, the problem with this method is getting rid of the memory again when you no longer need it. There's a standard function free that releases an allocation made using malloc, but knowing when to call it can be difficult. The most common method of doing this is "ownership semantics"; whenever a pointer to dynamic memory gets passed around, part of the API is whether the ownership of the pointer is being passed (in which case the recipient is responsible for freeing it and the sender cannot keep copies), or whether the sender keeps the ownership (in which case the sender remains responsible for freeing it and the recipient cannot keep copies nor pass the ownership on itself).

This would seem to be a nicely simple and general method of handling fixed-size memory. However, there are a few reasons why it is not used everywhere:

It's a bunch of typing; it's much easier to write
```
{
    struct obj otmp;
    /* do stuff with otmp */
}
```
than it is to write
```
{
    struct obj *otmp;
    otmp = malloc(sizeof *otmp);
    /* do stuff with otmp */
    free(otmp);
}
```
And of course, the latter code needed to do a stack allocation anyway in order to store the address of the memory it allocated.

This is pretty minor, but in cases where stack and dynamic allocations are otherwise equally good, there's no real reason to do the dynamic allocation at all; it just makes your code harder to read.
It is hard, in C, to make sure that your ownership semantics are being tracked correctly; if you make a mistake, often apparently nothing will go wrong, but your program is now incorrect (either in terms of a use-after-free, a double-free, or simply just leaking memory). There are other languages that aim to solve this problem (notably Rust, although various C++ libraries go a good way towards being able to solve it too), but converting NetHack's codebase to a new language would be very difficult (even one as similar as C++, which has many potentially relevant details that differ from C).

There are also program analyzers that are meant to be able to catch memory allocation mistakes. splint is designed to do this but requires many changes to the program and turns out to be unusably buggy in practice. valgrind works excellently, but works on a running program, and thus can only catch issues that you actually trigger in your test run; additionally, in the case of a memory leak, it can determine where the allocation was, but not where the free should have been.
However, the largest reason I've been moving away from the most naive version of dynamic allocation in NetHack 4 is exception safety. In C++, the use of RAII means that exceptions can be made to free any dynamically allocated objects whose owners went out of scope as a result of the exception. In C, we don't easily have that option available to us, and those objects are just going to stay allocated. As a result, when short-lived dynamically allocated objects are used in NetHack 4, we have to ensure that it's impossible to have an exception while the object is alive (which means, for instance, the code cannot make any attempts to get user input; a problem, because constructing prompts to show to the user is one of the situations you'd otherwise want to use dynamic allocation for).

Here's an amusing example of this technique in use, new to the NetHack 4 codebase:

case OPTTYPE_KEYMAP:
    str = malloc(1 + sizeof "submenu");
    strcpy(str, "submenu");
    return str;

Why not just return the string "submenu"? Because the function in question can return strings from a range of different sources. That one's a compile-time constant, but not all of them have the luxury of being able to return compile-time constants. For instance, another possibility is for the return value to be the ASCII representation of a number; the code for that prints it into a fixed-size buffer on the stack, then makes a dynamically allocated copy of it so as to be able to return it from a function.

This case of creating text for the current values of options on the option menu is a perfect scenario for ownership semantics because the text simply has to be calculated, added to the menu, and then can be freed again (a few lines later, before we've lost track of the owner of the text); yet a stack allocation would be awkward due to the way the code flow in the section works. In most cases, things aren't that simple due to the possibility of exceptions and the lifetime of the object being longer (and thus harder to track). Thus, despite being hugely widespread in most C codebases, this is relatively rarely used in the NetHack 4 codebase.

Allocation on chains

This is a technique used in both the NetHack 3.4.3 and NetHack 4 codebases in order to gain the advantages of dynamic allocation, while dodging many of the disadvantages. The basic idea is for all objects of a particular sort to be stored on an "allocation chain" (typically most easily implemented as a linked list whose metadata is part of the objects themselves, although any method of implementing a set would work); they're placed on a chain when allocated (there can be, and often is, more than one), and removed when freed. Any of the other memory management techniques in C can be used to track the chain itself (for instance, it could be a global variable, or possibly even stack allocated).

For this technique to work properly, the code needs to know points in time at which the chains should be empty (otherwise there's no real benefit to tracking them); this is the main restriction on this technique.

It works pretty well, though, when that restriction is not an issue:

It enables the code to monitor its allocations at runtime, making double-frees and memory leaks detectable (although not use-after-frees). While not as good as a compile-time guarantee, it's nonetheless helpful in catching errors in practice. (Incidentally, this check is responsible for what is probably NetHack 4's record number of panics in a day, trying to iron out bugs I'd introduced converting the item code to be fully allocation-chain based.)
It can handle exceptions (i.e. longjmp), just so long as after the jump, it's possible to determine, by analyzing objects on the chains, whether they should be allocated or not (the ones that shouldn't be get deallocated). Typically this is via arranging things such that the chains are supposed to be empty after an exception, meaning that anything that's found on the chains can be freed.

NetHack 3.4.3 uses an approximation of this technique for handing items (although probably for a different reason than memory safety). Each item is on one of a large number of chains; many are gamestate (e.g. "the chain containing the possessions of «monster»"). In NetHack 4, there are also chains for turnstate, and for items that are currently being moved around between locations (in case an exception happens after an object's been removed from one gamestate chain and before it's been placed on another; the length of time during which an item is on no chain is just a few lines, which can easily be checked to ensure they can't cause exceptions); getting that to correctly handle unusual situations (such as stacks being split or items being polymorphed) is what caused all the panics.

This technique's used in other places in NetHack 4, too. A good example is the menu code in the interface. NetHack 4 avoids a hard limit on the number of menus that can be drawn on the screen at once, meaning that a static buffer is unusable, and because menus need to move on the screen when the screen is resized (which could happen with any number of menus open), stack allocation would be awkward for visibility reasons. Instead, a statically allocated menu chain is used, holding currently open menus. It is possible to need to quit the current game with menus open (currently only due to hangup, but I'm planning to add a "save anywhere" feature before 4.3 is released), causing an exception. However, the game can deallocate the menus because exceptions in NetHack 4 always work by jumping back to a known state (just before the game was loaded) and replaying from there, and thus it's known that no menus should be on the screen, and they can all be removed.

Finally, a cautionary tale. NitroHack has an allocation chain for objects being communicated over the API from the server to the client. It's freed every turn. This loses most of the benefits of allocation chains, and makes it very difficult for the client to work out the lifetimes of the objects it receives (especially as there's no guarantee that there's even a game running, so the concept of "turn" may be meaningless). In NetHack 4, I changed it to be freed with every API call, which effectively works the same way as returning a static buffer from the client's point of view. This is better, but still unsatisfactory, really; perhaps I'll find a better option someday. The moral of the story is that allocation chains do not magically solve all the issues with dynamic allocation; sometimes they're appropriate, but sometimes you need to find something else.

Reference counting

No discussion of memory management in C would be complete without a mention of one of the most general methods for dealing with it. Instead of having one owner for an object, as in ownership semantics, you allow any number of owners for each object, and keep a count on the object of how many owners it has. When an owner no longer cares about an object, it decreases the count, and deallocates it if the count became 0. This is basically just a generalization of ownership semantics; it does nothing to help with exceptions, because you cannot know whether the reference count is correct or not after an exception.

There are some sorts of programs where reference counting is very useful; in particular, the sorts of programs most suited to functional programming (such as parsers, compilers, and other programs that mess with abstract syntax trees) often are best implemented using immutable objects that each hvae multiple owners, rather than giving each owner their own mutable copy. NetHack is, however, not really that sort of program, and there are few situations in which the power of reference counting is needed. (Arguably, it would even hurt; the most famous situation in which reference counting breaks down is reference cycles, and there are quite a few of those in the item handling code, albeit with clear rules for breaking cycles using weak references.)

There is one case where reference counting is used in NetHack 4; the (external) JSON parser library supports reference counting for JSON objects, and thus NetHack 4 is forced to interface with that. As far as I know, though, it isn't used for anything other than emulating a single-owner semantics but with more boilerplate.

Caller allocates

As mentioned earlier, one of the big problems with stack allocations is that they're not particularly useful for returning values from functions, and using static buffers for the purpose is a bad idea. Sometimes you can avoid the problems using dynamic allocation ("callee allocates"); but as seen above, that has its own issues.

One widespread solution to this problem is to have the caller do the allocation itself, then tell the function it's calling where it wants the return value to be placed. The callee thus has the memory it needs to write in already, and doesn't need to worry about its lifetime; that's the caller's job.

So long as the caller knows how large the memory it needs to allocate is at compile time, there are very few downsides to doing things this way. Any sort of allocation the callee could do, the caller can do too, and the caller has a better idea of what lifetime the data needs than the callee does. Additionally, there are two things that the caller can do that the callee couldn't; it can allocate objects on the stack using its own lifetime (which is very useful if it's planning to use them itself rather than return them to some other function); and it can use storage that was allocated by its caller, and so on ad infinitum.

The problem, of course, is that the caller absolutely needs to know the size of the object that's going to be returned to it, or it has no way to allocate an object large enough (other than guessing, which is prone to failure). For objects that inherently tend to have a variable length, such as strings, this doesn't work so well. The most infamous example is gets, which was once a standard C function; its purpose is to return a string from standard input, with no length limit. Given that the only way for gets's caller to be completely sure of how much memory to allocate would be if it had placed the string on standard input itself, defeating the whole point of the system call, it quickly became apparent that all useful uses of gets were security vulnerabilities, and the function was subsequently removed from the standard.

It is perhaps unfortunate, then, that NetHack 3.4.3 uses caller-allocates mostly for strings, one of the least suited sorts of data for the technique. There's a convention that the amount of space to allocate is BUFSZ (256), and callees are therefore supposed to check this to make sure that they don't overflow the caller's buffer, but the code usually seems to forget the check in practice.

Examples of better uses include coordinate handling code, such as dtoxy in NetHack 3.4.3 (that asks the caller to allocate a pair of coordinates), and arg_from_delta in NetHack 4 (which fills in an nh_cmd_arg structure provided by the caller; the function's purpose is to convert a direction given as x/y/z coordinates into a structure used to remember the context of a user's command so that it can be repeated correctly with control-A).

Techniques for handling memory of unknown size

So far, we've been looking at techniques that only work when the size of the object we're trying to allocate is known at compile time. Those are, unfortunately, the easy ones, despite there being such a range of possibilities to handle it. If it isn't known how large the object will be, things can get a lot worse.

Variable length arrays

Variable length VLAs are a feature added to C in C99, and thus often considered a "new" feature even though C99 was released approximately 15 years ago. (Something similar to VLAs, yet which works on many older compilers, is alloca, which is not defined in the C standards but is nonetheless a commonly provided extension. It doesn't mix well with VLAs, though.) The idea is pretty simple: if a C implementation uses a stack for handling automatic variables, then it can allocate a variable amount of memory as easily as a fixed amount of memory, so why shouldn't that capability be available to user programs?

The big advantage of VLAs is that you get all the advantages of stack allocations, except now you don't need to know how large your object is at compile time, so long as you know how large it is some time before you need to allocate it. VLAs are used heavily in code newly written for NetHack 4, as a result; in particular, the exception safety of stack allocations is something that is often much more awkward to gain any other way.

In fact, pretty much our only answer to "why aren't you just using a VLA" is "we can't":

Although not as odious a requirement as knowing the size of the allocation at compile time, it's often still impossibe to know the size of the object even at the time you try to allocate it. For instance, not even VLAs will make gets work. Many of the following sections will be discussing various workarounds to this.
VLAs are still stack allocations, and so they still can't have a lifetime larger than the function that allocates them. So they're still useless for returning values from functions.

These problems both come up surprisingly often in practice.

Buffer bumping

Just as VLAs are the variable-length version of stack allocations, "buffer bumping" is my name for the variable-length version of static allocations. (I'm not sure if there's an official or widely-used one.) This is seen quite widely in NitroHack code; I'm not aware of it being used in NetHack 3.4.3. I've mostly got rid of it for NetHack 4, usually via finding a way to avoid static allocations entirely, although it is nonetheless appropriate in some cases (such as the "Things that are here:" display in the interface, which needs to be global for visibility reasons; being part of the interface, there's no need to worry about ensuring that it ends up in the save file).

The basic idea is to have a statically allocated pointer, initially NULL. Whenever you want to write to the static storage in question, you check to ensure it's allocated and large enough. If it isn't, you reallocate it larger (free followed by malloc is the most efficient way to do this if you don't care about its old contents; realloc copies the contents over from the old location to the new larger location, and thus is useful if the old contents are still relevant). Thus, the buffer never actually gets smaller, and ends up still allocated at the end of the program, at which point it's deallocated by the operating system. A variation is to resize the memory smaller at any point where less of it is needed, or perhaps deallocate it altogether (setting the variable back to NULL) once it's no longer relevant; this is probably marginally slower and probably marginally less wasteful of memory, but the size of both changes tends to be so small that it's very unlikely it would make a difference in practice.

One nice advantage of this method is that you don't need to know the size of the object you're storing in advance; you can bump the buffer halfway through filling it if you have to.

There are some problems with doing things this way:

In addition to the standard advantages of static allocation, you have all the standard drawbacks of static allocation, too. You can have a better idea of when the value is meaningful (perhaps making errors in resetting it easier to detect), but that's about it; you still have the uncertainty about lifetimes, and the issues with ensuring that the variable is saved correctly.

These disadvantages can be mitigated the same way as they can be mitigated with regular statically allocated variables.
There's also a new problem, though. With statically allocated variables, if you pass pointers to them around, those pointers will at least remain valid forever. Buffer bumping, because it resizes memory, can sometimes end up giving it a new address; I hope nobody was saving a pointer to it! Worse, the new address is usually but not always the same as the old one, making this difficult to debug, because simple tests will normally not notice a problem, and tests that do observe a problem may fail to reproduce it.

This can be worked around using an extra layer of indirection, passing around a pointer to the statically allocated pointer itself rather than to the memory it points to, but the code gets increasingly hard to read and maintain as a result.

You could theoretically do this for rotating buffers, too. That might arguably be an improvement on the current object naming code, but the result would be both hideous and massively fragile.

Dynamic reallocation

This is the standard method for dealing with data of an unknown size. You start off with just a small dynamic allocation (or sometimes even an effective 0-byte allocation using a null pointer); and keep track of how much memory is allocated, in addition to tracking a pointer to the memory itself. Then you start doing whatever calculations you need to do in order to construct the object itself. If you ever find there isn't enough memory, you just make it larger, using realloc to copy the old contents over.

For performance reasons, it helps to resize the object in large steps rather than small steps; reallocating it one byte larger every time round a loop can lead to quadratic performance. When the expected sizes are large, reallocations typically multiply the allocated size by a factor (often 2 or 1.5), in order to ensure that the performance remains linear no matter what. When the expected sizes are small, you can simply round up to a multiple of some number that's larger than commonly expected sizes, and have efficiency in the common case and code that continues to work even in the rare cases where you're dealing with more data than expected. Here's an example from the NetHack 4 codebase (this is part of a strcat implementation for dynamically reallocated memory, quite similar to StringBuilder in Java):

if (strlen_first + strlen_second >= *first_alloclen) {
    int first_was_null = !*first;

    *first_alloclen = ((strlen_first + strlen_second + 1) / 256) * 256 + 256;
    *first = realloc(*first, *first_alloclen);

    if (first_was_null)
        **first = 0;
}

The code has to handle both first_alloclen and first itself; it also needs to maintain a NUL terminator on first, because strcat works on NUL-terminated strings.

The advantages of this technique:

If dynamic allocation is appropriate for whatever you're doing (in NetHack 4, that normally means code that's entirely about calculation and doesn't cross the client/server boundary or do communication with the user), then all you have to do is add an extra variable to store the allocation length (stored the same way as your pointer), and your reallocations will work fine. This is especially useful if you would be using dynamic allocation anyway, but stack allocations are not hard to convert if their exception-related properties aren't needed.
It works quite neatly with caller-allocates; so long as you can require the caller to do a dynamic allocation, and add an extra level of indirection to the call, you can have the callee do the reallocation. (This is what's going on with the *first and *first_alloclen in the code sample above.) That said, if the callee can do reallocation, there often isn't much need to make the caller do the original allocation; the main benefit is if you need mutiple callees to do reallocations on the same buffer.

The disadvantage, of course, is that you're doing a dynamic allocation. Even when you can use ownership semantics (which is "usually but not always"), it's not always possible to guarantee that your allocation won't end up leaking due to an exception. As such, this technique is not that widely used in NetHack 4, although it does show up occasionally.

Count and copy

Using a caller-allocates system has several advantages, but one big disadvantage: the caller has to know the size of allocation to perform. However, there is one way in which it can frequently find out: ask the function which is eventually going to fill the buffer it allocates.

The basic idea of count-and-copy is to provide functions with an argument that specifies whether to count the size of their output, or to copy their output into a provided buffer (conveniently, the argument can be the buffer itself, setting it to NULL to represent a count). This works particularly well when an entire call hierarchy uses it, because the callee can go and ask the functions it calls how much data they plan to return, and use it to determine how much data it will return, without needing to actually generate the data itself. In the most appropriate situations for count-and-copy, the count can be very fast, and thus not cause much of an efficiency loss (or in extreme circumstances even an efficiency gain) compared to dynamic reallocation; and it can achieve this with all the advantages of caller-allocates at the same time! (Most likely, the caller will use a VLA to do the allocation, although it could allocate dynamically if it want to, or even use buffer-bumping.)

Count-and-copy is not so good if the callee needs to effectively calculate the object it's going to return just to be able to determine how large it is, and it's horrific if the callee needs to allocate memory to be able to calculate that. As such, it isn't a technique that's appropriate in all situations. It also suffers from longer code than some other techniques, and potentially repetitive code too (you have to call a function twice to use it, and the two calls cannot be wrapped in a function because that would mess up the scope of the object being allocated).

This technique turns up in an unusual place in NetHack 3.4.3: the game save code. This is particularly unexpected because unlike in NetHack 4, the save code in 3.4.3 does not have a return value apart from "the save worked, exit the game" or "the save failed, keep playing"; the save file itself is written to disk. However, NetHack is even older than C99 (in fact, even older than C89), and dates from an era when hard disks might not be installed and floppy disks were pretty small; there was a real risk that the save file would fail to fit on the remaining space on the drive, and thus counting the amount of available space before copying onto the disk ensured that the save would not fail halfway through (perhaps corrupting the gamestate in the process, because the save code is also responsible for memory deallocation in that version).

A more typical example is the vsnprintf function from the C standards (which libuncursed, a rendering library I wrote for NetHack 4, uses to interpret its equivalent of a printf statement). There's a vsprintf function that uses a caller-allocates system, but although the caller sometimes knows the maximum length that can be produced from a given format string, libuncursed would have to reproduce much of the logic of printf to be able to work it out for an arbitrary format string provided by its users. vsnprintf uses a count-and-copy interface instead, allowing libuncursed to determine how large a buffer it needs to provide for the next call to vsnprintf.

Sadly, it turns out that the version of vsnprintf available in the C toolchain for Windows that I'm using is buggy; it returns an error code, rather than the buffer size, if the buffer is absent or too small, which is behaviour forbidden by the standard. Thus, in libuncursed, I had to write a loop that detects this situation and tries successively larger buffers until the correct buffer size is determined. I guess that's better than gets, at least…

Callee-cleans

Sometimes, you have no option but to use dynamic reallocation for obtaining the memory for your object. For instance, NetHack creates a lot of menus, often with a range of different functions responsible for adding various items to the menu; and although count-and-copy is theoretically usable for that, the code would become something of a mess, especially as counting menu items is often quite inefficient.

However, menus cannot be dynamically allocated either; the lack of exception-safety is a problem, because "showing a menu to the user" is a particularly likely situation to get an exception. Not only is it both possible, and reasonable in practice, to get a hangup there, but if the menu needs to be closed asynchronously without user interaction, NetHack 4 implements that with exceptions too (for instance, this can happen if you're watching someone else's game and they close a menu).

Although the situation seems unsolvable (or worse, solvable using global variables), there is a solution, albeit a reasonably absurd one. Here's the relevant part of NetHack 4's menu code:

/* Make a stack-allocated copy of the menulist, in case we end up longjmping
   out of here before we get a chance to deallocate it. */
struct nh_menuitem item_copy[ml->icount ? ml->icount : 1];
if (ml->icount)
    memcpy(item_copy, ml->items, sizeof item_copy);

/* ... some other initialization goes here ... */

dealloc_menulist(ml);

/* ... main body of the function goes here ... */

The idea is reasonably simple. From the caller's point of view, the API is simply "you pass a dynamically allocated object, and transfer ownership to the callee". The callee responds by copying the object into a VLA, and then freeing the original (which it can do, because it has ownership); thus, by the time the code reaches the point at which exceptions can happen, the dynamically allocated memory is long gone.

One potential alternative is to make the caller do the copying into a VLA. That is slightly more general, in that it allows the caller to have access to the object after the call is over (in the typical case when there was no exception). However, it's much worse in terms of code duplication, because a VLA is needed at every callsite rather than merely in the function being called.

Despite performing its job well, I avoid this approach in NetHack 4 unless the alternatives are even worse; copying an object purely for the purpose of moving it from the heap to the stack is wasteful in the vast majority of cases.

Avoiding allocations in the first place

As can be seen above, memory allocation in C is something of a mess, and there is no simple and general solution that works for everything. (This is, incidentally, one of the reasons why C is famous for being more efficient than other languages; in other languages, it's the compiler managing the memory, not you, and being a computer and thus uncreative it will inevitably attempt to use the same solution for everything, or at least choose from a relatively small set.)

Sometimes it's simplest to dodge the problem. Obviously, doing that is not always possible. In this section, we consider some methods that can be used the rest of the time.

Using indexes into lookup tables

Sometimes, you need to pass an argument to a function, or return a value from a function, but there are only a finite number of possible values. If the number is both finite and small, then it can make a lot of sense to have an explicit list, and simply pass an index into that list.

This is most commonly done with strings. If you don't have to allocate the memory for a string, its variable-length nature is not a problem. Instead of returning memory which is allocated for the purpose of holding a string, you can just return a pointer to the tring in the program's data section itself; the pointer is effectively an index into your program's memory image. Of course, this only works when all the possible return values are known in advance, or (if you're prepared to handle the lifetime headaches) static buffers.

A good example of this in the NetHack codebase is the hcolor function, whose purpose is to return a nonexistent color to hallucinating players, and a default color to players who are not suffering from hallucination. Thus, a black object would be described as hcolor("black"); normally this would just be "black", but when hallucinating, it might be "bluish-orange" or "sky blue-pink".

The function's general behaviour might require a static buffer or dynamic allocation if it did something more complex, but because the list of nonexistent colors is hard-coded, and the default color is also hard-coded by each caller, there's no need for messing about with memory allocation at all; the caller can just provide a constant string as the argument, and get one in return.

This technique can also be used with things other than strings; monster descriptions are one common example (in fact, the game normally works out which species a monster belongs to by converting the pointer to its monster description back into an index, relying on the fact that it's just a pointer into an array rather than allocated memory and that C is one of the very few languages that allows pointer arithmetic).

Continuation passing style

As can be seen above, the most difficult case for handling memory allocations is in function return values; stack allocations are the best general-purpose allocations, but have lifetime issues that prevent them being returned from functions. However, frequently the only thing the caller does with the return value from a function is to pass it to another function.

This suggests an alternative style of return value passing; instead of the callee trying to pass the return value back to the caller, the caller can instead tell the callee what it wants to do next with the return value. This is known as continuation passing style, or CPS. Here's a hypothetical example:

/* Without CPS */
char foo[7] = "foo";
strcat(foo, "bar");
puts(foo);

/* With CPS */
cps_strcat("foo", "bar", puts);

In the first version of the code, the caller needed to allocate a buffer to hold the concatenation of "foo" and "bar", so that it could pass it to puts; and it would be very easy to get the size of the buffer wrong. The CPS version is much simpler; the "foobar" needs to be allocated somewhere, but it's most likely going to be a stack allocation or dynamic reallocation somewhere inside cps_strcat, which at least has the benefit of being able to look at its own internals to measure the size of the resulting string. Then once it's produced the string, it passes it on to puts.

It should be noted that the allocations could be avoided altogether if some other representation were used for the strings. C typically uses a sequence of characters contiguous in memory. If instead, for instance, a string were represented by a loop that calls an arbitrary (passed-in) function once for each character in the string, no variable-length allocations would be needed in the above example. That representation is very alien to C programmers, though, and has effectively no library support; additionally, although it works well for strcat, it can struggle doing more complex string optimizations. (Incidentally, I first saw it in an embedded system with less than a kilobyte of memory, and no ability to dereference pointers to ROM. Storing strings in RAM would have been wasteful, so it used that string representation in order to fit the string into ROM while simultaneously allowing arbitrary operations to be performed on the string.)

The CPS technique is very widely used in implementing functional languages, but it falls down somewhat in C. The main issue is the lack of closures, meaning that an annoying number of extra levels of indirection need to be used in order to be able to pass extra arguments to the function that is given the return value. (Although gcc supports stack-allocated closures as an extension, this is not portable to C compilers in general.) The most promising solution to this is to come up with a kind of domain-specific language for describing what to do with function return values (much the same way as printf has a domain-specific language for its format strings). As such, the technique seems most useful when it is only used across a small group of callees, which all co-operate. (Can you tell I'm strongly considering trying to use this approach for constructing messages to be shown to the user in NetHack 4? Whether I do or not will likely depend on how readable the resulting code is and how much effort converting the existing code will take.)

CPS is also useless at storing values long-term. You might be able to construct a string to show as a message that way, but it's never going to work as a method of storing strings in a save file.

One other thing to note is that unlike in functional language implementations, in C, you do not want to use continuation passing style everywhere; especially if a function has a simple return value (such as an integer), you want it to return normally. The reason is that a continuation passing style "return" is not a return from C's point of view, and thus does not free stack allocations. This is fine and desirable for what you use CPS for, but if there are no normal returns involved, nothing on the stack will ever be deallocated, and your program is effectively just one giant memory leak.

Call-by-name evaluation

CPS is useful for avoiding the need to allocate memory in a way that can be returned from a function, at the cost of forcing the memory to be made suitable for use as an argument instead. The noticeable feature here is that the return value is replaced with a callback. The same technique can be applied to the arguments; instead of allocating memory to serve as a function argument, you instead supply a callback that can be used to obtain the argument. This is known as the "call by name" convention; some programming languages (e.g. Algol 60) actually use it by default (the compiler constructs the callbacks transparently without needing user interaction), but it can be emulated in "call by value" languages like C.

The main disadvantage of call-by-name is that it is often inefficient in practice, and can be hard to reason about. However, it has a couple of advantages for practical use: it is up to the callee whether the argument is ever evaluated at all (meaning that dead code can be eliminated dynamically at runtime), and more importantly, there is never a need to store the value of an argument anywhere. (My day job is actually involved with call-by-name compiler design; it has additional advantages when reasoning about programs, because control structures like if and while cannot be expressed as call-by-value functions but can be expressed as call-by-name functions, meaning that fewer special cases are needed to be able to reason about arbitrary programs.)

For the purposes of this blog post, what we're really interested in is the way in which the use of call-by-name gets around the need for temporaries to store data. Here's some code from the current NetHack 4 codebase, that prints a message when an object breaks, reminding the player that they can name it even after it's broken (unlike NetHack 3.4.3, which prompts for a name immediately because the player will not have a chance to name it later):

char buf[BUFSZ];

/* ... */

else 
    sprintf(buf, "(You can name %s from the item naming menu.)",
            an(xname(&otemp)));
pline("%s", buf);

The current version of this code uses a bunch of fixed-size temporaries; we have buf declared to hold the output of sprintf, but an and xname also need temporaries for their return value (they use a shared rotating buffer, at current). Thus there's a lot of data storage required. (This code is a little inefficient even given its memory allocation style: it could be simplified by simply replacing the sprintf calls with pline calls, avoiding the need for buf, but still requiring storage from an and xname. That would avoid the need for the doubled-up formatted prints.)

Imagine what this code would look like in call-by-name:

struct cbn_string *buf;

/* ... */

else 
    buf = strcat3("(You can name ", an(xname(&otemp)),
                  "from the item naming menu.)");
pline(buf);

The code is pretty similar from the point of view of the caller, which is a huge advantage, but would work very differently from the point of view of actual control flow. The buf = line isn't doing any calculations immediately; instead, it's building up a data structure. In this case, strcat3, an, and xname would be macros that create a structure that describes the string to evaluate, probably expanding to something like the following C99 code:

buf = &(struct cbn_strcat3){
    CBN_STRCAT, &(struct cbn_literal){CBN_LITERAL, "(You can name "},
    &(struct cbn_an){CBN_AN, &(struct cbn_xname){CBN_XNAME, &otemp}},
    &(struct cbn_literal){CBN_LITERAL, "%s from the item naming menu.)"}};

We now have a data structure that describes which object naming functions need to be called, with hardly any change in the original code (our only change was to replace sprintf, which takes a variable number of arguments and thus is very difficult to handle via macros, with a hypothetical strcat3 that concatenates exactly three strings). Then, "all" pline has to do is to evaluate this structure into an actual string so as to be able to show it to the user. (We use a data structure as the callback, rather than a C function, because C is bad at creating closures.) Thus, this technique does not get rid of the problems with passing data around entirely; the implementations of an and xname will still need to handle the strings somehow. It does, however, localize the problem of memory handling to within the object naming functions, removing the need to manage memory manually from all the places that call them.

There are some drawbacks to this approach, though. One is efficiency; call-by-name is noticeably slower than call-by-value on every benchmark I've tried (especially because copying a name means you end up evaluating it twice, as opposed to copying a value that has already been evaluated). Another is that the sugar given above requires some pretty modern compilers to work properly; although the result of expanding the macros works in C99, and thus should be supported by any compiler less than ten years old or so (although surprisingly many compilers don't care about C99 support), actually writing the macros themselves needs C11 support so that they can distinguish a pointer from a string literal. C11 is less than three years old right now, and support for _Generic, the required feature, just isn't there in practice; the version of gcc in my distribution's repositories does not recognise the keyword at all, and clang compiles it correctly, but produces incorrect warning messages, making development awkward.

The other thing to note is that the sugar given above constructs the names themselves on the stack; they have a fixed size known at compile time, so that isn't an issue, but attempting to return them from a function will fail. As a result, any function that would return a name has to be converted to call-by-name itself, and so the style rather tends to infect its callers, meaning that it can be a major change to a program.

Finally, just like CPS, this method tends not to work for storing data in a save file. Unlike with CPS, it's entirely possible to work out a method to save a name; the problem is that if the data needs saving at all, the name is not enough information to reproduce it. Something like getlin, which prompts for a string from the user, is very unsuitable for call-by-name, because there would be a new prompt for a string every time it were needed, rather than remembering the string that the user gave last time.

Neither CPS nor call-by-name are used in NetHack 4 (although the rendering library does something vaguely similar to call-by-name in order to remove code that's unnecessary on the terminal in use, such as converting characters to UTF-8 on a non-Unicode terminal). Although I'm considering using them in places that particularly benefit, this work is still highly experimental right now, and unlikely to appear in 4.3.