UNIPROCESSOR GARBAGE COLLECTION TECHNIQUES -- SUMMARY Alexander Mosolov This paper surveys existing garbage collection techniques for uniprocessor machines. The following methods are discussed: REFERENCE COUNTING Simply involves counting the number of pointers to a particular object. When the reference count is 0, the object can be safely reclaimed as garbage. Advantages: Naturally incremental -- important for real-time applications. Disadvantages: Fails for cyclic data structures. Cost proportional to amount of work done by program. MARK-SWEEP COLLECTION Distinguish live objects from garbage -- done by traversing pointers from the root set, marking found objects, and then reclaiming everything that wasn't marked. (The root set is the set of all visible pointers, ie, global ones and those in the current stack) Disadvantages: Difficult to avoid fragmentation -- reclaimed space is interspersed with live objects. Cost proportional to size of heap, as all of heap needs to be traversed during reclamation. Locality of reference -- because of fragmentation, new and old objects end up interleaved. This is very bad for virtual memory systems. MARK-COMPACT COLLECTION Similar to mark-sweep collection, except to reclaim the garbage, all live objects are moved to one side of the heap, so they form a continuous chunk of allocated memory. Advantages: Fixes fragmentation and locality-of-reference issues found in mark-sweep collection. Allocation of new objects is easy. Disadvantages: Cost still proportional to heap size, with a potentially higher constant of proportionality then mark-sweep. COPYING GARBAGE COLLECTION Copying garbage collectors periodically move live objects to one area, and the area from which they were moved can be reclaimed as garbage. STOP-AND-COPY (SEMISPACES) The heap is divided into two semi-spaces, one called the fromspace, one the tospace. Allocation is done from the fromspace. Once there's no more room in the fromspace, the program is stopped, all live data is copied to the tospace, and the semispaces switch roles. Advantages: Can be made very efficient just by increasing the heap size -- the larger the heap, the longer the time between collection, the larger the likelyhood that an object will become garbage before it ever needs to be copied. NON-COPYING IMPLICIT COLLECTION Similar to copying garbage collection, except instead of having separate semispaces, the allocated space and the free space are kept together in a linked list, so no actual copying of data is necessary. TRICOLOR MARKING Is an abstraction useful for incremental garbage collections. All nodes are initially colored white (subject to collection), and as they are traversed, they are colored black (nodes to be kept). Nodes that are in the process of being expanded are colored grey. It is important that no black object hold a pointer to a white one, as this can break the algorithm. The collection stops when there are no more reachable white nodes, and all the remaining white nodes are garbage. 2 ways dealing with no black nodes point to white nodes issue: Read barrier: If the program accesses a white node pointer, color that node grey. That way, no black node can be made to point to a white node since the program will never have a pointer to a white node. Write barrier: If a pointer to a white object is written into a black object, color the black object grey. GENERATIONAL GARBAGE COLLECTION Takes advantage of the fact that most objects tend to either live for a very short of a very long time. This is basically a copying collector with multiple levels, if an object survives collection some number of times, it's moved to a higher level, where collection is less frequent. Any number of levels may be used. Advantages: More efficient than simple copy collection. Disadvantages: Need to deal with inter-generational pointers.