Monday, November 3, 2014

Previous space optimizations of the BDW GC

http://www.cs.rice.edu/~javaplt/311/Readings/pldi93.pdf

- Conservative collectors have come in many shapes and sizes. Some track all pointers in heap while others are more conservative.

- Severe space leaks in conservative collectors have been noted in the past.

- Negative performance results associated with collectors likely due to space leakage and associated problems such as increased memory pressure and likelihood of thrashing.

- Removal of low-hanging fruit such as static values which point into heap area can lead to great improvements.

- Only paper(at time of this paper) analyzing pitfalls of conservative GC: http://dl.acm.org/citation.cfm?id=79028





- Larger heaps face problem of pointer misidentification to a larger degree because they have a larger range that pointers could point into.

- Simple trick: position heap so only valid pointers in language(well-aligned) will refer to heap values. Commonly also avoids giving out memory cells with trailing zeroes to avoid considering one value as a prefix of another. Can lead to space waste.

- Languages which allow unaligned references have observed 'unreasonable' garbage retention with the collector.

- Systematic technique:
1) Ensure that collections happen at regular intervals
2) If you encounter a value which points into the heap area but not to any allocated cells, record or "blacklist" it so that it cannot be found in a future collection and mistaken for live data.
3) Do not allocate at blacklisted addresses.

-- This prevents static data from being considered a reference and holding onto garbage for the entire duration of execution.

-- This is implemented as a blacklisting of entire pages rather than addresses. Usually implemented as bit array indexed by page numbers.

-- Note: In static analysis of a number of executables it was found that if all interior pointers are considered valid it becomes difficult to allocate objects more than 100kb in size without pinning them in special memory locations or falling back to explicit allocation.

- Significant problem stems from the fact that variable liveness is not known to the collector. If a variable is not used after a given point, optimized stack frame usage may never zero out the old location of the reference and the pointer may persist for a while.

* DWARF information might help here since variable liveness is marked with a 'high/low program counter' attribute.

- Proposed solutions
-- Garbage collection / allocation work that does not zero out references can contribute to this problem.

-- When the collector notices that there is a lot of nonzero, left-over data below the current stack frame, it should zero this out.

- These can unnecessarily lengthen the lifetime of objects, which interferes with generational conservative collectors.

- Worst case for pointer misidentification absurdly worse than average case. If there is a very interconnected graph in the heap, a false reference can hold onto a significant amount of memory.

* Since the number of roots typically small, this suggests that very large heaps will tend to have a large number of in-heap pointers. In such a case, the worst case become much more likely.

-- Ex: Reference to an early element of a lazy list or queue which grows without bound in one direction but has a finite amount of accessible elements could lead to unbounded heap growth and eventual memory exhaustion.

-- Structures with many embedded pointers in data itself rather than in 'wrapper' objects tend to risk retaining much more garbage in the highly-connected failure state.

-- Practices which lead to less risk of worst case are antithesis of common C/C++ data access patterns.

- Overall, data retention is not a problem in most cases. Changes to programmer style can alleviate some problems. Memory usage tends to be higher, and there are some downsides to conservative collection, but garbage collection overall tends to have better performance than manual memory management.

No comments:

Post a Comment