Adventures In Garbage Collection: A Control-Theory Approach to Heap Resizing

1 Heap Sizing

Programs frequently have seperate phases with very different memory behavior, no static heuristic acceptable
Control theory offers more formalism than trying to glue random, effective heuristics together.
Paper describes the idea of having a GC controller and increasing the heap size until the heap thrashing decreases and a throughput goal is hit.

If heap too large, paging will impact performance. If too small, will thrash heap.
Can see a parabolic curve of running time and heap size. Ahead-of-time choice of heap size by benchmarking tends to not work though.

Existing VMs tend to increase heap size too optimistically, and decrease it more slowly than could be desired.
Jikes RVM
1. After each GC, manager determines new "resize ratio" based on short term GC overhead (ratio of gc time and time since last GC) and ratio of heap that was live
  1. Is an index into a lookup table of hard-coded values found by trial-and-error
2. Not explicitly goal-oriented in terms of throughput or heap size
3. Doesn't take history into account, can flip-flop between states needlessly when live data size fluctuates back and forth quickly.
4. There is no evidence that different GC algorithms will match these hard-coded heuristic thresholds
Hotspot
1. Has a lot of tunable "ergonomics," and will resize heap to make best-effort to meet all priorities.
2. Resizing rate much less flexible, ratio bounded between 0.95 and 1.2. Early on in execution, allows to grow at ratio 2.
3. Venegrov has questioned whether the resize policies actually ensure progress towards goals.

Chose to treat entire GC system as black box, tune using Proportional Integral Derivative controller
Use a target GC overhead to resize heap
Only consider resizing after GC to avoid confounding variables in controller training.
PID Controller
1. Why PID is appropriate
  1. Doesn't require a model of the system.
  2. Promises eventual zero steady-state error
  3. Takes history into account.
PID Controller Theory
1. PID takes in responsiveness configuration, responds by outputting a change or "gain" in the controller
Implementation
1. Built on Jikes/MMTK
2. Uses bytes allocated as time proxy
3. Heap growth manager was modified to use the PID controller
4. Source: http://sourceforge.net/p/jikesrvm/research-archive/40/
Tuning:
1. Can use the "empirical tuning method" which is effective, but dangerous in physical engineering.
2. Ideal for software tuning

See paper for charts, results.
Generally positive
Might be worrying is that frequent, drastic changes in live data cause frequent, drastic changes in heap size.

Heuristic Approaches
1. Brecht used Boehm GC. Grows heap by variably, finely-grained amount. Never shrinks heap.
2. Most recent work has explicit goal to avoid paging
  1. Yang requires changes to OS
  2. Isla Vista system detects risk of paging by detecting allocation stalls
    1. Grows aggressively until allocation stall detected, and then shrinks aggressively
  3. Hertz has different VMs communicate relative stall/paging rates to optimize collection of multiple VMs.
Mathematical Models
1. Sun varies size of heaps of multiple applications to hit performance goals
2. Tay and Zong use number of page faults to direct resizing strategy to reduce number of faults.
  1. Threshold values determined by benchmarking, requires new derivation on a new system or new system behavior.
3. Venegrov determines time not spent in GC for hotspot. Uses a study of the hotspot system to find custom tuning algorithm.
Control Theory Papers
1. Storm deals with database configuration, uses control theory to handle ratio and size of sub-heaps for specific types of cached assets in database
2. Gandhi uses control theory to optimize apache's CPU and memory utilization
3. Growing research area

Created: 2014-10-22 Wed 16:56

Emacs 24.4.1 (Org mode 8.2.10)