notes
Table of Contents
Resources:
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/gc\_implementation
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/gc\_interface
http://www.slideshare.net/waterfox1/an-introduction-to-jvm-internals-and-garbage-collection-in-java
http://www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/gc\_interface
http://www.slideshare.net/waterfox1/an-introduction-to-jvm-internals-and-garbage-collection-in-java
http://www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf
Interface
How does Hotspot interact with the garbage collector?
CollectedHeap
The CollectedHeap interface defines the behavior of a heap which is garbage collected.
- Components
Name Description _gc_cause and _gc_lastcause, and _set_gc_cause Records the causes when collections are triggered. Needs to be set to a "no cause" state between collections _total_collections and _total_full_collections allocate_new_tlab(size), accumulate_statistics_all_tlabs, resize_all_tlabs, allocate_from_tlab Interact with thread-local allocation buffers. common_mem_allocate_noinit, common_mem_allocate_init Allocate a block with the given size. Initialization means zeroing out fill_with_array, fill_with_object_impl Initialize array or object that's been allocated. Name The heap strategy being used. It's an enum mostly named for collector types. Initialize, post_initialize Initialize the "universe", and do other setup tasks before allocation _is_maximal_no_gc If the heap has reached it's maximum size allowed without a GC max_capacity The amount of memory the heap that the vm can allocate for java objects made by the user's code, and not used for bookkeeping is_in_reserved, is_in_reserved_or_null Whether the pointer is in the reserved heap region, and optionally is null is_in_closed_subset, is_in_closed_subset_or_null Is in a closed subset(all objects in subset point only to other objects in subset) is_scavengable Whether object can move during a scavenge(partial collection) n_par_threads, set_par_threads Number of threads currently working on gc tasks Class_obj_allocate Allocate and initialize an instance of a class fill_with_objects, fill_with_object, etc Fill allocated raw memory with object(s) and arrays ensure_parseability Place heap in a parseable state if not(in middle of GC) unsafe_max_alloc Estimate what can be allocated without possibly causing a collection supports_tlab_alloc, tlab_capacity, tlab_max_alloc Related functions for tlab interaction can_elide_store_barriers Whether the compiler can avoid a store barrier. If so, it must later call a deferring function to defer the store action. It must also later trigger a flush of these deferred stores. collect, do_full_collection, collect_as_vm_thread Various ways to request a collection, requiring various levels of locks is_active? Whether the world is stopped to collect, false for concurrent part of concurrent collectors CollectorPolicy, AdaptiveSizePolicy The policies the heap has for collection and resizing, with associated costs for actions. millis_since_last_gc Time since last GC print_gc_threads, print_all_gc_threads, trace_all_threads Print / trace / debug threads use_parallel_gc_threads Whether parallel threads are being used
- SharedHeap
A SharedHeap is defined as a heap for the java language. It's an abstract class, but contains common infrastructure.- G1CollectedHeap
- A sharedheap which uses g1's parition strategy
- A sharedheap which uses g1's parition strategy
- GenCollectedHeap
- A generationally-partitioned sharedheap
- A generationally-partitioned sharedheap
- G1CollectedHeap
- ParallelScavengeHeap
ParallelScavengeHeap allocates pages from a survivor space in round-robin fashion. It differs from all other collection heaps in that it doesn't use the sharedheap structure. It is NUMA-aware.
Collectors
Generational
Interaction
- Why can't parNew and parallelOld work together?
"ParNew" (and "Serial") implements space_iterate() which will apply an operation to every object in the young generation. When collecting the tenured generation with either "CMS" or "Serial Old", the GC can use space_iterate() to do some work on the objects in the young generation. parallelOld does not implement this interfact. (Source: https://blogs.oracle.com/jonthecollector/entry/our\_collectors)
Young Generation
- ParallelScavenge
- "Parallel Scavenge" is a stop-the-world, copying collector which uses multiple GC threads.
- "Parallel Scavenge" is a stop-the-world, copying collector which uses multiple GC threads.
- ParNew
- The newer parallel collector. It will eventually replace the parallelScavenge collector
- Difference
It differs from "Parallel Scavenge" in that it has enhancements that make it usable with CMS. For example, "ParNew" does the synchronization needed so that it can run during the concurrent phases of CMS. (Source: https://blogs.oracle.com/jonthecollector/entry/our\_collectors)
- The newer parallel collector. It will eventually replace the parallelScavenge collector
- Serial
- "Serial" is a stop-the-world, copying collector which uses a single GC thread.
- "Serial" is a stop-the-world, copying collector which uses a single GC thread.
Tenured Generation
- ConcurrentMarkSweep
- "" is a mostly concurrent, low-pause collector.
- "" is a mostly concurrent, low-pause collector.
- SerialOld(Mark-sweep-compact)
- "Serial Old" is a stop-the-world, mark-sweep-compact collector that uses a single GC thread.
- "Serial Old" is a stop-the-world, mark-sweep-compact collector that uses a single GC thread.
- ParallelOld
- "Parallel Old" is a compacting collector that uses multiple GC threads.
- "Parallel Old" is a compacting collector that uses multiple GC threads.
G1(Garbage-first Garbage Collection)
Summary:
- Divides the heap into regions and collects as many as it can while abiding by the desired pause time.
- Specifies region to act as the new "younger" generation, which will be collected first.
- Collects regions with most garbage first("Garbage-first")
- Compacts to avoid fragmentation issues, but can still get internal fragmentation due to many under-filled heaps.
- Because there isn't a static divide between young/tenured generations, can't have a problematic imbalance in sizes of heaps.
Paper:
http://dl.acm.org/citation.cfm?doid=1029873.1029879
- 1 Intro
- Designed to be a soft real-time system, where a maximum percent x of the next y-duration timeslice is specified
- All inter-region pointers are tracked, not simply old-to-young
- A write barrier logs changes, and a concurrent thread updates remembered sets with this log
- Snapshot-at-the-beginning
- At beginning of marking period, mutator threads paused and roots collected. This snapshot pause is relatively short.
- Used to guarantee completeness
- Finds percentage of each heap that's garbage, to allow garbage-first collection to work
- At beginning of marking period, mutator threads paused and roots collected. This snapshot pause is relatively short.
- Hard real-time abandonment
- Rather than making collection interruptable at granularity of object moved, use granularity of each region
- Can estimate collection time for each region(function of the amount of live data), and try to fit within soft real-time bounds
- Allows higher throughput and less synchronization than requiring allowing arbitrary pauses
- Rather than making collection interruptable at granularity of object moved, use granularity of each region
- Designed to be a soft real-time system, where a maximum percent x of the next y-duration timeslice is specified
- 2 Data structures
- 2.1 Heap layout/regions/allocation
- All allocation done from same allocation region
- Allocation is "bump-a-pointer" of the "top" boundary marker in the current allocation region
- All non-large-objects are allocated into thread-local allocation buffers or tlabs
- Tlabs are allocated from a linked-list of free regions. Each region consists of many tlabs-sized subregions.
- Large and humongous objects are allocated seperately into heap regions outside of tlabs.
- All allocation done from same allocation region
- 2.2 Remembered Set Maintenance
- Each region has one hash-based card-table set per parallel GC thread, to avoid contention
- Write barrier which checks if change creates inter-region pointer. If not, or if new reference is NULL, does nothing.
- Else logs write into write log. If write log is empty, gets a new one.
- Concurrent Remembered Set Thread
- Has an "initiating threshold"
- When RS buffer size exceeds threshold, CRST processes them until it reduces back down to the RST.
- Use a seperate card table to determine which cards are "hot" by incrementing cards as processed. Hot cards only processed during evacuation pauses.
- Has an "initiating threshold"
- Only one remembered set thread runs. If the RS buffer is too full, mutator threads are forced to do RS work themselves in order to allocate.
- Each region has one hash-based card-table set per parallel GC thread, to avoid contention
- 2.3 Evacuation Pauses
- Stops mutator threads
- Choose a set of regions, and evacuate the live data in regions into other heap regions.
- Object movement must appear atomic, so world is stopped to allow incremental compaction.
- Parallelized
- Threads compete to claim sub-tasks in collection. No synchronization used after ensuring each object has only one thread working on it.
- Algorithm:
- Copy object into GC thread TLAB, and race to install brooks-style forwarding pointer
- Winner scans contents, and processes.
- Work-stealing load balances
- Copy object into GC thread TLAB, and race to install brooks-style forwarding pointer
- Stops mutator threads
- 2.4 Generational Garbage-First
- Can take advantage of the generational hypothesis with heuristic
- Designate region young when mutator allocation region
- Young regions always part of collection set
- Remembered set processing not required for young regions
- Can be toggled between "pure garbage-first" and "generational"
- Can take advantage of the generational hypothesis with heuristic
- 2.5 Concurrent Marking
- Uses snapshot-at-beginning. Considers objects allocated during concurrent marking as alive. Guaranteed to find all garbage that exists during snapshot.
- 2.5.1 Marking Data Structures
- A bitmap is used to mark. "Previous" bitmap was bitmap for last marking period. "Next" is under construction. Two bitmaps switch when marking completed.
- A bitmap is used to mark. "Previous" bitmap was bitmap for last marking period. "Next" is under construction. Two bitmaps switch when marking completed.
- 2.5.2 Initial Marking Pause
- Clears next marking bitmap.
- Stops all mutator marking threads, marks everything reachable from roots
- Each region records top at mark start(TAMS) for previous and next.
- Allows one to determine when objects are allocated during marking because they are above TAMS.
- Clears next marking bitmap.
- 2.5.3 Concurrent Marking Write Barrier
- If mutator removes pointers that exist at start, SATB guarantee can be violated.
- Needs a write barrier to store origional version of fields before overwriting
- It's only necessary to log writes to non-previously-null fields, and most writes are initialization writes so recording is cheaper than one would think
- If mutator removes pointers that exist at start, SATB guarantee can be violated.
- 2.5.4 Final Marking Pause
- Mutator threads paused and changes since initial pause are processed by marking threads. Marking is thus finished.
- Mutator threads paused and changes since initial pause are processed by marking threads. Marking is thus finished.
- 2.5.5 Live Data Counting and Cleanup
- When marking finished, GC thread counts amount of live data in each region.
- Find it by using marking bitmap, not traversing object graph
- Since concurrent modification can make counts imprecise, a final stop-the-world cleanup pause must finish counting process.
- Cleanup also swaps next and previous bitmap roles
- Next TAMS value copied into previous
- Cleanup phase sorts heap regions by expected GC efficiency.
- Is the ratio of amount of garbage to free over cost of collecting it.
- Garbage-only regions immediately reclaimed in this phase.
- For some programs, may recover significant space in this phase.
- Is the ratio of amount of garbage to free over cost of collecting it.
- When marking finished, GC thread counts amount of live data in each region.
- Uses snapshot-at-beginning. Considers objects allocated during concurrent marking as alive. Guaranteed to find all garbage that exists during snapshot.
- 2.6 Evaucation Pauses and Marking
- Never evacuates objects proven dead in last complete marking pass.
- Needs to ensure that next/previous bitmasks are valid with respect to movements.
- May only evacuate when marking thread's marking stack is non-empty. Otherwise marking could delay evacuation.
- Treats marking stack as source of roots.
- Never evacuates objects proven dead in last complete marking pass.
- 2.7 Popular Object Handling
- Popular objects handled differently in order to ensure smaller remembered sets and more efficient barrier.
- Popular objects placed in regions in small prefix of heap
- Prefix never part of general collection sets
- When popular regions reach a threshold, create a "popularity pause" to empty region.
- Popularity pause creates reference count for all popular objects and moves objects with counts below a threshold to ordinary heap regions.
- Benchmarks show that this behavior removes majority of remembered set enteries
- Popular objects handled differently in order to ensure smaller remembered sets and more efficient barrier.
- 2.1 Heap layout/regions/allocation
- 3 Heuristics
- 3.1 User Inputs
- Two user inputs: Upper bound on space usage, and the maximum portion of a time slice given to stop-the-world GC
- Whether to use generational mode
- Soft realtime goal of time-slice only counts stop-the-world, since concurrent collection can be seen as a constant tax on mutator.
- Two user inputs: Upper bound on space usage, and the maximum portion of a time slice given to stop-the-world GC
- 3.2 Satisfying a Soft Real-time goal
- 3.2.1 Predicting evacuation pause times
- Need to accurately predict added cost of adding a region to determine how many regions to evacuate.
- Keep adding regions until we would exceed pause time
- Use statistics to estimate the fraction of the data allocated since last marking which is live.
- Need to accurately predict added cost of adding a region to determine how many regions to evacuate.
- 3.2.2 Scheduling Pauses to meet a realtime goal.
- If we would otherwise miss deadlines, we can simply delay phases.
- Can cause mutator to allocate into regions which wouldn't be collected, to avoid exceeding pause times in future collections.
- If we would otherwise miss deadlines, we can simply delay phases.
- 3.2.1 Predicting evacuation pause times
- 3.3 Collection Set Choice
- Decide on the "GC efficiency" as the amount of garbage which will be freed over the cost of collecting the region.
- Can have low amount of live data and still have low efficiency because of a large remembered set
- In time between ending marking and starting the evacuation pause, costs may change due to changes in remembered set size. Need to recompute for most attractive subset at start of evacuation pause.
- Decide on the "GC efficiency" as the amount of garbage which will be freed over the cost of collecting the region.
- 3.4 Evacuation Pause Initiation
- Determine the hard margin, or portion of the heap that must be empty for there to be a to-space for collection. When this gets threatened, we always trigger a collection.
- In fully-young generational configuration:
- We know the number of regions that would cause the desired pause time
- When the number of allocation regions which have filled up is equal to this number, we collect.
- We know the number of regions that would cause the desired pause time
- In partially-young mode, we do evacuation paues at maximum allowed frequency to minimize pause time.
- Generational starts in fully-young, and then switches to partially-young after marking pass complete in order to collect attractive regions.
- Always use partially-young if heap very full, until heap occupancy drops.
- Determine the hard margin, or portion of the heap that must be empty for there to be a to-space for collection. When this gets threatened, we always trigger a collection.
- 3.5 Concurrent Marking Initiation
- In generational configuration, we define a soft margin such that when the margin is exceeded, we start a new marking pass.
- Allows us to try to collect more frequently to avoid hitting the hard margin, where a soft real-time goal can be violated because of collection time required to free enough of the heap to continue.
- Allows us to try to collect more frequently to avoid hitting the hard margin, where a soft real-time goal can be violated because of collection time required to free enough of the heap to continue.
- In pure garbage-first, we mark and then keep track of the efficiency of our evacuation pauses.
- Once they begin to decline(freeing less-attractive regions after more-attractive) we begin a new marking pass to mark more garbage and increase attractiveness.
- Once they begin to decline(freeing less-attractive regions after more-attractive) we begin a new marking pass to mark more garbage and increase attractiveness.
- In generational configuration, we define a soft margin such that when the margin is exceeded, we start a new marking pass.
- 3.1 User Inputs
- 5 Related Work
- Idea of dividing collection of entire heap into incremental collection of subheaps steams from train algorithm.
- Differs from train in the use of more remembered sets to allow choice of collecting any region.
- Differs from train in the use of more remembered sets to allow choice of collecting any region.
- Differs from metronome in that it sacrifices hard-realtime performance for higher throughput.
- Idea of dividing collection of entire heap into incremental collection of subheaps steams from train algorithm.
Shenandoah
Presentations http://rkennke.files.wordpress.com/2014/02/shenandoahtake4.pdf Strangeloop: https://www.youtube.com/watch?v=QcwyKLlmXeY
- New alternative to G1
- Eventual goal: An entirely pauseless collector for large-memory systems
- Current goal: <10ms pause times for 100+GB heaps.
- Not currently pauseless
- Not yet using round-robin root updating because makes difficult to ensure that object graph is maintained, and difficult to prove both systems simultaneously before done.
- Not yet using round-robin root updating because makes difficult to ensure that object graph is maintained, and difficult to prove both systems simultaneously before done.
- All other collectors stop-the-world during compaction to avoid problems of concurrent mutation
- Two solutions: Trap on access of objects in motion, or brooks-style forwarding pointers
- They chose forwarding pointers
- Two solutions: Trap on access of objects in motion, or brooks-style forwarding pointers
- Forwarding pointers remove need for remembered sets
- Remembered sets have scalability problems
- Large systems which pre-allocate space for scalability reasons frequently thrash cache lines containing card table
- G1 had massive remembered sets. Fell back on a "generational" configuration eventually which only had to remember old-to-young references.
- Forced generational paradigm, not truly garbage first.
- Forced generational paradigm, not truly garbage first.
- Forwarding pointers allow arbitrary choice of region collection without remembered sets. Truly garbage first.
- Large systems which pre-allocate space for scalability reasons frequently thrash cache lines containing card table
- Forwarding pointer lives in indirection object
- Divorces collection from allocation, doesn't risk randomly interrupting mutator threads.
- Reading an object in a from-region doesn't trigger evacuation. Barrier is unconditional since always chases forwarding pointer.
- Writing an object does.
- Also ensures that when any reference to an object in fromspace is written to an object in tospace, it is updated to point to a newly copied version of that item in tospace.
- System has invariant that writes only occur in to-regions.
- In order to avoid races in copying/installing forwarding pointer, GC and mutator threads copy and then compare-and-set the forwarding pointer.
- Loser undoes allocation.
- Loser undoes allocation.
- If mutator out-races the collector thread, collector must use thread's new location as object address and perform forwarding process again.
- Needs to ensure that anything alive at beginning of marking is still considered live, SATB is used.
- To avoid mutation while scanning the graph causing breakage, write barrier is used to mark references in overwritten field values
- To avoid mutation while scanning the graph causing breakage, write barrier is used to mark references in overwritten field values
- Needs to ensure that atomic compare and swap of reference by mutator threads only swap refereneces to tospace objects because otherwise CAS would fail because GC thread modified the previous reference, and because no tospace object should point to a fromspace object.
- Solution is to copy all referred-to objects into tospace.
- Solution is to copy all referred-to objects into tospace.
- Costs are bounded because of finite region size. Can't prevent progress for unbounded amount of time.
- Also ensures that when any reference to an object in fromspace is written to an object in tospace, it is updated to point to a newly copied version of that item in tospace.
- Reading an object in a from-region doesn't trigger evacuation. Barrier is unconditional since always chases forwarding pointer.
- Use work-stealing
- Compete to claim roots
- Process one child, push rest of children onto work-list.
- Compete to claim roots
- Overview
- Heap divided into regions. Concurrent marking tracks each region's live data.
- Regions with most garbage will join collection set.
- GC threads concurrently evacuate live data.
- At start of next round of concurrent marking, references in fromspace are updated.
- Fromspace is freed.
- Heap divided into regions. Concurrent marking tracks each region's live data.
- Downsides
- Forwarding pointers add a time and space overhead.
- Forwarding pointers add a time and space overhead.
- Generational hypothesis doubts
- Suspicion that generational hypothesis doesn't improve performance because most objects die young, but because remembered sets are smaller.
- Suspicion that generational hypothesis doesn't improve performance because most objects die young, but because remembered sets are smaller.
No comments:
Post a Comment