Thursday, December 4, 2014

M3: Mixed Collection for Simplicity

1 M3: High Performance Memory Management from Off-the-shelf components

1.1 Abstract

1.1.1 Investigation as to whether complexity of existing collectors necessary
1.1.2 May simplify if programmer determines the collection strategy to use(tracing/refcounting)
1.1.3 Observe performance competitive with C when this strategy used

 1.2 Introduction

1.2.1 Current collectors improve productivity and safety but require immense engineering hours

1.2.2 One reason for time spent is to avoid pathologically worst case workloads for each collection type

  • As a result, languages with smaller community bases must resort to refcounting or simplem mark-sweep
  • Also tuning a modern collector for a specific workload is difficult because of complexity

1.2.3 Idea is to allow multiple strategies at the same time, and to allow the developer to specify which one to use.

1.2.4 Allows for drastic reduction of complexity

1.2.5 If no custom strategy used, will fall back to default allocator, standard state of affairs

1.2.6 M3 is an implementation of this for C

  • Was able to achieve competitive(in some cases better) throughput than explicit memory management
  • C chosen to allow comparison with manual memory management

1.2.7 Becomes a way to get higher performance with less effort by requiring more input from the developer by avoiding hitting a pathological case.

1.2.8 Not as onerous as it sounds when one considers the time that performance engineers spend tweaking the interfaces that collectors currently expose.

1.3 Background

  • While modern collectors work well, performance highly dependent on allocation habits
  • Examples cited of programs findind after being written that the collector of their language was a poor fit for their workload
    • Left with no solution
    • Ex: Memcached
      • Large heaps, only oldest objects evicted, roots ambigous
      • Horrible performance for most memory management patterns, requires custom/unsafe memory allocators currently

1.4 Design

  • M3 system must offer a number of tunable parameters
    • Strategy
    • Composability = rules/restrictions for combining strategies
    • Granularity = Data items to which strategy can apply
    • Staging = Selection time
  • Many collectors have heterogenous strategies(generational collectors)
    • Only offer vague, black-box tunables to developer

1.4.1 Paper's design

  • Mixes naive refcounting and tracing GC
  • Strategies freely composable
  • Granularity at type level. Each type has a strategy
  • Selected at allocation time.
  • Granularity: Using types
    • Makes a gc:: and an rc:: variant of each type considered
  • Strategies
    • Currently uses reference counting and tracing as alternatives
    • Makes sense, since strengths of one usually weakness of the other.
  • Implementation Considerations
    • Since pointer modifications may need read/write barriers, insert switch on where it points if we don't know
    • Expensive, but because of annotation we can expect to know where it resides
    • How to specialize?
      • Current solution is to have compiler search uses of type and if the type is always used as one type, consider it that type
      • Requires whole-program compilation
    • Cross-heap pointers
      • Tracing-to-refcounting = Make object finalization decrement refcounts
        • May make new garbage, cause potentially unbounded pause.
      • Refcounting-to-tracing
        • Can use references in refcounted object as roots / use remembered set
        • Problem: Since we can't discard remembered set, must add+remove
        • May be expensive
      • Paper argues that many cross-heap pointers is a sign of poor developer specification of collection strategy

1.5 Case Studies

1.5.1 Process

  • Memcached, MOSS, and toy web server example
  • Compare against explicit memory management

1.5.2 Implementation

  • Competitive performance with simple implementation
  • Uses BDW for tracing and C++11's sharedptr
  • sharedptr gives costs of a second level of indirection, but is acceptable

1.5.3 Memcached

  • key-value pairs managed with slab allocator to limit external fragmentation
  • key-value pairs are refcounted, everything else traced(connections, free-lists, etc)
  • Removes bulk of memory from tracing heap
  • Low number of cross-heap connections
  • M3 matched explicit allocation, both faster than all-tracing

1.5.4 MOSS

  • Uses a region-based memory manager
  • M3 less painful than regions for developer
  • Has db, index, files, matches, and temporary regions
    • db, index, and files will only grow, no deallocation
    • DB larger, triggers worst case of tracing of high survival rate
  • Put db into refcounted heap and rest under tracing
  • Kept some of it in tracing because it's only integer offsets, not pointers
  • M3 had strongest performance vs regions and tracing

1.5.5 Web Middleware

  • Has tunables
    • Allocaton per request, rpc delay, amount of long-lived data
  • Generational would be best in this case to straw dog
  • They use refcounting for longer-lived data
  • Argue that generational too difficult to tune perfectly for this
  • M3 better than traced, but refcounting beats out M3 because BDW doesn't help with the rare, long-lived data

1.6 Related Work

  • Attardi/Flagella/Idilio had "Customizable Memory Management" framework for C++
    • Driven by observation that no strategy best for all access patterns
    • Only did multiple types of tracing collectors
  • Cyclone
    • Uses region-based memory management
    • Uses type system to provide framework for different allocation types
  • Rust
    • Has different pointers, including reference counted and traced pointers (I believe they've changed design since then)
  • Rust + Cyclone have advanced type systems, while this paper seeks to add heterogenous management to an unsafe language without much complexity in implementation

No comments:

Post a Comment