The Boehm GC has a number of faults related to the fact that it is a
conservative collector and cannot know if it is considering a pointer
inside the heap or a value whose bits may be interpreted as such. The
.NET framework gained performance gains which they considered
significant when they made the switch to the ‘precise’ mode of the Boehm
GC. I believe that this is evidence that greater access to type
information could allow the Boehm GC to achieve higher performance for
the many, many projects and runtimes which use it.
In a master’s thesis(http://rus.har.mn/ files/project-report.pdf),
student Russell Harmon showed that access to the standard DWARF debug
symbols for an executable can be leveraged in order to get a level of
introspection into C programs that would otherwise be near impossible to
accomplish. These symbols can be either compiled into the executable or
provided as a separate file. They contain information about the types
of the variables used by the program, and can be used to find the type
of an object on the stack.
I
believe that there exists a way to use these debugging symbols to
increase the effectiveness of the Boehm-Demers-Weiser collector by
allowing for introspection of the types of otherwise ambiguous roots.
There already exists two separate ways to communicate the type of
objects being allocated to the collector, but they require that the
program specify the type at each malloc site. Allowing the dwarf symbols
be provided as another argument to the collector at initialization
would allow for the collector to infer these types without modification
of the program in use. This is more in line with the spirit of the GC,
which spent significant effort to allow a programmer to include it
rather than stdlib, and it would replace the malloc/free previously used
by the programmer.
I would use the type interface in in their types.h, and would use libbfd or libdwarf to parse the dwarf files.
A caveat would be that casting a pointer to an integer may lead to freeing memory in use. As this is a purposeful defeat of the type system, I’m not sure if there is a way to track this and compensate for it.
In more depth:
All of the variables on the stack, or within scope
when the collector is called, are noted by the Boehm collector. Each
variable's type can be found through looking at the dwarf debugging
symbols.
We don't know the type of objects on the heap,
but all allocation in Boehm currently goes through an alternative
implementation of malloc. This implementation looks into a structure to
find a large enough contiguous list of cells of the allocator's heap
area that it has gotten by using the os's standard allocator. It
allocates these cells with a header which notes the start and end of the
data allocated to the program inside of it. Therefore we know the size
of every object on the heap, but we do not know the type.
No comments:
Post a Comment