|
From: |
Brian Harry <bharry@MICROSOFT.COM> |
|
Sent: |
Fri, 6 Oct 2000 08:22:59 -0700 |
|
To: |
dotnet@discuss.develop.com |
I know people have been waiting a long time for someone at Microsoft to say something about the issue of resource management and deterministic finalization. Because this is such a sensitive topic I am going to try to be as precise and complete in my explanation as I can. I apologize for the length of the mail. The first 90% of this mail is trying to convince you that the problem really is hard. In that last part, I'll talk about things we are trying to do but you need the first part to understand why we are looking at these options. Don't get too depressed reading the first part it's all background.
History
First, let me start with some history. A few years ago when this project first started we had a massive debate on this very issue. Some of the early contributors to this project came from the COM and VB teams, along with many other teams across the company. One of the big problems we set out to solve was to eliminate the issues with ref counting, including cycles and errors due to misuse. There were all kinds of anecdotes running around at the time about how team X used the last N months of their product cycle chasing down ref counting bugs. I suspect some of it was overblown, but nonetheless we believe that ref counting errors are reasonably common and providing a uniform solution is a valuable addition.
We initially started with the assumption that the solution would take the form of automatic ref counting (so the programmer couldn't forget) plus some other stuff to detect and handle cycles automatically. We looked at adding conservative collection or tracing to the mix, GC algorithms that could collect a single object without doing an entire graph trace, etc. For a variety of reasons (more on this below), we ultimately concluded that this was not going to work in the general case. In those days the primary focus of our effort was in maintaining VB compatibility. The solution had to be complete and transparent with no semantic changes for VB. We eventually ended up with a context-based model where there was a "deterministic context" and everything in that context used ref counting on top of GC and anything outside just used GC. This avoided some of the type bifurcation issues described below but didn't yield a particularly good story for fine-grained mixing of code between languages. Ultimately, as you all know, we decided to make a series of changes to the VB language to modernize it and make it more capable. As part of this decision we decided to drop the VB lifetime compatibility requirements. This decision also generally ended investigation into deterministic lifetime issues. In retrospect, I think we tied that issue a little too closely to VB semantics and should have instead turned the discussion to the broader issue of resource management. We are looking at it that way now, if belatedly.
In the beginning, I was one of the most vocal advocates of ref counting. We came up with a million arguments for why deterministic finalization was critical. However, during this time we have watched (and helped) hundreds of thousands of lines of code be written without deterministic finalization. Now I am convinced that substantial programs can be reasonably written and debugged without the system providing any automatic support. That said, I fully agree that it would be better if the system/language provided additional support. Without it, you must build the behavior into the contract of the objects (like calling the Dispose method).
On to
a technical analysis of the issues...
Deterministic Finalization
The first thing I want to do is define what it is we are talking about and give some examples. Without going into details of the implementation, what we refer to as "deterministic finalization" means: immediately after an object is determined to be no longer used by the program, its termination code executes and releases the references it holds on other objects cascading the termination in a predictable and repeatable order through the graph of objects. Ideally, this would work for both shared and single use objects. The term "immediately" here is open to some interpretation. It's actually not a promise about time at all (for example a context switch can happen and an arbitrary amount of time can pass). It really means that the thread that discovers a reference is no longer used executes the termination code before it does anything else. There are a variety of cases where we care about either timeliness or order of the execution of the termination code of related objects. The most common case is where the object represents a physical resource (like memory or files or ...). But it generally applies in any case where there is contention. Some examples:
Memory - freeing the memory for an object graph quickly returns the memory to the pool for use.
Window handles - the footprint of the window object in the GC does not reflect the actual cost - there is some footprint inside the OS to represent the window and there may even be a limit (other than available memory) on the total number of window handles that can be allocated. Database connections - concurrent database connections are frequently licensed and therefore there may be a small (like you could actually count that high in a few seconds) number of them available. It is important that these be returned to a pool promptly so they can be reused. Files - since there is a single instance of a given file and exclusive access is required for many operations (deleting, writing, etc) it is important that file handles get closed very aggressively. The canonical example goes something like this (in pseudo code):
File f
= new File("c:\foo.txt");
byte[]
b = f.Read();
File.Delete("c:\foo.txt");
Window subclassing - On tear down, it's reasonable to unsub-class window functions in Windows. It's important that this happen after all of the messages have been sent to the window and therefore after any other termination code that sends messages to the window.
I could go on and on, but you get the point.
Ref counting collection
Ref counting does a reasonable job providing deterministic finalization in many cases. It is worth noting that there are quite a few where it does not. The most common cited example is cycles. In fact, straightforward reference counting never collects objects that participate in cycles at all. There are techniques to manage this and we have all learned them painfully but it is a very undesirable characteristic and a huge source of bugs. In addition, if you start remoting objects across apartments, you can get apartment reentrancy and thus introduce a great deal of "nondeterminism" in your program. Some would argue that as soon as you hand a reference to an object outside the immediate control of a tightly coupled program you have lost your deterministic finalization because you have no idea when or if that "foreign" code will release the reference. There are others who believe building a complex system that is very dependent on the order of termination of a complex graph of objects is an inherently brittle design and is likely to create a significant maintenance problem as the code evolves. I understand most of this is of little comfort because a reasonably constrained program written by a reasonably competent programmer does get the benefits of deterministic finalization.
Tracing collection
A
tracing collector definitely makes some weaker promises than ref counting
does. It is a somewhat more "lazy" system with respect to
executing termination code. Objects have "Finalizer" methods that
are executed when the object is no longer reachable by the program. Tracing
has the advantage that cycles are not an issue. It also has the huge
advantage that assigning a reference is a very simple move operation (more
on this in a minute). The price that you pay for this is that there is no
promise about termination code running "immediately" after a
reference is no longer used. However, I think there is a bunch of confusion
about what IS promised (caused largely by our docs and due to some bugs in
the pre-release causing finalizers not to be called on shutdown :)). The
truth is that for "well behaved" programs, the finalizers will be
called for objects. An ill-behaved program is one that crashes or puts the
finalizer thread in an infinite loop, etc. Our docs are overly cautious
about promises in this respect. If you have an object with a finalizer, the
system will call it. This doesn't address the issue of deterministic
finalization, but is important to understand that resources will get collected
and finalizers are a very valuable way of preventing resource leaks in a
program.
Performance
Before I get into an explanation of the reasoning, I want to cover performance. I have seen quite a lot of doubt about whether performance is relevant. I strongly believe it is. I believe that we must have some kind of tracing collector to handle cycles, which necessitates a big part of the cost of a tracing collector. I also believe that code execution performance can be substantially affected by the cost of reference counting. The good news is that in the context of all objects allocated in a running program, the number of those objects that really need deterministic finalization is small. However, below I'll talk about why it is hard to isolate the cost to just those objects. Let's look at some pseudo code for a simple reference assignment when using a tracing collector vs. ref counting:
tracing:
a = b;
that's it. The compiler turns that into a single move instruction and might even optimize the whole thing away in some circumstances.
ref
counting:
if (a
!= null)
if
(InterlockedDecrement(ref a.m_ref) == 0)
a.FinalRelease();
if (b
!= null)
InterlockedIncrement(ref
b.m_ref);
a = b;
This code is huge. The bloat is very high - bigger working set and the execution performance is obscenely higher, especially given the two interlocked instructions. You can limit the code bloat by putting all of this stuff in a "helper" and further increasing the length of the code path. In addition code generation will ultimately suffer when you put in all of the necessary try blocks because the optimizer has its hands somewhat tied in the presence of exception handling code - this is true even in unmanaged C++. It's also worth noting that every object is 4 bytes bigger due to the extra ref count field, again increasing memory usage and working set.
Here are a few programs I have written that demonstrate the cost of this. This particular benchmark loops, allocating objects, doing two assignments and exiting the scope of one reference. As with any benchmark you can make a million arguments about whether it is valid or not. I'm sure someone will argue that in the context of this routine, most of the ref counting can be optimized away. That's probably true, however this is intended to simply demonstrate the effect. In a real program, those kinds of optimizations are actually very hard if not impossible to do. In fact in C++, what happens is programmers make those optimizations manually and that leads to lots of ref counting bugs. I would argue that in a real program the ratio of assignment to allocation is much higher than this.
<<ref_gc.cs>> - the version which relies on the tracing GC.
<<ref_rm.cs>> - a ref counted version that uses interlocked operations for thread safety. Note there is only one thread and therefore no bus contention making this the "ideal" case. I'm sure with some tuning we could make it perform a bit better, but not a ton.
It is worth noting that VB has not historically had to worry about using interlocked operations for its ref counting (although VC has). This is because VB components ran in a single threaded apartment and where "relatively" guaranteed that only a single thread would be executing in them at one time. One of the goals we had for this version was to open up VB for multi-threaded programming and to get rid of the complexity of COM's existing 7 or 8 threading models (I can enumerate them if you insist). So I believe the multi-threaded comparison is the correct one. However, just in case there are nay-sayers, I have included the version that doesn't use lock prefixes. It is not nearly as slow as the multi-threaded version, but it is still quite a bit slower than the GC version. <<ref_rs.cs>> - a ref counted version that assumes it is running in a single threaded environment.
Here are the numbers I got on my dual proc PIII-600:
ref_gc:
531ms
ref_rm:
3563ms
ref_rs: 844ms
The perfect solution
I think everyone agrees that the perfect solution is that every object in the system is cheap to allocate, use and reclaim and at the same time goes away in a deterministic, orderly fashion the instant the programmer believes he/she is no longer using it regardless of whether there are cycles or anything else. The only way I am aware of to accomplish this is to combine something like a tracing GC with something like ref counting. I believe the data shows that ref counting is too expensive to be used in a general purpose way for all of the objects in programming environment. The code paths are longer, and the code and data working set are larger. If you then combine this already high price with the additional cost of implementing a tracing collector to reclaim cycles, you are spending an exorbitant price for memory management.
We researched various techniques to improve the performance of ref counting. There have been reasonably high performance systems built on ref counting before. We surveyed the literature, but these systems accomplished the improved performance by giving up some of their determinism. I don't remember the details because it was years ago and I didn't do the research myself.
It is worth noting that C++ programs don't do this. Most high performance C++ programs that use COM actually use C++ classes internally where the programmer is required to explicitly manage the memory. C++ programmers generally only use COM on the boundaries where the API is exposed to clients. This is a key characteristic that allows the programs to perform well.
I'm sure my argument here is not going to sway everyone in the world. All I can say is we did a bunch of benchmarking and profiling and ultimately concluded that the performance was unacceptable and therefore the solution unattainable.
The next best thing
OK, so you can't have it all, but what about if you could just have deterministic finalization on those objects that you need it on? This is where we spent most of our time thinking. Again you have to remember that most of this was in the context of exactly duplicating VB6 semantics with our new system. Most of the analysis still applies but some ideas that we discarded a long time ago now look more palatable as a resource management technique rather than transparent VB6 lifetime semantics.
Our first hope was that there would be some way to simply mark a class as requiring "deterministic finalization" either with an attribute or by inheriting from a "special" class. This would cause the object to be reference counted. We investigated tons of different designs that included both subclasses of System.Object and changing the root of the class hierarchy to some other class that served to bind the ref counting world to the non-ref counting world. There are two basic issues that we couldn't work around:
(1) composition - Any time you take an object that requires deterministic finalization and store it in an object that does not, you have lost the determinism transitively. The problem is that this strikes at the heart of the class hierarchy. For example, what about arrays? If you want to have arrays of deterministic objects, then arrays had better be deterministic. What about collections, hash tables, .... The list goes on and on and before you know it, the entire class library is ref counted. The other alternative is to bifurcate the class library - have deterministic arrays and non-deterministic arrays, etc. We thought about this, but soon concluded you'd have two copies of the whole framework and that would be confusing, perform horribly (you'd have 2 copies of every class loaded) and in the end wouldn't be practical. We could think of specific solutions to specific classes (like I think, although I can't remember, that we came up with some way to make arrays work, but the solution just didn't scale to the whole framework). The fundamental problem is that if a non-deterministic object contains a reference to a deterministic one, the system is not deterministic. We also considered outlawing it; simply having that be an error. Again we felt that you couldn't write real programs under those constraints.
(2) casting - A somewhat related issue is what about casting? Can I cast a deterministic object to System.Object? If so is it ref counted then? If the answer is "yes" than everything is ref counted. If the answer is "no" then the object loses determinism. If the answer is "it is an error" then it violates the fundamental premise that System.Object is the root of the object hierarchy. And what about interfaces? If a deterministic object implements interfaces, is the reference typed as an interface ref counted? If the answer is "yes" then you ref count all objects that implement interfaces (note System.Int32 implements interfaces). If the answer is "no" then you lose determinism. If the answer is "it is an error" then deterministic objects can't implement interfaces. If the answer is "it depends on whether the interface is marked deterministic or not" then you have a bifurcation problem of a different sort. Interfaces aren't supposed to dictate object lifetime semantics. What if some guy implemented an API that takes an ICollection interface (pick your favorite) and your object that implements it needs determinism but the interface wasn't defined that way? You are screwed. This leads to further bifurcation. Everybody defines 2 interfaces, one deterministic and one not and implements every method twice. Believe it or not, we actually looked at what it would take to do this automatically and to automatically generate the versions of the methods (deterministic or not) as they were used. That whole line of thought deteriorated into immense complexity. All of this led us to conclude that a really automatic and simple type modifier couldn't be done. That said, there are some ideas below on how we could relax some constraints and get something that might be helpful.