Эта механика, кстати, указывает на еще одну разновитность тормозов, связанную с GC — он не любит, когда очень много и часто пишутся ссылки. Но это, надо понимать, очень небольшие тормоза, вылазящие в весьма специфичных ситуациях.
The JIT establishes write barriers that are triggered every time there is a write into a reference field in an object. If the value of the reference (which is basically a pointer, after all) is in the ephemeral segment (which is where gen0 and gen1 reside), the write barrier knows that it must update a global data structure, called a card table, with a bit that indicates that we might have a condition where a young object is referenced by an old object.
If every old object would have a bit in the card table, the card table would have been too big. Consider that the minimum object size is 12 bytes, and assuming that we can access about 1.6GB of memory for gen2, we can have up to 143 million objects, which would require almost 18MB of memory just for the card table to be properly synched. Therefore, the card table contains a bit for every 128-byte range, which requires a single DWORD for every page (4KB) on x86.
When a GC occurs, at a later point, it consults the card table to see whether there were any "dirty" writes to objects in an older generation, thus not allowing objects that are still referenced from an older generation to be incorrectly sweeped.
Every time the JIT sees that an update to a reference field is about to be emitted, it emits a call to the write barrier thunk. It can be examined using the SSCLI in the jithelp.asm file (for the x86 implementation). The code ensures that the address being written to (the destination) is in the GC heap, and then checks whether the address being written (the source) is within the ephemeral segment.
cmp rg, g_ephemeral_low
cmp rg, g_ephemeral_high
shr edx, 10
add edx, [g_card_table]
Note that whenever the ephemeral segment is moved, the JIT thunk must be updated to contain the correct segment low and high addresses.
The question I was talking about, then, is why doesn't the .NET GC use this built-in memory watch mechanism, supplied by the Win32 API? The following blog entry notes this possibility (in section 3.2.4), but does not elaborate regarding the reasons behind the particular choice made in .NET. I have several speculations (which are just that — speculations) and am still pursuing a more definitive answer:
* The aforementioned API is not supported on Windows 95 (which is, perhaps, not so surprising), but it is not supported on Windows 2000 as well. This would limit the .NET framework's compatibility with these platforms (although in those particular cases the JIT-thunk approach could be adopted).
* The aforementioned API is Windows-specific and does not provide any compatibility with other platforms. The JIT-generated write barrier is generic and theoretically can work on any platform.
* The performance penalty of using the MEM_WRITE_WATCH flag for writing to a region of memory is bigger than the thunk generated by the JIT. Note that a very primitive measurement I've performed indicates an 8% performance penalty when writing to memory protected by a write watch as opposed to writing to memory that is not protected by a write watch (don't quote me on this .