Написание кода в средах со сборкой мусора - Философия программирования

Всем привет. Может кто из зубров сталкивался, есть ли какие-либо общие рекомендации по написанию кода в средах со сборкой мусора (удаление множества мелких объектов предпочтительнее, чем одного крупного и т.п.)? Или же все сильно зависит от конкретной реализации сборщика? Стоит ли задаваться вообще подобными вопросами?

Здравствуйте, Shizuka-kun, Вы писали:

SK>Всем привет. Может кто из зубров сталкивался, есть ли какие-либо общие рекомендации по написанию кода в средах со сборкой мусора (удаление множества мелких объектов предпочтительнее, чем одного крупного и т.п.)? Или же все сильно зависит от конкретной реализации сборщика? Стоит ли задаваться вообще подобными вопросами?

Зависит от сборщика.
Задаваться этими вопросами и изучать как оно реализовано — стоит, пригодится, в том числе для общего понимания, а хуже всяко не будет.
Но не стоит черезмерно этим зацикливаться при написании кода, дабы не скатиться в "преждевременные оптимизации".

Здравствуйте, Shizuka-kun, Вы писали:

SK>Всем привет. Может кто из зубров сталкивался, есть ли какие-либо общие рекомендации по написанию кода в средах со сборкой мусора (удаление множества мелких объектов предпочтительнее, чем одного крупного и т.п.)? Или же все сильно зависит от конкретной реализации сборщика? Стоит ли задаваться вообще подобными вопросами?

если для net, у майкрософта в msdn на сайте, есть целая статья по поводу написания эффективных программ для net. ссылку не дам, лень искать. но там у них справа по деревцу полазайте.
для java не знаю, но возможно есть что-то аналогичное.

Извиняюсь за большой кусок, т.к. текст из mailing list'а
Письмо из рассылки касательно оптимизаций в контексте приложений под XNA и .net

Subject: Re: [Sweng-Gamedev] Banishing allocations in "managed" code (was: 3rd party Heap memory manager.)

First I should say that I don't claim to be an expert on CLR performance. Most of what I say here is based on trial and error tests, countless hours in the profiler and public information that anyone can find. As with everything perf related, there is no magic bullet that I am aware of just a number of best practices.

When we first considered building an engine for XNA (back in 2006) the biggest concern was whether we could get decent performance by the time we were done. Actually it was a pretty big gamble because we knew that at the time the JIT technology generated code that was typically a fair bit slower than C++ on average and we had to deal with the GC issue. We weren't that worried about the code performance since all that meant was that we needed to be smart about the stuff that we did on the CPU but making a game engine that's kind of a given anyway. The GC issue was more worrying.

Originally I thought about letting GC leak and then issuing a collect very infrequently. The theory being that you could go quite a long time between collections with sufficient RAM and users wouldn't complain too much about a GC hickup every few minutes. Unfortunately the GC is pretty much a black box from the applications perspective and although you can force collections there is no direct support to suspend it entirely. All you have is a setting to configure the GC policy at a macro level to suit either "workstation" or "server" applications and even that is something you can't change directly. Early on I did some experiments to try and trick it into not running but the schemes were either unusable in a real scenario or unreliable. In the end we just decided to do our best to live with GC and focus on being GC friendly. I also figured that if Microsoft were serious about XNA then over time they would tune GC for real-time applications like
games.

Newer versions of the CLR have a very useful setting "System.Runtime.GCSettings.LatencyMode" that allows you to change the collection policy. I definitely recommend switching to LowLatency mode to avoid costly Gen 2 collections. You can find more information about it here. I should point out that it's currently not available on the 360 for XNA 2.0 since the 360 CLR doesn't use a generational GC yet.

http://blogs.msdn.com/clyon/archive/2007/03/12/new-in-orcas-part-3-gc-latency-modes.aspx

Now, our engine is pretty complex. It has an object model that is seperate from the actual engine code and each object in the object model can be made up of multiple components. You can think of it as a very leafy tree structure. This data structure was originally prototyped using a typical structure where each node has a list of children. One of the first things we did to make it more GC friendly was to flatten the hierarchy, stop using lists and use simple integer handles to reference the nodes. There can be upwards of 5000 nodes in the object model and this simple change resulted in a very noticeable improvement in GC collection time especially on the 360.

You can extend this idea to everything. So my recommendation is that it's okay to have objects but try not to make long chains of references that the GC has to follow. Additionally, try to collect the objects together. If you can make your objects value types then that's great since you can keep them together with a simple array. In our case the majority of our objects will wind up being long lived and migrate to the Gen2 collection bucket. If we have low latency mode turned on then they will hardly ever need to be collected.

Now let's look at some of the sources of garbage you might want to avoid. Probably you already know this but it's worth restating.

1) "new" obviously

I regularly see code where people create a list to do some processing and then throw it away... like this:

List<int> vertexIndices = new List<int>();
// populate the list
...
// the list is never used again

This is just plain crazy. Although the allocation of the list goes in the GC0 bucket and will quickly get collected, why make the GC work harder than it has to? If you absolutely need to use a list then you could allocate it statically so that it migrates to Gen2 over time.

2) Boxing/unboxing value types

This is a very well documented cause of garbage and comes in to play when you cast a value type to an object like:

int myInt = 3; // cool, we have a nice efficient value type
object objMyInt = myInt; // uhoh, we just caused an allocation for the boxed value

This was a big concern before .Net 2.0 when there was no such thing as generics and the collection classes only supported objects. Now we have type safe collections you shouldn't need to do this at all. There are places where we need to do this internally but it's quite rare.

3) Any time you create iterators

Another very well documented cause of garbage.

List<int> vertexIndices = new List<int>();
foreach (int i in vertexIndices)
{
}

The foreach is syntactic sugar that request an iterator from the list. The iterator has to be allocated since it is an object that maintains state of the current location. This doesn't happen if you are using a simple array of items rather than going through a collection. So this is fine...

int[] vertexIndices = new int[10];
foreach (int i in vertexIndices)
{
}

But just don't use foreach — it's much easier to code defensively and use a regular for loop.

As far as code optimizations are concerned there's a bunch of things we do — here are a couple of ideas:

1) Don't index into List<> etc

Rather than use the overloaded [] operator, it's much faster to use .ForEach() or not even better not use a list in the first place. When you use the [] operator it performs range checking every time you call it and that is quite slow.

2) Avoid conditionals in inner loops like the plague.

We go to extreme measures to avoid conditionals in inner loops. Obviously you can't get rid of them all but if you know that the conditional is invariant over the lifetime of the loop you really need to pull it out. For example, when we bind to a shader we check the semantics to see if it supports lighting and other things. Based on what is supported we configure a delegate to point to a function that is designed to best work with the capabilities of the shader. This way, later, when we call Render() there are no checks required. It sounds obvious (and it is) but it's easy to forget to do this kind of thing.

3) Avoid the overloaded operators on the XNA math structures.

This is a big one actually. When you call things like operator+ on the Vector3 it's really slow. I did some tests comparing 24 ways to add arrays of vectors. The result was I found we can get close to an 800% performance increase by manually doing the operations. This is because we at least avoid a function call and whatever else weird stuff goes on inside those calls (like maybe calling native code to do the work in some cases).

Okay, well that's it from me for now. There's more stuff that I could go into but I'm out of time for now. While I know a lot of this is obvious, hopefully it is useful to someone.

Здравствуйте, SiGMan / iO UpG, Вы писали:

Каждый раз когда я читаю подобные рекомендации:

But just don't use foreach — it's much easier to code defensively and use a regular for loop.

или

This is just plain crazy. Although the allocation of the list goes in the GC0 bucket and will quickly get collected, why make the GC work harder than it has to? If you absolutely need to use a list then you could allocate it statically so that it migrates to Gen2 over time.

у меня возникает вопрос — зачем вообще в таких условиях использовать managed среду со сборкой мусора? Если использование for вместо foreach по соображениям быстродействия РЕАЛЬНО становится критично для вашего софта, значит вы выбрали не тот инструмент и .NET для вас не подходит. Все сугубо IMHO.

Здравствуйте, SiGMan / iO UpG, Вы писали:

SIU>1) "new" obviously
SIU>I regularly see code where people create a list to do some processing and then throw it away... like this:
SIU>List<int> vertexIndices = new List<int>();
SIU>// populate the list
SIU>...
SIU>// the list is never used again
SIU>This is just plain crazy. Although the allocation of the list goes in the GC0 bucket and will quickly get collected, why make the GC work harder than it has to? If you absolutely need to use a list then you could allocate it statically so that it migrates to Gen2 over time.

Потерявшийся ребенок
Должен помнить, что его
Отведут домой, как только
Назовет он адрес свой.
Надо действовать умнее,
Говорите: "Я живу Возле пальмы с обезьяной На далеких островах".
Потерявшийся ребенок,
Если он не дурачок,
Не упустит верный случай
В разных странах побывать.

Я конечно понимаю, что любой человек, пусть даже он и не является экспертом по производительности CLR, имеет право высказать свое мение, но лично я бы относился к таким рекомендациям с осторожностью. Насколько я понимаю, мертвый объект ничего не стоит. Дорого обходится живой. Так что предпочитать объекты в G2 короткоживущим в G0 как раз и означает "делать работу GC сложнее".

... << RSDN@Home 1.2.0 alpha 4 rev. 1090>>

Здравствуйте, Klapaucius, Вы писали:

K>Насколько я понимаю, мертвый объект ничего не стоит. Дорого обходится живой. Так что предпочитать объекты в G2 короткоживущим в G0 как раз и означает "делать работу GC сложнее".

Мертвый обьект стоит тот кусок памяти который он занимет. И как я полагаю, чем интенсивнее выделяется память тем чаще вызывается сборщик мусора

Здравствуйте, Kluev, Вы писали:

K>Мертвый обьект стоит тот кусок памяти который он занимет. И как я полагаю, чем интенсивнее выделяется память тем чаще вызывается сборщик мусора

Да, но в молодом поколении сборка может стоить существенно дешевле, нежели в старом поколении. Собственно, разбиение на молодое/старое поколение в .NET/JVM сделано как раз для оптимизации под большое число короткоживущих объектов и меньшее — долгоживущих. Удерживая объект дольше необходимого, ты можешь отправить его в долгоживущее поколение.

Здравствуйте, Kluev, Вы писали:

K>>Насколько я понимаю, мертвый объект ничего не стоит. Дорого обходится живой. Так что предпочитать объекты в G2 короткоживущим в G0 как раз и означает "делать работу GC сложнее".

K>Мертвый обьект стоит тот кусок памяти который он занимет. И как я полагаю, чем интенсивнее выделяется память тем чаще вызывается сборщик мусора

Но зато очень быстро выполняется. Для того и придумано нулевое поколение.

	От:	Shizuka-kun
	Дата:	04.07.08 09:17
	Оценка:

	От:	fmiracle
	Дата:	04.07.08 10:08
	Оценка:

	От:	merk
	Дата:	05.07.08 11:07
	Оценка:

От:	SiGMan / iO UpG	www.ioupg.com
Дата:	07.07.08 08:02
Оценка:	2 (1)

	От:	Melo
	Дата:	07.07.08 11:13
	Оценка:	+2