Re[4]: RAM - не RAM, или Cache-Conscious Data Structures - Философия программирования

Здравствуйте, Black Lion, Вы писали:

R>>А в каком контексте clflush применяется при JIT компиляции?
R>>Вроде ж как при модификации кода модификация будет распространяться по протоколу когерентности на общих основаниях

R>>Насколько я помню из документации, если модифицируется выровненный блок не более 16 байт, то вообще никаких специальных мер предпринимать не надо.
R>>В документации AMD я вижу упоминание clflush только в контексте устройств, которые не поддерживают протокол когерентности кэшей.
R>>А в документации Intel — в контексте виртуализации...

BL> Очистка кэша кода. У Intel он точно сам не флашится при записи в ту область, которая закэширована (причём ведь у интела в код-кэше ещё и микрооперации или как их там нынче велено называть хранятся). У AMD -- не помню, но вроде как тоже...
BL> Оно надо не при любой динамической генерации, но иногда приходится.

Вот всё, что я вижу в "Intel® 64 and IA-32 Architectures Software Developer’s Manual. Volume 3A: System Programming Guide, Part 1" по поводу Self- и Cross-Modifying кода. Тут нет никаких упоминаний ни clflush, ни необходимости ручного сброса кэша. Тут вроде написано как раз обратное, что записи в область кода сбрасываются в память сразу, L1I кэш даже не поддерживает состояния M и E (смотри раздел 10.4).

7.1.3 Handling Self- and Cross-Modifying Code
The act of a processor writing data into a currently executing code segment with
the intent of executing that data as code is called self-modifying code. IA-32
processors exhibit model-specific behavior when executing self-modified code,
depending upon how far ahead of the current execution pointer the code has been
modified.
As processor microarchitectures become more complex and start to speculatively
execute code ahead of the retirement point (as in P6 and more recent processor
families), the rules regarding which code should execute, pre- or post-modification,
become blurred. To write self-modifying code and ensure that it is compliant with
current and future versions of the IA-32 architectures, use one of the following
coding options:
(* OPTION 1 *)
Store modified code (as data) into code segment;
Jump to new code or an intermediate location;
Execute new code;
(* OPTION 2 *)
Store modified code (as data) into code segment;
Execute a serializing instruction; (* For example, CPUID instruction *)
Execute new code;
The use of one of these options is not required for programs intended to run on the
Pentium or Intel486 processors, but are recommended to insure compatibility with
the P6 and more recent processor families.
Self-modifying code will execute at a lower level of performance than non-self-modifying
or normal code. The degree of the performance deterioration will depend upon
the frequency of modification and specific characteristics of the code.
The act of one processor writing data into the currently executing code segment of a
second processor with the intent of having the second processor execute that data as
code is called cross-modifying code. As with self-modifying code, IA-32 processors
exhibit model-specific behavior when executing cross-modifying code, depending
upon how far ahead of the executing processors current execution pointer the code
has been modified.
To write cross-modifying code and insure that it is compliant with current and future
versions of the IA-32 architecture, the following processor synchronization algorithm
must be implemented:
(* Action of Modifying Processor *)
Memory_Flag ← 0; (* Set Memory_Flag to value other than 1 *)
Store modified code (as data) into code segment;
Memory_Flag ← 1;
(* Action of Executing Processor *)
WHILE (Memory_Flag ≠ 1)
Wait for code to update;
ELIHW;
Execute serializing instruction; (* For example, CPUID instruction *)
Begin executing modified code;
(The use of this option is not required for programs intended to run on the Intel486
processor, but is recommended to insure compatibility with the Pentium 4, Intel
Xeon, P6 family, and Pentium processors.)
Like self-modifying code, cross-modifying code will execute at a lower level of performance
than non-cross-modifying (normal) code, depending upon the frequency of
modification and specific characteristics of the code.
The restrictions on self-modifying code and cross-modifying code also apply to the
Intel 64 architecture.

10.6 SELF-MODIFYING CODE
A write to a memory location in a code segment that is currently cached in the
processor causes the associated cache line (or lines) to be invalidated. This check is
based on the physical address of the instruction. In addition, the P6 family and
Pentium processors check whether a write to a code segment may modify an instruction
that has been prefetched for execution. If the write affects a prefetched instruction,
the prefetch queue is invalidated. This latter check is based on the linear
address of the instruction. For the Pentium 4 and Intel Xeon processors, a write or a
snoop of an instruction in a code segment, where the target instruction is already
decoded and resident in the trace cache, invalidates the entire trace cache. The latter
behavior means that programs that self-modify code can cause severe degradation
of performance when run on the Pentium 4 and Intel Xeon processors.
In practice, the check on linear addresses should not create compatibility problems
among IA-32 processors. Applications that include self-modifying code use the same
linear address for modifying and fetching the instruction. Systems software, such as
a debugger, that might possibly modify an instruction using a different linear address
than that used to fetch the instruction, will execute a serializing operation, such as a
CPUID instruction, before the modified instruction is executed, which will automatically
resynchronize the instruction cache and prefetch queue. (See Section 7.1.3,
“Handling Self- and Cross-Modifying Code,” for more information about the use of
self-modifying code.)
For Intel486 processors, a write to an instruction in the cache will modify it in both
the cache and memory, but if the instruction was prefetched before the write, the old
version of the instruction could be the one executed. To prevent the old instruction
from being executed, flush the instruction prefetch unit by coding a jump instruction
immediately after any write that modifies an instruction.

10.4 CACHE CONTROL PROTOCOL
The L1 instruction cache in P6 family processors implements only the “SI” part of the
MESI protocol, because the instruction cache is not writable. The instruction cache
monitors changes in the data cache to maintain consistency between the caches
when instructions are modified. See Section 10.6, “Self-Modifying Code,” for more
information on the implications of caching instructions.

17.28.1 Self-Modifying Code with Cache Enabled
On the Intel486 processor, a write to an instruction in the cache will modify it in both
the cache and memory. If the instruction was prefetched before the write, however,
the old version of the instruction could be the one executed. To prevent this problem,
it is necessary to flush the instruction prefetch unit of the Intel486 processor by
coding a jump instruction immediately after any write that modifies an instruction.
The P6 family and Pentium processors, however, check whether a write may modify
an instruction that has been prefetched for execution. This check is based on the
linear address of the instruction. If the linear address of an instruction is found to be
present in the prefetch queue, the P6 family and Pentium processors flush the
prefetch queue, eliminating the need to code a jump instruction after any writes that
modify an instruction.

Я в замешательстве, и по прежнему не вижу применений для clflush.

От:	remark	http://www.1024cores.net/
Дата:	15.05.08 19:28
Оценка: