原文链接:http://mechanical-sympathy.blogspot.com/2011/07/write-combining.html(有墙移动到墙内)
Modern CPUs employ lots of techniques to counteract the latency cost of going to main memory. These days CPUs can process hundreds of instructions in the time it takes to read or write data to the DRAM memory banks.
The major tool used to hide this latency is multiple layers of SRAM cache. In addition, SMP systems employ message passing protocols to achieve coherence between caches. Unfortunately CPUs are now so fast that even these caches cannot keep up at times. So to further hide this latency a number of less well known buffers are used.
This article explores “write combining store buffers” and how we can write code that uses them effectively.
CPU caches are effectively unchained hash maps where each bucket is typically 64-bytes. This is known as a “cache line”. The cache line is the effective unit of memory transfer. For example, an address A in main memory would hash to map to a given cache line C.
阅读全文