In a previous article I reported what appeared to be a performance issue with CAS/LOCK instructions on the Sandy Bridge microarchitecture compared to the previous Nehalem microarchitecture. Since then I’ve worked with the good people of Intel to understand what was going on and I’m now pleased to be able to shine some light on the previous results.
I observed a small drop in throughput with the uncontended single-thread case, and an order-of-magnitude decrease in throughput once multiple threads contend when performing updates. This testing spawned out of observations testing Java Queue implementations and the Disruptor for the multi-producer case. I was initially puzzled by these findings because almost every other performance test I applied to Sandy Bridge indicated a major step forward for this microarchitecture. 阅读全文