I've been profiling a bit with cachegrind and noticed something. When i compile with -O3 i had fewer data fetches but the same amount of cache misses resulting in a higher miss rate. This is great but it just seems like a funny weird thing to me and i'd like to know what's going on behind the scene. The only other relevant compiler option i have turned on is -march=native . For comparison,
Without O3
==16951== D refs: 923,170,681 (817,941,424 rd + 105,229,257 wr)
==16951== D1 misses: 9,477,102 ( 8,115,150 rd + 1,361,952 wr)
==16951== LLd misses: 647,219 ( 262,227 rd + 384,992 wr)
==16951== D1 miss rate: 1.0% ( 1.0% + 1.3% )
==16951== LLd miss rate: 0.1% ( 0.0% + 0.4% )
With O3
==16978== D refs: 218,804,125 (205,979,405 rd + 12,824,720 wr)
==16978== D1 misses: 9,372,533 ( 8,016,083 rd + 1,356,450 wr)
==16978== LLd misses: 647,195 ( 262,191 rd + 385,004 wr)
==16978== D1 miss rate: 4.3% ( 3.9% + 10.6% )
==16978== LLd miss rate: 0.3% ( 0.1% + 3.0% )