Shared publicly  - 
 
We're moving from one local computational server with 2*Xeon X5650 to another one with 2*Opteron 4280... Today I was trying to launch my wonderful C programs on the new machine (AMD one), and discovered a significant downfall of the performance >50%, keeping all possible parameters the same(even seed for a random numbers generator). I started digging into this problem: googling "amd opteron 4200 compiler options" gave me couple suggestions, i.e., "flags"(options) for available to me GCC 4.6.3 compiler. I played with these flags and summarized my findings on the plots down here...

I'm wondering if anyone (coding folks) could give me any comments on the subject, especially I'm interested in the fact that "... -march=bdver1 -fprefetch-loop-arrays" and "... -fprefetch-loop-arrays -march=bdver1" yield in a different runtime?
 I'm not sure also if, let's say "-funroll-all-loops" is already included in "-O3" or "-Ofast", - why then adding this flag one more time makes any difference at all?
Why any additional flags for intel processor makes the performance even worse (except only "-ffast-math" - which is kind of obvious, because it enables less precise and faster by definition floating point arithmetic, as I understand it, though...)?

A bit more details about machines and my program:
2*Xeon X5650 machine is an Ubuntu Server with gcc 4.4.3, it is 2(CPUs on the motherboard)X6(real cores per each)*2(HyperThreading)=24 thread machine, and there was something running on it , during my "experiments" or benchmarks...

2*Opteron 4280 machine is an Ubuntu Server with gcc 4.6.3, it is 2(CPUs on the motherboard)X4(real cores per each=Bulldozer module)*2(AMD Bulldozer whatever threading=kind of a core)=18 thread machine, and I was using it solely for my wonderful "benchmarks"...

My benchmarking program is just a Monte Carlo simulation thing, it does some IO in the beginning, and then ~10^5 Mote Carlo loops to give me the result. So, I assume it is both integer and floating point calculations program, looping every now and then and checking if randomly generated "result" is "good" enough for me or not... The program is just a single-threaded , and I was launching it with the very same parameters for every benchmark(it is obvious, but I should mention it anyway) including random generator seed(so, the results were 100% identical)... The program IS NOT MEMORY INTENSIVE. Resulting runtime is just a "user" time by the standard "/usr/bin/time" command.
1
sergey venev's profile photoAlexey Osipov's profile photo
3 comments
 
Just at a quick glance: don't use -O3 optimization flag, try to use -O2 instead. And btw, 70 sec. is a quiet long time for benchmarking, don't you think so?
 
I ended up just using everything except "-ffast-math" - because this some sort of less precise math - sounds very scary to me... People from StackOverflow recommended me to use a profiler or another feature: first compile with -fprofile-generate and then -fprofile-use - in this case at the first step (-fprofile-generate) compiler figures out potential improvements , and generates some sort of a log file - with optimizations very specific to your code - optimizations may vary along the program , and then you recompile your code with (-fprofile-use) utilizing these optimizations. This should be, theoretically, the best way to optimize code...

Back to O2 and O3 - i think it is a matter of a problem your are dealing with - I don't do any weather forecasting, where any decimal after the dot is super important and affects the stability of your diff equation... I did some experimenting with O2 O3 long ago - result were the same ... so, I kind of decided to live with O3 on intel machine. gcc 4.6.3 offers Ofast - i have no tiny idea what is this all about, but i decided to try...

My benchmark, i think is awesome for me, because it is real world - for me. And also when calculations are that long, I think , all statistical noise is getting canceled, or at least significantly diminished and it is getting possible to distinguish even tiniest performance boost.
Add a comment...