Gentoo: Safe CFLAGS
If you are building packages (ebuilds) on same machine as using them then it is much better (and easier) to just use -march=native
. This will ensure that GCC pick most optimal configuration for your CPU.
I like to pre-build packages on different machine and then apply updates as binary ones (buildpkg
/ getbinpkg
). My machine have different processor than server therefore -march=native
is not option. I can build packages on different machine as 1) both are same arch 2) selected flags (instructions) for target are subset of flags (instructions) on build machine. Examples are for my configuration (Intel(R) Celeron(R) CPU G1610T @ 2.30GHz
used in HP Microserver). If you have different CPU then just try to tweak options to get expected result.
This article is inspired by Safe_CFLAGS on Gentoo Wiki.
Contents
Step 1 - CPU_FLAGS_X86
This is configuration for portage
and should be saved in make.conf
emerge -1 app-portage/cpuid2cpuflags
# cpuid2cpuflags CPU_FLAGS_X86="mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3"
Step 2 - identify correct arch
Following command show which arch at best match installed CPU:
# gcc -c -Q -march=native --help=target | grep march -march= ivybridge
In my example it is ivybridge
.
Step 3 - identify correct options
Compare output of gcc
with march option set to -march=native
and -march=ivybridge
. This will show all differences between auto-detection and configured parameters.
# diff <(gcc -c -Q -march=native --help=target) <(gcc -c -Q -O2 --help=target -O2 -march=ivybridge)
After bit of tweaking I get following output:
# diff <(gcc -c -Q -march=native --help=target) <(gcc -c -Q -O2 --help=target -O2 -march=ivybridge -mtune=ivybridge -mcx16 -mfsgsbase -mfxsr -mmmx -mpclmul -mpopcnt -msahf -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 -mfpmath=sse -fomit-frame-pointer ) 51c51 < -mfpmath= 387 --- > -mfpmath= sse
Note: -mfpmath=sse -fomit-frame-pointer is my preference on top of detected configuration.
Step 4 - identify missing instructions
Options -march=ivybridge -mtune=ivybridge
enable all possible instruction for CPU group. Some of them may not be present and cause binaries to fail with "invalid opcode" or similar messages. Especially Celeron processors are know to miss some of instructions. Following commands will identify those and help to disable them in configuration.
Generate output from compiler for detected and selected configuration:
# touch native.cc march.cc # LANG="en" # gcc -fverbose-asm -march=native native.cc -S # gcc -fverbose-asm -march=ivybridge march.cc -S
Note: it is important to select English language. Localized output may result in empty files. Examine files before continuing.
Format files to diff "readable" format:
# sed -i 1,/options\ enabled/d march.s # sed -i 1,/options\ enabled/d native.s
Show differences:
# diff march.s native.s 18c18 < # -m128bit-long-double -m64 -m80387 -maes -malign-stringops -mavx --- > # -m128bit-long-double -m64 -m80387 -malign-stringops 20,23c20,23 < # -mf16c -mfancy-math-387 -mfp-ret-in-387 -mfsgsbase -mfxsr -mglibc < # -mieee-fp -mlong-double-80 -mmmx -mpclmul -mpopcnt -mpush-args -mrdrnd < # -mred-zone -msahf -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 < # -mtls-direct-seg-refs -mxsave -mxsaveopt --- > # -mfancy-math-387 -mfp-ret-in-387 -mfsgsbase -mfxsr -mglibc -mieee-fp > # -mlong-double-80 -mmmx -mpclmul -mpopcnt -mpush-args -mred-zone -msahf > # -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 > # -mtls-direct-seg-refs
Close examine of output show that target CPU is not supporting (or GCC don't like to enable) following instructions: aes avx f16c rdrnd xsave xsaveopt
. I will add following to target configuration -mno-aes -mno-avx -mno-f16c -mno-rdrnd -mno-xsave -mno-xsaveopt
to disable missing / problematic instructions.
Step 5 - finalize configuration
By combinig output of all steps I get following configuration for my machine. This is stored in make.conf
configuration of portage
.
CFLAGS="-O2 -march=ivybridge -mtune=ivybridge -mno-aes -mno-avx -mno-f16c -mno-rdrnd -mno-xsave -mno-xsaveopt -mcx16 -mfsgsbase -mfxsr -mmmx -mpclmul -mpopcnt -msahf -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 -mfpmath=sse -fomit-frame-pointer -pipe" CPU_FLAGS_X86="mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3" CHOST="x86_64-pc-linux-gnu"