Gentoo: Safe CFLAGS

From Luky-Wiki
Jump to: navigation, search

If you are building packages (ebuilds) on same machine as using them then it is much better (and easier) to just use -march=native. This will ensure that GCC pick most optimal configuration for your CPU.

I like to pre-build packages on different machine and then apply updates as binary ones (buildpkg / getbinpkg). My machine have different processor than server therefore -march=native is not option. I can build packages on different machine as 1) both are same arch 2) selected flags (instructions) for target are subset of flags (instructions) on build machine. Examples are for my configuration (Intel(R) Celeron(R) CPU G1610T @ 2.30GHz used in HP Microserver). If you have different CPU then just try to tweak options to get expected result.

This article is inspired by Safe_CFLAGS on Gentoo Wiki.

Step 1 - CPU_FLAGS_X86

This is configuration for portage and should be saved in make.conf

emerge -1 app-portage/cpuid2cpuflags
# cpuid2cpuflags
CPU_FLAGS_X86="mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3"

Step 2 - identify correct arch

Following command show which arch at best match installed CPU:

# gcc -c -Q -march=native --help=target | grep march
  -march=                     		ivybridge

In my example it is ivybridge.

Step 3 - identify correct options

Compare output of gcc with march option set to -march=native and -march=ivybridge. This will show all differences between auto-detection and configured parameters.

# diff <(gcc -c -Q -march=native --help=target) <(gcc -c -Q -O2 --help=target -O2 -march=ivybridge)

After bit of tweaking I get following output:

# diff <(gcc -c -Q -march=native --help=target) <(gcc -c -Q -O2 --help=target -O2 -march=ivybridge -mtune=ivybridge -mcx16 -mfsgsbase -mfxsr -mmmx -mpclmul -mpopcnt -msahf -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 -mfpmath=sse -fomit-frame-pointer  )
<   -mfpmath=                        387
>   -mfpmath=                        sse

Note: -mfpmath=sse -fomit-frame-pointer is my preference on top of detected configuration.

Step 4 - identify missing instructions

Options -march=ivybridge -mtune=ivybridge enable all possible instruction for CPU group. Some of them may not be present and cause binaries to fail with "invalid opcode" or similar messages. Especially Celeron processors are know to miss some of instructions. Following commands will identify those and help to disable them in configuration.

Generate output from compiler for detected and selected configuration:

# touch
# LANG="en"
# gcc -fverbose-asm -march=native -S
# gcc -fverbose-asm -march=ivybridge -S

Note: it is important to select English language. Localized output may result in empty files. Examine files before continuing.

Format files to diff "readable" format:

# sed -i 1,/options\ enabled/d march.s
# sed -i 1,/options\ enabled/d native.s

Show differences:

# diff march.s native.s
< # -m128bit-long-double -m64 -m80387 -maes -malign-stringops -mavx
> # -m128bit-long-double -m64 -m80387 -malign-stringops
< # -mf16c -mfancy-math-387 -mfp-ret-in-387 -mfsgsbase -mfxsr -mglibc
< # -mieee-fp -mlong-double-80 -mmmx -mpclmul -mpopcnt -mpush-args -mrdrnd
< # -mred-zone -msahf -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3
< # -mtls-direct-seg-refs -mxsave -mxsaveopt
> # -mfancy-math-387 -mfp-ret-in-387 -mfsgsbase -mfxsr -mglibc -mieee-fp
> # -mlong-double-80 -mmmx -mpclmul -mpopcnt -mpush-args -mred-zone -msahf
> # -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3
> # -mtls-direct-seg-refs

Close examine of output show that target CPU is not supporting (or GCC don't like to enable) following instructions: aes avx f16c rdrnd xsave xsaveopt. I will add following to target configuration -mno-aes -mno-avx -mno-f16c -mno-rdrnd -mno-xsave -mno-xsaveopt to disable missing / problematic instructions.

Step 5 - finalize configuration

By combinig output of all steps I get following configuration for my machine. This is stored in make.conf configuration of portage.

CFLAGS="-O2 -march=ivybridge -mtune=ivybridge -mno-aes -mno-avx -mno-f16c -mno-rdrnd -mno-xsave -mno-xsaveopt -mcx16 -mfsgsbase -mfxsr -mmmx -mpclmul -mpopcnt -msahf -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 -mfpmath=sse -fomit-frame-pointer -pipe"

CPU_FLAGS_X86="mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3"