CPython Compiler Hardening Intro

Getting to First PR

I have been investigating adding default compiler and linker options to CPython for improved safety at runtime as well as uncover potential vulnerabilities by digging through warnings these compiler options would generate. OpenSSF has been developing some guidance for hardening compiler options and some time ago an issue was written suggesting CPython should consider adopting some of the suggestions of the OpenSSF and there is where I enter the story.

At first my goals are to

  1. Get pyperformance baseline for existing CPython mainline

  2. See how options suggested in the OpenSSF guidance affect benchmarks

  3. Get an understanding of new warnings

  4. Give a recommendation for a set of compiler options

For my local benchmarking machine I decided to install Fedora on a PC I built a few years ago. I chose Fedora because the Dockerfile that comes with the CPython repo specified a Fedora:40 image. The PC I will be using for benchmarking has an AMD Ryzen 7 3700X with 32GB of memory. I will use this machine as a way to gauge options to send on to run in a lab at Microsoft set up to run CPython benchmarks. More information on that here and a special thanks to Michael Droettboom for running these benchmarks!

At first I would like to figure out compiler options we can get for “free”. Warning flags that don’t generate any warnings and other compiler hardening options that have minimal impact on pyperformance benchmarks. These flags will be benchmarked on the Microsoft test machines and then if they are close to baseline benchmarks they will be included in a PR.

Benchmarks

The OpenSSF guidance document provided a TLDR; set of compiler options for C/C++ code that the working group had deemed worthy enough that they should probably be enabled for a majority of projects. The first tests should be with these options:

-O2 -Wall -Wformat -Wformat=2 -Wconversion -Wimplicit-fallthrough \
-Werror=format-security \
-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 \
-D_GLIBCXX_ASSERTIONS \
-fstrict-flex-arrays=3 \
-fstack-clash-protection -fstack-protector-strong \
-Wl,-z,nodlopen -Wl,-z,noexecstack \
-Wl,-z,relro -Wl,-z,now

Compiler and linker flags can be passed to the CPython configuration script. We know that the linker options are going to impact performance, so we will take benchmarks for each of the options we think are going to implement, observe the compile time warnings, and assess the benchmarks to get an initial PR of non-warning and non-performance impacting

Note: Benchmark numbering may not make much sense without more context. The intent is that these runs can be referenced in my backlog of benchmarks sometime in the future

Run NameConfiguration CommandOption
Run 45./configure CFLAGS="-O2 -fdiagnostics-format=json" LDFLAGS=""
Run 46./configure CFLAGS="-Wall -fdiagnostics-format=json" LDFLAGS=""Just a warning, true zero performance impact
Run 47./configure CFLAGS="-Wformat=2 -fdiagnostics-format=json" LDFLAGS=""Just a warning, true zero performance impact
Run 48./configure CFLAGS="-Wconversion -fdiagnostics-format=json" LDFLAGS=""Just a warning, true zero performance impact
Run 49./configure CFLAGS="-Wimplicit-fallthrough -fdiagnostics-format=json" LDFLAGS=""Just a warning, true zero performance impact
Run 50./configure CFLAGS="-Werror=format-security -fdiagnostics-format=json" LDFLAGS=""Just a warning, true zero performance impact
Run 51./configure CFLAGS="-fstack-protector-strong -fdiagnostics-format=json" LDFLAGS=""
Run 52./configure CFLAGS="-fcf-protection=full -fdiagnostics-format=json" LDFLAGS=""
Run 53./configure CFLAGS="-Wtrampolines -fdiagnostics-format=json" LDFLAGS=""Just a warning, true zero performance impact
Run 54./configure CFLAGS="-fno-delete-null-pointer-checks -fdiagnostics-format=json" LDFLAGS=""
Run 55./configure CFLAGS="-fno-strict-overflow -fdiagnostics-format=json" LDFLAGS=""
Run 56./configure CFLAGS="-fstrict-flex-arrays=1 -fdiagnostics-format=json" LDFLAGS=""
Run 57./configure CFLAGS="-fno-strict-aliasing -fdiagnostics-format=json" LDFLAGS=""
Run 58./configure CFLAGS="-Wbidi-chars=any -fdiagnostics-format=json" LDFLAGS=""Just a warning, true zero performance impact
Run 59./configureNone (another baseline run for fun)
Run 60./configureNone (yet another baseline run)

Above run data is run_set_20240618_011115

Results

By default CPython is compiled with -O3 optimization, so -O2 is not included, although it was tested here.

We can see that warnings are generated when some options are enabled:

Run NameNew Warnings?
Run 45No
Run 46Yes
Run 47Yes (-Wformat-nonliterl)
Run 48Yes (-Wsign-conversion, -Wconversion, -Wfloat-conversion)
Run 49No
Run 50Yes
Run 51No
Run 52No
Run 53No
Run 54Yes (-Wformat-overflow replaces a -Wstringop-overflow)
Run 55No
Run 56No
Run 57No
Run 58No
Run 59N/A
Run 60N/A

And an analysis of the benchmarks of the options that didn’t generate warnings (even just warning flags):

Run 49 vs. Run 59

-Wimplicit-fallthrough
Benchmark TagGeometric Mean
apps1.00x slower
asyncio1.01x slower
math1.00x faster
regex1.00x slower
serialize1.00x slower
startup1.00x slower
template1.01x slower
overall1.00x slower

Just a warning flag. Including in next stage.

Run 51 vs. Run 59

-fstack-protector-strong
Benchmark TagGeometric Mean
apps1.01x slower
asyncio1.01x slower
math1.01x faster
regex1.01x faster
serialize1.02x slower
startup1.01x slower
template1.04x slower
overall1.01x slower

A little suspicious but will include it in the set to be tested in the Microsoft lab.

Run 53 vs. Run 59

-Wtrampolines
Benchmark TagGeometric Mean
apps1.00x slower
asyncio1.01x slower
math1.00x faster
regex1.00x slower
serialize1.00x slower
startup1.00x slower
template1.01x slower
overall1.00x slower

Looks good to move to next step.

Run 55 vs. Run 59

-fno-strict-overflow
Benchmark TagGeometric Mean
apps1.01x slower
asyncio1.01x slower
math1.00x faster
regex1.00x faster
serialize1.01x slower
startup1.00x slower
template1.01x slower
overall1.01x slower

Looks good to include in next step.

Benchmark in Microsoft Lab

I modified the CPython autoconf configure.ac to check and enable the options above:

# Enable flags that warn and protect for potential security vulnerabilities.
# These flags should be enabled by default for all builds.
AX_CHECK_COMPILE_FLAG([-Wimplicit-fallthrough], [BASECFLAGS="$BASECFLAGS -Wimplicit-fallthrough"], [AC_MSG_WARN([-Wimplicit-fallthrough not supported])])
AX_CHECK_COMPILE_FLAG([-fstack-protector-strong], [BASECFLAGS="$BASECFLAGS -fstack-protector-strong"], [AC_MSG_WARN([-fstack-protector-strong not supported])])
AX_CHECK_COMPILE_FLAG([-fno-strict-overflow], [BASECFLAGS="$BASECFLAGS -fno-strict-overflow"], [AC_MSG_WARN([-fno-strict-overflow not supported])])
case $CC in
  *gcc*)
    # Add GCC-specific compiler flags          
    AX_CHECK_COMPILE_FLAG([-Wtrampolines], [BASECFLAGS="$BASECFLAGS -Wtrampolines"], [AC_MSG_WARN([-Wtrampolines not supported])])
esac

The benchmarks showed that the difference between my build and the baseline were below the threshold that we could reasonably measure.

Base vs. New Options

  • fork: nohlson

  • ref: enable_no_impact_def

  • machine: linux-x86_64

  • commit hash: 98d9ea0

  • commit date: 2024-06-20

  • overall geometric mean: 1.00x slower

  • HPT reliability: 70.47%

  • HPT 99th percentile: 1.00x faster

  • Memory change: 1.00x

The full set of benchmarks can be found here

Conclusion

I will create an initial PR with the following options since they do not negatively impact performance and do not generate new warnings at compile time:

-Wimplicit-fallthrough -fstack-protector-strong -fno-strict-overflow -Wtrampolines

PR: gh-112301: Enable compiler flags with low performance impact and no warnings #120975

UPDATE (6/25/24)

It was noted in the comments of the PR that -fno-strict-overflow is already enabled if it is available for the compiler. I removed my check and enabling of that flag since it is redundant.

CC BY-SA 4.0 Nate Ohlson. Last modified: June 25, 2024.