CPython Updates and First Tool

I asked Michael if he could run -D_FORTIFY_SOURCE=3 with the latest from upstream since it had been 2 weeks since the last test for that option.

The results are similar to the previous tests:

  • fork: nohlson

  • ref: enablefortifysourc

  • machine: linux-x86_64

  • commit hash: 26aa174

  • commit date: 2024-07-08

  • overall geometric mean: 1.00x faster

  • HPT reliability: 97.80%

  • HPT 99th percentile: 1.00x faster

  • Memory change: 1.00x

TagGeometric Mean
apps1.00x slower
asyncio1.01x faster
regex1.03x slower
serialize1.01x faster
startup1.00x faster
template1.00x slower

I created a PR that enables this option to generate discussion regarding the performance impacts, which are minor.

Tooling

The tooling that will accompany the new compiler options is for compiler warning tracking. When new compiler options are enabled new warnings will be generated depending on the option type. Some warnings we may want to address in code, others will be understood to not pose a security risk. The tooling will alert devs when changes are made that introduce new warnings, or when changes are made that remove warnings. The tooling will also allow developers to note that warnings in a particular file should be ignored. I suppose if I were writing user stories for such a tool they would be:

  • As a dev I would like the tool to inform me when new warnings are generated by a change

  • As a dev I would like the tool to inform me when a file that previously has warnings no longer does

  • As a dev I would like to tell the tool to ignore warnings from a particular file

This should be sufficient for a first version of the tool. Eventually keeping count of the number of warnings for each type per file would be ideal, since ignoring an entire file is too broad.

Since different platforms will emit different warnings I initially thought that implementing an additional warning check step in the buildbot pipelines would be the most robust way of checking all platforms. However this certainly seems to be overkill and would add additional processing overhead for the buildbots. The CPython docs team piggybacked off of existing GitHub actions builds to accomplish their version of this. After some consideration it seemed that this was the best course of action, since burden will be low for developing the tool and for contributors to run the checks themselves.

I decided that the warning check tooling can be attached to three GitHub actions builds:

  • Ubuntu/build and test

  • macOS/build and test

  • Windows/build and test (x86)

Ubuntu will be first.

Creating a dev Environment


To test GitHub actions locally I installed act which uses Docker containers to run jobs.

I created a simple script to launch the ubuntu job using act.

#!/bin/bash

# Stop all running containers
docker stop $(docker ps -aq)

# Remove all containers
docker rm $(docker ps -aq)

# Remove all docker volumes
docker volume prune -f

# Run act
act -j build_ubuntu --container-cap-add SYS_ADMIN

When running act initially I encountered some permissions issues, passing SYS_ADMIN privileges fixed that issue.

I created a hello world check-warnings.py script in Tools/build to test including the tool in the action job. reusable-ubuntu.yml is the configuration for the ubuntu build and test job. Toward the end of the job CPython is configured and built. I modified the CFLAGS for this configuration to include -fdiagnostics-format=json so that I can easily parse the compiler output allowing me to reuse parts of scripts I had already wrote. I then redirected the compiler output to a file. I added a new step in the job after building called “Check compiler warnings” and added the script. The check-warnings.py script will have to take the compiler output text as an argument and the specific warning ignore file. This is what the config looks like:

- name: Configure CPython out-of-tree
      working-directory: ${{ env.CPYTHON_BUILDDIR }}
      run: ${{ inputs.options }} CFLAGS="-fdiagnostics-format=json"
    - name: Build CPython out-of-tree
      working-directory: ${{ env.CPYTHON_BUILDDIR }}
      run: make -j4 &> compiler_output.txt
    - name: Display build info
      working-directory: ${{ env.CPYTHON_BUILDDIR }}
      run: make pythoninfo
    - name: Check compiler warnings
      run: python Tools/build/check_warnings.py --compiler-output-file-path=${{ env.CPYTHON_BUILDDIR }}/compiler_output.txt --warning-ignore-file-path ${GITHUB_WORKSPACE}/Tools/build/.warningignore_ubuntu
    - name: Remount sources writable for tests
      # some tests write to srcdir, lack of pyc files slows down testing
      run: sudo mount $CPYTHON_RO_SRCDIR -oremount,rw
    - name: Tests
      working-directory: ${{ env.CPYTHON_BUILDDIR }}
      run: xvfb-run make test

Check Warnings Script


The warning checker script can borrow from the existing docs warning checker. First grab the files and get the contents

def main(argv: list[str] | None = None) -> int:
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--compiler-output-file-path",
        type=str,
        required=True,
        help="Path to the compiler output file"
    )
    parser.add_argument(
        "--warning-ignore-file-path",
        type=str,
        required=True,
        help="Path to the warning ignore file"
    )
    parser.add_argument(
        "--fail-on-regression",
        action="store_true",
        default=False,
        help="Flag to fail if new warnings are found"
    )
    parser.add_argument(
        "--fail-on-improvement",
        action="store_true",
        default=False,
        help="Flag to fail if files that were expected to have warnings have no warnings"
    )

    args = parser.parse_args(argv)

    exit_code = 0

    # Check that the compiler output file is a valid path
    if not Path(args.compiler_output_file_path).is_file():
        print(f"Compiler output file does not exist: {args.compiler_output_file_path}")
        return 1

    # Check that the warning ignore file is a valid path
    if not Path(args.warning_ignore_file_path).is_file():
        print(f"Warning ignore file does not exist: {args.warning_ignore_file_path}")
        return 1

    with Path(args.compiler_output_file_path).open(encoding="UTF-8") as f:
        compiler_output_file_contents = f.read()

    with Path(args.warning_ignore_file_path).open(encoding="UTF-8") as clean_files:
        files_with_expected_warnings = {
            filename.strip()
            for filename in clean_files
            if filename.strip() and not filename.startswith("#")
        }

    if not len(compiler_output_file_contents) > 0:exit_code = 1

Before we check the warnings we need to parse the output. The compiler output includes many json arrays when using -fdiagnostic-format=json. To extract the json arrays I used a regex to grab anything between [] at the highest level. Then each individual object within the array will be checked to see if the kind is warning. Then each object will be added to the list of warnings if the kind is warning. This method still includes the duplicate warnings. In my tooling I use pandas to remove duplicates, so that could be a future improvement, but for now the duplicates don’t impact the functionality of the tool.

def extract_warnings_from_compiler_output(compiler_output: str) -> list[dict]:
    """
    Extracts warnings from the compiler output when using -fdiagnostics-format=json

    Compiler output as a whole is not a valid json document, but includes many json
    objects and may include other output that is not json.
    """

    # Regex to find json arrays in the compiler output
    json_arrays = re.findall(r'\[(?:[^\[\]]|\[(?:[^\[\]]|\[[^\[\]]*\])*\])*\]', compiler_output)
    compiler_warnings = []
    for array in json_arrays:
        try:
            json_data = json.loads(array)
            json_objects_in_array = [entry for entry in json_data]
            compiler_warnings.extend([entry for entry in json_objects_in_array if entry.get('kind') == 'warning'])
        except json.JSONDecodeError:
            continue  # Skip malformed JSON

    return compiler_warnings
def extract_warnings_from_compiler_output(compiler_output: str) -> list[dict]:
    """
    Extracts warnings from the compiler output when using -fdiagnostics-format=json

    Compiler output as a whole is not a valid json document, but includes many json
    objects and may include other output that is not json.
    """
    compiler_output_json_objects = []
    stack = []
    start_index = None
    for index, char in enumerate(compiler_output):
        if char == '[':
            if len(stack) == 0:
                start_index = index  # Start of a new JSON array
            stack.append(char)
        elif char == ']':
            if len(stack) > 0:
                stack.pop()
            if len(stack) == 0 and start_index is not None:
                try:
                    json_data = json.loads(compiler_output[start_index:index+1])
                    compiler_output_json_objects.extend(json_data)
                    start_index = None
                except json.JSONDecodeError:
                    continue  # Skip malformed JSON

    compiler_warnings = [entry for entry in compiler_output_json_objects if entry.get('kind') == 'warning']

    return compiler_warnings

After that matching warnings in a file to files in the warning ignore file is straightforward for both regressions and improvements.

def get_unexpected_warnings(
    warnings: list[dict],
    files_with_expected_warnings: set[str],
) -> int:
    """
    Returns failure status if warnings discovered in list of warnings are associated with a file
    that is not found in the list of files with expected warnings
    """
    unexpected_warnings = []
    for warning in warnings:
        locations = warning['locations']
        for location in locations:
            for key in ['caret', 'start', 'end']:
                if key in location:
                    filename = location[key]['file']
                    if filename not in files_with_expected_warnings:
                        unexpected_warnings.append(warning)

    if unexpected_warnings:
        print("Unexpected warnings:")
        for warning in unexpected_warnings:
            print(warning)
        return 1

    return 0

def get_unexpected_improvements(
    warnings: list[dict],
    files_with_expected_warnings: set[str],
) -> int:
    """
    Returns failure status if there are no warnings in the list of warnings for a file
    that is in the list of files with expected warnings
    """

    # Create set of files with warnings
    files_with_warnings = set()
    for warning in warnings:
        locations = warning['locations']
        for location in locations:
            for key in ['caret', 'start', 'end']:
                if key in location:
                    filename = location[key]['file']
                    files_with_warnings.add(filename)

    unexpected_improvements = []
    for filename in files_with_expected_warnings:
        if filename not in files_with_warnings:
            unexpected_improvements.append(filename)

    if unexpected_improvements:
        print("Unexpected improvements:")
        for filename in unexpected_improvements:
            print(filename)
        return 1

    return 0

When running with act we can see that the step is run and passes, since the arguments indicate to not fail on regression or improvement.

[Ubuntu/reusable-ubuntu.yml/build and test]   ✅  Success - Main Display build info
[Ubuntu/reusable-ubuntu.yml/build and test] ⭐ Run Main Check compiler warnings
[Ubuntu/reusable-ubuntu.yml/build and test]   🐳  docker exec cmd=[bash --noprofile --norc -e -o pipefail /var/run/act/workflow/16] user= workdir=
| Returning exit code:  0
[Ubuntu/reusable-ubuntu.yml/build and test]   ✅  Success - Main Check compiler warnings
[Ubuntu/reusable-ubuntu.yml/build and test] ⭐ Run Main Remount sources writable for tests
[Ubuntu/reusable-ubuntu.yml/build and test]   🐳  docker exec cmd=[bash --noprofile --norc -e -o pipefail /var/run/act/workflow/17] user= workdir=
[Ubuntu/reusable-ubuntu.yml/build and test]   ✅  Success - Main Remount sources writable for tests

When I build locally on my Fedora 40 system (not using act and the Docker container) I get some warnings related to -Wstringop-overflow due to some compiler option defaults:

{"kind": "warning",
  "message": "writing 1 byte into a region of size 0",
  "option": "-Wstringop-overflow=",
  "option_url": "https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow",
  "children": [],
  "column-origin": 1,
  "locations": [{"caret": {"file": "./Modules/_decimal/libmpdec/io.c",
                           "line": 378,
                           "display-column": 8,
                           "byte-column": 8,
                           "column": 8},
                 "start": {"file": "./Modules/_decimal/libmpdec/io.c",
                           "line": 378,
                           "display-column": 5,
                           "byte-column": 5,
                           "column": 5},
                 "finish": {"file": "./Modules/_decimal/libmpdec/io.c",
                            "line": 378,
                            "display-column": 13,
                            "byte-column": 13,
                            "column": 13}}],
  "escape-source": false},

When I run the script without flags for failing on regression or improvement and an empty .warningignore_ubuntu the warnings are dumped and the script returns successfully:

$ python Tools/build/check_warnings.py --compiler-output-file-path=/home/nate/cpython/compiler_output.txt --warning-ignore-file-path=/home/nate/cpython/Tools/build/.warningignore_ubuntu 
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 45, 'byte-column': 45, 'column': 45}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 40, 'byte-column': 40, 'column': 40}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 59, 'byte-column': 59, 'column': 59}}], 'escape-source': False}
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 45, 'byte-column': 45, 'column': 45}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 40, 'byte-column': 40, 'column': 40}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 59, 'byte-column': 59, 'column': 59}}], 'escape-source': False}
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 8, 'byte-column': 8, 'column': 8}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 5, 'byte-column': 5, 'column': 5}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 13, 'byte-column': 13, 'column': 13}}], 'escape-source': False}
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 8, 'byte-column': 8, 'column': 8}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 5, 'byte-column': 5, 'column': 5}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 13, 'byte-column': 13, 'column': 13}}], 'escape-source': False}
Returning exit code:  0

If the flag is added to fail on regression we see that the script returns a failure status code:

$ python Tools/build/check_warnings.py --compiler-output-file-path=/home/nate/cpython/compiler_output.txt --warning-ignore-file-path=/home/nate/cpython/Tools/build/.warningignore_ubuntu --fail-on-regression
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 45, 'byte-column': 45, 'column': 45}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 40, 'byte-column': 40, 'column': 40}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 59, 'byte-column': 59, 'column': 59}}], 'escape-source': False}
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 45, 'byte-column': 45, 'column': 45}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 40, 'byte-column': 40, 'column': 40}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 59, 'byte-column': 59, 'column': 59}}], 'escape-source': False}
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 8, 'byte-column': 8, 'column': 8}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 5, 'byte-column': 5, 'column': 5}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 13, 'byte-column': 13, 'column': 13}}], 'escape-source': False}
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 8, 'byte-column': 8, 'column': 8}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 5, 'byte-column': 5, 'column': 5}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 13, 'byte-column': 13, 'column': 13}}], 'escape-source': False}
Returning exit code:  1

If I modify the .warningignore_ubuntu file to include the file where these warnings exist..

# Files listed will be ignored by the compiler warning checker
# for the Ubuntu/build and test job.
# Keep lines sorted lexicographically to help avoid merge conflicts.

Modules/_decimal/libmpdec/io.c

Then the script returns successfully and does not print out the warnings

$ python Tools/build/check_warnings.py --compiler-output-file-path=/home/nate/cpython/compiler_output.txt --warning-ignore-file-path=/home/nate/cpython/Tools/build/.warningignore_ubuntu --fail-on-regression
Returning exit code:  0

To test improvements we can add a file that doesn’t contain warnings to the .warningignore_ubuntu file

# Files listed will be ignored by the compiler warning checker
# for the Ubuntu/build and test job.
# Keep lines sorted lexicographically to help avoid merge conflicts.

Modules/_decimal/libmpdec/io.c
Modules/_decimal/libmpdec/memory.c

And we get a printout of improved files:

$ python Tools/build/check_warnings.py --compiler-output-file-path=/home/nate/cpython/compiler_output.txt --warning-ignore-file-path=/home/nate/cpython/Tools/build/.warningignore_ubuntu --fail-on-regression
Unexpected improvements:
Modules/_decimal/libmpdec/memory.c
Returning exit code:  0

We can add --fail-on-improvement to return a failure status code for this scenario

$ python Tools/build/check_warnings.py --compiler-output-file-path=/home/nate/cpython/compiler_output.txt --warning-ignore-file-path=/home/nate/cpython/Tools/build/.warningignore_ubuntu --fail-on-regression --fail-on-improvement
Unexpected improvements:
Modules/_decimal/libmpdec/memory.c
Returning exit code:  1

Putting a regression and an improvement together by removing Modules/_decimal/libmpdec/memory.c from .warningignore_ubuntu

# Files listed will be ignored by the compiler warning checker
# for the Ubuntu/build and test job.
# Keep lines sorted lexicographically to help avoid merge conflicts.

Modules/_decimal/libmpdec/memory.c

we get both

$ python Tools/build/check_warnings.py --compiler-output-file-path=/home/nate/cpython/compiler_output.txt --warning-ignore-file-path=/home/nate/cpython/Tools/build/.warningignore_ubuntu --fail-on-regression --fail-on-improvement
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 45, 'byte-column': 45, 'column': 45}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 40, 'byte-column': 40, 'column': 40}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 59, 'byte-column': 59, 'column': 59}}], 'escape-source': False}
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 45, 'byte-column': 45, 'column': 45}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 40, 'byte-column': 40, 'column': 40}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 375, 'display-column': 59, 'byte-column': 59, 'column': 59}}], 'escape-source': False}
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 8, 'byte-column': 8, 'column': 8}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 5, 'byte-column': 5, 'column': 5}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 13, 'byte-column': 13, 'column': 13}}], 'escape-source': False}
{'kind': 'warning', 'message': 'writing 1 byte into a region of size 0', 'option': '-Wstringop-overflow=', 'option_url': 'https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Warning-Options.html#index-Wno-stringop-overflow', 'children': [], 'column-origin': 1, 'locations': [{'caret': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 8, 'byte-column': 8, 'column': 8}, 'start': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 5, 'byte-column': 5, 'column': 5}, 'finish': {'file': './Modules/_decimal/libmpdec/io.c', 'line': 378, 'display-column': 13, 'byte-column': 13, 'column': 13}}], 'escape-source': False}
Unexpected improvements:
Modules/_decimal/libmpdec/memory.c
Returning exit code:  1

The current version of this tooling should be good enough for an initial PR. Further builds (macOS and Windows) will follow.

CC BY-SA 4.0 Nate Ohlson. Last modified: July 13, 2024.