CI on 100 Repos — Matrix Build, 3 Python Versions, Lessons Learned

The CI bill is one of the few cost lines that scales sharply with the number of repos. Every push triggers a build. Every build consumes GitHub Actions minutes. Every minute is paid for, either by private-repo billing or by the free-tier cap you hit.

130+ repos, each pushing 2-5 times per week, each running a 3-Python-version matrix with ~200 tests and lint, means real minutes. Here's how we structured it, what we cache, and what failures have taught us.

The Workflow

Every Dangercorn app has the same .github/workflows/ci.yml. It's ~40 lines. The skeleton:

name: CI
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        python-version: ["3.10", "3.11", "3.12"]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: 'pip'
      - run: pip install -r requirements.txt -r requirements-dev.txt
      - run: ruff check .
      - run: ruff format --check .
      - run: pytest -q --tb=short

Three Python versions because our self-host users are all over the map — some on 3.10 (Ubuntu 22.04 default), some on 3.11, some on 3.12. Breaking a self-host deploy because of a version incompatibility is the kind of bug we genuinely cannot afford.

Why 3.10 as the Floor

Python 3.9 reached end-of-life for security patches in October 2025. 3.10 is still supported. We target 3.10+ on the assumption that any serious self-hoster is on a patched Python. If someone's running 3.9, they have other problems.

Ruff Replaced Everything

18 months ago our workflow had: flake8, black, isort, mypy. Four tools, sometimes conflicting on formatting. Combined install and run time: ~40 seconds.

Ruff replaced flake8 + black + isort. Combined install and run: 4 seconds. 10x improvement from one tool swap.

mypy is still in the loop, but we run it on a separate job that's not required-for-merge. It's slow, the type-errors are often legitimate but non-urgent, and blocking on it slows the whole feedback loop. Async type checking > sync type checking.

The Caching Story

pip cache saves ~20 seconds per job. uv cache would save more — we're mid-migration to uv and the initial tests show ~35 seconds per job savings. Across 130 repos × 3 versions × 3 pushes/week, that's ~5 hours of runner time per week.

GitHub's cache has a 10GB per-repo limit. Each Python version's pip cache is ~200MB. 3 versions = ~600MB. Under the limit, but noticeable.

Flaky Tests

Across 130 repos, we see maybe 5-10 flaky test failures per week. Most are network flakes (pip install failing on a bad mirror, a test that hits a fake HTTP endpoint that's temporarily slow). A few are real concurrency bugs.

Our policy: retry-once on CI failure, via a continue-on-error + re-run pattern isn't in stock Actions, so we use a manual workflow dispatch button for re-runs. Habitually-flaky tests get quarantined with an @pytest.mark.flaky decorator and a ticket to fix.

Parallel Matrix

The matrix (3 Python versions) runs in parallel. Total wall-clock time for a PR is ~3 minutes (the slowest matrix job). Serial would be ~9 minutes. The parallel factor is worth the extra minutes on the billing side.

What Breaks at 100+ Repos

Three things.

Dependency drift. When 130 repos each pin their dependencies, they drift apart. One app is on Flask 3.0.1, another is on 3.0.2. In isolation, fine. When you're trying to share code via dangercorn-saas-template, you have to pick a version. We now pin major-minor in the template (Flask~=3.0) and let patch versions float.

Coordinated security updates. When a CVE drops for a transitive dependency, you have to push the fix to every repo. Dependabot handles the PR generation, but merging 130 PRs is... tedious. We now batch-merge Dependabot PRs on Fridays with a script that checks each one's CI and merges the passing ones.

Matrix explosion. 3 Python versions × some apps also wanting to test against multiple Postgres versions, multiple SQLite versions, multiple OS runners (ubuntu-latest + macos-latest + windows-latest). If you let it grow unchecked, you get 27-variant matrices and runner minutes evaporate. We resist this — the default is 3 Python versions on ubuntu-latest, and adding more requires a justification.

The F-String Incident

Covered separately in this post: a backslash-in-f-string bug that parsed fine on Python 3.12 and crashed on 3.10/3.11. CI caught it because we test the matrix. Exactly the class of bug the matrix exists for. Would have shipped to self-hosters otherwise.

Where Public Repos Save Money

Half of our 130 repos are public. Public repos get unlimited Actions minutes on GitHub. The other half are private (early-stage verticals we haven't promoted yet, internal tooling, hosted-tier-only modules) and consume the metered minutes.

If we made everything public, our CI bill would drop to near-zero. We've considered it. The reasons we don't:

Some private repos contain customer-specific config or partner integration code that isn't ours to publish.
Some private repos are early-stage and we want to launch before competitors clone the work-in-progress.
Some private repos contain audit logging that references customer behavior we don't want public.

The CI cost of keeping these private is, in retrospect, a reasonable price for the optionality. We could go further on the public side and probably should as the verticals stabilize.

Cost

Our GitHub Actions minutes run about 8,000-12,000/month. Free tier for private repos is 2,000 minutes/month on the Team plan. We're well over.

Total bill: around $80-110/month for CI across the whole portfolio. Cheaper than a single Circle CI seat if you wanted to self-manage runners. We've considered self-hosted runners on the fleet (Huginn has capacity) but haven't pulled the trigger — the cost is small enough that the reliability of GitHub's runners wins.

Release Workflow

When we tag a release on a repo (v1.2.3), a release workflow runs: full test suite, build a wheel, upload to internal PyPI mirror, update the landing page. Release CI is separate from commit CI because it's slower (~6 minutes) and only needs to run on tags.

What I'd Do Differently

I'd have standardized on uv from day one instead of migrating after the fact. pip is fine; uv is 10x faster. The migration is non-trivial because every requirements.txt has small differences.

I'd have set up a dashboard of cross-repo CI status from the beginning. Right now I know if CI is healthy across the portfolio by scrolling through GitHub notifications. A dashboard would take ~2 hours to build and save me an hour a week. Will do this soon.

I'd have added a "security patches only" fast-lane workflow. When a CVE drops, going through the full matrix + full test suite is overkill. A minimal "install + import" test is enough for a patch-level dep bump. Today we run full CI; that's wasted minutes.

CI is a portfolio discipline. One repo's CI is a checklist. 100 repos' CI is a system — and the system has its own failure modes, cost structure, and ongoing maintenance cost.

The f-string bug CI caught. Flask + SQLite at scale. The template walkthrough. The fleet that runs it all.