No description
  • Dockerfile 100%
Find a file
Quik2007 8deabb9d15
All checks were successful
build / build (push) Successful in 8m17s
chore: migrate git host to code.podesta.ai
Rename git.tu-po.com -> code.podesta.ai and reorganize orgs
(auralang -> PodestaAI/akribes, runner image -> public/runner-image,
brew tap -> public/brew-tap, mirrored bases -> public/*). Product
domains aura/akribes.tu-po.com -> api.akribes.ai, studio -> podesta.studio.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 13:54:30 +02:00
.forgejo/workflows chore: migrate git host to code.podesta.ai 2026-05-31 13:54:30 +02:00
.gitignore chore: gitignore local .claude settings 2026-04-15 21:45:20 +02:00
Containerfile feat: bake MinIO client (mc) into image 2026-04-18 11:08:22 +02:00
README.md chore: migrate git host to code.podesta.ai 2026-05-31 13:54:30 +02:00
renovate.json Initial: pre-warmed Forgejo runner image 2026-04-15 21:33:36 +02:00

runner-image

Pre-warmed Forgejo Actions runner image for the tu-po cluster. Published to code.podesta.ai/public/runner-image:latest and pulled by the Kata-VM runner pods in infra/apps/forgejo/ (label runner).

Goals

  • Fast "Set up environment" step. Bake rust, sccache, uv, bun into the image so rust-toolchain, setup-bun, setup-uv are instant (no GitHub API calls, no download, no rate limits).
  • Drop-in replacement for ubuntu-24.04. Base on ghcr.io/catthehacker/ubuntu:full-24.04 so anything that worked under the stock label keeps working.
  • Sane cache story. Ship sccache configured for S3; workflows opt in with RUSTC_WRAPPER=sccache and credentials from the runner env.
  • Renovate-tracked versions. Every pinned tool has a # renovate: marker so renovate.json's custom manager bumps it automatically.

What's in the image

Tool Source Notes
Base OS catthehacker/ubuntu:act-24.04 GitHub-Actions-compatible Ubuntu 24.04 (slim variant, ~1.5GB vs 15GB for :full)
rustup sh.rustup.rs, minimal profile + clippy, rustfmt
sccache mozilla/sccache release (musl) Default RUSTC_WRAPPER=sccache
uv / uvx astral-sh/uv release Replaces pip per global CLAUDE.md
bun / bunx bun.sh install script Replaces npm/npx per global CLAUDE.md
playwright npm, via bun add -g Browsers + OS deps pre-installed at /opt/pw-browsers
node nodejs from Ubuntu noble Needed for the Playwright CLI (#!/usr/bin/env node)

RUSTUP_HOME=/opt/rust, CARGO_HOME=/opt/cargo (world-writable so any uid can use them). SCCACHE_IDLE_TIMEOUT=0 keeps the daemon alive for the full job. PLAYWRIGHT_BROWSERS_PATH=/opt/pw-browsers points workflows at the pre-baked browsers so npx playwright install is a no-op.

Build

podman build -t code.podesta.ai/public/runner-image:latest \
             -t code.podesta.ai/public/runner-image:$(date +%Y%m%d) \
             -f Containerfile .
podman push code.podesta.ai/public/runner-image:latest
podman push code.podesta.ai/public/runner-image:$(date +%Y%m%d)

CI does the same on push to main and on a weekly cron (see .forgejo/workflows/build.yml). Secrets PUSH_USER / PUSH_TOKEN must exist on the repo (names cannot be prefixed with FORGEJO_ or GITHUB_ — Forgejo rejects those).

Problems encountered / solutions

1. catthehacker base switches to USER runner

apt-get update and rustup need root. Symptom:

E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)

Fix: USER root right after FROM.

2. podman rootless: "history lists N non-empty layers, but we have M layers on disk"

Final COMMIT step fails on this error with the catthehacker base. The base has ~18 layers with empty-layer history markers that podman's overlay driver mishandles under rootless home-directory storage. --squash alone does not fix it — the final commit still hits the check.

Fix: build in CI (Kata VM has a clean overlay store) or on a workstation with a non-home storage root. If you must build locally, falling back to --storage-driver=vfs works but is slow and disk-hungry.

3. Kata sandbox crashes during long builds

CI builds that take >10 min have hit SandboxChanged events from cloud-hypervisor on the Kata runtime. The Rust toolchain layer is the long pole. Workarounds:

  • Keep the RUN steps small and fast — every layer is a checkpoint.
  • If builds start failing reproducibly, build locally and push; see infra/scripts/kata-capture.sh for post-mortem capture.

4. GitHub API rate limits on action setup

setup-bun, setup-node, dtolnay/rust-toolchain all hit api.github.com/repos/*/git/refs/tags unauthenticated and get 403'd during busy hours. That's the whole reason this image exists — bake the tools in, skip those actions entirely in workflows that target runner.

5. Forgejo registry truncates large blob uploads (UNRESOLVED)

Pushing this image to code.podesta.ai/public/runner-image fails on the playwright-browser layer (~GB-sized) with one of:

  • 499 Client Closed Request (podman, CI with docker)
  • 504 Gateway Timeout (from the registry ingress)
  • unknown: Client Closed Request after per-blob retry loop

Reproduces both from CI and from a local podman push over the same public ingress, so the bottleneck is not in this repo — it's the Forgejo registry (or the ingress-nginx/Traefik/whatever proxy sits in front of it) cutting long blob uploads. Retrying doesn't help because the connection is torn down before the blob finishes.

Actual fix lives in infra/apps/forgejo/ (or the ingress chart):

  • Raise the proxy's proxy-read-timeout / proxy-send-timeout / proxy-request-timeout past the time it takes to upload the biggest single blob at realistic bandwidth. 1015 min is a safe target.
  • Raise proxy-body-size / client_max_body_size if set — individual blobs can be >500MB.
  • Check Forgejo's own [packages] / [server] upload limits in app.ini (LFS_MAX_FILE_SIZE, STORAGE.MINIO timeouts if backed by MinIO, etc.).

Until that's done, builds fail at push even when the image itself is correct — there's no reasonable image-side workaround (splitting RUN steps doesn't shrink the browser blobs; playwright lays each browser down as one directory tree, and tar/zstd compresses the whole thing as a single blob per changed layer).

6. The act-24.04 base doesn't ship node in PATH

Unlike :full, the slim :act-24.04 variant has no system node. Playwright's CLI is a #!/usr/bin/env node shebang script, so playwright install fails with env: 'node': No such file or directory. Fix: install the nodejs apt package in the same RUN as the rest of the apt deps.

Renovate

Pinned versions have # renovate: datasource=github-releases depName=… markers immediately before their ARG. The renovate.json in this repo has a custom manager that picks them up.

Consumer

infra/apps/forgejo/runner-config.yaml exposes the image as a runner label:

- "runner:docker://code.podesta.ai/public/runner-image:latest"

Workflows opt in with runs-on: runner. Rust jobs additionally export:

env:
  RUSTC_WRAPPER: sccache
  SCCACHE_BUCKET: sccache
  SCCACHE_ENDPOINT: http://minio.tu-po.svc.cluster.local:9000
  SCCACHE_S3_USE_SSL: "false"

AWS creds reach the job via the runner StatefulSet's env_file: .env (populated from a secretKeyRef in the runner pod).