Kira's brain dump 🔮

Build a containerized python app

Suddenly, back to Python.

Recently, I've been deciding what language to use for a new set of programs I need to develop for my pet project. Some time ago I experimented with Scala before putting my work on ice and traveling to Thailand for some time… Anyway, the Scala FP ecosystem 🐈 is what I want to master, but I surprised myself by deciding against it. I found myself under time constraints and in need to be productive. 🙄 So, I decided to go with Python. No functional programming, no distractions — just basic programming! 🫠 No monads… 💔

Setup a new project with PDM.

The process is detailed in the PDM documentation. Basically:

mkdir my-project && cd my-project
pdm init

The pdm init command runs interactively, asking a few questions. The question, Do you want to build this project for distribution (such as a wheel)?, gave me pause. Since the default is no, I wondered why I wouldn't want to create a package. Ultimately, I answered yes because I need a package for deployment within a container image. This allows me to install the package from my private repository during the build process. In reality, it will be a set of packages, which makes this approach even more sensible; the alternative would be checking out multiple repositories with git and running from the source tree.

That's pretty much it… the development of the first project ensued and a couple of days later I found myself going mad again over relative/absolute module imports…

Import modules like a pro (maybe?).

Modules can be imported using either relative or absolute imports. Long story short: relative imports? 😫 Headache! Absolute imports? 😎 Good! It seems that in all these years, nothing has changed about this import struggle. I don't want the import style to dictate how my service can run, whether it's with python, python -m, uvicorn1. Absolute imports seem to solve this problem once and for all… or at least until it breaks again. 🤞

Include a bin script.

That took some rummaging around to find! So, the way to include scripts with PDM is to add a section to your pyproject.toml file, like this:

[project.scripts]
data-dump = "data_dump.__main__:main"
pyproject.toml

Where data_dump is my package name, and __main__2 contains the code to import the package and run uvicorn. Basically, it's the entry point for your script.

import uvicorn


def main() -> None:
    uvicorn.run(
        app="data_dump.main:app",
        host="127.0.0.1",
        port=8080,
    )


if __name__ == "__main__":
    main()
__main__.py

Undocumented Gitea setup.

It might be useful to discuss my Gitea instance here. Though the setup is currently undocumented and running on bare VPS metal, this definitely has to change. When my schedule allows, the plan is to run it in a container and finally document all settings. For now, all that's known is that I have a private Gitea instance running on a VPS, reachable via my equally private OpenVPN network. Gitea supports many repository types out of the box, including PyPI, and requires no extra configuration for them. The tricky part is act runner3 and… yeah… it works, but I didn't document the setup either.

Package and upload to a Gitea repository.

Even though my Gitea instance isn't reachable outside my OpenVPN network, I still try to maintain basic security measures, so if I ever need to expose any parts of my little infrastructure to the outside, I'll not instantly compromise the entire setup. So, to upload to Gitea, a token will be required.

The process is:

  1. Generate a token in Gitea under User -> Settings -> Applications.
  2. Setup PDM to use the token for package upload:

    [repository.<repo_name>]
    url = "<gitea>/api/packages/<org>/pypi"
    username = "__token__"
    password = "<token>"
    
    ~/.config/pdm/config.toml

    When a token is used, the username must be set to __token__. Also, the Gitea docs contain additional useful documentation for pip.

  3. Publish with pdm publish. There seems to be no way to set the target repository from step 2 as the default, so either -r <repo_name> or --repository <repo_name> flag should be used. Alternatively, the PDM_PUBLISH_REPO env var can be set.

Build a containerd image.

Dockerfile

I don't need a production-grade setup for containers yet, so just to make sure there are no surprises, I did quick testing with building and running my app inside a container.

FROM archlinux:base

RUN <<EOF
    pacman --noconfirm -Sy python python-pip uvicorn
    pacman --noconfirm -Scc
    rm -rf /var/cache/pacman/pkg/*
EOF

ARG TOKEN

RUN <<EOF
    pip install \
        --break-system-packages \
        --trusted-host ${gitea} \
        --extra-index-url http://__token__:${TOKEN}@${gitea}/api/packages/${org}/pypi/simple/ \
        --upgrade data-dump
    pip cache purge
EOF

CMD ["data-dump"]
Dockerfile
  • Let's break some system packages with --break-system-packages. It's a container already, no need for a virtual environment inside it.
  • pip will refuse to use Gitea on an untrusted host, so we force it with --trusted-host.
  • We don't want to override the default index but add our own as extra, hence --extra-index-url.

I use nerdctl to communicate with containerd. Here is a build script:

#!/bin/sh

nerdctl build \
        --add-host ${gitea}:${ip} \
        --progress tty \
        --tag ${gitea}/${org}/data-dump:v1 \
        --tag ${gitea}/${org}/data-dump:latest \
        --build-arg TOKEN=<token> \
        .
build_image.sh

This is all pretty rustic, but it's a start. Later, the whole process will run on act runner3. And yes, yes, don't send secrets via environment variables.

  • The --add-host parameter is there to resolve the hostname for the Gitea/PyPI repository, as the build process will not have access to my host network where dnsmasq is running. Without it, pip install will fail.
  • --tag with <gitea>/<org> container repository, that way the images can be immediately uploaded without assigning additional tags later when doing nerdctl push.
  • The rest is self-explanatory, maybe except for --progress, tty is the default, but if the build process fails it helps to set it to plain and get more output. For some reason, at least on my terminals (vterm, kitty), tty eats quite a lot of output; only single lines are shown for RUN output.

Running a container

To make sure the image is correct, it can be quickly tested with:

#!/bin/sh

nerdctl run --network=host --rm ${gitea}/${org}/data-dump:latest

--network-host gives host network access to the container, so the port it binds will be accessible on the host machine without any extra work.

Pushing to a container registry.

  • First of all, log in with nerdctl login --insecure-registry <gitea>. Username/password: the same principle here; the username should be set to __token__ and the password is the actual token.
  • And we are ready to go! nerdctl push --insecure-registry <gitea>/<org>/data-dump

Footnotes:

1

Uvicorn is an ASGI web server implementation for Python.

2

Details on main.py file.

3

Gitea actions is a built-in CI/CD solution.