Build a containerized python app
Suddenly, back to Python.
Recently, I've been deciding what language to use for a new set of programs I need to develop for my pet project. Some time ago I experimented with Scala before putting my work on ice and traveling to Thailand for some time… Anyway, the Scala FP ecosystem 🐈 is what I want to master, but I surprised myself by deciding against it. I found myself under time constraints and in need to be productive. 🙄 So, I decided to go with Python. No functional programming, no distractions — just basic programming! 🫠 No monads… 💔
Setup a new project with PDM.
The process is detailed in the PDM documentation. Basically:
mkdir my-project && cd my-project
pdm init
The pdm init
command runs interactively, asking a few questions. The question, Do you want to build this project for distribution (such as a wheel)?, gave me pause. Since the default is no, I wondered why I wouldn't want to create a package. Ultimately, I answered yes because I need a package for deployment within a container image. This allows me to install the package from my private repository during the build process. In reality, it will be a set of packages, which makes this approach even more sensible; the alternative would be checking out multiple repositories with git and running from the source tree.
That's pretty much it… the development of the first project ensued and a couple of days later I found myself going mad again over relative/absolute module imports…
Import modules like a pro (maybe?).
Modules can be imported using either relative or absolute imports. Long story short: relative imports? 😫 Headache! Absolute imports? 😎 Good! It seems that in all these years, nothing has changed about this import struggle. I don't want the import style to dictate how my service can run, whether it's with python
, python -m
, uvicorn
1. Absolute imports seem to solve this problem once and for all… or at least until it breaks again. 🤞
Include a bin script.
That took some rummaging around to find! So, the way to include scripts with PDM is to add a section to your pyproject.toml
file, like this:
[project.scripts]
data-dump = "data_dump.__main__:main"
Where data_dump
is my package name, and __main__
2 contains the code to import the package and run uvicorn. Basically, it's the entry point for your script.
import uvicorn
def main() -> None:
uvicorn.run(
app="data_dump.main:app",
host="127.0.0.1",
port=8080,
)
if __name__ == "__main__":
main()
Undocumented Gitea setup.
It might be useful to discuss my Gitea instance here. Though the setup is currently undocumented and running on bare VPS metal, this definitely has to change. When my schedule allows, the plan is to run it in a container and finally document all settings. For now, all that's known is that I have a private Gitea instance running on a VPS, reachable via my equally private OpenVPN network. Gitea supports many repository types out of the box, including PyPI, and requires no extra configuration for them. The tricky part is act runner3 and… yeah… it works, but I didn't document the setup either.
Package and upload to a Gitea repository.
Even though my Gitea instance isn't reachable outside my OpenVPN network, I still try to maintain basic security measures, so if I ever need to expose any parts of my little infrastructure to the outside, I'll not instantly compromise the entire setup. So, to upload to Gitea, a token will be required.
The process is:
- Generate a token in Gitea under User -> Settings -> Applications.
Setup PDM to use the token for package upload:
[repository.<repo_name>] url = "<gitea>/api/packages/<org>/pypi" username = "__token__" password = "<token>"
~/.config/pdm/config.toml When a token is used, the username must be set to
__token__
. Also, the Gitea docs contain additional useful documentation for pip.- Publish with
pdm publish
. There seems to be no way to set the target repository from step 2 as the default, so either-r <repo_name>
or--repository <repo_name>
flag should be used. Alternatively, thePDM_PUBLISH_REPO
env var can be set.
Build a containerd image.
Dockerfile
I don't need a production-grade setup for containers yet, so just to make sure there are no surprises, I did quick testing with building and running my app inside a container.
FROM archlinux:base
RUN <<EOF
pacman --noconfirm -Sy python python-pip uvicorn
pacman --noconfirm -Scc
rm -rf /var/cache/pacman/pkg/*
EOF
ARG TOKEN
RUN <<EOF
pip install \
--break-system-packages \
--trusted-host ${gitea} \
--extra-index-url http://__token__:${TOKEN}@${gitea}/api/packages/${org}/pypi/simple/ \
--upgrade data-dump
pip cache purge
EOF
CMD ["data-dump"]
- Let's break some system packages with
--break-system-packages
. It's a container already, no need for a virtual environment inside it. - pip will refuse to use Gitea on an untrusted host, so we force it with
--trusted-host
. - We don't want to override the default index but add our own as extra, hence
--extra-index-url
.
I use nerdctl
to communicate with containerd. Here is a build script:
#!/bin/sh
nerdctl build \
--add-host ${}:${} \
--progress tty \
--tag ${}/${}/data-dump:v1 \
--tag ${}/${}/data-dump:latest \
--build-arg TOKEN=<token> \
.
This is all pretty rustic, but it's a start. Later, the whole process will run on act runner3. And yes, yes, don't send secrets via environment variables.
- The
--add-host
parameter is there to resolve the hostname for the Gitea/PyPI repository, as the build process will not have access to my host network where dnsmasq is running. Without it,pip install
will fail. --tag
with<gitea>/<org>
container repository, that way the images can be immediately uploaded without assigning additional tags later when doingnerdctl push
.- The rest is self-explanatory, maybe except for
--progress
,tty
is the default, but if the build process fails it helps to set it toplain
and get more output. For some reason, at least on my terminals (vterm, kitty),tty
eats quite a lot of output; only single lines are shown forRUN
output.
Running a container
To make sure the image is correct, it can be quickly tested with:
#!/bin/sh
nerdctl run --network=host --rm ${}/${}/data-dump:latest
--network-host
gives host network access to the container, so the port it binds will be accessible on the host machine without any extra work.
Pushing to a container registry.
- First of all, log in with
nerdctl login --insecure-registry <gitea>
. Username/password: the same principle here; the username should be set to__token__
and the password is the actual token. - And we are ready to go!
nerdctl push --insecure-registry <gitea>/<org>/data-dump
Footnotes:
- 3
Gitea actions is a built-in CI/CD solution.