Background
Containerized applications often spend a lot of time downloading packages during build (sometimes encountering network issues, requiring multiple builds for one version to succeed), yet most dependency packages rarely change. So I had this optimization idea for building images:
Use a minimal container to pre-download required packages locally, then build these packages into local repositories. In containers that need to be built, replace the software sources with local repositories to save container build time (by orders of magnitude).
Another reason I’m writing this blog post: although everything used here is existing tools and software, besides their own manuals and --help, other helpful documentation is really scattered. And from my searching, these existing tools always have small “pitfalls” not mentioned in documentation, or even rarely encountered on sites like Stack Overflow - solving these “pitfalls” is what takes the most time.
TL;DR
I’ll later open source part of this on GitHub, placed here (TODO)
Currently tested compatible distributions:
- centos 6 / 7 / 8
- fedora 31 / 32 / 33
- amazonlinux 1 / 2
- ubuntu trusty (14.04) / xenial (16.04) / bionic (18.04) / focal (20.04)
- debian jessie (8) / stretch (9) / buster (10)
- opensuse leap 15
Process
- Based on a minimal container for the distribution, add some software sources that will be needed.
- For different distributions, use the corresponding package management tool to download all packages in the required package list plus their dependencies
- Place these packages in corresponding directories by distribution, use repository creation commands in the container to build local repositories
- Use a simple static web server, listening on a local port. This sets up a local http software repository
- Add the local software source to the
Dockerfileof containers that need frequent rebuilds. Note that local software repositories generally don’t have signature verification or https, so you need to manually add trust.
Here I’ll only explain the more tedious steps
0x02. Package Download
yum / dnf
$ cd /path/to/dir \
&& yumdownloader --resolve pkg-1 pkg-2 ...
yumdownloaderis preferred here. The previous approach trieddnf install --downloadonly, and found quite a few unknown pitfalls. One of them is that after downloading, packages already downloaded locally occasionally get deleted - feels likednf / yumhas some storage optimization strategies.- The
--resolveoption tellsyumdownloaderto download dependencies for the specified packages --installrootNot recommended to use this option to specify download path. After using this option, macros (variables) in software source config files won’t be automatically resolved. For example, the common$releasevervariable would need extra manual specification.yumdownloaderdownloads packages directly to the working directory, just use cd to switch the working directory first
apt-get
$ cd /path/to/dir \
&& apt-get download \
$(apt-cache depends --recurse --no-recommends --no-suggests \
--no-conflicts --no-breaks --no-replaces --no-enhances \
pkg-1 pkg-2 ... | grep "^\w")
- If using
apt-get install --donwload-only --reinstallto download packages, dependency packages that already exist in the current container won’t be downloaded again.
For example, if the downloader-container already has ca-certificates and openssl packages installed, when executing the following command, the result is: due to the --reinstall option ca-certificates will be downloaded, but openssl as a dependency of ca-certificates will be ignored.
$ apt-get download \
$(apt-cache depends --recurse --no-recommends --no-suggests \
--no-conflicts --no-breaks --no-replaces --no-enhances \
ca-certificates | grep "^\w")
- Here we use
apt-get downloadrather thanapt-get --install --donwload-only, mainly because in the subcommandapt-cache depends, the queried dependencies will have preferred and alternative choices, and these two are often conflicting. Even ifapt-get installuses--donwload-only, it will cause package download failure because conflicts can’t be resolved.
Below is an example of apt-cache depends output, where pinentry-curses is the more preferred choice over <pinentry:i386>.
Detailed explanation can be found at https://www.thecodeship.com/gnu-linux/understanding-apt-cache-depends-output/
$ apt-cache depends --recurse --no-recommends \
--no-suggests --no-conflicts --no-breaks \
--no-replaces --no-enhances --no-pre-depends \
gnupg2 | grep -E '^gnupg-agent:i386' -A10
gnupg-agent:i386
|Depends: pinentry-curses:i386
Depends: <pinentry:i386>
mew-beta-bin:i386
mew-bin:i386
pinentry-curses:i386
pinentry-gnome3:i386
pinentry-gtk2:i386
pinentry-qt:i386
pinentry-tty:i386
Depends: libassuan0:i386
apt-get downloadalso downloads packages directly to the current directory, so just use thecdcommand to switch working directory first
zypper
$ zypper --no-gpg-checks --non-interactive \
--pkg-cache-dir /path/to/dir \
install -y -f --download-only \
pkg-1 pkg-2 ...
--non-interactiveis mainly for scripts, preventingzypperfrom waiting for user input until timeout--pkg-cache-dirspecifies the download directory-fforces downloading already installed packages. This actually encounters the same problem asapt-get install --download-only, where dependency packages won’t be downloaded if already installed. Currently I write it this way, adding missing base packages manually.- For
zypper, distinguish between global arguments and subcommand arguments. Specifically for this command, before install are global arguments, and after are subcommand arguments
0x03. Directory Structure
yum
The yum repository directory structure is as follows:
base/
├── amazonlinux-1
│ └── x86_64
| ├── audit-libs-2.6.5-3.28.amzn2.i686.rpm
| ├── ...
│ └── repodata
...
Note: yum repository structure is relatively simple. Under distribution subdirectory -> CPU architecture directory, store downloaded rpm packages, then create local repository index in the same directory.
Command to create yum repository index:
cd /path/to/dir \
&& createrepo --update ./
There’s also a C version createrepo_c which is faster with the same usage. Recommended for newer distributions, like centos 8 / fedora 31+ / amazonlinux
cd /path/to/dir \
&& createrepo_c --update ./
Newer distributions have some packages built with modularity1. If you want to build local repositories for these packages, extra commands are needed:
Documentation at: https://docs.fedoraproject.org/en-US/modularity/hosting-modules/
cd /path/to/dir \
&& createrepo_c --update ./ \
&& repo2module -s stable -n REPO_NAME -d ./ ./repodata/modules \
&& modifyrepo_c --mdtype=modules ./repodata/modules.yaml ./repodata
Where REPO_NAME is the local repository name
A noteworthy command here is repo2module (from https://github.com/rpm-software-management/modulemd-tools), because the above documentation doesn’t mention how to generate the modules.yaml file.
fedora or centos 8 (needs additional epel repository) can install the repo2module command via dnf install -y python3-gobject modulemd-tools
apt
The apt repository directory structure is as follows:
ubuntu/
├── dists
│ ├── bionic
│ │ └── base
│ │ └── main
│ │ └── binary-amd64
| ...
└── pool
├── bionic
│ └── base
│ └── main
│ └── binary-amd64
...
Note: apt repository has two subdirectories dists/ and pool/. dists/ subdirectory stores indexes, pool/ subdirectory stores packages.
Command to create apt repository index:
Here the local repository won’t use gpg signed Release, full command at: https://medium.com/sqooba/create-your-own-custom-and-authenticated-apt-repository-1e4a4cf0b864#35dd
cd /path/to/dir
apt-ftparchive --arch amd64 packages \
pool/bionic/base/main/binary-amd64 \
> dists/base/main/binary-amd64/Packages
gzip -k -c \
-f dists/base/main/binary-amd64/Packages \
> dists/base/main/binary-amd64/Packages.gz
apt-ftparchive release dists/bionic/base > dists/bionic/Release
Where base is a custom repository subdirectory, convenient for future expansion.
The apt-ftparchive command can be installed via apt-get install -y dpkg-dev.
0x05. Adding Local Repository
The host.docker.internal below is a hostname added via docker build’s --add-host, 4891 is the port openresty listens on locally
yum
printf "[local-base]\n\
name=Local Base Repo\n\
baseurl=http://host.docker.internal:4891/base/centos-7/x86_64/\n\
skip_if_unavailable=True\n\
gpgcheck=0\n\
repo_gpgcheck=0\n\
enabled=1\n\
enabled_metadata=1" > /etc/yum.repos.d/local-base.repo
zypper
printf "[local-base]\n\
name=Local Base Repo\n\
baseurl=http://host.docker.internal:4891/base/sles-12/x86_64/\n\
skip_if_unavailable=True\n\
gpgcheck=0\n\
repo_gpgcheck=0\n\
enabled=1\n\
enabled_metadata=1" > /root/local-base.repo \
&& zypper -n ar --check --refresh -G file:///root/local-base.repo \
&& zypper -n mr --gpgcheck-allow-unsigned-repo local-base \
&& zypper -n mr --gpgcheck-allow-unsigned-package local-base \
&& rm -f /root/local-base.repo
apt
echo "deb [trusted=yes] http://host.docker.internal:4891/ubuntu bionic/base main" > /etc/apt/sources.list