Ansible's built-in 'package' module doesn't show any details when
installing the RPMs. All that can be seen is:
TASK [Install RPM packages]
fedora-rawhide | changed
Therefore, there's no way to know what version of the packages got
installed.
In this case, not knowing the go-md2man(1) version being used by the CI
makes it difficult to know why the tests are failing on Fedora Rawhide
and Fedora 39 with:
not ok 3 help: Command 'help' in 177ms
# (from function `assert_line' in file
test/system/libs/bats-assert/src/assert.bash, line 479,
# in test file test/system/002-help.bats, line 48)
# `assert_line --index 0 --partial "toolbox(1)"' failed
# /usr/bin/man
#
# -- line does not contain substring --
# index : 0
# substring : toolbox(1)
# line : troff:<standard input>:33: warning: cannot select font
'C'
# --
#
It could be either because the CI is still using an older version of
go-md2man(1) [1,2], or that there's some other problem.
[1] Fedora golang-github-cpuguy83-md2man commit 117806d50e401c19
https://src.fedoraproject.org/rpms/golang-github-cpuguy83-md2man/c/117806d50e401c19https://src.fedoraproject.org/rpms/golang-github-cpuguy83-md2man/pull-request/3
[2] go-md2man commit d85280db9b54b574
https://github.com/cpuguy83/go-md2man/commit/d85280db9b54b574https://github.com/cpuguy83/go-md2man/issues/99https://github.com/containers/toolbox/pull/1386
The Zuul executor contains Ansible 2.13.7 whose 'dnf' module is not
working as it should with Fedora Rawhide because of the DNF5 Change [1].
Unlike DNF4, DNF5 no longer pulls in the python3-dnf RPM, which causes:
TASK [Install RPM packages]
fedora-rawhide | ERROR
fedora-rawhide | {
fedora-rawhide | "msg": "Could not import the dnf python module
using /usr/bin/python3 (3.12.0b3 (main, Jun 21 2023, 00:00:00)
[GCC 13.1.1 20230614 (Red Hat 13.1.1-4)]). Please install
`python3-dnf` or `python2-dnf` package or ensure you have
specified the correct ansible_python_interpreter. (attempted
['/usr/libexec/platform-python', '/usr/bin/python3',
'/usr/bin/python2', '/usr/bin/python'])",
fedora-rawhide | "results": []
fedora-rawhide | }
This adds a workaround that explicitly installs the python3-dnf RPM
using Ansible's 'command' module. It should be removed after Zuul
contains a newer release of Ansible.
[1] https://fedoraproject.org/wiki/Changes/ReplaceDnfWithDnf5https://github.com/containers/toolbox/pull/1338
Signed-off-by: Daniel Pawlik <dpawlik@redhat.com>
Ansible's 'shell' module is almost exactly like the 'command' module,
except that it runs the command through a command line shell so that
environment variables like HOSTNAME and operations like '*', '<' and '>'
work. None of those things are necessary are here. Hence, it's better
to use the 'command' module as elsewhere.
Note that, unlike Ansible's 'shell' module, the 'command' module doesn't
support inline scripts. So, each command needs to be in its own
separate task.
https://github.com/containers/toolbox/pull/1318
This uses 'skopeo inspect' to get the size of the image on the registry,
which is usually less than the size of the image in a local
containers/storage image store after download (eg., 'podman images'),
because they are kept compressed on the registry. Skopeo >= 1.10.0 is
needed to retrieve the sizes [1].
However, this doesn't add a hard dependency on Skopeo to accommodate
size-constrained operating systems like Fedora CoreOS. If skopeo(1) is
missing or too old, then the size of the image won't be shown, but
everything else would continue to work as before.
Some changes by Debarshi Ray.
[1] Skopeo commit d9dfc44888ff71a6
https://github.com/containers/skopeo/commit/d9dfc44888ff71a6https://github.com/containers/skopeo/issues/641https://github.com/containers/toolbox/issues/752
Signed-off-by: Nieves Montero <nmontero@redhat.com>
This is meant to roughly replicate the build environments used by
downstream distributors to build toolbox(1). These can be restricted in
odd ways compared to a fully featured environment where toolbox(1) is
actually going to be used. eg., the inability to use podman(1) in the
case of Fedora or not having subordinate user and group ID ranges in the
case of openSUSE.
It's important to ensure that toolbox(1) can be built by downstream
distributors without any unnecessary hassle.
https://github.com/containers/podman/issues/17657https://github.com/containers/toolbox/issues/1246
On enterprise FreeIPA set-ups, the subordinate user and group IDs are
provided by SSSD's sss plugin for the GNU Name Service Switch (or NSS)
functionality of the GNU C Library. They are not listed in /etc/subuid
and /etc/subgid. Therefore, its necessary to use libsubid.so to check
the subordinate ID ranges.
The CGO interaction with libsubid.so is loosely based on 'readSubid' in
github.com/containers/storage/pkg/idtools [1].
However, unlike 'readSubid', this code considers the absence of any
range (ie., nRanges == 0) to be an error as well.
More importantly, this code uses dlopen(3) and friends to dynamically
load the symbols from libsubid.so, instead of linking to libsubid.so at
build-time and having the dependency noted in the /usr/bin/toolbox
binary. This is done because libsubid.so itself depends on several
other shared libraries, and indirect dependencies can't be influenced
by the RUNPATH [2] embedded in the /usr/bin/toolbox binary [3]. Hence,
when the binary is used inside Toolbx containers (eg., as the entry
point), those indirect dependencies won't be picked from the host's
runtime against which the binary was built. This can render the binary
useless due to ABI compatibility issues. Using dlopen(3) avoids this
problem, especially because libsubid.so is only used when running on the
host.
Care was taken to not load and link libsubid.so twice to separately
validate the subordinate ID ranges for the user and the group. Note
that libsubid_init() must be passed a FILE pointer for logging.
Otherwise, it will create it's own for logging, and there's no way to
close it during dlclose(3).
Version 4 of the libsubid.so API/ABI [4] was released in Shadow 4.10,
which is newer than the versions shipped on RHEL 8 and Debian 10 [5],
and even that newer version had some problems [6]. Therefore, support
for older versions, with the relevant workarounds, is necessary.
Fortunately, the oldest that needs to be support is Shadow 4.9 because
that's when libsubid.so was introduced [7].
Note that SUBID_ABI_VERSION was only introduced with version 4 of the
libsubid.so API/ABI released in Shadow 4.10 [8]. The first release of
libsubid.so in Shadow 4.9 already had an ABI version of 3.0.0 [9], since
it was bumped a few times during development, so that's what's assumed
when SUBID_ABI_VERSION is absent.
This code doesn't set the public variables Prog and shadow_logfd that
older Shadow versions used to expect for logging, because from Shadow
4.9 onwards there's a separate function [4,10] to specify these. This
can be changed if there are libsubid.so versions in the wild that really
do need those public variables to be set.
Finally, ISO C99 is required because of the use of <stdbool.h> in the
libsubid.so API.
Some changes by Debarshi Ray.
[1] https://github.com/containers/storage/blob/main/pkg/idtools/idtools_supported.go
[2] https://man7.org/linux/man-pages/man8/ld.so.8.html
[3] Commit 6063eb27b9https://github.com/containers/toolbox/issues/821
[4] Shadow commit 32f641b207f6ddff
https://github.com/shadow-maint/shadow/commit/32f641b207f6ddffhttps://github.com/shadow-maint/shadow/issues/443
[5] https://packages.debian.org/source/buster/shadow
[6] Shadow commit 79157cbad87f42cd
https://github.com/shadow-maint/shadow/commit/79157cbad87f42cdhttps://github.com/shadow-maint/shadow/issues/465
[7] Shadow commit 0a7888b1fad613a0
https://github.com/shadow-maint/shadow/commit/0a7888b1fad613a0https://github.com/shadow-maint/shadow/issues/154
[8] Shadow commit 0c9f64140852e8d5
https://github.com/shadow-maint/shadow/commit/0c9f64140852e8d5https://github.com/shadow-maint/shadow/pull/449
[9] Shadow commit 3d670ba7ed58f910
https://github.com/shadow-maint/shadow/commit/3d670ba7ed58f910https://github.com/shadow-maint/shadow/issues/339
[10] Shadow commit 2b22a6909dba60d
https://github.com/shadow-maint/shadow/commit/2b22a6909dba60dhttps://github.com/shadow-maint/shadow/issues/325https://github.com/containers/toolbox/issues/1074
Signed-off-by: Martin Jackson <martjack@redhat.com>
Building Toolbx requires a C compiler [1], which defaults to GCC on
Fedora and CentOS Stream. It's good to explicitly require it, so that
it doesn't go missing from the build.
Showing the version of the C compiler is a big help when debugging weird
build problems involving the toolchain. A following commit will use CGO
to link to libsubid.so, which will only increase the relevance of the C
compiler.
[1] Commit c8aaed52c5https://github.com/containers/toolbox/pull/923https://github.com/containers/toolbox/pull/1218
This will be used by the subsequent commit to have a separate set of
dependencies for CentOS Stream 9 builds. eg., unlike Fedora, CentOS
Stream 9 doesn't have the ShellCheck, bats and fish RPMs.
https://github.com/containers/toolbox/pull/1171
Currently, the standard error and output streams of the child commands
invoked by 'meson test' are redirected to a separate log file. When the
tests fail, it's difficult, or maybe even impossible, to access this
file from the Zuul CI, and all that can be seen is something like:
1/7 shellcheck src/go-build-wrapper OK 0.04s
2/7 shellcheck profile.d/toolbox.sh FAIL 0.06s exit status 1
>>> MALLOC_PERTURB_=241 /usr/bin/shellcheck
--shell=sh
/home/zuul-worker/src/github.com/containers/toolbox/builddir/../profile.d/toolbox.sh
3/7 go fmt FAIL 0.05s exit status 1
>>> MALLOC_PERTURB_=209 /usr/bin/python3
/home/zuul-worker/src/github.com/containers/toolbox/src/meson_go_fmt.py
/home/zuul-worker/src/github.com/containers/toolbox/src
4/7 codespell FAIL 0.31s exit status 65
>>> MALLOC_PERTURB_=180 /usr/bin/codespell
--check-filenames
--check-hidden
--context 3
--exclude-file /home/zuul-worker/src/github.com/containers/toolbox/.codespellexcludefile
--skip /home/zuul-worker/src/github.com/containers/toolbox/builddir
--skip /home/zuul-worker/src/github.com/containers/toolbox/.git
--skip /home/zuul-worker/src/github.com/containers/toolbox/test/system/libs/bats-assert
--skip /home/zuul-worker/src/github.com/containers/toolbox/test/system/libs/bats-support
/home/zuul-worker/src/github.com/containers/toolbox
5/7 shellcheck toolbox (deprecated) FAIL 1.09s exit status 1
>>> MALLOC_PERTURB_=233 /usr/bin/shellcheck
/home/zuul-worker/src/github.com/containers/toolbox/builddir/../toolbox
6/7 go test OK 1.89s
7/7 go vet OK 17.60s
This doesn't have enough information to understand what caused the tests
to fail on non-interactive CI environments.
Not redirecting the standard error and output streams of the child
commands invoked by 'meson test' will readily reveal more details about
the test failures and remove the need to find the log file created by
Meson.
https://github.com/containers/toolbox/pull/1171
Different versions of ShellCheck and codespell may treat the same code
base differently. eg., these tools are currently being used on Fedora
36 as part of the 'unit tests', but CentOS Stream 9 has newer versions
that are stricter and catch several new problems.
Knowing the versions of the tools used in the tests helps to understand
these differences, and is a step towards testing on CentOS Stream 9.
https://github.com/containers/toolbox/pull/1199
Currently, 'meson compile' and 'meson install' were being invoked from
pre-run playbooks. This meant that a genuine build failure from either
of those commands would be shown as a RETRY_LIMIT failure by the CI.
This was misleading. It made it look as if the failure was caused by
some transient networking problem or that the CI node was too slow due
to momentary heavy load, whereas the failure was actually due to a
problem in the Toolbx sources. A genuine problem in the sources should
be reflected as a FAILURE, not RETRY_LIMIT.
However, it's worth noting that 'meson compile' invokes 'go build',
which downloads all the Go modules required by the Toolbx sources. This
is worth retaining in the pre-run playbooks since it primarily depends
on Internet infrastructure beyond the Toolbx sources.
As a nice side-effect, the CI no longer gets mysteriously stuck like
this while the Go modules are being downloaded:
TASK [Build Toolbox]
ci-node-36 | ninja: Entering directory
`/home/zuul-worker/src/github.com/containers/toolbox/builddir'
...
ci-node-36 | [8/13] Generating doc/toolbox-rmi.1 with a custom command
ci-node-36 | [9/13] Generating doc/toolbox-run.1 with a custom command
ci-node-36 | [10/13] Generating doc/toolbox.conf.5 with a custom
command
ci-node-36 | [11/13] Generating src/toolbox with a custom command
https://github.com/containers/toolbox/pull/1158
... at https://containertoolbx.org/install/
There are some minor benefits to always invoking meson(1), as opposed to
directly invoking the underlying build backend, like 'ninja'.
It's one less command to be aware of. Secondly, in theory, Meson can be
used with backends other than Ninja (see 'meson configure'), even though
Ninja is the most likely option for building Toolbx because it's only
supported on Linux.
https://github.com/containers/toolbox/pull/1142
The -Dmigration_path_for_coreos_toolbox option enables a different code
path that's currently not tested by the CI at all. In fact, since it's
a build-time option, the corresponding code path is not even built by
the CI.
To properly support the -Dmigration_path_for_coreos_toolbox option, it
needs to be covered by the CI. This is a step in that direction by
running the unit tests on it.
https://github.com/containers/toolbox/pull/1095
A subsequent commit will introduce builds performed with the
-Dmigration_path_for_coreos_toolbox option to the CI. It will be good
to avoid duplicating the build and installation steps for builds with
and without the -Dmigration_path_for_coreos_toolbox option.
https://github.com/containers/toolbox/pull/1095
A subsequent commit will introduce builds performed with the
-Dmigration_path_for_coreos_toolbox option to the CI. It will be good
to avoid duplicating the installation of RPM packages, Git submodule
handling, and the listing of various debug and version information for
builds with and without -Dmigration_path_for_coreos_toolbox option.
https://github.com/containers/toolbox/pull/1095
It's only necessary to call 'systemd-tmpfiles --create' when building
and installing from source on the host operating system.
It's not needed when using a pre-built binary downstream package,
because:
* When 'meson install' is called as part of building the package,
that's not when the temporary files need to be created. They need
to be created when the binary package is later downloaded and
installed by the user.
* Downstream tools can sometimes handle it automatically. eg., on
Fedora, the systemd RPM installs a trigger that tells RPM to call
'systemd-tmpfiles --create' automatically when a tmpfiles.d snippet
is installed.
It's also not needed when installing inside a toolbox container because
the files that 'systemd-tmpfiles --create' is supposed to create are
meant to be on the host.
Downstream distributors set the DESTDIR environment variable when
building their packages. Therefore, it's used to detect when a
downstream package is being built.
Unfortunately, environment variables are messy and, generally, Meson
doesn't support accessing them inside its scripts [1]. Therefore, this
adds a spurious build-time dependency on systemd for downstream
distributors. However, that's probably not a big problem because all
supported downstream operating systems are already expected to use
systemd for the tmpfiles.d(5) snippets to work.
[1] https://github.com/mesonbuild/meson/issues/9https://github.com/containers/toolbox/issues/955
Some downstream distributors like RHEL don't have patchelf(1). Relying
on patchelf(1) during the build will make it difficult for such
downstreams to distribute Toolbox.
Fortunately, the path of the dynamic linker (ie., PT_INTERP) is
hardcoded in the ABI specification of each architecture [1]. This means
that Toolbox's build system can keep it's own architecture to dynamic
linker mapping, and specify it during the build through the GNU ld
linker's --dynamic-linker flag, as opposed to using a tool like
patchelf(1) to change the path of the dynamic linker in the built
binary to the one inside /run/host. Currently, the list of
architectures covers the ones that Fedora builds for.
[1] https://sourceware.org/glibc/wiki/ABIListhttps://github.com/containers/toolbox/pull/942
PR #897 made adjustmnets to the Toolbx binary that it requires presence
of /run/host in both the host filesystem and the filesystem in
a container.
The presence of the directory is assured by systemd-tmpfiles by
running it before the binary is started for the first time. For the run
to be effective 'data/tmpfiles.d/toolbox.conf' has to be installed in
a location visible to systemd-tmpfiles. Therefore, the call to
'systemd-tmpfiles --create' had to be placed after the install step.
https://github.com/containers/toolbox/pull/898
There is no significant benefit in keeping this configuration separated.
Now the to-be installed packages are tracked in a single place and the
test playbooks only call the relevant tests.
This was pointed out by in 6063eb27b9https://github.com/containers/toolbox/pull/898
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.
In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available. However, the Go implementation, can run into ABI
compatibility issues because binaries built on newer toolchains aren't
meant to be run against older runtimes.
The previous approach [1] of restricting the versions of the glibc
symbols that are linked against isn't actually supported by glibc, and
breaks if the early process start-up code changes. This is seen in
glibc-2.34, which is used by Fedora 35 onwards, where a new version of
the __libc_start_main symbol [2] was added as part of some security
hardening:
$ objdump -T ./usr/bin/toolbox | grep GLIBC_2.34
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
__libc_start_main
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
pthread_detach
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
pthread_create
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
pthread_attr_getstacksize
This means that /usr/bin/toolbox binaries built against glibc-2.34 on
newer Fedoras fail to run against older glibcs in older Fedoras.
Another option is to make the host's runtime available inside the
toolbox container and ensure that the binary always runs against it.
Luckily, almost all supported containers have the host's /usr available
at /run/host/usr. This is exploited by embedding RPATHs or RUNPATHs to
/run/host/usr/lib and /run/host/usr/lib64 in the binary, and changing
the path of the dynamic linker (ie., PT_INTERP) to the one inside
/run/host.
Unfortunately, there can only be one PT_INTERP entry inside the
binary, so there must be a /run/host on the host too. Therefore, a
/run/host symbolic link is created on the host that points to the
host's /.
Based on ideas from Alexander Larsson and Ray Strode.
[1] Commit 6ad9c63180https://github.com/containers/toolbox/pull/534
[2] glibc commit 035c012e32c11e84
https://sourceware.org/git/?p=glibc.git;a=commit;h=035c012e32c11e84https://sourceware.org/bugzilla/show_bug.cgi?id=23323https://github.com/containers/toolbox/issues/821
Since the rewrite of the system test suite[0] we've relied on the Zuul
playbooks for taking care of caching images using Skopeo for increasing
the reliability of the tests (in the past the instability of the Fedora
registry caused problems). This state is problematic if we want to use
the tests in other environments than the Zuul CI. This moves the caching
from Zuul into the system tests.
Currently, Bats does not support officially suite-wide setup and
teardown functions. The solution I chose was to add two new test files
that are executed before and after all tests. This may complicate the
execution of cherry-picked tests but that is not a very common use case
anyway.
The tests are now to some extent capable of adjusting to the host
environment. This is meant in the sense of: I'm running on RHEL, the
"default image" is UBI; I'm running on Fedora, the "default image" is
fedora-toolbox. This mechanism relies on os-release, which is the same
as what Toolbox itself uses.
[0] https://github.com/containers/toolbox/pull/517https://github.com/containers/toolbox/pull/774
The fedora-toolbox:32 image is the first of images in the renamed
toolbox image repository[0]. With the change we can drop the
pull_image_old() function because it was kept only for the old image.
Seems like newer version of ShellCheck checks the validity of variable
names (SC2153). This caused a false positive, so I silenced it.
[0] https://github.com/containers/toolbox/pull/615https://github.com/containers/toolbox/pull/780
Call "meson builddir" makes Meson create a build directory called
"builddir". It does not make it build the project. A subsequent call to
"meson compile" or "ninja" needs to be made. This subtle detail causes
a minor (purely visual) discrepancy in the CI output. Fix this for both
unit-test & system-test job definitions.
We now have some Go unit tests[0] and we should use them. By adding a
new test case to Meson, the existing CI job called "shellcheck" has no
longer an accurate name. With this it has been renamed to "unit-test".
Also, the job is now more important and therefore should also be used
for gating.
[0] https://github.com/containers/toolbox/pull/474https://github.com/containers/toolbox/pull/730
Since commit b27795a03e, each section of the test suite starts
and ends with a clean Podman state. This includes removing all images
from the local containers storage. Therefore, the images get downloaded
multiple times during the course of the test suite.
This commit restores the earlier behaviour where the images would get
downloaded only once, by copying them to separate directories outside
the local containers storage and then restoring them when the tests
are run.
https://github.com/containers/toolbox/pull/517https://github.com/containers/toolbox/pull/704