Currently, 'meson compile' and 'meson install' were being invoked from
pre-run playbooks. This meant that a genuine build failure from either
of those commands would be shown as a RETRY_LIMIT failure by the CI.
This was misleading. It made it look as if the failure was caused by
some transient networking problem or that the CI node was too slow due
to momentary heavy load, whereas the failure was actually due to a
problem in the Toolbx sources. A genuine problem in the sources should
be reflected as a FAILURE, not RETRY_LIMIT.
However, it's worth noting that 'meson compile' invokes 'go build',
which downloads all the Go modules required by the Toolbx sources. This
is worth retaining in the pre-run playbooks since it primarily depends
on Internet infrastructure beyond the Toolbx sources.
As a nice side-effect, the CI no longer gets mysteriously stuck like
this while the Go modules are being downloaded:
TASK [Build Toolbox]
ci-node-36 | ninja: Entering directory
`/home/zuul-worker/src/github.com/containers/toolbox/builddir'
...
ci-node-36 | [8/13] Generating doc/toolbox-rmi.1 with a custom command
ci-node-36 | [9/13] Generating doc/toolbox-run.1 with a custom command
ci-node-36 | [10/13] Generating doc/toolbox.conf.5 with a custom
command
ci-node-36 | [11/13] Generating src/toolbox with a custom command
https://github.com/containers/toolbox/pull/1158
... at https://containertoolbx.org/install/
There are some minor benefits to always invoking meson(1), as opposed to
directly invoking the underlying build backend, like 'ninja'.
It's one less command to be aware of. Secondly, in theory, Meson can be
used with backends other than Ninja (see 'meson configure'), even though
Ninja is the most likely option for building Toolbx because it's only
supported on Linux.
https://github.com/containers/toolbox/pull/1142
The -Dmigration_path_for_coreos_toolbox option enables a different code
path that's currently not tested by the CI at all. In fact, since it's
a build-time option, the corresponding code path is not even built by
the CI.
To properly support the -Dmigration_path_for_coreos_toolbox option, it
needs to be covered by the CI. This is a step in that direction by
running the unit tests on it.
https://github.com/containers/toolbox/pull/1095
A subsequent commit will introduce builds performed with the
-Dmigration_path_for_coreos_toolbox option to the CI. It will be good
to avoid duplicating the build and installation steps for builds with
and without the -Dmigration_path_for_coreos_toolbox option.
https://github.com/containers/toolbox/pull/1095
A subsequent commit will introduce builds performed with the
-Dmigration_path_for_coreos_toolbox option to the CI. It will be good
to avoid duplicating the installation of RPM packages, Git submodule
handling, and the listing of various debug and version information for
builds with and without -Dmigration_path_for_coreos_toolbox option.
https://github.com/containers/toolbox/pull/1095
It's only necessary to call 'systemd-tmpfiles --create' when building
and installing from source on the host operating system.
It's not needed when using a pre-built binary downstream package,
because:
* When 'meson install' is called as part of building the package,
that's not when the temporary files need to be created. They need
to be created when the binary package is later downloaded and
installed by the user.
* Downstream tools can sometimes handle it automatically. eg., on
Fedora, the systemd RPM installs a trigger that tells RPM to call
'systemd-tmpfiles --create' automatically when a tmpfiles.d snippet
is installed.
It's also not needed when installing inside a toolbox container because
the files that 'systemd-tmpfiles --create' is supposed to create are
meant to be on the host.
Downstream distributors set the DESTDIR environment variable when
building their packages. Therefore, it's used to detect when a
downstream package is being built.
Unfortunately, environment variables are messy and, generally, Meson
doesn't support accessing them inside its scripts [1]. Therefore, this
adds a spurious build-time dependency on systemd for downstream
distributors. However, that's probably not a big problem because all
supported downstream operating systems are already expected to use
systemd for the tmpfiles.d(5) snippets to work.
[1] https://github.com/mesonbuild/meson/issues/9https://github.com/containers/toolbox/issues/955
Some downstream distributors like RHEL don't have patchelf(1). Relying
on patchelf(1) during the build will make it difficult for such
downstreams to distribute Toolbox.
Fortunately, the path of the dynamic linker (ie., PT_INTERP) is
hardcoded in the ABI specification of each architecture [1]. This means
that Toolbox's build system can keep it's own architecture to dynamic
linker mapping, and specify it during the build through the GNU ld
linker's --dynamic-linker flag, as opposed to using a tool like
patchelf(1) to change the path of the dynamic linker in the built
binary to the one inside /run/host. Currently, the list of
architectures covers the ones that Fedora builds for.
[1] https://sourceware.org/glibc/wiki/ABIListhttps://github.com/containers/toolbox/pull/942
PR #897 made adjustmnets to the Toolbx binary that it requires presence
of /run/host in both the host filesystem and the filesystem in
a container.
The presence of the directory is assured by systemd-tmpfiles by
running it before the binary is started for the first time. For the run
to be effective 'data/tmpfiles.d/toolbox.conf' has to be installed in
a location visible to systemd-tmpfiles. Therefore, the call to
'systemd-tmpfiles --create' had to be placed after the install step.
https://github.com/containers/toolbox/pull/898
There is no significant benefit in keeping this configuration separated.
Now the to-be installed packages are tracked in a single place and the
test playbooks only call the relevant tests.
This was pointed out by in 6063eb27b9https://github.com/containers/toolbox/pull/898
The /usr/bin/toolbox binary is not only used to interact with toolbox
containers and images from the host. It's also used as the entry point
of the containers by bind mounting the binary from the host into the
container. This means that the /usr/bin/toolbox binary on the host must
also work inside the container, even if they have different operating
systems.
In the past, this worked perfectly well with the POSIX shell
implementation because it got intepreted by whichever /bin/sh was
available. However, the Go implementation, can run into ABI
compatibility issues because binaries built on newer toolchains aren't
meant to be run against older runtimes.
The previous approach [1] of restricting the versions of the glibc
symbols that are linked against isn't actually supported by glibc, and
breaks if the early process start-up code changes. This is seen in
glibc-2.34, which is used by Fedora 35 onwards, where a new version of
the __libc_start_main symbol [2] was added as part of some security
hardening:
$ objdump -T ./usr/bin/toolbox | grep GLIBC_2.34
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
__libc_start_main
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
pthread_detach
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
pthread_create
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.34
pthread_attr_getstacksize
This means that /usr/bin/toolbox binaries built against glibc-2.34 on
newer Fedoras fail to run against older glibcs in older Fedoras.
Another option is to make the host's runtime available inside the
toolbox container and ensure that the binary always runs against it.
Luckily, almost all supported containers have the host's /usr available
at /run/host/usr. This is exploited by embedding RPATHs or RUNPATHs to
/run/host/usr/lib and /run/host/usr/lib64 in the binary, and changing
the path of the dynamic linker (ie., PT_INTERP) to the one inside
/run/host.
Unfortunately, there can only be one PT_INTERP entry inside the
binary, so there must be a /run/host on the host too. Therefore, a
/run/host symbolic link is created on the host that points to the
host's /.
Based on ideas from Alexander Larsson and Ray Strode.
[1] Commit 6ad9c63180https://github.com/containers/toolbox/pull/534
[2] glibc commit 035c012e32c11e84
https://sourceware.org/git/?p=glibc.git;a=commit;h=035c012e32c11e84https://sourceware.org/bugzilla/show_bug.cgi?id=23323https://github.com/containers/toolbox/issues/821
Since the rewrite of the system test suite[0] we've relied on the Zuul
playbooks for taking care of caching images using Skopeo for increasing
the reliability of the tests (in the past the instability of the Fedora
registry caused problems). This state is problematic if we want to use
the tests in other environments than the Zuul CI. This moves the caching
from Zuul into the system tests.
Currently, Bats does not support officially suite-wide setup and
teardown functions. The solution I chose was to add two new test files
that are executed before and after all tests. This may complicate the
execution of cherry-picked tests but that is not a very common use case
anyway.
The tests are now to some extent capable of adjusting to the host
environment. This is meant in the sense of: I'm running on RHEL, the
"default image" is UBI; I'm running on Fedora, the "default image" is
fedora-toolbox. This mechanism relies on os-release, which is the same
as what Toolbox itself uses.
[0] https://github.com/containers/toolbox/pull/517https://github.com/containers/toolbox/pull/774
The fedora-toolbox:32 image is the first of images in the renamed
toolbox image repository[0]. With the change we can drop the
pull_image_old() function because it was kept only for the old image.
Seems like newer version of ShellCheck checks the validity of variable
names (SC2153). This caused a false positive, so I silenced it.
[0] https://github.com/containers/toolbox/pull/615https://github.com/containers/toolbox/pull/780
Call "meson builddir" makes Meson create a build directory called
"builddir". It does not make it build the project. A subsequent call to
"meson compile" or "ninja" needs to be made. This subtle detail causes
a minor (purely visual) discrepancy in the CI output. Fix this for both
unit-test & system-test job definitions.
We now have some Go unit tests[0] and we should use them. By adding a
new test case to Meson, the existing CI job called "shellcheck" has no
longer an accurate name. With this it has been renamed to "unit-test".
Also, the job is now more important and therefore should also be used
for gating.
[0] https://github.com/containers/toolbox/pull/474https://github.com/containers/toolbox/pull/730
Since commit b27795a03e, each section of the test suite starts
and ends with a clean Podman state. This includes removing all images
from the local containers storage. Therefore, the images get downloaded
multiple times during the course of the test suite.
This commit restores the earlier behaviour where the images would get
downloaded only once, by copying them to separate directories outside
the local containers storage and then restoring them when the tests
are run.
https://github.com/containers/toolbox/pull/517https://github.com/containers/toolbox/pull/704
The bats-support[0] and bats-assert[1] libraries extend the
capabilities of bats[2]. Mainly, bats-assert is very useful for clean
checking of values/outputs/return codes.
Apart from updating the cases to use the libraries, the test cases have
been restructured in a way that they don't depend on each other anymore.
This required major changes in the helpers.bats file.
Overall, the tests are cleaner to read and easier to extend due to the
test cases being independent.
Some slight changes were made to the test cases themselves. Should not
alter their final behaviour.
There will be a follow up commit that will take care of downloading of
the tested images locally and caching them using Skopeo to speedup the
tests and try to resolve network problems when pulling the images that
we experienced in the past.
[0] https://github.com/bats-core/bats-support
[1] https://github.com/bats-core/bats-assert
[2] https://github.com/bats-core/bats-core
The Ansible playbooks are small enough as they are. Splitting things
across too many files makes it harder to remember which file does what.
https://github.com/containers/toolbox/pull/653
The system tests for Fedora 33 were failing:
not ok 21 Remove all images (2 should be present; --force should not
be necessary)
# (from function `is' in file test/system/helpers.bash, line 287,
# in test file test/system/302-rmi.bats, line 7)
# `is "$output" "" "The output should be empty"' failed
# $ /usr/local/bin/toolbox rmi --all
# Error: image
3ac100502d2123aff1cf6314760c7a89c55108b8de6ea3c10ddc79a1479f0fca
has dependent children
# Error: image
4a6adf1f2a96adf5ea0c02b61f9fa574306f77fc522f39c2ce6bb164daead882
has dependent children
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: The output should be empty
# #| expected: '[no output]'
# #| actual: 'Error: image
3ac100502d2123aff1cf6314760c7a89c55108b8de6ea3c10ddc79a1479f0fca
has dependent children'
# #| > 'Error: image
4a6adf1f2a96adf5ea0c02b61f9fa574306f77fc522f39c2ce6bb164daead882
has dependent children'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fallout from ff4e4905dahttps://github.com/containers/toolbox/pull/642
Not all tests are the same and the ones we're currently running are
system tests. Also the mention of 'podman-stable' is not that important
because we're using the version in the 'stable' stream of Fedora
releases.
https://github.com/containers/toolbox/pull/508
In the Go implementation, when the 'rm' and 'rmi' commands fail to
remove a container or image, they don't use a non-zero exit code.
There's currently no nice fix for this. So, the tests have been
adjusted as a temporary measure.
https://github.com/containers/toolbox/pull/507
Toolbox requires Go 1.13, while Fedora 30 only has Go 1.12.17.
Therefore the test environment needs to be upgraded to something more
recent.
Otherwise, the test fails with:
note: module requires Go 1.13
The name of the go-md2man package changed in Fedora 31, and hence had
to be updated.
https://github.com/containers/toolbox/pull/442
Current Rawhide is actually version 33. So the appropriate image should
be pre-pulled.
Because of the old version of image being pulled, the tests were
failing.
This change adds a pre-run task to pull the fedora-toolbox images from
the registry to reduce the number of false positives caused by
'podman pull' failing to download them during the actual test.
Each section needs a separate playbook because they use different
versions of Fedora, and hence different default images.
https://github.com/containers/toolbox/pull/375
This adds several .yaml files that specify jobs (those in folder
playbooks) and one that serves as the main config (.zuul.yaml).
Tests and builds are currently executed on every change in PRs (ie.,
check and gating) and periodically (according to the documentation
this pipeline should be run at least once a day).
There are 4 tests in total:
1. 'ninja test' - does the same thing that Travis did
2. Fedora 30 - runs the system tests with current Podman and Toolbox
in Fedora 30
3. Fedora 31 - the same but for Fedora 31
4. Fedora Rawhide - the same but for Fedora Rawhide
https://github.com/containers/toolbox/issues/68