Honey, I shrunk the npm package

Have you ever wondered what lies beneath the surface of an npm package? At its heart, it’s nothing more than a gzipped tarball. Working in software development, source code and binary artifacts are nearly always shipped as .tar.gz or .tgz files. And gzip compression is supported by every HTTP server and web browser out there. caniuse.com doesn’t even give statistics for support, it just says “supported in effectively all browsers”. But here’s the kicker: gzip is starting to show its age, making way for newer, more modern compression algorithms like Brotli and ZStandard. Now, imagine a world where npm embraces one of these new algorithms. In this blog post, I’ll dive into the realm of compression and explore the possibilities of modernising npm’s compression strategy.

What’s the competition?

The two major players in this space are Brotli and ZStandard (or zstd for short). Brotli was released by Google in 2013 and zstd was released by Facebook in 2016. They’ve since been standardised, in RFC 7932 and RFC 8478 respectively, and have seen widespread use all over the software industry. It was actually the announcement by Arch Linux that they were going to start compressing their packages with zstd by default that made think about this in the first place. Arch Linux was by no means the first project, nor is it the only one. But to find out if it makes sense for the Node ecosystem, I need to do some benchmarks. And that means breaking out tar.

Benchmarking part 1

https://xkcd.com/1168/
https://xkcd.com/1168

I’m going to start with tar and see what sort of comparisons I can get by switching gzip, Brotli, and zstd. I’ll test with the npm package of npm itself as it’s a pretty popular package, averaging over 4 million downloads a week, while also being quite large at around 11MB unpacked.

1$ curl --remote-name https://registry.npmjs.org/npm/-/npm-9.7.1.tgz
2$ ls -l --human npm-9.7.1.tgz 
3-rw-r--r-- 1 jamie users 2.6M Jun 16 20:30 npm-9.7.1.tgz 
4$ tar --extract --gzip --file npm-9.7.1.tgz
5$ du --summarize --human --apparent-size package
611M	package

gzip is already giving good results, compressing 11MB to 2.6MB for a compression ratio of around 0.24. But what can the contenders do? I’m going to stick with the default options for now:

 1$ brotli --version
 2brotli 1.0.9
 3$ tar --use-compress-program brotli --create --file npm-9.7.1.tar.br package
 4$ zstd --version
 5*** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***
 6$ tar --use-compress-program zstd --create --file npm-9.7.1.tar.zst package
 7$ ls -l --human npm-9.7.1.tgz npm-9.7.1.tar.br npm-9.7.1.tar.zst 
 8-rw-r--r-- 1 jamie users 1.6M Jun 16 21:14 npm-9.7.1.tar.br
 9-rw-r--r-- 1 jamie users 2.3M Jun 16 21:14 npm-9.7.1.tar.zst
10-rw-r--r-- 1 jamie users 2.6M Jun 16 20:30 npm-9.7.1.tgz 

Wow! With no configuration both Brotli and zstd come out ahead of gzip, but Brotli is the clear winner here. It manages a compression ratio of 0.15 versus zstd’s 0.21. In real terms that means a saving of around 1MB. That doesn’t sound like much, but at 4 million weekly downloads, that would save 4TB of bandwidth per week.

Benchmarking part 2: Electric boogaloo

The compression ratio is only telling half of the story. Actually, it’s a third of the story, but compression speed isn’t really a concern. Compression of a package only happens once, when a package is published, but decompression happens every time you run npm install. So any time saved decompressing packages means quicker install or build steps.

To test this, I’m going to use hyperfine, a command-line benchmarking tool. Decompressing each of the packages I created earlier 100 times should give me a good idea of the relative decompression speed.

1$ hyperfine --runs 100 --export-markdown hyperfine.md \
2  'tar --use-compress-program brotli --extract --file npm-9.7.1.tar.br --overwrite' \
3  'tar --use-compress-program zstd --extract --file npm-9.7.1.tar.zst --overwrite' \
4  'tar --use-compress-program gzip --extract --file npm-9.7.1.tgz --overwrite'
CommandMean [ms]Min [ms]Max [ms]Relative
tar –use-compress-program brotli –extract –file npm-9.7.1.tar.br –overwrite51.6 ± 3.047.957.31.31 ± 0.12
tar –use-compress-program zstd –extract –file npm-9.7.1.tar.zst –overwrite39.5 ± 3.033.551.81.00
tar –use-compress-program gzip –extract –file npm-9.7.1.tgz –overwrite47.0 ± 1.744.054.91.19 ± 0.10

This time zstd comes out in front, followed by gzip and Brotli. This makes sense, as “real-time compression” is one of the big features that is touted in zstd’s documentation. While Brotli is 31% slower compared to zstd, in real terms it’s only 12ms. And compared to gzip, it’s only 5ms slower. To put that into context, you’d need a more than 1Gbps connection to make up for the 5ms loss it has in decompression compared with the 1MB it saves in package size.

Benchmarking part 3: This time it’s serious

Up until now I’ve just been looking at Brotli and zstd’s default settings, but both have a lot of knobs and dials that you can adjust to change the compression ratio and compression or decompression speed. Thankfully, the industry standard lzbench has got me covered. It can run through all of the different quality levels for each compressor, and spit out a nice table with all the data at the end.

But before I dive in, there are a few caveats I should point out. The first is that lzbench isn’t able to compress an entire directory like tar , so I opted to use lib/npm.js for this test. The second is that lzbench doesn’t include the gzip tool. Instead it uses zlib, the underlying gzip library. The last is that the versions of each compressor aren’t quite current. The latest version of zstd is 1.5.5, released on April 4th 2023, whereas lzbench uses version 1.4.5, released on May 22nd 2020. The latest version of Brotli is 1.0.9, released on August 27th 2020, whereas lzbench uses a version released on October 1st 2019.

1$ lzbench -o1 -ezlib/zstd/brotli package/lib/npm.js
Click to expand results
Compressor nameCompressionDecompress.Compr. sizeRatioFilename
memcpy117330 MB/s121675 MB/s13141100.00package/lib/npm.js
zlib 1.2.11 -1332 MB/s950 MB/s500038.05package/lib/npm.js
zlib 1.2.11 -2382 MB/s965 MB/s487637.11package/lib/npm.js
zlib 1.2.11 -3304 MB/s986 MB/s477436.33package/lib/npm.js
zlib 1.2.11 -4270 MB/s1009 MB/s453934.54package/lib/npm.js
zlib 1.2.11 -5204 MB/s982 MB/s445233.88package/lib/npm.js
zlib 1.2.11 -6150 MB/s983 MB/s442533.67package/lib/npm.js
zlib 1.2.11 -7125 MB/s983 MB/s442133.64package/lib/npm.js
zlib 1.2.11 -892 MB/s989 MB/s441933.63package/lib/npm.js
zlib 1.2.11 -995 MB/s986 MB/s441933.63package/lib/npm.js
zstd 1.4.5 -1594 MB/s1619 MB/s479336.47package/lib/npm.js
zstd 1.4.5 -2556 MB/s1423 MB/s488137.14package/lib/npm.js
zstd 1.4.5 -3510 MB/s1560 MB/s468635.66package/lib/npm.js
zstd 1.4.5 -4338 MB/s1584 MB/s451034.32package/lib/npm.js
zstd 1.4.5 -5275 MB/s1647 MB/s445533.90package/lib/npm.js
zstd 1.4.5 -6216 MB/s1656 MB/s443933.78package/lib/npm.js
zstd 1.4.5 -7140 MB/s1665 MB/s442233.65package/lib/npm.js
zstd 1.4.5 -8101 MB/s1714 MB/s441633.60package/lib/npm.js
zstd 1.4.5 -997 MB/s1673 MB/s441033.56package/lib/npm.js
zstd 1.4.5 -1097 MB/s1672 MB/s441033.56package/lib/npm.js
zstd 1.4.5 -1137 MB/s1665 MB/s437133.26package/lib/npm.js
zstd 1.4.5 -1227 MB/s1637 MB/s433633.00package/lib/npm.js
zstd 1.4.5 -1320 MB/s1601 MB/s431032.80package/lib/npm.js
zstd 1.4.5 -1418 MB/s1582 MB/s430932.79package/lib/npm.js
zstd 1.4.5 -1518 MB/s1582 MB/s430932.79package/lib/npm.js
zstd 1.4.5 -169.03 MB/s1556 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -178.86 MB/s1559 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -188.86 MB/s1558 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -198.86 MB/s1559 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -208.85 MB/s1558 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -218.86 MB/s1559 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -228.86 MB/s1589 MB/s430532.76package/lib/npm.js
brotli 2019-10-01 -0604 MB/s813 MB/s518239.43package/lib/npm.js
brotli 2019-10-01 -1445 MB/s775 MB/s514839.18package/lib/npm.js
brotli 2019-10-01 -2347 MB/s947 MB/s472735.97package/lib/npm.js
brotli 2019-10-01 -3266 MB/s936 MB/s464535.35package/lib/npm.js
brotli 2019-10-01 -4164 MB/s930 MB/s455934.69package/lib/npm.js
brotli 2019-10-01 -5135 MB/s944 MB/s427632.54package/lib/npm.js
brotli 2019-10-01 -6129 MB/s949 MB/s425732.39package/lib/npm.js
brotli 2019-10-01 -7103 MB/s953 MB/s424432.30package/lib/npm.js
brotli 2019-10-01 -884 MB/s919 MB/s424032.27package/lib/npm.js
brotli 2019-10-01 -97.74 MB/s958 MB/s423732.24package/lib/npm.js
brotli 2019-10-01 -104.35 MB/s690 MB/s391629.80package/lib/npm.js
brotli 2019-10-01 -111.59 MB/s761 MB/s380828.98package/lib/npm.js

This pretty much confirms what I’ve shown up to now. zstd is able to provide faster decompression speed than either gzip or Brotli, and slightly edge out gzip in compression ratio. Brotli, on the other hand, has comparable decompression speeds and compression ratio with gzip at lower quality levels, but at levels 10 and 11 it’s able to edge out both gzip and zstd’s compression ratio.

Everything is derivative

Now that I’ve finished with benchmarking, I need to step back and look at my original idea of replacing gzip as npm’s compression standard. As it turns out, Evan Hahn had a similar idea in 2022 and proposed an npm RFC. He proposed using Zopfli, a backwards-compatible gzip compression library, and Brotli’s older (and cooler 😎) sibling. Zopfli is able to produce smaller artifacts with the trade-off of a much slower compression speed. In theory an easy win for the npm ecosystem. And if you watch the RFC meeting recording or read the meeting notes, everyone seems hugely in favour of the proposal. However, the one big roadblock that prevents this RFC from being immediately accepted, and ultimately results in it being abandoned, is the lack of a native JavaScript implementation.

Learning from this earlier RFC and my results from benchmarking Brotli and zstd, what would it take to build a strong RFC of my own?

Putting it all together

Both Brotli and zstd’s reference implementations are written in C. And while there are a lot of ports on the npm registry using Emscripten or WASM, Brotli has an implementation in Node.js’s zlib module, and has done since Node.js 10.16.0, released in May 2019. I opened an issue in Node.js’s GitHub repo to add support for zstd, but it’ll take a long time to make its way into an LTS release, nevermind the rest of npm’s dependency chain. I was already leaning towards Brotli, but this just seals the deal.

Deciding on an algorithm is one thing, but implementing it is another. npm’s current support for gzip compression ultimately comes from Node.js itself. But the dependency chain between npm and Node.js is long and slightly different depending on if you’re packing or unpacking a package.

The dependency chain for packing, as in npm pack or npm publish, is:

npmlibnpmpackpacotetarminizlibzlib (Node.js)

But the dependency chain for unpacking (or ‘reifying’ as npm calls it), as in npm install or npm ci is:

npm@npmcli/arboristpacotetarminizlibzlib (Node.js)

That’s quite a few packages that need to be updated, but thankfully the first steps have already been taken. Support for Brotli was added to minizlib 1.3.0 back in September 2019. I built on top of that and contributed Brotil support to tar. That is now available in version 6.2.0. It may take a while, but I can see a clear path forward.

The final issue is backwards compatibility. This wasn’t a concern with Evan Hahn’s RFC, as Zopfli generates backwards-compatible gzip files. However, Brotli is an entirely new compression format, so I’ll need to propose a very careful adoption plan. The process I can see is:

  1. Support for packing and unpacking is added in a minor release of the current version of npm
    1. Unpacking using Brotli is handled transparently
    2. Packing using Brotli is disabled by default and only enabled if one of the following are true:
      1. The engines field in package.json is set to a version of npm that supports Brotli
      2. The engines field in package.json is set to a version of node that bundles a version of npm that supports Brotli
      3. Brotli support is explicitly enabled in .npmrc
  2. Packing using Brotli is enabled by default in the next major release of npm after the LTS version of Node.js that bundles it goes out of support

Let’s say that Node.js 22 comes with npm 10, which has Brotli support. Node.js 22 will stop getting LTS updates in April 2027. Then, the next major version of npm after that date should enable Brotli packing by default.

I admit that this is an incredibly long transition period. However, it will guarantee that if you’re using a version of Node.js that is still being supported, there will be no visible impact to you. And it still allows early adopters to opt-in to Brotli support. But if anyone has other ideas about how to do this transition, I am open to suggestions.

What’s next?

As I wrap up my exploration into npm compression, I must admit that my journey has only just begun. To push the boundaries further, there are a lot more steps. First and foremost, I need to do some more extensive benchmarking with the top 250 most downloaded npm packages, instead of focusing on a single package. Once that’s complete, I need to draft an npm RFC and seek feedback from the wider community. If you’re interested in helping out, or just want to see how it’s going, you can follow me on Mastodon at @[email protected], or on Twitter at @Jamie_Magee.

Container Plumbing Days 2023—Windows containers: The forgotten stepchild

When it comes to Linux containers, there are plenty of tools out there that can scan container images, generate Software Bill of Materials (SBOM), or list vulnerabilities. However, Windows container images are more like the forgotten stepchild in the container ecosystem. And that means we’re forgetting the countless developers using Windows containers, too.

Instead of allowing this gap to widen further, container tool authors—especially SBOM tools and vulnerability scanners—need to add support for Windows container images.

In my presentation at Container Plumbing Days 2023 I showed how to extract version information from Windows containers images that can be used to generate SBOMs, as well as how to integrate with the Microsoft Security Updates API which can provide detailed vulnerability information.

Your Jest tests might be wrong

Is your Jest test suite failing you? You might not be using the testing framework’s full potential, especially when it comes to preventing state leakage between tests. The Jest settings clearMocks, resetMocks, restoreMocks, and resetModules are set to false by default. If you haven’t changed these defaults, your tests might be fragile, order-dependent, or just downright wrong. In this blog post, I’ll dig into what each setting does, and how you can fix your tests.

clearMocks

First up is clearMocks:

Automatically clear mock calls, instances, contexts and results before every test. Equivalent to calling jest.clearAllMocks() before each test. This does not remove any mock implementation that may have been provided.

Every Jest mock has some context associated with it. It’s how you’re able to call functions like mockReturnValueOnce instead of only mockReturnValue. But if clearMocks is false by default, then that context can be carried between tests.

Take this example function:

1export function randomNumber() {
2  return Math.random();
3}

And this simple test for it:

 1jest.mock('.');
 2
 3const { randomNumber } = require('.');
 4
 5describe('tests', () => {
 6    randomNumber.mockReturnValue(42);
 7  
 8    it('should return 42', () => {
 9        const random = randomNumber();
10    
11        expect(random).toBe(42);
12        expect(randomNumber).toBeCalledTimes(1)
13    });
14});

The test passes and works as expected. However, if we add another test to our test suite:

 1jest.mock('.');
 2
 3const { randomNumber } = require('.');
 4
 5describe('tests', () => {
 6    randomNumber.mockReturnValue(42);
 7  
 8    it('should return 42', () => {
 9        const random = randomNumber();
10    
11        expect(random).toBe(42);
12        expect(randomNumber).toBeCalledTimes(1)
13    });
14    
15    it('should return same number', () => {
16        const random1 = randomNumber();
17        const random2 = randomNumber();
18    
19        expect(random1).toBe(42);
20        expect(random2).toBe(42);
21    
22        expect(randomNumber).toBeCalledTimes(2)
23    });
24});

Our second test fails with the error:

1Error: expect(jest.fn()).toBeCalledTimes(expected)
2
3Expected number of calls: 2
4Received number of calls: 3

And even worse, if we change the order of our tests:

 1jest.mock('.');
 2
 3const { randomNumber } = require('.');
 4
 5describe('tests', () => {
 6    randomNumber.mockReturnValue(42);
 7  
 8    it('should return same number', () => {
 9        const random1 = randomNumber();
10        const random2 = randomNumber();
11    
12        expect(random1).toBe(42);
13        expect(random2).toBe(42);
14    
15        expect(randomNumber).toBeCalledTimes(2)
16    });
17  
18    it('should return 42', () => {
19        const random = randomNumber();
20    
21        expect(random).toBe(42);
22        expect(randomNumber).toBeCalledTimes(1)
23    });
24});

We get the same error as before, but this time for should return 42 instead of should return same number.

Enabling clearMocks in your Jest configuration ensures that every mock’s context is reset between tests. You can achieve the same result by adding jest.clearAllMocks() to your beforeEach() functions. But this isn’t a great idea as it means you have to remember to add it to each test file to make your tests safe, instead of using clearMocks to make them all safe by default.

resetMocks

Next up is resetMocks:

Automatically reset mock state before every test. Equivalent to calling jest.resetAllMocks() before each test. This will lead to any mocks having their fake implementations removed but does not restore their initial implementation.

resetMocks takes clearMocks a step further, by clearing the implementation of any mocks. However, you need to use it in addition to clearMocks.

Going back to my first example again, I’m going to move the mock setup inside the first test case randomNumber.mockReturnValue(42);.

 1describe('tests', () => {
 2    it('should return 42', () => {
 3        randomNumber.mockReturnValue(42);
 4        const random = randomNumber();
 5
 6        expect(random).toBe(42);
 7        expect(randomNumber).toBeCalledTimes(1)
 8    });
 9
10    it('should return 42 twice', () => {
11        const random1 = randomNumber();
12        const random2 = randomNumber();
13
14        expect(random1).toBe(42);
15        expect(random2).toBe(42);
16
17        expect(randomNumber).toBeCalledTimes(2)
18    });
19});

Logically, you might expect this to fail, but it passes! Jest mocks are global to the file they’re in. It doesn’t matter what describe, it, or test scope you use. And if I change the order of tests again, they fail. This makes it very easy to write tests that leak state and are order-dependent.

Enabling resetMocks in your Jest context ensures that every mock implementation is reset between tests. Like before, you can also add jest.resetAllMocks() to beforeEach() in every test file. But it’s a much better idea to make your tests safe by default instead of having to opt-in to safe tests.

restoreMocks

Next is restoreMocks:

Automatically restore mock state and implementation before every test. Equivalent to calling jest.restoreAllMocks() before each test. This will lead to any mocks having their fake implementations removed and restores their initial implementation.

restoreMocks takes test isolation and safety to the next level.

Let me rewrite my example a little bit, so instead of mocking the function directly, I’m mocking Math.random() instead.

 1const { randomNumber } = require('.');
 2
 3const spy = jest.spyOn(Math, 'random');
 4
 5describe('tests', () => {
 6    it('should return 42', () => {
 7        spy.mockReturnValue(42);
 8        const random = randomNumber();
 9
10        expect(random).toBe(42);
11        expect(spy).toBeCalledTimes(1)
12    });
13
14    it('should return 42 twice', () => {
15        spy.mockReturnValue(42);
16
17        const random1 = randomNumber();
18        const random2 = randomNumber();
19
20        expect(random1).toBe(42);
21        expect(random2).toBe(42);
22
23        expect(spy).toBeCalledTimes(2)
24    });
25});

With clearMocks and resetMocks enabled, and restoreMocks disabled, my tests pass. But if I enable restoreMocks both tests fail with an error message like:

1Error: expect(received).toBe(expected) // Object.is equality
2
3Expected: 42
4Received: 0.503533695686772

restoreMocks has restored the original implementation of Math.random() before each test, so now I’m getting an actual random number instead of my mocked return value of 42. This forces me to be explicit about not only the mocked return values I’m expecting, but the mocks themselves.

To fix my tests I can set up my Jest mocks in each individual test.

 1describe('tests', () => {
 2    it('should return 42', () => {
 3        const spy = jest.spyOn(Math, 'random').mockReturnValue(42);
 4        const random = randomNumber();
 5
 6        expect(random).toBe(42);
 7        expect(spy).toBeCalledTimes(1)
 8    });
 9
10    it('should return 42 twice', () => {
11        const spy = jest.spyOn(Math, 'random').mockReturnValue(42);
12
13        const random1 = randomNumber();
14        const random2 = randomNumber();
15
16        expect(random1).toBe(42);
17        expect(random2).toBe(42);
18
19        expect(spy).toBeCalledTimes(2)
20    });
21});

resetModules

Finally, we have resetModules:

By default, each test file gets its own independent module registry. Enabling resetModules goes a step further and resets the module registry before running each individual test. This is useful to isolate modules for every test so that the local module state doesn’t conflict between tests. This can be done programmatically using jest.resetModules().

Again, this builds on top of clearMocks, resetMocks, and restoreMocks. I don’t think this level of isolation is required for most tests, but I’m a completionist.

Let’s take my example from above and expand it to include some initialization that needs to happen before I can call randomNumber. Maybe I need to make sure there’s enough entropy to generate random numbers? My module might look something like this:

 1let isInitialized = false;
 2
 3export function initialize() {
 4    isInitialized = true;
 5}
 6
 7export function randomNumber() {
 8    if (!isInitialized) {
 9        throw new Error();
10    }
11
12    return Math.random();
13}

I also want to write some tests to make sure that this works as expected:

 1const random = require('.');
 2
 3describe('tests', () => {
 4    it('does not throw when initialized', () => {
 5        expect(() => random.initialize()).not.toThrow();
 6    });
 7
 8    it('throws when not initialized', () => {
 9        expect(() => random.randomNumber()).toThrow();
10    });
11});

initialize shouldn’t throw an error, but randomNumber should throw an error if initialize isn’t called first. Great! Except it doesn’t work. Instead I get:

1Error: expect(received).toThrow()
2
3Received function did not throw

That’s because without enabling resetModules, the module is shared between all tests in the file. So when I called random.initialize() in my first test, isInitialized is still true for my second test. But once again, if I were to switch the order of my tests in the file, they would both pass. So my tests are order-dependent again!

Enabling resetModules will give each test in the file a fresh version of the module for each test. Though, this might actually be a case where you want to use jest.resetAllModules() in your beforeEach() instead of enabling it globally. This kind of isolation isn’t required for every test. And if you’re using import instead of require, the syntax can get very awkward very quickly if you’re trying to avoid an 'import' and 'export' may only appear at the top level error.

TL;DR reset all of the things

By default, Jest tests are only isolated at the file level. If you really want to make sure your tests are safe and isolated, add this to your Jest config:

1{
2  clearMocks: true,
3  resetMocks: true,
4  restoreMocks: true,
5  resetModules: true // It depends
6}

There is a suggestion to make this part of the default configuration. But until then, you’ll have to do it yourself.

Maintaining AUR packages with Renovate

One big advantage that Arch Linux has over other distributions, apart from being able to say “BTW I use Arch.”, is the Arch User Repository (AUR). It’s a community-driven repository with over 80,000 packages. If you’re looking for a package, chances are you’ll find it in the AUR.

Keeping all those packages up to date, takes a lot of manual effort by a lot of volunteers. People have created and used tools, like urlwatch and aurpublish, to let them know when upstream releases are cut and automate some parts of the process. I know I do. But I wanted to automate the entire process. I think Renovate can help here.

Updating versions with Renovate

Renovate is an automated dependency update tool. You might have seen it opening pull requests on GitHub and making updates for npm or other package managers, but it’s a lot more powerful than just that.

Renovate has a couple of concepts that I need to explain first: datasources and managers. Datasources define where to look for new versions of a dependency. Renovate comes with over 50 different datasources, but the one that is important for AUR packages is the git-tags datasource. Managers are the Renovate concept for package managers. There isn’t an AUR or PKGBUILD manager, but there is a regex manager that I can use.

I can create a renovate.json configuration with the following regex manager configuration:

 1{
 2  "regexManagers": [
 3    {
 4      "fileMatch": ["(^|/)PKGBUILD$"],
 5      "matchStrings": [
 6        "pkgver=(?<currentValue>.*) # renovate: datasource=(?<datasource>.*) depName=(?<depName>.*)"
 7      ],
 8      "extractVersionTemplate": "^v?(?<version>.*)$"
 9    }
10  ]
11}

Breaking that down:

  • The fileMatch setting tells Renovate to look for any PKGBUILD files in a repository
  • The matchStrings is the regex format to extract the version, datasource, and dependency name from the PKGBUILD
  • The extractVersionTemplate is to handle a “v” in front of any version number that is sometimes added to Git tags

And here’s an extract from the PKGBUILD for the bicep-bin AUR package that I maintain:

1pkgver=0.15.31 # renovate: datasource=github-tags depName=Azure/bicep

Here I’m configuring Renovate to use the github-tags datasource and to look in the Azure/bicep GitHub repository for new versions. That means it’ll look in the list of tags for the Azure/bicep repository for any new versions. If Renovate finds any new versions, it’ll automatically update the PKGBUILD and open a pull request with the updated version.

So I’ve automated the PKGBUILD update, but that’s only half of the work. The checksums and .SRCINFO must be updated before pushing to the AUR. Unfortunately, Renovate can’t do that (yet, see Renovate issue #16923), but GitHub Actions can!

Updating checksums and .SRCINFO with GitHub Actions

Updating the checksums with updpkgsums is easy, and generating an updated .SRCINFO with makepkg --printsrcinfo > .SRCINFO is straightforward too. But doing that for a whole repository of AUR packages is going to be a little trickier. So let me build up the GitHub actions workflow step-by-step.

First, I only want to run this workflow on pull requests targeting the main branch.

1on:
2  pull_request:
3    types:
4      - opened
5      - synchronize
6    branches:
7      - main

Next, I’m going to need to check out the entire history of the repository, so I can compare the files changed in the latest commit with the Git history.

1jobs:
2  updpkgsums:
3    runs-on: ubuntu-latest
4    steps:
5      - name: Checkout
6        uses: actions/checkout@ac593985615ec2ede58e132d2e21d2b1cbd6127c # v3.3.0
7        with:
8          fetch-depth: 0
9          ref: ${{ github.ref }}

Getting the package that changed in a pull request requires a little bit of shell magic.

1- name: Find updated package
2  run: |
3    #!/usr/bin/env bash
4    set -euxo pipefail
5
6    echo "pkgbuild=$(git diff --name-only origin/main origin/${GITHUB_HEAD_REF} "*PKGBUILD" | head -1 | xargs dirname)" >> $GITHUB_ENV    

Now I’ve found the package that changed in the Renovate pull request, I can update the files.

This step in the workflow uses a private GitHub Action that I have in my aur-packages repository. I’m not going to break it down here, but at its core it runs updpkgsums and makepkg --printsrcinfo > .SRCINFO with a little extra configuration required to run Arch Linux on GitHub Actions runners. You can check out the full code on GitHub.

1- name: Validate package
2  if: ${{ env.pkgbuild != '' }}
3  uses: ./.github/actions/aur
4  with:
5    action: 'updpkgsums'
6    pkgname: ${{ env.pkgbuild }}

Finally, once the PKGBUILD and .SRCINFO are updated I need to commit that change back to the pull request.

1- name: Commit
2  if: ${{ env.pkgbuild != '' }}
3  uses: stefanzweifel/git-auto-commit-action@3ea6ae190baf489ba007f7c92608f33ce20ef04a # v4.16.0
4  with:
5    file_pattern: '*/PKGBUILD */.SRCINFO'

Check out this pull request for bicep-bin where Renovate opened a pull request, and my GitHub Actions workflow updated the b2sums in the PKGBUILD and updated the .SRCINFO.

But why stop there? Let’s talk about publishing.

Publishing to the AUR

Each AUR package is its own Git repository. So to update a package in the AUR, I only need to push a new commit with the updated PKGBUILD and .SRCINFO. Thankfully, KSXGitHub created the github-actions-deploy-aur GitHub Action to streamline the whole process.

If I create a new GitHub Actions workflow to publish to the AUR, I can reuse the first two steps from my previous workflow to check out the repository and find the updated package. Then all I need to do is to use the github-actions-deploy-aur GitHub Action:

1- name: Publish package
2  uses: KSXGitHub/github-actions-deploy-aur@065b6056b25bdd43830d5a3f01899d0ff7169819 # v2.6.0
3  if: ${{ env.pkgbuild != '' }}
4  with:
5    pkgname: ${{ env.pkgbuild }}
6    pkgbuild: ${{ env.pkgbuild }}/PKGBUILD
7    commit_username: ${{ secrets.AUR_USERNAME }}
8    commit_email: ${{ secrets.AUR_EMAIL }}
9    ssh_private_key: ${{ secrets.AUR_SSH_PRIVATE_KEY }}

All together now

If you own any AUR packages and want to automate some of the maintenance burden, check out my AUR packages template GitHub repository. It contains all of the steps I showed in this blog post. And if you want to see how it works in practice, check out my AUR packages GitHub repository.

Scanning Windows container images is (surprisingly) easy!

When it comes to Linux containers, there are plenty of tools out there that can scan container images, generate Software Bill of Materials (SBOM), or list vulnerabilities. However, Windows container images are more like the forgotten stepchild in the container ecosystem. And that means we’re forgetting the countless developers using Windows containers, too.

I wanted to see what I’d need to make scanning tools for Windows container images. Turns out it’s pretty easy. So easy, in fact, I think the existing container tools should add support for Windows container images.

What version of Windows am I running?

The first question I needed to answer was: what version of Windows was the container image based on? This tells me what date the container image is from, what updates are applicable, and what vulnerabilities it has.

Container images are really just tar files, and Windows container images are no different. So first I saved a Windows container image locally using skopeo:

1$ skopeo --insecure-policy --override-os windows copy docker://mcr.microsoft.com/windows/nanoserver:ltsc2022 dir:///tmp/nanoserver
2$ ls /tmp/nanoserver
30db1879370e5c72dae7bff5d013772cbbfb95f30bfe1660dcef99e0176752f1c  7d843aa7407d9a5b1678482851d2e81f78b08185b72c18ffb6dfabcfed383858 manifest.json version

Next, I inspected the manifest using jq to find the layer that had the Windows files.

 1$ jq . manifest.json
 2{
 3  "schemaVersion": 2,
 4  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
 5  "config": {
 6    "mediaType": "application/vnd.docker.container.image.v1+json",
 7    "size": 638,
 8    "digest": "sha256:0db1879370e5c72dae7bff5d013772cbbfb95f30bfe1660dcef99e0176752f1c"
 9  },
10  "layers": [
11    {
12      "mediaType": "application/vnd.docker.image.rootfs.foreign.diff.tar",
13      "size": 304908800,
14      "digest": "sha256:7d843aa7407d9a5b1678482851d2e81f78b08185b72c18ffb6dfabcfed383858"
15    }
16  ]
17}

I then extracted the layer and fixed the permissions.

 1$ mkdir layer
 2$ tar -xf 7d843aa7407d9a5b1678482851d2e81f78b08185b72c18ffb6dfabcfed383858 -C ./layer/
 3$ sudo find ./layer -type f -exec chmod 0644 {} \;
 4$ sudo find ./layer -type d -exec chmod 0755 {} \;
 5$ ls -lah layer/
 6total 16K
 7drwxr-xr-x 4 jamie users 4.0K Dec 28 15:05 .
 8drwxr-xr-x 3 jamie users 4.0K Dec 28 15:00 ..
 9drwxr-xr-x 5 jamie users 4.0K Dec  9 01:18 Files
10drwxr-xr-x 3 jamie users 4.0K Dec  9 01:22 UtilityVM
11$ ls -lah layer/Files/
12total 28K
13drwxr-xr-x  5 jamie users 4.0K Dec  9 01:18 .
14drwxr-xr-x  4 jamie users 4.0K Dec 28 15:05 ..
15-rw-r--r--  1 jamie users 5.6K Dec  9 01:18 License.txt
16drwxr-xr-x  4 jamie users 4.0K May  7  2021 ProgramData
17drwxr-xr-x  6 jamie users 4.0K Dec  9 01:19 Users
18drwxr-xr-x 20 jamie users 4.0K Dec  9 01:19 Windows

Inside the extracted layer there are two directories: Files and UtilityVM. Files had the filesystem of the Windows container image, while UtilityVM is used by Hyper-V behind the scenes. So I just needed to focus on Files.

How did I figure out the specific version of Windows the container is running? From the registry of course! The SOFTWARE registry hive contained information about installed software, including Windows itself, and was found at Files/Windows/System32/config/SOFTWARE.

Thankfully, there’s a great NuGet package called Registry that let me easily load and parse the registry, but there are also packages for Go, Rust, and even Node.js.

1using Registry;
2
3var registryHive = new RegistryHive("/tmp/nanoserver/layer/Files/Windows/System32/config/SOFTWARE");
4registryHive.ParseHive();
5var currentVersion = registryHive.GetKey(@"Microsoft\Windows NT\CurrentVersion");
6var fullVersion =
7    $"{currentVersion.GetValue("CurrentMajorVersionNumber")}.{currentVersion.GetValue("CurrentMinorVersionNumber")}.{currentVersion.GetValue("CurrentBuildNumber")}.{currentVersion.GetValue("UBR")}";
8Console.WriteLine(fullVersion);

Running this code, I got version 10.0.20348.1366 which was apparently released on 13th December 2022.

What about Windows updates?

The version of Windows doesn’t tell the whole story. There are also updates that can be applied on top. You might have seen them referred to by their KB number, for example KB1234567. Information on what updates have been applied is also stored in the registry.

By extending my earlier code, I can find out what updates this container image has.

 1var packages = registryHive.GetKey(@"Microsoft\Windows\CurrentVersion\Component Based Servicing\Packages");
 2var updatePackageRegex = new Regex(@"^Package_\d+_for_(KB\d+)~\w{16}~\w+~~((?:\d+\.){3}\d+)$");
 3
 4var updates = new Dictionary<string, string>();
 5foreach (var packageKey in packages.SubKeys)
 6{
 7    if (!updatePackageRegex.IsMatch(packageKey.KeyName))
 8    {
 9        continue;
10    }
11
12    var currentState = packageKey.Values.Find(v => v.ValueName == "CurrentState")?.ValueData;
13
14    // Installed
15    if (currentState == "112")
16    {
17        var groups = updatePackageRegex.Match(packageKey.KeyName).Groups;
18        updates[groups[1].Value] = groups[2].Value;
19    }
20}
21
22foreach (var update in updates)
23{
24    Console.WriteLine($"{update.Key}: {update.Value}");
25}

Running this gave me a single update: KB5020373: 20348.1300.1.0. Searching online for KB5020373 led me to the documentation for the update. It’s the November 2022 security update for .NET Framework and has a fix for CVE-2022-41064.

Done! …Now what if we scaled this?

It turns out it’s not that difficult to find out info about Windows container images. It took me a couple of hours to figure out, but that’s only because no one seems to have done this before. The actual code is only about 30 lines.

Windows containers are widely used for legacy applications, like .NET Framework applications, that haven’t been rewritten but could benefit from the cloud. All of the big three cloud providers offer managed Kubernetes services that support Windows nodes out of the box (yes, Kubernetes supports Windows nodes). There is clearly a demand for Windows containers, but there is a gap in the kind of container tooling that has sprung up for Linux containers.

Instead of allowing this gap to widen further, I think that container tool authors—especially SBOM tools and vulnerability scanners—should add support for Windows container images. These tools should then correlate the extracted information with the Microsoft Security Research Center (MSRC) API. MSRC publishes information every month on security updates. Comparing the Windows version from a container image with the fixed versions provided by the MSRC API, you could easily see your container image’s security vulnerabilities.

As my proof-of-concept has shown, it’s low-hanging fruit. A small addition that would have a big impact for the many forgotten developers and the applications they work on.