Honey, I shrunk the npm package

Have you ever wondered what lies beneath the surface of an npm package? At its heart, it’s nothing more than a gzipped tarball. Working in software development, source code and binary artifacts are nearly always shipped as .tar.gz or .tgz files. And gzip compression is supported by every HTTP server and web browser out there. caniuse.com doesn’t even give statistics for support, it just says “supported in effectively all browsers”. But here’s the kicker: gzip is starting to show its age, making way for newer, more modern compression algorithms like Brotli and ZStandard. Now, imagine a world where npm embraces one of these new algorithms. In this blog post, I’ll dive into the realm of compression and explore the possibilities of modernising npm’s compression strategy.

What’s the competition?

The two major players in this space are Brotli and ZStandard (or zstd for short). Brotli was released by Google in 2013 and zstd was released by Facebook in 2016. They’ve since been standardised, in RFC 7932 and RFC 8478 respectively, and have seen widespread use all over the software industry. It was actually the announcement by Arch Linux that they were going to start compressing their packages with zstd by default that made think about this in the first place. Arch Linux was by no means the first project, nor is it the only one. But to find out if it makes sense for the Node ecosystem, I need to do some benchmarks. And that means breaking out tar.

Benchmarking part 1

https://xkcd.com/1168/
https://xkcd.com/1168

I’m going to start with tar and see what sort of comparisons I can get by switching gzip, Brotli, and zstd. I’ll test with the npm package of npm itself as it’s a pretty popular package, averaging over 4 million downloads a week, while also being quite large at around 11MB unpacked.

1$ curl --remote-name https://registry.npmjs.org/npm/-/npm-9.7.1.tgz
2$ ls -l --human npm-9.7.1.tgz 
3-rw-r--r-- 1 jamie users 2.6M Jun 16 20:30 npm-9.7.1.tgz 
4$ tar --extract --gzip --file npm-9.7.1.tgz
5$ du --summarize --human --apparent-size package
611M	package

gzip is already giving good results, compressing 11MB to 2.6MB for a compression ratio of around 0.24. But what can the contenders do? I’m going to stick with the default options for now:

 1$ brotli --version
 2brotli 1.0.9
 3$ tar --use-compress-program brotli --create --file npm-9.7.1.tar.br package
 4$ zstd --version
 5*** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***
 6$ tar --use-compress-program zstd --create --file npm-9.7.1.tar.zst package
 7$ ls -l --human npm-9.7.1.tgz npm-9.7.1.tar.br npm-9.7.1.tar.zst 
 8-rw-r--r-- 1 jamie users 1.6M Jun 16 21:14 npm-9.7.1.tar.br
 9-rw-r--r-- 1 jamie users 2.3M Jun 16 21:14 npm-9.7.1.tar.zst
10-rw-r--r-- 1 jamie users 2.6M Jun 16 20:30 npm-9.7.1.tgz 

Wow! With no configuration both Brotli and zstd come out ahead of gzip, but Brotli is the clear winner here. It manages a compression ratio of 0.15 versus zstd’s 0.21. In real terms that means a saving of around 1MB. That doesn’t sound like much, but at 4 million weekly downloads, that would save 4TB of bandwidth per week.

Benchmarking part 2: Electric boogaloo

The compression ratio is only telling half of the story. Actually, it’s a third of the story, but compression speed isn’t really a concern. Compression of a package only happens once, when a package is published, but decompression happens every time you run npm install. So any time saved decompressing packages means quicker install or build steps.

To test this, I’m going to use hyperfine, a command-line benchmarking tool. Decompressing each of the packages I created earlier 100 times should give me a good idea of the relative decompression speed.

1$ hyperfine --runs 100 --export-markdown hyperfine.md \
2  'tar --use-compress-program brotli --extract --file npm-9.7.1.tar.br --overwrite' \
3  'tar --use-compress-program zstd --extract --file npm-9.7.1.tar.zst --overwrite' \
4  'tar --use-compress-program gzip --extract --file npm-9.7.1.tgz --overwrite'
CommandMean [ms]Min [ms]Max [ms]Relative
tar –use-compress-program brotli –extract –file npm-9.7.1.tar.br –overwrite51.6 ± 3.047.957.31.31 ± 0.12
tar –use-compress-program zstd –extract –file npm-9.7.1.tar.zst –overwrite39.5 ± 3.033.551.81.00
tar –use-compress-program gzip –extract –file npm-9.7.1.tgz –overwrite47.0 ± 1.744.054.91.19 ± 0.10

This time zstd comes out in front, followed by gzip and Brotli. This makes sense, as “real-time compression” is one of the big features that is touted in zstd’s documentation. While Brotli is 31% slower compared to zstd, in real terms it’s only 12ms. And compared to gzip, it’s only 5ms slower. To put that into context, you’d need a more than 1Gbps connection to make up for the 5ms loss it has in decompression compared with the 1MB it saves in package size.

Benchmarking part 3: This time it’s serious

Up until now I’ve just been looking at Brotli and zstd’s default settings, but both have a lot of knobs and dials that you can adjust to change the compression ratio and compression or decompression speed. Thankfully, the industry standard lzbench has got me covered. It can run through all of the different quality levels for each compressor, and spit out a nice table with all the data at the end.

But before I dive in, there are a few caveats I should point out. The first is that lzbench isn’t able to compress an entire directory like tar , so I opted to use lib/npm.js for this test. The second is that lzbench doesn’t include the gzip tool. Instead it uses zlib, the underlying gzip library. The last is that the versions of each compressor aren’t quite current. The latest version of zstd is 1.5.5, released on April 4th 2023, whereas lzbench uses version 1.4.5, released on May 22nd 2020. The latest version of Brotli is 1.0.9, released on August 27th 2020, whereas lzbench uses a version released on October 1st 2019.

1$ lzbench -o1 -ezlib/zstd/brotli package/lib/npm.js
Click to expand results
Compressor nameCompressionDecompress.Compr. sizeRatioFilename
memcpy117330 MB/s121675 MB/s13141100.00package/lib/npm.js
zlib 1.2.11 -1332 MB/s950 MB/s500038.05package/lib/npm.js
zlib 1.2.11 -2382 MB/s965 MB/s487637.11package/lib/npm.js
zlib 1.2.11 -3304 MB/s986 MB/s477436.33package/lib/npm.js
zlib 1.2.11 -4270 MB/s1009 MB/s453934.54package/lib/npm.js
zlib 1.2.11 -5204 MB/s982 MB/s445233.88package/lib/npm.js
zlib 1.2.11 -6150 MB/s983 MB/s442533.67package/lib/npm.js
zlib 1.2.11 -7125 MB/s983 MB/s442133.64package/lib/npm.js
zlib 1.2.11 -892 MB/s989 MB/s441933.63package/lib/npm.js
zlib 1.2.11 -995 MB/s986 MB/s441933.63package/lib/npm.js
zstd 1.4.5 -1594 MB/s1619 MB/s479336.47package/lib/npm.js
zstd 1.4.5 -2556 MB/s1423 MB/s488137.14package/lib/npm.js
zstd 1.4.5 -3510 MB/s1560 MB/s468635.66package/lib/npm.js
zstd 1.4.5 -4338 MB/s1584 MB/s451034.32package/lib/npm.js
zstd 1.4.5 -5275 MB/s1647 MB/s445533.90package/lib/npm.js
zstd 1.4.5 -6216 MB/s1656 MB/s443933.78package/lib/npm.js
zstd 1.4.5 -7140 MB/s1665 MB/s442233.65package/lib/npm.js
zstd 1.4.5 -8101 MB/s1714 MB/s441633.60package/lib/npm.js
zstd 1.4.5 -997 MB/s1673 MB/s441033.56package/lib/npm.js
zstd 1.4.5 -1097 MB/s1672 MB/s441033.56package/lib/npm.js
zstd 1.4.5 -1137 MB/s1665 MB/s437133.26package/lib/npm.js
zstd 1.4.5 -1227 MB/s1637 MB/s433633.00package/lib/npm.js
zstd 1.4.5 -1320 MB/s1601 MB/s431032.80package/lib/npm.js
zstd 1.4.5 -1418 MB/s1582 MB/s430932.79package/lib/npm.js
zstd 1.4.5 -1518 MB/s1582 MB/s430932.79package/lib/npm.js
zstd 1.4.5 -169.03 MB/s1556 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -178.86 MB/s1559 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -188.86 MB/s1558 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -198.86 MB/s1559 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -208.85 MB/s1558 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -218.86 MB/s1559 MB/s430532.76package/lib/npm.js
zstd 1.4.5 -228.86 MB/s1589 MB/s430532.76package/lib/npm.js
brotli 2019-10-01 -0604 MB/s813 MB/s518239.43package/lib/npm.js
brotli 2019-10-01 -1445 MB/s775 MB/s514839.18package/lib/npm.js
brotli 2019-10-01 -2347 MB/s947 MB/s472735.97package/lib/npm.js
brotli 2019-10-01 -3266 MB/s936 MB/s464535.35package/lib/npm.js
brotli 2019-10-01 -4164 MB/s930 MB/s455934.69package/lib/npm.js
brotli 2019-10-01 -5135 MB/s944 MB/s427632.54package/lib/npm.js
brotli 2019-10-01 -6129 MB/s949 MB/s425732.39package/lib/npm.js
brotli 2019-10-01 -7103 MB/s953 MB/s424432.30package/lib/npm.js
brotli 2019-10-01 -884 MB/s919 MB/s424032.27package/lib/npm.js
brotli 2019-10-01 -97.74 MB/s958 MB/s423732.24package/lib/npm.js
brotli 2019-10-01 -104.35 MB/s690 MB/s391629.80package/lib/npm.js
brotli 2019-10-01 -111.59 MB/s761 MB/s380828.98package/lib/npm.js

This pretty much confirms what I’ve shown up to now. zstd is able to provide faster decompression speed than either gzip or Brotli, and slightly edge out gzip in compression ratio. Brotli, on the other hand, has comparable decompression speeds and compression ratio with gzip at lower quality levels, but at levels 10 and 11 it’s able to edge out both gzip and zstd’s compression ratio.

Everything is derivative

Now that I’ve finished with benchmarking, I need to step back and look at my original idea of replacing gzip as npm’s compression standard. As it turns out, Evan Hahn had a similar idea in 2022 and proposed an npm RFC. He proposed using Zopfli, a backwards-compatible gzip compression library, and Brotli’s older (and cooler 😎) sibling. Zopfli is able to produce smaller artifacts with the trade-off of a much slower compression speed. In theory an easy win for the npm ecosystem. And if you watch the RFC meeting recording or read the meeting notes, everyone seems hugely in favour of the proposal. However, the one big roadblock that prevents this RFC from being immediately accepted, and ultimately results in it being abandoned, is the lack of a native JavaScript implementation.

Learning from this earlier RFC and my results from benchmarking Brotli and zstd, what would it take to build a strong RFC of my own?

Putting it all together

Both Brotli and zstd’s reference implementations are written in C. And while there are a lot of ports on the npm registry using Emscripten or WASM, Brotli has an implementation in Node.js’s zlib module, and has done since Node.js 10.16.0, released in May 2019. I opened an issue in Node.js’s GitHub repo to add support for zstd, but it’ll take a long time to make its way into an LTS release, nevermind the rest of npm’s dependency chain. I was already leaning towards Brotli, but this just seals the deal.

Deciding on an algorithm is one thing, but implementing it is another. npm’s current support for gzip compression ultimately comes from Node.js itself. But the dependency chain between npm and Node.js is long and slightly different depending on if you’re packing or unpacking a package.

The dependency chain for packing, as in npm pack or npm publish, is:

npmlibnpmpackpacotetarminizlibzlib (Node.js)

But the dependency chain for unpacking (or ‘reifying’ as npm calls it), as in npm install or npm ci is:

npm@npmcli/arboristpacotetarminizlibzlib (Node.js)

That’s quite a few packages that need to be updated, but thankfully the first steps have already been taken. Support for Brotli was added to minizlib 1.3.0 back in September 2019. I built on top of that and contributed Brotil support to tar. That is now available in version 6.2.0. It may take a while, but I can see a clear path forward.

The final issue is backwards compatibility. This wasn’t a concern with Evan Hahn’s RFC, as Zopfli generates backwards-compatible gzip files. However, Brotli is an entirely new compression format, so I’ll need to propose a very careful adoption plan. The process I can see is:

  1. Support for packing and unpacking is added in a minor release of the current version of npm
    1. Unpacking using Brotli is handled transparently
    2. Packing using Brotli is disabled by default and only enabled if one of the following are true:
      1. The engines field in package.json is set to a version of npm that supports Brotli
      2. The engines field in package.json is set to a version of node that bundles a version of npm that supports Brotli
      3. Brotli support is explicitly enabled in .npmrc
  2. Packing using Brotli is enabled by default in the next major release of npm after the LTS version of Node.js that bundles it goes out of support

Let’s say that Node.js 22 comes with npm 10, which has Brotli support. Node.js 22 will stop getting LTS updates in April 2027. Then, the next major version of npm after that date should enable Brotli packing by default.

I admit that this is an incredibly long transition period. However, it will guarantee that if you’re using a version of Node.js that is still being supported, there will be no visible impact to you. And it still allows early adopters to opt-in to Brotli support. But if anyone has other ideas about how to do this transition, I am open to suggestions.

What’s next?

As I wrap up my exploration into npm compression, I must admit that my journey has only just begun. To push the boundaries further, there are a lot more steps. First and foremost, I need to do some more extensive benchmarking with the top 250 most downloaded npm packages, instead of focusing on a single package. Once that’s complete, I need to draft an npm RFC and seek feedback from the wider community. If you’re interested in helping out, or just want to see how it’s going, you can follow me on Mastodon at @[email protected], or on Twitter at @Jamie_Magee.

comments powered by Disqus