This brings them in line with the other tests, and furthers my goals of
separating unit test data from code.
Doing this cleanup as part of my #13570 effort, but strictly-speaking,
this is separate as these data types' JSON never contained and store
paths or store dirs, just simple output name strings.
Enables builds with ASAN to catch memory corruption
bugs faster and in CI. This is an incredibly valuable
instrument that must be used as much as possible.
Somewhat based on jade's work from Lix, though there's a lot that
we have to do differently:
19ae87e5ce
Co-authored-by: Jade Lovelace <lix@jade.fyi>
See #13570 for details --- the idea is that included the store dir in
store paths makes systematic JSON parting with e.g. Serde, Aeson,
nlohmann, or similiar harder.
After talking to Eelco, we are changing the `Derivation` format right
away because not only is `nix derivation` technically experimental, we think it is
also less widely used in practice than, say, `nix path-info`.
Progress on #13570
This implements a special back-compat shim to specifically allow
unbracketed IPv6 addresses in store references. This is something
that is relied upon in the wild and the old parsing logic accepted
both ways (brackets were optional). This patch restores this behavior.
As always, we didn't have any tests for this.
Addresses #13937.
This is relied upon (specifically the `local` store) by existing
tooling [1] and we broke this in 3e7879e6df (which
was first released in 2.31).
To lessen the scope of the breakage we should not normalize "auto" references
and explicitly specified references like "local" or "daemon". It also makes
sense to canonicalize local://,daemon:// to be more compatible with prior
behavior.
[1]: 05e1b3cba2/lib/NOM/Builds.hs (L60-L64)
Looking at perf:
0.21 │ push %rbp
0.99 │ mov %rsp,%rbp
│ push %r15
0.25 │ push %r14
│ push %r13
0.49 │ push %r12
0.66 │ push %rbx
1.23 │ lea -0x10000(%rsp),%r11
0.23 │ 15: sub $0x1000,%rsp
1.01 │ orq $0x0,(%rsp)
59.12 │ cmp %r11,%rsp
0.27 │ ↑ jne 15
Seems like 64K is too much to have on the stack for each invocation, considering
that only a minuscule number of allocations are actually larger than 4K.
There's actually no good reason this function should use so much stack space. Or
use small_string at all. Everything can be done in small chunks that don't require
any memory allocations and use up 2K bytes on the stack.
This patch also adds a microbenchmark for tracking the unparsing performance. Here
are the results for this change:
(Before)
BM_UnparseRealDerivationFile/hello 7275 ns 7247 ns 96093 bytes_per_second=232.136Mi/s
BM_UnparseRealDerivationFile/firefox 40538 ns 40376 ns 17327 bytes_per_second=378.534Mi/s
(After)
BM_UnparseRealDerivationFile/hello 3228 ns 3218 ns 215671 bytes_per_second=522.775Mi/s
BM_UnparseRealDerivationFile/firefox 39724 ns 39584 ns 17617 bytes_per_second=386.101Mi/s
This translates into nice evaluation performance improvements (compared to 18c3d2348f):
Benchmark 1: GC_INITIAL_HEAP_SIZE=8G old-nix/bin/nix-instantiate ../nixpkgs -A nixosTests.gnome --readonly-mode
Time (mean ± σ): 3.111 s ± 0.021 s [User: 2.513 s, System: 0.580 s]
Range (min … max): 3.083 s … 3.143 s 10 runs
Benchmark 2: GC_INITIAL_HEAP_SIZE=8G result/bin/nix-instantiate ../nixpkgs -A nixosTests.gnome --readonly-mode
Time (mean ± σ): 3.037 s ± 0.038 s [User: 2.461 s, System: 0.558 s]
Range (min … max): 2.960 s … 3.086 s 10 runs
See the new extensive doxygen in `url.hh`.
This fixes fetching gitlab: flakes.
Paths are now stored as a std::vector of individual path
segments, which can themselves contain path separators '/' (%2F).
This is necessary to make the Gitlab's /projects/ API work.
Co-authored-by: John Ericson <John.Ericson@Obsidian.Systems>
Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
This systematizes the way our s3:// URLs are parsed in filetransfer.cc.
Yoinked out and refactored out of [1].
[1]: https://github.com/NixOS/nix/pull/13752
Co-authored-by: Bernardo Meurer Costa <beme@anthropic.com>
Compilers in nixpkgs have caught up and major distros
should also have recent enough compilers. It would be
nice to have newer features like more full featured
ranges and deducing this.
The problem with old code was that it used getUri for both the `diskCache`
as well as logging. This is really bad because it mixes the textual human
readable representation with the caching.
Also using getUri for the cache key is really problematic for the S3 store,
since it doesn't include the `endpoint` in the cache key, so it's totally broken.
This starts separating the logging / cache concerns by introducing a
`getHumanReadableURI` that should only be used for logging. The caching
logic now instead uses `getReference().render(/*withParams=*/false)` exclusively.
This would need to be fixed in follow-ups, because that's really fragile and
broken for some store types (but it was already broken before).
Rather than having store implementations return a free-form URI string,
have them return a `StoreReference`. This reflects that fact that this
method is supposed to invert `resolveStoreConfig`, which goes from a
`StoreReference` to some `StoreConfig` concrete derived class (based on
the registry).
`StoreConfig::getUri` is kept only as a convenience for the common case
that we want to immediately render the `StoreReference`.
A few tests were changed to use `local://` not `local`, since
`StoreReference` does not encode the `local` and `daemon` shorthands
(and instead desugars them to `local://` and `unix://` right away). I
think that is fine. `local` and `daemon` still work as input.
Previously `getUri` didn't include store query parameters,
`ssh-ng` didn't include any information at all and the local
store didn't have the path:
```
$ nix store info --store "local?root=/tmp/aaa&require-sigs=false"
Store URL: local
Version: 2.31.0
Trusted: 1
$ nix store info --store "ssh-ng://localhost?remote-program=nix-daemon"
Store URL: ssh-ng://
Version: 2.31.0
Trusted: 1
$ nix store info --store "ssh://localhost?remote-program=nix-store"
Store URL: ssh://localhost
```
This commit changes this to:
```
$ nix store info --store "local?root=/tmp/aaa&require-sigs=false"
Store URL: local?require-sigs=false&root=/tmp/aaa
Version: 2.31.0
Trusted: 1
$ nix store info --store "ssh-ng://localhost?remote-program=nix-daemon"
Store URL: ssh-ng://localhost?remote-program=nix-daemon
Version: 2.31.0
Trusted: 1
$ nix store info --store "ssh://localhost?remote-program=nix-store"
Store URL: ssh://localhost?remote-program=nix-store
```
The implicit dependency on refLength (which is the StorePath::HashLen)
is not good. Also the companion tests and benchmarks are already in libstore-tests.
This patch allows users to specify the connection port
in the store URLS like so:
```
nix store info --store "ssh-ng://localhost:22" --json
```
Previously this failed with: `error: failed to start SSH connection to 'localhost:22'`,
because the code did not distinguish the port from the hostname. This
patch remedies that problem by introducing a ParsedURL::Authority type
for working with parsed authority components of URIs.
Now that the URL parsing code is less ad-hoc we can
add more long-awaited fixes for specifying SSH connection
ports in store URIs.
Builds upon the work from bd1d2d1041.
Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
Co-authored-by: John Ericson <John.Ericson@Obsidian.Systems>
This keeps things fast by making the function inline, but also prevents
people from having to know about the `0xFF` implementation detail
directly, instead making one go through a `std::optional` (which could be
fused away with a sufficiently smart compiler).
Additionally, the base "nix32" implementation is moved to its own header
file pair, as it is logically distinct and prior to the `Hash` data
type. It would probably be nice to do this with all the hash format
implementations.
This benchmark should provide a relatively realistic
scenario for reference scanning.
As shown by the following results, reference scanning code
is already plenty fast and is definitely not a bottleneck:
```
BM_RefScanSinkRandom/10000 1672 ns 1682 ns 413354 bytes_per_second=5.53691Gi/s
BM_RefScanSinkRandom/100000 11217 ns 11124 ns 64341 bytes_per_second=8.37231Gi/s
BM_RefScanSinkRandom/1000000 205745 ns 204421 ns 3360 bytes_per_second=4.55591Gi/s
BM_RefScanSinkRandom/5000000 1208407 ns 1201046 ns 597 bytes_per_second=3.87713Gi/s
BM_RefScanSinkRandom/10000000 2534397 ns 2523344 ns 273 bytes_per_second=3.69083Gi/s
```
(Measurements on Ryzen 5900X via `nix build --file ci/gha/tests componentTests.nix-store-tests-run -L`)
This changes our GHA CI and nix-store-tests packaging
to build and run the benchmarks. This does not affect
the default packaging - the overrides apply only for the
GHA CI.
Instead of parsing a structured attrs at some later point, we parsed it
right away when parsing the A-Term format, and likewise serialize it to
`__json = <JSON dump>` when serializing a derivation to A-Term.
The JSON format can directly contain the JSON structured attrs without
so encoding it, so we just do that.
* It is tough to contribute to a project that doesn't use a formatter,
* It is extra hard to contribute to a project which has configured the formatter, but ignores it for some files
* Code formatting makes it harder to hide obscure / weird bugs by accident or on purpose,
Let's rip the bandaid off?
Note that PRs currently in flight should be able to be merged relatively easily by applying `clang-format` to their tip prior to merge.
This failed on macOS:
nix-store-tests-run> C++ exception with description "../nix_api_store.cc:33: nix_err_code(ctx) != NIX_OK, message: error: getting status of '/nix/var/nix/daemon-socket/socket': Operation not permitted" thrown in the test body.
Makes the behavoral change of #13263 without the underlying refactor.
Hopefully this clearly safe from a perf and GC perspective, and will
make it easier to benchmark #13263.
Unfortunately Feature is just an alias to `std::string`
and not a new-type, so a ton of code relies on it being
exactly a `std::string`.
Using transparent comparators just for StringSet necessitates
using it here as well.
The intention is to switch to transparent comparators from N3657 for
ordered set containers for strings and using the alias consistently
would simplify things.
Since we dropped fs::symlink_exists, we no longer have a need for the fs
namespace. Having less abstractions makes it easier to lookup the
functions in reference documentations.