Previous code had a sneaky bug due to which no caching
actually happened:
```cpp
auto linesForInput = (*lines)[origin->offset];
```
That should have been:
```cpp
auto & linesForInput = (*lines)[origin->offset];
```
See [1].
Now that it also makes sense to make the cache bound in side
in order not to memoize all the sources without freeing any memory.
The default cache size has been chosen somewhat arbitrarily to be ~64k
origins. For reference, 25.05 nixpkgs has ~50k .nix files.
Simple benchmark:
```nix
let
pkgs = import <nixpkgs> { };
in
builtins.foldl' (acc: el: acc + el.line) 0 (
builtins.genList (x: builtins.unsafeGetAttrPos "gcc" pkgs) 10000
)
```
(After)
```
$ hyperfine "result/bin/nix eval -f ./test.nix"
Benchmark 1: result/bin/nix eval -f ./test.nix
Time (mean ± σ): 292.7 ms ± 3.9 ms [User: 131.0 ms, System: 120.5 ms]
Range (min … max): 288.1 ms … 300.5 ms 10 runs
```
(Before)
```
hyperfine "nix eval -f ./test.nix"
Benchmark 1: nix eval -f ./test.nix
Time (mean ± σ): 666.7 ms ± 6.4 ms [User: 428.3 ms, System: 191.2 ms]
Range (min … max): 659.7 ms … 681.3 ms 10 runs
```
If the origin happens to be a `all-packages.nix` or similar in size then the
difference is much more dramatic.
[1]: 22e3f0e987
(cherry picked from commit 5ea81f5b8f)
For heavier objects it doesn't make sense to return
a std::optional with the copy of the data, when it
can be used by const reference.
(cherry picked from commit 4711720efe)
* It is tough to contribute to a project that doesn't use a formatter,
* It is extra hard to contribute to a project which has configured the formatter, but ignores it for some files
* Code formatting makes it harder to hide obscure / weird bugs by accident or on purpose,
Let's rip the bandaid off?
Note that PRs currently in flight should be able to be merged relatively easily by applying `clang-format` to their tip prior to merge.
Co-authored-by: Graham Christensen <graham@grahamc.com>
c39cc00404 has added assertions for
all Value accesses and the following case has started failing with
an `unreachable`:
(/tmp/fun.nix):
```nix
{a}: a
```
```
$ nix eval --impure --expr 'import /tmp/fun.nix {a="a";b="b";}'
```
This would crash:
```
terminating due to unexpected unrecoverable internal error: Unexpected condition in getStorage at ../include/nix/expr/value.hh:844
```
This is not a regression, but rather surfaces an existing problem, which previously
was left undiagnosed. In the case of an import `fun` is the `import` primOp, so that read is invalid
and previously this resulted in an access into an inactive union member, which is UB.
The correct thing to use is `vCur`. Identical problem also affected the case of a missing argument.
Add previously failing test cases to the functional/lang test suite.
Fixes#13448.
(cherry picked from commit 6e78cc90d3)
Previous use of symlink_status() always translated into a stat call, leading
to huge performance penalties for by-name-overlay in nixpkgs. The comment
below references the possible caching, but that seemed to be erroneous, since
the correct way to make use of the caching API is by calling a bunch of `is_*`
functions [1]. For example, here's how libstdc++ does that [2], [3].
This translates to great nixpkgs eval performance improvements:
```
Benchmark 1: GC_INITIAL_HEAP_SIZE=4G result/bin/nix-instantiate ../nixpkgs -A hello --readonly-mode
Time (mean ± σ): 186.7 ms ± 6.7 ms [User: 121.3 ms, System: 64.9 ms]
Range (min … max): 179.4 ms … 201.6 ms 16 runs
Benchmark 2: GC_INITIAL_HEAP_SIZE=4G nix-instantiate ../nixpkgs -A hello --readonly-mode
Time (mean ± σ): 230.6 ms ± 5.0 ms [User: 126.9 ms, System: 103.1 ms]
Range (min … max): 225.1 ms … 241.4 ms 13 runs
```
[1]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html
[2]: 8ea555b7b4/libstdc%2B%2B-v3/include/bits/fs_dir.h (L341-L348)
[3]: 8ea555b7b4/libstdc%2B%2B-v3/include/bits/fs_dir.h (L161-L163)
(cherry picked from commit 8708e9a526)
Ensure relative path inputs are relative to the parent node's _actual_
`outPath`, instead of the subtly different `sourceInfo.outPath`.
Additionally, non-flake inputs now also have a `sourceInfo` attribute.
This fixes the relationship between `self.outPath` and
`self.sourceInfo.outPath` in some edge cases.
Fixes#13164
(cherry picked from commit 46beb9af76)
Squashed commit of the following:
commit 04fff3a637d455cbb1d75937a235950e43008db9
Author: Eelco Dolstra <edolstra@gmail.com>
Date: Thu Jun 12 12:30:32 2025 +0200
Chown structured attr files safely
commit 5417ad445e414c649d0cfc71a05661c7bf8f3ef5
Author: Eelco Dolstra <edolstra@gmail.com>
Date: Thu Jun 12 12:14:04 2025 +0200
Replace 'bool sync' with an enum for clarity
And drop writeFileAndSync().
commit 7ae0141f328d8e8e1094be24665789c05f974ba6
Author: Eelco Dolstra <edolstra@gmail.com>
Date: Thu Jun 12 11:35:28 2025 +0200
Drop guessOrInventPathFromFD()
No need to do hacky stuff like that when we already know the original path.
commit 45b05098bd019da7c57cd4227a89bfd0fa65bb08
Author: Eelco Dolstra <edolstra@gmail.com>
Date: Thu Jun 12 11:15:58 2025 +0200
Tweak comment
commit 0af15b31209d1b7ec8addfae9a1a6b60d8f35848
Author: Raito Bezarius <raito@lix.systems>
Date: Thu Mar 27 12:22:26 2025 +0100
libstore: ensure that temporary directory is always 0o000 before deletion
In the case the deletion fails, we should ensure that the temporary
directory cannot be used for nefarious purposes.
Change-Id: I498a2dd0999a74195d13642f44a5de1e69d46120
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit 2c20fa37b15cfa03ac6a1a6a47cdb2ed66c0827e
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 12:42:55 2025 +0100
libutil: ensure that `_deletePath` does NOT use absolute paths with dirfds
When calling `_deletePath` with a parent file descriptor, `openat` is
made effective by using relative paths to the directory file descriptor.
To avoid the problem, the signature is changed to resist misuse with an
assert in the prologue of the function.
Change-Id: I6b3fc766bad2afe54dc27d47d1df3873e188de96
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit d3c370bbcae48bb825ce19fd0f73bb4eefd2c9ea
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 01:07:47 2025 +0100
libstore: ensure that `passAsFile` is created in the original temp dir
This ensures that `passAsFile` data is created inside the expected
temporary build directory by `openat()` from the parent directory file
descriptor.
This avoids a TOCTOU which is part of the attack chain of CVE-????.
Change-Id: Ie5273446c4a19403088d0389ae8e3f473af8879a
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit 45d3598724f932d024ef6bc2ffb00c1bb90e6018
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 01:06:03 2025 +0100
libutil: writeFile variant for file descriptors
`writeFile` lose its `sync` boolean flag to make things simpler.
A new `writeFileAndSync` function is created and all call sites are
converted to it.
Change-Id: Ib871a5283a9c047db1e4fe48a241506e4aab9192
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit 732bd9b98cabf4aaf95a01fd318923de303f9996
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 01:05:34 2025 +0100
libstore: chown to builder variant for file descriptors
We use it immediately for the build temporary directory.
Change-Id: I180193c63a2b98721f5fb8e542c4e39c099bb947
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit 962c65f8dcd5570dd92c72370a862c7b38942e0d
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 01:04:59 2025 +0100
libstore: open build directory as a dirfd as well
We now keep around a proper AutoCloseFD around the temporary directory
which we plan to use for openat operations and avoiding the build
directory being swapped out while we are doing something else.
Change-Id: I18d387b0f123ebf2d20c6405cd47ebadc5505f2a
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit c9b42462b75b5a37ee6564c2b53cff186c8323da
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 01:04:12 2025 +0100
libutil: guess or invent a path from file descriptors
This is useful for certain error recovery paths (no pun intended) that
does not thread through the original path name.
Change-Id: I2d800740cb4f9912e64c923120d3f977c58ccb7e
Signed-off-by: Raito Bezarius <raito@lix.systems>
Judging by the comment for `makeEmptySourceAccessor` the prefix has
to be empty:
> Return a source accessor that contains only an empty root directory.
Fixes#13295.
(cherry picked from commit fba1bb0c13)
As it turns out using `std::regex` is actually the bottleneck
for root discovery. Just substituting `std::` -> `boost::`
makes root discovery twice as fast (3x if counting only userspace time).
Some rather ad-hoc measurements to motivate the switch:
(On master)
```
nix build github:nixos/nix/1e822bd4149a8bce1da81ee2ad9404986b07914c#nix-cli --out-link result-1e822bd4149a8bce1da81ee2ad9404986b07914c
taskset -c 2,3 hyperfine "result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0"
Benchmark 1: result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run --max 0
Time (mean ± σ): 481.6 ms ± 3.9 ms [User: 336.2 ms, System: 142.0 ms]
Range (min … max): 474.6 ms … 487.7 ms 10 runs
```
(After this patch)
```
taskset -c 2,3 hyperfine "result/bin/nix store gc --dry-run --max 0"
Benchmark 1: result/bin/nix store gc --dry-run --max 0
Time (mean ± σ): 254.7 ms ± 9.7 ms [User: 111.1 ms, System: 141.3 ms]
Range (min … max): 246.5 ms … 281.3 ms 10 runs
```
`boost::regex` is a drop-in replacement for `std::regex`, but much faster.
Doing a simple before/after comparison doesn't surface any change in behavior:
```
result/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l
result-1e822bd4149a8bce1da81ee2ad9404986b07914c/bin/nix store gc --dry-run -vvvvv --max 0 |& grep "got additional" | wc -l
```
(cherry picked from commit 3a1301cd6d)
Leverage #10766 to show how we can now resolve a store configuration
without actually opening the store for that resolved configuration.
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
Splicing the list element to the back can be done in
a much simpler and concise way without the need for
erasing and re-inserting the element. Doing it this
way is equivalent to just moving node pointers around,
whereas inserting/erasing allocates/deallocates new nodes.
The existing header is a bit too big. Now the following use-cases are
separated, and get their own headers:
- Using or implementing an arbitrary store: remaining `store-api.hh`
This is closer to just being about the `Store` (and `StoreConfig`)
classes, as one would expect.
- Opening a store from a textual description: `store-open.hh`
Opening an aribtrary store implementation like this requires some sort
of store registration mechanism to exists, but the caller doesn't need
to know how it works. This just exposes the functions which use such a
mechanism, without exposing the mechanism itself
- Registering a store implementation: `store-registration.hh`
This requires understanding how the mechanism actually works, and the
mechanism in question involves templated machinery in headers we
rather not expose to things that don't need it, as it would slow down
compilation for no reason.
I can't find a good way to benchmark in isolation from the
git cache, but common sense dictates that creating (and destroying)
a 131KiB std::vector for each regular file from the archive imposes
quite a significant overhead regardless of the IO bound git cache.
AFAICT there is no reason to keep a copy of the data since
it always gets fed into the sink and there are no coroutines/threads
in sight.