Previous code had a sneaky bug due to which no caching
actually happened:
```cpp
auto linesForInput = (*lines)[origin->offset];
```
That should have been:
```cpp
auto & linesForInput = (*lines)[origin->offset];
```
See [1].
Now that it also makes sense to make the cache bound in side
in order not to memoize all the sources without freeing any memory.
The default cache size has been chosen somewhat arbitrarily to be ~64k
origins. For reference, 25.05 nixpkgs has ~50k .nix files.
Simple benchmark:
```nix
let
pkgs = import <nixpkgs> { };
in
builtins.foldl' (acc: el: acc + el.line) 0 (
builtins.genList (x: builtins.unsafeGetAttrPos "gcc" pkgs) 10000
)
```
(After)
```
$ hyperfine "result/bin/nix eval -f ./test.nix"
Benchmark 1: result/bin/nix eval -f ./test.nix
Time (mean ± σ): 292.7 ms ± 3.9 ms [User: 131.0 ms, System: 120.5 ms]
Range (min … max): 288.1 ms … 300.5 ms 10 runs
```
(Before)
```
hyperfine "nix eval -f ./test.nix"
Benchmark 1: nix eval -f ./test.nix
Time (mean ± σ): 666.7 ms ± 6.4 ms [User: 428.3 ms, System: 191.2 ms]
Range (min … max): 659.7 ms … 681.3 ms 10 runs
```
If the origin happens to be a `all-packages.nix` or similar in size then the
difference is much more dramatic.
[1]: 22e3f0e987
(cherry picked from commit 5ea81f5b8f)
For heavier objects it doesn't make sense to return
a std::optional with the copy of the data, when it
can be used by const reference.
(cherry picked from commit 4711720efe)
* It is tough to contribute to a project that doesn't use a formatter,
* It is extra hard to contribute to a project which has configured the formatter, but ignores it for some files
* Code formatting makes it harder to hide obscure / weird bugs by accident or on purpose,
Let's rip the bandaid off?
Note that PRs currently in flight should be able to be merged relatively easily by applying `clang-format` to their tip prior to merge.
Co-authored-by: Graham Christensen <graham@grahamc.com>
Previous use of symlink_status() always translated into a stat call, leading
to huge performance penalties for by-name-overlay in nixpkgs. The comment
below references the possible caching, but that seemed to be erroneous, since
the correct way to make use of the caching API is by calling a bunch of `is_*`
functions [1]. For example, here's how libstdc++ does that [2], [3].
This translates to great nixpkgs eval performance improvements:
```
Benchmark 1: GC_INITIAL_HEAP_SIZE=4G result/bin/nix-instantiate ../nixpkgs -A hello --readonly-mode
Time (mean ± σ): 186.7 ms ± 6.7 ms [User: 121.3 ms, System: 64.9 ms]
Range (min … max): 179.4 ms … 201.6 ms 16 runs
Benchmark 2: GC_INITIAL_HEAP_SIZE=4G nix-instantiate ../nixpkgs -A hello --readonly-mode
Time (mean ± σ): 230.6 ms ± 5.0 ms [User: 126.9 ms, System: 103.1 ms]
Range (min … max): 225.1 ms … 241.4 ms 13 runs
```
[1]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html
[2]: 8ea555b7b4/libstdc%2B%2B-v3/include/bits/fs_dir.h (L341-L348)
[3]: 8ea555b7b4/libstdc%2B%2B-v3/include/bits/fs_dir.h (L161-L163)
(cherry picked from commit 8708e9a526)
Squashed commit of the following:
commit 04fff3a637d455cbb1d75937a235950e43008db9
Author: Eelco Dolstra <edolstra@gmail.com>
Date: Thu Jun 12 12:30:32 2025 +0200
Chown structured attr files safely
commit 5417ad445e414c649d0cfc71a05661c7bf8f3ef5
Author: Eelco Dolstra <edolstra@gmail.com>
Date: Thu Jun 12 12:14:04 2025 +0200
Replace 'bool sync' with an enum for clarity
And drop writeFileAndSync().
commit 7ae0141f328d8e8e1094be24665789c05f974ba6
Author: Eelco Dolstra <edolstra@gmail.com>
Date: Thu Jun 12 11:35:28 2025 +0200
Drop guessOrInventPathFromFD()
No need to do hacky stuff like that when we already know the original path.
commit 45b05098bd019da7c57cd4227a89bfd0fa65bb08
Author: Eelco Dolstra <edolstra@gmail.com>
Date: Thu Jun 12 11:15:58 2025 +0200
Tweak comment
commit 0af15b31209d1b7ec8addfae9a1a6b60d8f35848
Author: Raito Bezarius <raito@lix.systems>
Date: Thu Mar 27 12:22:26 2025 +0100
libstore: ensure that temporary directory is always 0o000 before deletion
In the case the deletion fails, we should ensure that the temporary
directory cannot be used for nefarious purposes.
Change-Id: I498a2dd0999a74195d13642f44a5de1e69d46120
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit 2c20fa37b15cfa03ac6a1a6a47cdb2ed66c0827e
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 12:42:55 2025 +0100
libutil: ensure that `_deletePath` does NOT use absolute paths with dirfds
When calling `_deletePath` with a parent file descriptor, `openat` is
made effective by using relative paths to the directory file descriptor.
To avoid the problem, the signature is changed to resist misuse with an
assert in the prologue of the function.
Change-Id: I6b3fc766bad2afe54dc27d47d1df3873e188de96
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit d3c370bbcae48bb825ce19fd0f73bb4eefd2c9ea
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 01:07:47 2025 +0100
libstore: ensure that `passAsFile` is created in the original temp dir
This ensures that `passAsFile` data is created inside the expected
temporary build directory by `openat()` from the parent directory file
descriptor.
This avoids a TOCTOU which is part of the attack chain of CVE-????.
Change-Id: Ie5273446c4a19403088d0389ae8e3f473af8879a
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit 45d3598724f932d024ef6bc2ffb00c1bb90e6018
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 01:06:03 2025 +0100
libutil: writeFile variant for file descriptors
`writeFile` lose its `sync` boolean flag to make things simpler.
A new `writeFileAndSync` function is created and all call sites are
converted to it.
Change-Id: Ib871a5283a9c047db1e4fe48a241506e4aab9192
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit 732bd9b98cabf4aaf95a01fd318923de303f9996
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 01:05:34 2025 +0100
libstore: chown to builder variant for file descriptors
We use it immediately for the build temporary directory.
Change-Id: I180193c63a2b98721f5fb8e542c4e39c099bb947
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit 962c65f8dcd5570dd92c72370a862c7b38942e0d
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 01:04:59 2025 +0100
libstore: open build directory as a dirfd as well
We now keep around a proper AutoCloseFD around the temporary directory
which we plan to use for openat operations and avoiding the build
directory being swapped out while we are doing something else.
Change-Id: I18d387b0f123ebf2d20c6405cd47ebadc5505f2a
Signed-off-by: Raito Bezarius <raito@lix.systems>
commit c9b42462b75b5a37ee6564c2b53cff186c8323da
Author: Raito Bezarius <raito@lix.systems>
Date: Wed Mar 26 01:04:12 2025 +0100
libutil: guess or invent a path from file descriptors
This is useful for certain error recovery paths (no pun intended) that
does not thread through the original path name.
Change-Id: I2d800740cb4f9912e64c923120d3f977c58ccb7e
Signed-off-by: Raito Bezarius <raito@lix.systems>
Judging by the comment for `makeEmptySourceAccessor` the prefix has
to be empty:
> Return a source accessor that contains only an empty root directory.
Fixes#13295.
(cherry picked from commit fba1bb0c13)
Splicing the list element to the back can be done in
a much simpler and concise way without the need for
erasing and re-inserting the element. Doing it this
way is equivalent to just moving node pointers around,
whereas inserting/erasing allocates/deallocates new nodes.
I can't find a good way to benchmark in isolation from the
git cache, but common sense dictates that creating (and destroying)
a 131KiB std::vector for each regular file from the archive imposes
quite a significant overhead regardless of the IO bound git cache.
AFAICT there is no reason to keep a copy of the data since
it always gets fed into the sink and there are no coroutines/threads
in sight.
Remove outdated and no longer relevant TODO. It's more confusing
now, since symbol table must now be addressed by uint32_t indices
in order to keep Attr size down to 16 bytes on 64 bit machines.
This patch finally applies the transition to std::less<>,
which is a transparent comparator. There's no functional
change and string lookups in sets are now more efficient
and don't produce temporaries (e.g. set.find(std::string_view{"key"})).
The intention is to switch to transparent comparators from N3657 for
ordered set containers for strings and using the alias consistently
would simplify things.
Since we dropped fs::symlink_exists, we no longer have a need for the fs
namespace. Having less abstractions makes it easier to lookup the
functions in reference documentations.
As it turns out the orignal implementation of symlink_exists cannot be
used in Nix because it did now std::filesystem::filesystem_error.
The new implementation fixes that but is now actually the same as
pathExists except for the path type.
Before the change `nix` was stripping warning flags
reported by `gcc-14` too eagerly:
$ nix build -f. texinfo4
error: builder for '/nix/store/i9948l91s3df44ip5jlpp6imbrcs646x-texinfo-4.13a.drv' failed with exit code 2;
last 25 log lines:
> 1495 | info_tag (mbi_iterator_t iter, int handle, size_t *plen)
> | ~~~~~~~~^~~~
> window.c:1887:39: error: passing argument 4 of 'printed_representation' from incompatible pointer type []
> 1887 | &replen);
> | ^~~~~~~
> | |
> | int *
After the change the compiler flag remains:
$ ~/patched.nix build -f. texinfo4
error: builder for '/nix/store/i9948l91s3df44ip5jlpp6imbrcs646x-texinfo-4.13a.drv' failed with exit code 2;
last 25 log lines:
> 1495 | info_tag (mbi_iterator_t iter, int handle, size_t *plen)
> | ~~~~~~~~^~~~
> window.c:1887:39: error: passing argument 4 of 'printed_representation' from incompatible pointer type [-Wincompatible-pointer-types]
> 1887 | &replen);
> | ^~~~~~~
> | |
> | int *
Note the difference in flag rendering around the warning.
https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda has a
good sumamry of why it happens. Befomre the change `nix` was handling
just one form or URL separator:
$ printf '\e]8;;http://example.com\e\\This is a link\e]8;;\e\\\n'
Now it also handled another for (used by gcc-14`):
printf '\e]8;;http://example.com\aThis is a link\e]8;;\a\n'
While at it fixed accumulation of trailing escape `\e\\` symbol.
It seems that the intention was to format a number in base 8 (as
suggested by the %o format specifier), but `perms` is a `std::string`
and not a number. Looks like `rawMode` is the correct thing to use here.
This name is close to the Nixpkgs lib function `escapeShellArg`,
making it easier to find.
A friendlier function with the same behavior as lib could be added
later.
Rather than "mounting" the store inside an empty virtual filesystem,
just return the store as a virtual filesystem. This is more modular.
(FWIW, it also supports two long term hopes of mind:
1. More capability-based Nix language mode. I dream of a "super pure
eval" where you can only use relative path literals (See #8738), and
any `fetchTree`-fetched stuff + the store are all disjoint (none is
mounted in another) file systems.
2. Windows, where the store dir may include drive letters, etc., and is
thus unsuitable to be the prefix of any `CanonPath`s.
)
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>