This shaves off a very significand amount of memory used
for evaluation as well as reduces the GC-managed heap.
Previously the union discriminator (InternalType) was
stored as a separate field in the Value, which takes up
whole 8 bytes due to padding needed for member alignment.
This effectively wasted 7 whole bytes of memory. Instead
of doing that InternalType is instead packed into pointer
alignment niches. As it turns out, there's more than enough
unused bits there for the bit packing to be effective.
See the doxygen comment in the ValueStorage specialization
for more details.
This does not add any performance overhead, even though
we now consistently assert the InternalType in all getters.
This can also be made atomic with a double width compare-and-swap
instruction on x86_64 (CMPXCHG16B instruction) for parallel evaluation.
This factors out most of the value representation into a mixin class.
`finishValue` is now gone for good and replaced with a simple template
function `setStorage` which derives the type information/disriminator from
the type of the argument. Likewise, reading of the value goes through function
template `getStorage`.
An empty type `Null` is introduced to make the bijection InternalType <-> C++ type
complete.
This prevents C++ level undefined behavior from affecting
the evaluator. Stdlib implementation details should not affect
eval, regardless of the build platform. Even erroneous usage
of `builtins.sort` should not make it possible to crash the
evaluator or produce results that depend on the host platform.
This overload isn't actually necessary anywhere and
doesn't make much sense. The pointers to `Value`s are
themselves const, but the `Value`s are mutable.
A non-const member function implies that the object itself
can be modified but this doesn't make much sense considering
the return type: `Value * const * `, which is a pointer
to a constant array of pointers to mutable values.
This adds a meson.format file that mostly mirrors the projects
meson style and a pre-commit hook to enforce this style.
Some low-diff files are formatted.
`getPrimOp` function was basically identical to existing
`Value::primOpAppPrimOp` modulo some trivial differences.
Makes sense to reuse existing code for that.
This makes the profiler much more useful by actually distiguishing
different derivations being evaluated. This does make the implementation
a bit more convoluted, but I think it's worth it.
Sometimes the profiler might want to do evaluation (e.g. for getting
derivation names). This is not ideal, but is really necessary
to make the profiler stack traces useful for end users.
This patch adds support for a native stack sampling
profiler to the evaluator, which saves a collapsed stack
profile information to a configurable location.
Introduced options (in `EvalSettings`):
- `eval-profile-file` - path to the collected profile file.
- `eval-profiler-frequency` - sampling frequency.
- `eval-profiler` - enumeration option for enabling the profiler.
Currently only `flamegraph` is supported, but having this an
enumeration rather than a boolean switch leaves the door open
for other profiler variants (e.g. tracy).
Profile includes the following information on best-effort basis (e.g. some lambdas might
have an undefined name). Callstack information contains:
- Call site location (where the function gets called).
- Primop/lambda name of the function being called.
- Functors/partial applications don't have a name attached to them unlike special-cased primops and lambads.
For cases where callsite location isn't available we have to resort to providing
the location where the lambda itself is defined. This removes some of the confusing
`«none»:0` locations in the profile from previous attempts.
Example usage with piping directly into zstd for compression:
```
nix eval --no-eval-cache nixpkgs#nixosTests.gnome \
--eval-profiler flamegraph \
--eval-profile-file >(zstd -of nix.profile.zstd)
```
Co-authored-by: Jörg Thalheim <joerg@thalheim.io>
This wires up the {pre,post}FunctionCallHook machinery
in EvalState::callFunction and migrates FunctionCallTrace
to use the new EvalProfiler mechanisms for tracing.
Note that branches when the hook gets called are marked with [[unlikely]]
as a hint to the compiler that this is not a hot path. For non-tracing
evaluation this should be a 100% predictable branch, so the performance
cost is nonexistent.
Some measurements to prove support this point:
```
nix build .#nix-cli
nix build github:nixos/nix/d692729759e4e370361cc5105fbeb0e33137ca9e#nix-cli --out-link before
```
(Before)
```
$ taskset -c 2,3 hyperfine "GC_INITIAL_HEAP_SIZE=16g before/bin/nix eval nixpkgs#gnome --no-eval-cache" --warmup 4
Benchmark 1: GC_INITIAL_HEAP_SIZE=16g before/bin/nix eval nixpkgs#gnome --no-eval-cache
Time (mean ± σ): 2.517 s ± 0.032 s [User: 1.464 s, System: 0.476 s]
Range (min … max): 2.464 s … 2.557 s 10 runs
```
(After)
```
$ taskset -c 2,3 hyperfine "GC_INITIAL_HEAP_SIZE=16g result/bin/nix eval nixpkgs#gnome --no-eval-cache" --warmup 4
Benchmark 1: GC_INITIAL_HEAP_SIZE=16g result/bin/nix eval nixpkgs#gnome --no-eval-cache
Time (mean ± σ): 2.499 s ± 0.022 s [User: 1.448 s, System: 0.478 s]
Range (min … max): 2.472 s … 2.537 s 10 runs
```
This patch adds an EvalProfiler and MultiEvalProfiler that can be used
to insert hooks into the evaluation for the purposes of function tracing
(what function-trace currently does) or for flamegraph/tracy profilers.
See the following commits for how this is supposed to be integrated into
the evaluator and performance considerations.
The existing header is a bit too big. Now the following use-cases are
separated, and get their own headers:
- Using or implementing an arbitrary store: remaining `store-api.hh`
This is closer to just being about the `Store` (and `StoreConfig`)
classes, as one would expect.
- Opening a store from a textual description: `store-open.hh`
Opening an aribtrary store implementation like this requires some sort
of store registration mechanism to exists, but the caller doesn't need
to know how it works. This just exposes the functions which use such a
mechanism, without exposing the mechanism itself
- Registering a store implementation: `store-registration.hh`
This requires understanding how the mechanism actually works, and the
mechanism in question involves templated machinery in headers we
rather not expose to things that don't need it, as it would slow down
compilation for no reason.