Instead of parsing a structured attrs at some later point, we parsed it
right away when parsing the A-Term format, and likewise serialize it to
`__json = <JSON dump>` when serializing a derivation to A-Term.
The JSON format can directly contain the JSON structured attrs without
so encoding it, so we just do that.
* It is tough to contribute to a project that doesn't use a formatter,
* It is extra hard to contribute to a project which has configured the formatter, but ignores it for some files
* Code formatting makes it harder to hide obscure / weird bugs by accident or on purpose,
Let's rip the bandaid off?
Note that PRs currently in flight should be able to be merged relatively easily by applying `clang-format` to their tip prior to merge.
Makes the behavoral change of #13263 without the underlying refactor.
Hopefully this clearly safe from a perf and GC perspective, and will
make it easier to benchmark #13263.
Now, both the unit and functional tests relating to derivation options
are tested both ways -- with input addressing and content-addressing
derivations.
For example, instead of doing
#include "nix/store-config.hh"
#include "nix/derived-path.hh"
Now do
#include "nix/store/config.hh"
#include "nix/store/derived-path.hh"
This was originally planned in the issue, and also recent requested by
Eelco.
Most of the change is purely mechanical. There is just one small
additional issue. See how, in the example above, we took this
opportunity to also turn `<comp>-config.hh` into `<comp>/config.hh`.
Well, there was already a `nix/util/config.{cc,hh}`. Even though there
is not a public configuration header for libutil (which also would be
called `nix/util/config.{cc,hh}`) that's still confusing, To avoid any
such confusion, we renamed that to `nix/util/configuration.{cc,hh}`.
Finally, note that the libflake headers already did this, so we didn't
need to do anything to them. We wouldn't want to mistakenly get
`nix/flake/flake/flake.hh`!
Progress on #7876
The short answer for why we need to do this is so we can consistently do
`#include "nix/..."`. Without this change, there are ways to still make
that work, but they are hacky, and they have downsides such as making it
harder to make sure headers from the wrong Nix library (e..g.
`libnixexpr` headers in `libnixutil`) aren't being used.
The C API alraedy used `nix_api_*`, so its headers are *not* put in
subdirectories accordingly.
Progress on #7876
We resisted doing this for a while because it would be annoying to not
have the header source file pairs close by / easy to change file
path/name from one to the other. But I am ameliorating that with
symlinks in the next commit.
I refactored the way that input resolution works in `DerivationGoal`. To
be honest, it is probably unclear to the reader whether this new way is
better or worse. I suppose *intrinsic* motivation, I can say that
- the more structured use of `inputGoal` (a local variable) is better
than the shotgrun approach with `inputDrvOutputs`
- A virtual `waiteeDone` was a hack, and now it's gone.
However, the *real* motivation of this is not the above things, but that
it is needed for my mammoth refactor fixing #11897 and #11928.
It is nice that this step could come first, rather than making that
refactor even bigger.
I do not believe there is any problem with computing
`hashDerivationModulo` the normal way with impure derivations.
Conversely, the way this used to work is very suspicious because two
almost-equal derivations that only differ in depending on different
impure derivations could have the same drv hash modulo. That is very
suspicious because there is no reason to think those two different
impure derivations will end up producing the same content-addressed
data!
Co-authored-by: Alain Zscheile <zseri.devel@ytrizja.de>
"content-address*ed*" derivation is misleading because all derivations
are *themselves* content-addressed. What may or may not be
content-addressed is not derivation itself, but the *output* of the
derivation.
The outputs are not *part* of the derivation (for then the derivation
wouldn't be complete before we built it) but rather separate entities
produced by the derivation.
"content-adddress*ed*" is not correctly because it can only describe
what the derivation *is*, and that is not what we are trying to do.
"content-address*ing*" is correct because it describes what the
derivation *does* --- it produces content-addressed data.
This is the first part of rewriteDerivation() factored out into its
own method. It's not used anywhere else at the moment, but it's useful
on lazy-trees for rewriting virtual paths.
This caused nlohmann/json.hpp to leak into a lot of compilation units,
which is slow (when not using precompiled headers).
Cuts build time from 46m24s to 42m5s (real time with -j24: 2m42s to
2m24s).
... at call sites that are may be in the hot path.
I do not know how clever the compiler gets at these sites.
My primary concern is to not regress performance and I am confident
that this achieves it the easy way.
The old `std::variant` is bad because we aren't adding a new case to
`FileIngestionMethod` so much as we are defining a separate concept ---
store object content addressing rather than file system object content
addressing. As such, it is more correct to just create a fresh
enumeration.
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
The JSON format no longer uses the legacy ATerm `r:` prefixing nonsese,
but separate fields.
Progress on #9866
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
This introduces new utility functions to get elements from JSON — in an ergonomic way and with nice error messages if the expected type does not match.
Co-authored-by: John Ericson <John.Ericson@Obsidian.Systems>
Instead, serialize as NAR and send that over, then rehash sever side.
This is alorithmically simpler, but comes at the cost of a newer
parameter to `Store::addToStoreFromDump`.
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
"hash type" -> "hash algorithm" in all comments, documentation, and
messages.
ht -> ha, [Hh]ashType -> [HhashAlgo] for all local variables and
function arguments. No API change is made.
Continuation of 5334c9c792 and 837b889c41.
No outward facing behavior is changed.
Older methods with same names that operate on on method + algo pair (for
old-style `<method>:algo`) are renamed to `*WithAlgo`.)
The functions are unit-tested in the same way the names for the hash
algorithms are tested.
To quote the method doc:
Non-impure derivations can still behave impurely, to the degree permitted
by the sandbox. Hence why this method isn't `isPure`: impure derivations
are not the negation of pure derivations. Purity can not be ascertained
except by rather heavy tools.
many paths need not be heap-allocated, and derivation env name/valye
pairs can be moved into the map.
before:
Benchmark 1: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 6.883 s ± 0.016 s [User: 5.250 s, System: 1.424 s]
Range (min … max): 6.860 s … 6.905 s 10 runs
after:
Benchmark 1: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 6.868 s ± 0.027 s [User: 5.194 s, System: 1.466 s]
Range (min … max): 6.828 s … 6.913 s 10 runs
the table is very small compared to cache sizes and a single indexed
load is faster than three comparisons.
before:
Benchmark 1: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 6.907 s ± 0.012 s [User: 5.272 s, System: 1.429 s]
Range (min … max): 6.893 s … 6.926 s 10 runs
after:
Benchmark 1: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 6.883 s ± 0.016 s [User: 5.250 s, System: 1.424 s]
Range (min … max): 6.860 s … 6.905 s 10 runs
a bunch of derivation strings contain no escape sequences. we can
optimize for this fact by first scanning for the end of a derivation
string and simply returning the contents unmodified if no escape
sequences were found. to make this even more efficient we can also use
BackedStringViews to avoid copies, avoiding heap allocations for
transient data.
before:
Benchmark 1: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 6.952 s ± 0.015 s [User: 5.294 s, System: 1.452 s]
Range (min … max): 6.926 s … 6.974 s 10 runs
after:
Benchmark 1: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 6.907 s ± 0.012 s [User: 5.272 s, System: 1.429 s]
Range (min … max): 6.893 s … 6.926 s 10 runs
istream sentry objects are very expensive for single-character
operations, and since we don't configure exception masks for the
istreams used here they don't even do anything. all we need is
end-of-string checks and an advancing position in an immutable memory
buffer, both of which can be had for much cheaper than istreams allow.
the effect of this change is most apparent on empty stores.
before:
Benchmark 1: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 7.167 s ± 0.013 s [User: 5.528 s, System: 1.431 s]
Range (min … max): 7.147 s … 7.182 s 10 runs
after:
Benchmark 1: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system'
Time (mean ± σ): 6.963 s ± 0.011 s [User: 5.330 s, System: 1.421 s]
Range (min … max): 6.943 s … 6.974 s 10 runs
This is needed for building CA deriations with a src store / dest store
split. In particular it is needed for Hydra.
https://github.com/NixOS/hydra/issues/838 currently puts realizations,
and thus build outputs, in the local store, but it should not.
This introduces some shared infrastructure for our notion of protocols.
We can then define multiple protocols in terms of that notion.
We an also express how particular protocols depend on each other.
For example, we can define a common protocol and a worker protocol,
where the second depends on the first in terms of the data types it can
read and write.
The "serve" protocol can just use the common one for now, but will
eventually need its own machinary just like the worker protocol for
version-aware serialisers
We use the same nested map representation we used for goals, again in
order to save space. We might someday want to combine with `inputDrvs`,
by doing `V = bool` instead of `V = std::set<OutputName>`, but we are
not doing that yet for sake of a smaller diff.
The ATerm format for Derivations also needs to be extended, in addition
to the in-memory format. To accomodate this, we added a new basic
versioning scheme, so old versions of Nix will get nice errors. (And
going forward, if the ATerm format changes again the errors will be even
better.)
`parsedStrings`, an internal function used as part of parsing
derivations in A-Term format, used to consume the final `]` but expect
the initial `[` to already be consumed. This made for what looked like
unbalanced brackets at callsites, which was confusing. Now it consumes
both which is hopefully less confusing.
As part of testing, we also created a unit test for the A-Term format for
regular non-experimental derivations too.
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
Co-authored-by: Valentin Gagarin <valentin.gagarin@tweag.io>
Apply suggestions from code review
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
- Don't assert: Derivation ATerms are not necessarily produced by Nix,
and parsers should always throw graceful errors
- Improve error message from `static void except(..)`, shows both what
we expected and what we actually got.
The intention is that we backport it, and then hopefully a few people
might get slightly better errors if they try out new experimental drv
files (for RFC 92) with an old version of Nix.
Types converted:
- `NixStringContextElem`
- `OutputsSpec`
- `ExtendedOutputsSpec`
- `DerivationOutput`
- `DerivationType`
Existing ones mostly conforming the pattern cleaned up:
- `ContentAddressMethod`
- `ContentAddressWithReferences`
The `DerivationGoal::derivationType` field had a bogus initialization,
now caught, so I made it `std::optional`. I think #8829 can make it
non-optional again because it will ensure we always have the derivation
when we construct a `DerivationGoal`.
See that issue (#7479) for details on the general goal.
`git grep 'Raw::Raw'` indicates the two types I didn't yet convert
`DerivedPath` and `BuiltPath` (and their `Single` variants) . This is
because @roberth and I (can't find issue right now...) plan on reworking
them somewhat, so I didn't want to churn them more just yet.
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
When loading a derivation from a JSON, malformed input would trigger
cryptic "assertion failed" errors. Simply replacing calls to `operator []`
with calls to `.at()` was not enough, as this would cause json.execptions
to be printed verbatim.
Display nice error messages instead and give some indication where the
error happened.
*Before:*
```
$ echo 4 | nix derivation add
error: [json.exception.type_error.305] cannot use operator[] with a string argument with number
$ nix derivation show nixpkgs#hello | nix derivation add
Assertion failed: (it != m_value.object->end()), function operator[], file /nix/store/8h9pxgq1776ns6qi5arx08ifgnhmgl22-nlohmann_json-3.11.2/include/nlohmann/json.hpp, line 2135.
$ nix derivation show nixpkgs#hello | jq '.[] | .name = 5' | nix derivation add
error: [json.exception.type_error.302] type must be string, but is object
$ nix derivation show nixpkgs#hello | jq '.[] | .outputs = { out: "/nix/store/8j3f8j-hello" }' | nix derivation add
error: [json.exception.type_error.302] type must be object, but is string
```
*After:*
```
$ echo 4 | nix derivation add
error: Expected JSON of derivation to be of type 'object', but it is of type 'number'
$ nix derivation show nixpkgs#hello | nix derivation add
error: Expected JSON object to contain key 'name' but it doesn't
$ nix derivation show nixpkgs#hello | jq '.[] | .name = 5' | nix derivation add
error: Expected JSON value to be of type 'string' but it is of type 'number'
$ nix derivation show nixpkgs#hello | jq '.[] | .outputs = { out: "/nix/store/8j3f8j-hello" }' | nix derivation add
error:
… while reading key 'outputs'
error: Expected JSON value to be of type 'object' but it is of type 'string'
```
Whereas `ContentAddressWithReferences` is a sum type complex because different
varieties support different notions of reference, and
`ContentAddressMethod` is a nested enum to support that,
`ContentAddress` can be a simple pair of a method and hash.
`ContentAddress` does not need to be a sum type on the outside because
the choice of method doesn't effect what type of hashes we can use.
Co-Authored-By: Cale Gibbard <cgibbard@gmail.com>
Pass this around instead of `Source &` and `Sink &` directly. This will
give us something to put the protocol version on once the time comes.
To do this ergonomically, we need to expose `RemoteStore::Connection`,
so do that too. Give it some more API docs while we are at it.
See API docs on that struct for why. The pasing as as template argument
doesn't yet happen in that commit, but will instead happen in later
commit.
Also make `WorkerOp` (now `Op`) and enum struct. This led us to catch
that two operations were not handled!
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
This is generally a fine practice: Putting implementations in headers
makes them harder to read and slows compilation. Unfortunately it is
necessary for templates, but we can ameliorate that by putting them in a
separate header. Only files which need to instantiate those templates
will need to include the header with the implementation; the rest can
just include the declaration.
This is now documenting in the contributing guide.
Also, it just happens that these polymorphic serializers are the
protocol agnostic ones. (Worker and serve protocol have the same logic
for these container types.) This means by doing this general template
cleanup, we are also getting a head start on better indicating which
code is protocol-specific and which code is shared between protocols.