mirror of
https://github.com/NixOS/nix.git
synced 2025-11-08 19:46:02 +01:00
Document "hash derivation quotiented", resolution, and build trace
Progress on #13405, which asks for an explicit characterisation of the equivalence relation like the one given here. Also progress on #11895, because we're using the term "build trace entry" instead of "realisation". Mention #9259, a future work item. Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
This commit is contained in:
parent
34ac1792f9
commit
72d0f7b619
13 changed files with 528 additions and 22 deletions
|
|
@ -7,6 +7,7 @@ additional-css = ["custom.css"]
|
|||
additional-js = ["redirects.js"]
|
||||
edit-url-template = "https://github.com/NixOS/nix/tree/master/doc/manual/{path}"
|
||||
git-repository-url = "https://github.com/NixOS/nix"
|
||||
mathjax-support = true
|
||||
|
||||
# Handles replacing @docroot@ with a path to ./source relative to that markdown file,
|
||||
# {{#include handlebars}}, and the @generated@ syntax used within these. it mostly
|
||||
|
|
|
|||
|
|
@ -92,6 +92,8 @@ manual = custom_target(
|
|||
(cd @2@; RUST_LOG=warn @1@ build -d @2@ 3>&2 2>&1 1>&3) | { grep -Fv "because fragment resolution isn't implemented" || :; } 3>&2 2>&1 1>&3
|
||||
rm -rf @2@/manual
|
||||
mv @2@/html @2@/manual
|
||||
# Remove Mathjax 2.7, because we will actually use MathJax 3.x
|
||||
find @2@/manual | grep .html | xargs sed -i -e '/2.7.1.MathJax.js/d'
|
||||
find @2@/manual -iname meson.build -delete
|
||||
'''.format(
|
||||
python.full_path(),
|
||||
|
|
|
|||
|
|
@ -26,9 +26,12 @@
|
|||
- [Derivation Outputs and Types of Derivations](store/derivation/outputs/index.md)
|
||||
- [Content-addressing derivation outputs](store/derivation/outputs/content-address.md)
|
||||
- [Input-addressing derivation outputs](store/derivation/outputs/input-address.md)
|
||||
- [Build Trace](store/build-trace.md)
|
||||
- [Derivation Resolution](store/resolution.md)
|
||||
- [Building](store/building.md)
|
||||
- [Store Types](store/types/index.md)
|
||||
{{#include ./store/types/SUMMARY.md}}
|
||||
- [Appendix: Math notation](store/math-notation.md)
|
||||
- [Nix Language](language/index.md)
|
||||
- [Data Types](language/types.md)
|
||||
- [String context](language/string-context.md)
|
||||
|
|
|
|||
|
|
@ -1,6 +1,8 @@
|
|||
# Derivation "ATerm" file format
|
||||
|
||||
For historical reasons, [store derivations][store derivation] are stored on-disk in [ATerm](https://homepages.cwi.nl/~daybuild/daily-books/technology/aterm-guide/aterm-guide.html) format.
|
||||
For historical reasons, [store derivations][store derivation] are stored on-disk in "Annotated Term" (ATerm) format
|
||||
([guide](https://homepages.cwi.nl/~daybuild/daily-books/technology/aterm-guide/aterm-guide.html),
|
||||
[paper](https://doi.org/10.1002/(SICI)1097-024X(200003)30:3%3C259::AID-SPE298%3E3.0.CO;2-Y)).
|
||||
|
||||
## The ATerm format used
|
||||
|
||||
|
|
|
|||
|
|
@ -39,9 +39,9 @@ properties:
|
|||
This is a guard that allows us to continue evolving this format.
|
||||
The choice of `3` is fairly arbitrary, but corresponds to this informal version:
|
||||
|
||||
- Version 0: A-Term format
|
||||
- Version 0: ATerm format
|
||||
|
||||
- Version 1: Original JSON format, with ugly `"r:sha256"` inherited from A-Term format.
|
||||
- Version 1: Original JSON format, with ugly `"r:sha256"` inherited from ATerm format.
|
||||
|
||||
- Version 2: Separate `method` and `hashAlgo` fields in output specs
|
||||
|
||||
|
|
|
|||
53
doc/manual/source/store/build-trace.md
Normal file
53
doc/manual/source/store/build-trace.md
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
# Build Trace
|
||||
|
||||
> **Warning**
|
||||
>
|
||||
> This entire concept is currently
|
||||
> [**experimental**](@docroot@/development/experimental-features.md#xp-feature-ca-derivations)
|
||||
> and subject to change.
|
||||
|
||||
The *build trace* is a [memoization table](https://en.wikipedia.org/wiki/Memoization) for builds.
|
||||
It maps the inputs of builds to the outputs of builds.
|
||||
Concretely, that means it maps [derivations][derivation] to maps of [output] names to [store objects][store object].
|
||||
|
||||
In general the derivations used as a key should be [*resolved*](./resolution.md).
|
||||
A build trace with all-resolved-derivation keys is also called a *base build trace* for extra clarity.
|
||||
If all the resolved inputs of a derivation are content-addressed, that means the inputs will be fully determined, leaving no ambiguity for what build was performed.
|
||||
(Input-addressed inputs however are still ambiguous. They too should be locked down, but this is left as future work.)
|
||||
|
||||
Accordingly, to look up an unresolved derivation, one must first resolve it to get a resolved derivation.
|
||||
Resolving itself involves looking up entries in the build trace, so this is a mutually recursive process that will end up inspecting possibly many entries.
|
||||
|
||||
Except for the issue with input-addressed paths called out above, base build traces are trivially *coherent* -- incoherence is not possible.
|
||||
That means that the claims that each key-value base build try entry makes are independent, and no mapping invalidates another mapping.
|
||||
|
||||
Whether the mappings are *true*, i.e. the faithful recording of actual builds performed, is another matter.
|
||||
Coherence is about the multiple claims of the build trace being mutually consistent, not about whether the claims are individually true or false.
|
||||
|
||||
In general, there is no way to audit a build trace entry except for by performing the build again from scratch.
|
||||
And even in that case, a different result doesn't mean the original entry was a "lie", because the derivation being built may be non-deterministic.
|
||||
As such, the decision of whether to trust a counterparty's build trace is a fundamentally subject policy choice.
|
||||
Build trace entries are typically *signed* in order to enable arbitrary public-key-based trust polices.
|
||||
|
||||
## Derived build traces
|
||||
|
||||
Implementations that wish to memoize the above may also keep additional *derived* build trace entries that do map unresolved derivations.
|
||||
But if they do so, they *must* also keep the underlying base entries with resolved derivation keys around.
|
||||
Firstly, this ensures that the derived entries are merely cache, which could be recomputed from scratch.
|
||||
Secondly, this ensures the coherence of the derived build trace.
|
||||
|
||||
Unlike with base build traces, incoherence with derived build traces is possible.
|
||||
The key ingredient is that derivation resolution is only deterministic with respect to a fixed base build trace.
|
||||
Without fixing the base build trace, it inherits the subjectivity of base build traces themselves.
|
||||
|
||||
Concretely, suppose there are three derivations \\(a\\), \\(b\\), and \((c\\).
|
||||
Let \\(a\\) be a resolved derivation, but let \\(b\\) and \((c\\) be unresolved and both take as an input an output of \\(a\\).
|
||||
Now suppose that derived entries are made for \\(b\\) and \((c\\) based on two different entries of \\(a\\).
|
||||
(This could happen if \\(a\\) is non-deterministic, \\(a\\) and \\(b\\) are built in one store, \\(a\\) and \\(c\\) are built in another store, and then a third store substitutes from both of the first two stores.)
|
||||
|
||||
If trusting the derived build trace entries for \\(b\\) and \((c\\) requires that each's underlying entry for \\(a\\) be also trusted, the two different mappings for \\(a\\) will be caught.
|
||||
However, if \\(b\\) and \((c\\)'s entries can be combined in isolation, there will be nothing to catch the contradiction in their hidden assumptions about \\(a\\)'s output.
|
||||
|
||||
[derivation]: ./derivation/index.md
|
||||
[output]: ./derivation/outputs/index.md
|
||||
[store object]: @docroot@/store/store-object.md
|
||||
|
|
@ -245,7 +245,7 @@ If those other derivations *also* abide by this common case (and likewise for tr
|
|||
> note the ".drv"
|
||||
> ```
|
||||
|
||||
## Extending the model to be higher-order
|
||||
## Extending the model to be higher-order {#dynamic}
|
||||
|
||||
**Experimental feature**: [`dynamic-derivations`](@docroot@/development/experimental-features.md#xp-feature-dynamic-derivations)
|
||||
|
||||
|
|
|
|||
|
|
@ -167,10 +167,10 @@ It is only in the potential for that check to fail that they are different.
|
|||
>
|
||||
> In a future world where floating content-addressing is also stable, we in principle no longer need separate [fixed](#fixed) content-addressing.
|
||||
> Instead, we could always use floating content-addressing, and separately assert the precise value content address of a given store object to be used as an input (of another derivation).
|
||||
> A stand-alone assertion object of this sort is not yet implemented, but its possible creation is tracked in [Issue #11955](https://github.com/NixOS/nix/issues/11955).
|
||||
> A stand-alone assertion object of this sort is not yet implemented, but its possible creation is tracked in [issue #11955](https://github.com/NixOS/nix/issues/11955).
|
||||
>
|
||||
> In the current version of Nix, fixed outputs which fail their hash check are still registered as valid store objects, just not registered as outputs of the derivation which produced them.
|
||||
> This is an optimization that means if the wrong output hash is specified in a derivation, and then the derivation is recreated with the right output hash, derivation does not need to be rebuilt --- avoiding downloading potentially large amounts of data twice.
|
||||
> This is an optimization that means if the wrong output hash is specified in a derivation, and then the derivation is recreated with the right output hash, derivation does not need to be rebuilt — avoiding downloading potentially large amounts of data twice.
|
||||
> This optimisation prefigures the design above:
|
||||
> If the output hash assertion was removed outside the derivation itself, Nix could additionally not only register that outputted store object like today, but could also make note that derivation did in fact successfully download some data.
|
||||
For example, for the "fetch URL" example above, making such a note is tantamount to recording what data is available at the time of download at the given URL.
|
||||
|
|
|
|||
|
|
@ -6,26 +6,221 @@
|
|||
That is to say, an input-addressed output's store path is a function not of the output itself, but of the derivation that produced it.
|
||||
Even if two store paths have the same contents, if they are produced in different ways, and one is input-addressed, then they will have different store paths, and thus guaranteed to not be the same store object.
|
||||
|
||||
<!---
|
||||
## Modulo content addressed derivation outputs {#hash-quotient-drv}
|
||||
|
||||
### Modulo fixed-output derivations
|
||||
A naive implementation of an output hash computation for input-addressed outputs would be to hash the derivation hash and output together.
|
||||
This clearly has the uniqueness properties we want for input-addressed outputs, but suffers from an inefficiency.
|
||||
Specifically, new builds would be required whenever a change is made to a fixed-output derivation, despite having provably no differences in the inputs to the new derivation compared to what it used to be.
|
||||
Concretely, this would cause a "mass rebuild" whenever any fetching detail changes, including mirror lists, certificate authority certificates, etc.
|
||||
|
||||
**TODO hash derivation modulo.**
|
||||
To solve this problem, we compute output hashes differently, so that certain output hashes become identical.
|
||||
We call this concept quotient hashing, in reference to quotient types or sets.
|
||||
|
||||
So how do we compute the hash part of the output path of a derivation?
|
||||
This is done by the function `hashDrv`, shown in Figure 5.10.
|
||||
It distinguishes between two cases.
|
||||
If the derivation is a fixed-output derivation, then it computes a hash over just the `outputHash` attributes.
|
||||
So how do we compute the hash part of the output paths of an input-addressed derivation?
|
||||
This is done by the function `hashQuotientDerivation`, shown below.
|
||||
|
||||
If the derivation is not a fixed-output derivation, we replace each element in the derivation’s inputDrvs with the result of a call to `hashDrv` for that element.
|
||||
(The derivation at each store path in `inputDrvs` is converted from its on-disk ATerm representation back to a `StoreDrv` by the function `parseDrv`.) In essence, `hashDrv` partitions store derivations into equivalence classes, and for hashing purpose it replaces each store path in a derivation graph with its equivalence class.
|
||||
First, a word on inputs.
|
||||
`hashQuotientDerivation` is only defined on derivations whose [inputs](@docroot@/store/derivation/index.md#inputs) take the first-order form:
|
||||
```typescript
|
||||
type ConstantPath = {
|
||||
path: StorePath;
|
||||
};
|
||||
|
||||
The recursion in Figure 5.10 is inefficient:
|
||||
it will call itself once for each path by which a subderivation can be reached, i.e., `O(V k)` times for a derivation graph with `V` derivations and with out-degree of at most `k`.
|
||||
In the actual implementation, memoisation is used to reduce this to `O(V + E)` complexity for a graph with E edges.
|
||||
type FirstOrderOutputPath = {
|
||||
drvPath: StorePath;
|
||||
output: OutputName;
|
||||
};
|
||||
|
||||
-->
|
||||
type FirstOrderDerivingPath = ConstantPath | FirstOrderOutputPath;
|
||||
|
||||
type Inputs = Set<FirstOrderDerivingPath>;
|
||||
```
|
||||
|
||||
For the algorithm below, we adopt a derivation where the two types of (first order) derived paths are partitioned into two sets, as follows:
|
||||
```typescript
|
||||
type Derivation = {
|
||||
// inputs: Set<FirstOrderDerivingPath>; // replaced
|
||||
inputSrcs: Set<ConstantPath>; // new instead
|
||||
inputDrvOutputs: Set<FirstOrderOutputPath>; // new instead
|
||||
// ...other fields...
|
||||
};
|
||||
```
|
||||
|
||||
In the [currently-experimental][xp-feature-dynamic-derivations] higher-order case where outputs of outputs are allowed as [deriving paths][deriving-path] and thus derivation inputs, derivations using that generalization are not valid arguments to this function.
|
||||
Those derivations must be (partially) [resolved](@docroot@/store/resolution.md) enough first, to the point where no such higher-order inputs remain.
|
||||
Then, and only then, can input addresses be assigned.
|
||||
|
||||
```
|
||||
function hashQuotientDerivation(drv) -> Hash:
|
||||
assert(drv.outputs are input-addressed)
|
||||
drv′ ← drv with {
|
||||
inputDrvOutputs = ⋃(
|
||||
assert(drvPath is store path)
|
||||
case hashOutputsOrQuotientDerivation(readDrv(drvPath)) of
|
||||
drvHash : Hash →
|
||||
(drvHash.toBase16(), output)
|
||||
outputHashes : Map[String, Hash] →
|
||||
(outputHashes[output].toBase16(), "out")
|
||||
| (drvPath, output) ∈ drv.inputDrvOutputs
|
||||
)
|
||||
}
|
||||
return hashSHA256(printDrv(drv′))
|
||||
|
||||
function hashOutputsOrQuotientDerivation(drv) -> Map[String, Hash] | Hash:
|
||||
if drv.outputs are content-addressed:
|
||||
return {
|
||||
outputName ↦ hashSHA256(
|
||||
"fixed:out:" + ca.printMethodAlgo() +
|
||||
":" + ca.hash.toBase16() +
|
||||
":" + ca.makeFixedOutputPath(drv.name, outputName))
|
||||
| (outputName ↦ output) ∈ drv.outputs
|
||||
, ca = output.contentAddress // or get from build trace if floating
|
||||
}
|
||||
else: // drv.outputs are input-addressed
|
||||
return hashQuotientDerivation(drv)
|
||||
```
|
||||
|
||||
### `hashQuotientDerivation`
|
||||
|
||||
We replace each element in the derivation's `inputDrvOutputs` using data from a call to `hashOutputsOrQuotientDerivation` on the `drvPath` of that element.
|
||||
When `hashOutputsOrQuotientDerivation` returns a single drv hash (because the input derivation in question is input-addressing), we simply swap out the `drvPath` for that hash, and keep the same output name.
|
||||
When `hashOutputsOrQuotientDerivation` returns a map of content addresses per-output, we look up the output in question, and pair it with the output name `out`.
|
||||
|
||||
The resulting pseudo-derivation (with hashes instead of store paths in `inputDrvs`) is then printed (in the ["ATerm" format](@docroot@/protocols/derivation-aterm.md)) and hashed, and this becomes the hash of the "quotient derivation".
|
||||
|
||||
When calculating output hashes, `hashQuotientDerivation` is called on an almost-complete input-addressing derivation, which is just missing its input-addressed outputs paths.
|
||||
The derivation hash is then used to calculate output paths for each output.
|
||||
<!-- TODO describe how this is done. -->
|
||||
Those output paths can then be substituted into the almost-complete input-addressed derivation to complete it.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> There may be an unintentional deviation from specification currently implemented in the `(outputHashes[output].toBase16(), "out")` case.
|
||||
> This is not fatal because the deviation would only apply for content-addressing derivations with more than one output, and that only occurs in the floating case, which is [experimental][xp-feature-ca-derivations].
|
||||
> Once this bug is fixed, this note will be removed.
|
||||
|
||||
### `hashOutputsOrQuotientDerivation`
|
||||
|
||||
How does `hashOutputsOrQuotientDerivation` in turn work?
|
||||
It consists of two main cases, based on whether the outputs of the derivation are to be input-addressed or content-addressed.
|
||||
|
||||
#### Input-addressed outputs case
|
||||
|
||||
In the input-addressed case, it just calls `hashQuotientDerivation`, and returns that derivation hash.
|
||||
This makes `hashQuotientDerivation` and `hashOutputsOrQuotientDerivation` mutually-recursive.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> In this case, `hashQuotientDerivation` is being called on a *complete* input-addressing derivation that already has its output paths calculated.
|
||||
> The `inputDrvs` substitution takes place anyways.
|
||||
|
||||
#### Content-addressed outputs case
|
||||
|
||||
If the outputs are [content-addressed](./content-address.md), then it computes a hash for each output derived from the content-address of that output.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> In the [fixed](./content-address.md#fixed) content-addressing case, the outputs' content addresses are statically specified in advance, so this always just works.
|
||||
> (The fixed case is what the pseudo-code shows.)
|
||||
>
|
||||
> In the [floating](./content-address.md#floating) case, the content addresses are not specified in advance.
|
||||
> This is what the "or get from [build trace](@docroot@/store/build-trace.md) if floating" comment refers to.
|
||||
> In this case, the algorithm is *stuck* until the input in question is built, and we know what the actual contents of the output in question is.
|
||||
>
|
||||
> That is OK however, because there is no problem with delaying the assigning of input addresses (which, remember, is what `hashQuotientDerivation` is ultimately for) until all inputs are known.
|
||||
|
||||
### Performance
|
||||
|
||||
The recursion in the algorithm is potentially inefficient:
|
||||
it could call itself once for each path by which a subderivation can be reached, i.e., `O(V^k)` times for a derivation graph with `V` derivations and with out-degree of at most `k`.
|
||||
In the actual implementation, [memoisation](https://en.wikipedia.org/wiki/Memoization) is used to reduce this cost to be proportional to the total number of `inputDrvOutputs` encountered.
|
||||
|
||||
### Semantic properties
|
||||
|
||||
*See [this chapter's appendix](@docroot@/store/math-notation.md) on grammar and metavariable conventions.*
|
||||
|
||||
In essence, `hashQuotientDerivation` partitions input-addressing derivations into equivalence classes: every derivation in that equivalence class is mapped to the same derivation hash.
|
||||
We can characterize this equivalence relation directly, by working bottom up.
|
||||
|
||||
We start by defining an equivalence relation on first-order output deriving paths that refer content-addressed derivation outputs. Two such paths are equivalent if they refer to the same store object:
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$d\_1$ is content-addressing}
|
||||
\\AxiomC{$d\_2$ is content-addressing}
|
||||
\\AxiomC{$
|
||||
{}^\*(\text{path}(d\_1), o\_1)
|
||||
\=
|
||||
{}^\*(\text{path}(d\_2), o\_2)
|
||||
$}
|
||||
\\TrinaryInfC{$(\text{path}(d\_1), o\_1) \\,\\sim_{\\mathrm{CA}}\\, (d\_2, o\_2)$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
where \\({}^*(s, o)\\) denotes the store object that the output deriving path refers to.
|
||||
|
||||
We will also need the following construction to lift any equivalence relation on \\(X\\) to an equivalence relation on (finite) sets of \\(X\\) (in short, \\(\\mathcal{P}(X)\\)):
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$\\forall a \\in A. \\exists b \\in B. a \\,\\sim\_X\\, b$}
|
||||
\\AxiomC{$\\forall b \\in B. \\exists a \\in A. b \\,\\sim\_X\\, a$}
|
||||
\\BinaryInfC{$A \\,\\sim_{\\mathcal{P}(X)}\\, B$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
Now we can define the equivalence relation \\(\\sim_\\mathrm{IA}\\) on input-addressed derivation outputs. Two input-addressed outputs are equivalent if their derivations are equivalent (via the yet-to-be-defined \\(\\sim_{\\mathrm{IADrv}}\\) relation) and their output names are the same:
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$d\_1$ is input-addressing}
|
||||
\\AxiomC{$d\_2$ is input-addressing}
|
||||
\\AxiomC{$d\_1 \\,\\sim_{\\mathrm{IADrv}}\\, d\_2$}
|
||||
\\AxiomC{$o\_1 = o\_2$}
|
||||
\\QuaternaryInfC{$(\text{path}(d\_1), o\_1) \\,\\sim_{\\mathrm{IA}}\\, (\text{path}(d\_2), o\_2)$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
And now we can define \\(\\sim_{\\mathrm{IADrv}}\\).
|
||||
Two input-addressed derivations are equivalent if their content-addressed inputs are equivalent, their input-addressed inputs are also equivalent, and they are otherwise equal:
|
||||
|
||||
<!-- cheating a bit with the semantics to get a good layout that fits on the page -->
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\alwaysNoLine
|
||||
\\AxiomC{$
|
||||
\\mathrm{caInputs}(d\_1)
|
||||
\\,\\sim_{\\mathcal{P}(\\mathrm{CA})}\\,
|
||||
\\mathrm{caInputs}(d\_2)
|
||||
$}
|
||||
\\AxiomC{$
|
||||
\\mathrm{iaInputs}(d\_1)
|
||||
\\,\\sim_{\\mathcal{P}(\\mathrm{IA})}\\,
|
||||
\\mathrm{iaInputs}(d\_2)
|
||||
$}
|
||||
\\BinaryInfC{$
|
||||
d\_1\left[\\mathrm{inputDrvOutputs} := \\{\\}\right]
|
||||
\=
|
||||
d\_2\left[\\mathrm{inputDrvOutputs} := \\{\\}\right]
|
||||
$}
|
||||
\\alwaysSingleLine
|
||||
\\UnaryInfC{$d\_1 \\,\\sim_{\\mathrm{IADrv}}\\, d\_2$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
where \\(\\mathrm{caInputs}(d)\\) returns the content-addressed inputs of \\(d\\) and \\(\\mathrm{iaInputs}(d)\\) returns the input-addressed inputs.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> An astute reader might notice that that nowhere does `inputSrcs` enter into these definitions.
|
||||
> That means that replacing an input derivation with its outputs directly added to `inputSrcs` always results in a derivation in a different equivalence class, despite the resulting input closure (as would be mounted in the store at build time) being the same.
|
||||
> [Issue #9259](https://github.com/NixOS/nix/issues/9259) is about creating a coarser equivalence relation to address this.
|
||||
>
|
||||
> \\(\\sim_\mathrm{Drv}\\) from [derivation resolution](@docroot@/store/resolution.md) is such an equivalence relation.
|
||||
> It is coarser than this one: any two derivations which are "'hash quotient derivation'-equivalent" (\\(\\sim_\mathrm{IADrv}\\)) are also "resolution-equivalent" (\\(\\sim_\mathrm{Drv}\\)).
|
||||
> It also relates derivations whose `inputDrvOutputs` have been rewritten into `inputSrcs`.
|
||||
|
||||
[deriving-path]: @docroot@/store/derivation/index.md#deriving-path
|
||||
[xp-feature-dynamic-derivations]: @docroot@/development/experimental-features.md#xp-feature-dynamic-derivations
|
||||
[xp-feature-ca-derivations]: @docroot@/development/experimental-features.md#xp-feature-ca-derivations
|
||||
[xp-feature-git-hashing]: @docroot@/development/experimental-features.md#xp-feature-git-hashing
|
||||
[xp-feature-impure-derivations]: @docroot@/development/experimental-features.md#xp-feature-impure-derivations
|
||||
|
|
|
|||
16
doc/manual/source/store/math-notation.md
Normal file
16
doc/manual/source/store/math-notation.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
# Appendix: Math notation
|
||||
|
||||
A few times in this manual, formal "proof trees" are used for [natural deduction](https://en.wikipedia.org/wiki/Natural_deduction)-style definition of various [relations](https://en.wikipedia.org/wiki/Relation_(mathematics)).
|
||||
|
||||
The following grammar and assignment of metavariables to syntactic categories is used in these sections.
|
||||
|
||||
\\begin{align}
|
||||
s, t &\in \text{store-path} \\\\
|
||||
o &\in \text{output-name} \\\\
|
||||
i, p &\in \text{deriving-path} \\\\
|
||||
d &\in \text{derivation}
|
||||
\\end{align}
|
||||
|
||||
\\begin{align}
|
||||
\text{deriving-path} \quad p &::= s \mid (p, o)
|
||||
\\end{align}
|
||||
219
doc/manual/source/store/resolution.md
Normal file
219
doc/manual/source/store/resolution.md
Normal file
|
|
@ -0,0 +1,219 @@
|
|||
# Derivation Resolution
|
||||
|
||||
*See [this chapter's appendix](@docroot@/store/math-notation.md) on grammar and metavariable conventions.*
|
||||
|
||||
To *resolve* a derivation is to replace its [inputs] with the simplest inputs — plain store paths — that denote the same store objects.
|
||||
|
||||
Derivations that only have store paths as inputs are likewise called *resolved derivations*.
|
||||
(They are called that whether they are in fact the output of derivation resolution, or just made that way without non-store-path inputs to begin with.)
|
||||
|
||||
## Input Content Equivalence of Derivations
|
||||
|
||||
[Deriving paths][deriving-path] intentionally make it possible to refer to the same [store object] in multiple ways.
|
||||
This is a consequence of content-addressing, since different derivations can produce the same outputs, and the same data can also be manually added to the store.
|
||||
This is also a consequence even of input-addressing, as an output can be referred to by derivation and output name, or directly by its [computed](./derivation/outputs/input-address.md) store path.
|
||||
Since dereferencing deriving paths is thus not injective, it induces an equivalence relation on deriving paths.
|
||||
|
||||
Let's call this equivalence relation \\(\\sim\\), where \\(p_1 \\sim p_2\\) means that deriving paths \\(p_1\\) and \\(p_2\\) refer to the same store object.
|
||||
|
||||
**Content Equivalence**: Two deriving paths are equivalent if they refer to the same store object:
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{${}^*p_1 = {}^*p_2$}
|
||||
\\UnaryInfC{$p_1 \\,\\sim_\\mathrm{DP}\\, p_2$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
where \\({}^\*p\\) denotes the store object that deriving path \\(p\\) refers to.
|
||||
|
||||
This also induces an equivalence relation on sets of deriving paths:
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$\\{ {}^*p | p \\in P_1 \\} = \\{ {}^*p | p \\in P_2 \\}$}
|
||||
\\UnaryInfC{$P_1 \\,\\sim_{\\mathcal{P}(\\mathrm{DP})}\\, P_2$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
**Input Content Equivalence**: This, in turn, induces an equivalence relation on derivations: two derivations are equivalent if their inputs are equivalent, and they are otherwise equal:
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$\\mathrm{inputs}(d_1) \\,\\sim_{\\mathcal{P}(\\mathrm{DP})}\\, \\mathrm{inputs}(d_2)$}
|
||||
\\AxiomC{$
|
||||
d\_1\left[\\mathrm{inputs} := \\{\\}\right]
|
||||
\=
|
||||
d\_2\left[\\mathrm{inputs} := \\{\\}\right]
|
||||
$}
|
||||
\\BinaryInfC{$d_1 \\,\\sim_\\mathrm{Drv}\\, d_2$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
Derivation resolution always maps derivations to input-content-equivalent derivations.
|
||||
|
||||
## Resolution relation
|
||||
|
||||
Dereferencing a derived path — \\({}^\*p\\) above — was just introduced as a black box.
|
||||
But actually it is a multi-step process of looking up build results in the [build trace] that itself depends on resolving the lookup keys.
|
||||
Resolution is thus a recursive multi-step process that is worth diagramming formally.
|
||||
|
||||
We can do this with a small-step binary transition relation; let's call it \\(\rightsquigarrow\\).
|
||||
We can then conclude dereferenced equality like this:
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$p\_1 \\rightsquigarrow^* p$}
|
||||
\\AxiomC{$p\_2 \\rightsquigarrow^* p$}
|
||||
\\BinaryInfC{${}^*p\_1 = {}^*p\_2$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
I.e. by showing that both original items resolve (over 0 or more small steps, hence the \\({}^*\\)) to the same exact item.
|
||||
|
||||
With this motivation, let's now formalize a [small-step](https://en.wikipedia.org/wiki/Operational_semantics#Small-step_semantics) system of reduction rules for resolution.
|
||||
|
||||
### Formal rules
|
||||
|
||||
### \\(\text{resolved}\\) unary relation
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$s \in \text{store-path}$}
|
||||
\\UnaryInfC{$s$ resolved}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$\forall i \in \mathrm{inputs}(d). i \text{ resolved}$}
|
||||
\\UnaryInfC{$d$ resolved}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
### \\(\rightsquigarrow\\) binary relation
|
||||
|
||||
> **Remark**
|
||||
>
|
||||
> Actually, to be completely formal we would need to keep track of the build trace we are choosing to resolve against.
|
||||
>
|
||||
> We could do that by making \\(\rightsquigarrow\\) a ternary relation, which would pass the build trace to itself until it finally uses it in that one rule.
|
||||
> This would add clutter more than insight, so we didn't bother to write it.
|
||||
>
|
||||
> There are other options too, like saying the whole reduction rule system is parameterized on the build trace, essentially [currying](https://en.wikipedia.org/wiki/Currying) the ternary \\(\rightsquigarrow\\) into a function from build traces to the binary relation written above.
|
||||
|
||||
#### Core build trace lookup rule
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$s \in \text{store-path}$}
|
||||
\\AxiomC{${}^*s \in \text{derivation}$}
|
||||
\\AxiomC{${}^*s$ resolved}
|
||||
\\AxiomC{$\mathrm{build\text{-}trace}[s][o] = t$}
|
||||
\\QuaternaryInfC{$(s, o) \rightsquigarrow t$}
|
||||
\\RightLabel{\\scriptsize output path resolution}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
#### Inductive rules
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$i \\rightsquigarrow i'$}
|
||||
\\AxiomC{$i \\in \\mathrm{inputs}(d)$}
|
||||
\\BinaryInfC{$d \\rightsquigarrow d[i \\mapsto i']$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$d \\rightsquigarrow d'$}
|
||||
\\UnaryInfC{$(\\mathrm{path}(d), o) \\rightsquigarrow (\\mathrm{path}(d'), o)$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
\\[
|
||||
\\begin{prooftree}
|
||||
\\AxiomC{$p \\rightsquigarrow p'$}
|
||||
\\UnaryInfC{$(p, o) \\rightsquigarrow (p', o)$}
|
||||
\\end{prooftree}
|
||||
\\]
|
||||
|
||||
### Properties
|
||||
|
||||
Like all well-behaved evaluation relations, partial resolution is [*confluent*](https://en.wikipedia.org/wiki/Confluence_(abstract_rewriting)).
|
||||
Also, if we take the symmetric closure of \\(\\rightsquigarrow^\*\\), we end up with the equivalence relations of the previous section.
|
||||
Resolution respects content equivalence for deriving paths, and input content equivalence for derivations.
|
||||
|
||||
> **Remark**
|
||||
>
|
||||
> We chose to define from scratch an "resolved" unary relation explicitly above.
|
||||
> But it can also be defined as the normal forms of the \\(\\rightsquigarrow^\*\\) relation:
|
||||
>
|
||||
> \\[ a \text{ resolved} \Leftrightarrow \forall b. b \rightsquigarrow^* a \Rightarrow b = a\\]
|
||||
>
|
||||
> In prose, resolved terms are terms which \\(\\rightsquigarrow^\*\\) only relates on the left side to the same term on the right side; they are the terms which can be resolved no further.
|
||||
|
||||
## Partial versus Complete Resolution
|
||||
|
||||
Similar to evaluation, we can also speak of *partial* versus *complete* derivation resolution.
|
||||
Partial derivation resolution is what we've actually formalized above with \\(\\rightsquigarrow^\*\\).
|
||||
Complete resolution is resolution ending in a resolved term (deriving path or derivation).
|
||||
(Which is a normal form of the relation, per the remark above.)
|
||||
|
||||
With partial resolution, a derivation is related to equivalent derivations with the same or simpler inputs, but not all those inputs will be plain store paths.
|
||||
This is useful when the input refers to a floating content addressed output we have not yet built — we don't know what (content-address) store path will used for that derivation, so we are "stuck" trying to resolve the deriving path in question.
|
||||
(In the above formalization, this happens when the build trace is missing the keys we wish to look up in it.)
|
||||
|
||||
Complete resolution is a *functional* relation, i.e. values on the left are uniquely related with values on the right.
|
||||
It is not however, a *total* relation (in general, assuming arbitrary build traces).
|
||||
This is discussed in the next section.
|
||||
|
||||
## Termination
|
||||
|
||||
For static derivations graphs, complete resolution is indeed total, because it always terminates for all inputs.
|
||||
(A relation that is both total and functional is a function.)
|
||||
|
||||
For [dynamic][xp-feature-dynamic-derivations] derivation graphs, however, this is not the case — resolution is not guaranteed to terminate.
|
||||
The issue isn't rewriting deriving paths themselves:
|
||||
a single rewrite to normalize an output deriving path to a constant one always exists, and always proceeds in one step.
|
||||
The issue is that dynamic derivations (i.e. those that are filled-in the graph by a previous resolution) may have more transitive dependencies than the original derivation.
|
||||
|
||||
> **Example**
|
||||
>
|
||||
> Suppose we have this deriving path
|
||||
> ```json
|
||||
> {
|
||||
> "drvPath": {
|
||||
> "drvPath": "...-foo.drv",
|
||||
> "output": "bar.drv"
|
||||
> },
|
||||
> "output": "baz"
|
||||
> }
|
||||
> ```
|
||||
> and derivation `foo` is already resolved.
|
||||
> When we resolve deriving path we'll end up with something like.
|
||||
> ```json
|
||||
> {
|
||||
> "drvPath": "...-foo-bar.drv",
|
||||
> "output": "baz"
|
||||
> }
|
||||
> ```
|
||||
> So far is just an atomic single rewrite, with no termination issues.
|
||||
> But the derivation `foo-bar` may have its *own* dynamic derivation inputs.
|
||||
> Resolution must resolve that derivation first before the above deriving path can finally be normalized to a plain `...-foo-bar-baz` store path.
|
||||
|
||||
The important thing to notice is that while "build trace" *keys* must be resolved.
|
||||
The *value* those keys are mapped to have no such constraints.
|
||||
An arbitrary store object has no notion of being resolved or not.
|
||||
But, an arbitrary store object can be read back as a derivation (as will in fact be done in case for dynamic derivations / nested output deriving paths).
|
||||
And those derivations need *not* be resolved.
|
||||
|
||||
It is those dynamic non-resolved derivations which are the source of non-termination.
|
||||
By the same token, they are also the reason why dynamic derivations offer greater expressive power.
|
||||
|
||||
[store object]: @docroot@/store/store-object.md
|
||||
[inputs]: @docroot@/store/derivation/index.md#inputs
|
||||
[build trace]: @docroot@/store/build-trace.md
|
||||
[deriving-path]: @docroot@/store/derivation/index.md#deriving-path
|
||||
[xp-feature-dynamic-derivations]: @docroot@/development/experimental-features.md#xp-feature-dynamic-derivations
|
||||
15
doc/manual/theme/head.hbs
Normal file
15
doc/manual/theme/head.hbs
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
<script>
|
||||
MathJax = {
|
||||
loader: {load: ['[tex]/bussproofs']},
|
||||
tex: {
|
||||
packages: {'[+]': ['bussproofs']},
|
||||
// Doesn't seem to work in mathjax 3
|
||||
//formatError: function(jax, error) {
|
||||
// console.log(`TeX error in "${jax.latex}": ${error.message}`);
|
||||
// return jax.formatError(error);
|
||||
//}
|
||||
}
|
||||
};
|
||||
</script>
|
||||
<!-- Load a newer versino of MathJax than mdbook does by default, and which in particular has working relative paths for the "bussproofs" extension. -->
|
||||
<script async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/3.0.1/es5/tex-mml-chtml.js"></script>
|
||||
Loading…
Add table
Add a link
Reference in a new issue