nixos/nix

mirror of https://github.com/NixOS/nix.git synced 2025-11-14 14:32:42 +01:00

John Ericson 4b9fb54957 Derivation doc improvements

2025-01-19 13:07:37 -05:00

11 KiB

Raw Blame History

Derivation and Deriving Path

So far, we have covered "inert" store objects. But the point of the Nix store layer is to be a build system. Other system (like Git or IPFS) also store and transfer immutable data, but they don't concern themselves with how that data was created.

This is where Nix distinguishes itself. Derivations represent individual build steps, and deriving paths are needed to to the outputs of those build steps. The two concepts need to be introduced together because, as described below, each depends on the other.

Derivation

A derivation is a specification for running an executable on precisely defined input files to repeatably produce output files at uniquely determined file system paths.

What is natural Unix analog for a build step in action? Answer: a process that will eventually exit, leaving behind some output date. What is the natural way to plan such a step? An execve system call.

A derivation consists of:

A (base) name
A set of outputs, consisting of names and possibly other data
A set of inputs, a set of deriving paths
Everything needed for an execve system call:
1. Path to executable
2. A list of arguments (except for argv[0], which is taken from the path in the usual way)
3. A set of environment variables.
A two-component "system" name (e.g. x86_64-linux) where the executable is to run.

The information needed for the execve system call will presumably include many store paths:

The path to the executable is almost surely starts with a store path
The arguments and environment variables likely contain many other store paths.

But just as we stored the references contained in the file data separately for store objects, so we store the set of inputs separately.

Finally, the system name is included take advantage of the fact that Nix allows heterogenous build plans, where not all steps can be run on the same machine or same sort of machine.

The process's job is to produce the outputs, but have no other important side effects. The rules around this will be discussed in following sections.

Referencing derivations

Derivations are always referred to by the store path of the store object they are encoded to. See the encoding encoding section for more details on how this encoding works, and thus what exactly what store path we would end up with for a given derivations.

The store path of the store object which encodes a derivation is often called a "derivation path" for brevity.

Outputs

The outputs are the derivations are the store objects it is obligated to produce.

Outputs are assigned names, and also consistent of other information based on the type of derivation.

Output names can be any string which is also a valid store path name. The store path of the output store object (also called an [output path] for short), has a name based on the derivation name and the output name. Most outputs are named drvMame + '-' + outputName. However, an output named "out" is just has name drvName. This is to allow derivations with a single output to avoid a superfluous -<outputName> in their single output's name when no disambiguation is needed.

Example

A derivation is named hello, and has two outputs, out, and dev

The derivation's path will be: /nix/store/<hash>-hello.drv.

The store path of out will be: /nix/store/<hash>-hello.

The store path of dev will be: /nix/store/<hash>-hello-dev.

System

The system type on which the builder executable is meant to be run.

A necessary condition for Nix to build derivations locally is that the system attribute matches the current system configuration option. It can automatically build on other platforms by forwarding build requests to other machines.

Builder

Path to an executable that will perform the build.

Args

Command-line arguments to be passed to the builder executable.

Environment Variables

Environment variables which will be passed to the builder executable.

Placeholder

TODO

Two types:

Reference to own outputs
output derived paths (see below), corresponding to store paths we haven't yet realized.

N.B. Current method of creating hashes which we substitute for string fields should be seen as an artifact of the current "ATerm" serialization format. In order to be more explicit, and avoid gotchas analogous to SQL injection, we ought to consider switching two a different format where we explicitly use a syntax for a oncatentation of plain strings and placeholders written more explicitly.

Inputs

The inputs are a set of [deriving paths]. Each of these must be [realised] prior to building the derivation in question. At that point, the derivation can be normalized replacing each input derived path with its store path --- which we now now since we've realised it.

Deriving path

Deriving paths are a way to refer to store objects that might not yet be realised. This is necessary because, in general and particularly for content-addressed derivations, the store path of an [output] is not known in advance. There are two forms:

constant: just a store path It can be made [valid][validity] by copying it into the store: from the evaluator, command line interface or another st ore.
output: a pair of a store path to a [derivation] and an [output] name.

In pseudo code:

type OutputName = String

data DerivingPath
  = ConstantPath { path : StorePath }
  | Output {
      drvPath : StorePath,
      output  : OutputName,
    }

Encoding

Derivation

There are two formats, documented separately:

The legacy "ATerm" format
The modern JSON format

Currently derivations are always serialized to store objects using the "ATerm" format, but this is subject to change.

Regardless of the format used, when serializing to store object, content-addressing is always used. In the common case the inputs to store objects are either:

constant deriving paths for content-addressed source objects, which are "initial inputs" rather than the outputs of some other derivation (except in the case of bootstrap binaries).
the outputs of other derivations abiding by this same invariant.

This common case makes for the following useful property: when we serialize such a derivation graph to store objects, the resulting closures are entirely content-addressed.

Here is a sketch at the proof of this:

The inputs which are constant deriving paths become references of the serialized derivations, but they are content-addressed per the above.
For inputs which are output deriving paths, we cannot directly reference the input because in general it is not built yet. We instead "peal back" the output deriving path to take its underlying serialized derivation (the drvPath field), and reference that. Since it is a derivation, it must be content-addressed
There are no other ways a store object would end up in an input closure. The references of a derivation in store object form always come from solely from the inputs of the derivation.

Deriving Path

constant

Constant deriving paths are encoded simply as the underlying store path is. Thus, we see that every encoded store path is also a valid encoded (constant) deriving path.

output

Output deriving paths are encoded by

encoding of a store path referring to a derivation
a separator (^ or ! depending on context)
the name of an output

Example

/nix/store/lxrn8v5aamkikg6agxwdqd1jz7746wz4-firefox-98.0.2.drv^out

This parses like so:

/nix/store/lxrn8v5aamkikg6agxwdqd1jz7746wz4-firefox-98.0.2.drv^out
|------------------------------------------------------------| |-|
store path (usual encoding)                                    output name
                                                          |--|
                                                          note the ".drv"

Extending the model to be higher-order

Experimental feature: dynamic-derivations

We can apply the same extension discussed for the abstract model to the concrete model. Again, only the data type for Deriving Paths needs to be modified. Derivations are the same except for using the new extended deriving path data type.

type OutputName = String

data DerivingPath
  = ConstantPath { storeObj : StorePath }
  | Output {
      drv    : DerivingPath, -- Note: changed
      output : OutputName,
    }

Now, the drv field of Output is itself a DerivingPath instead of an StorePath.

Under this extended model, DerivingPaths are thus inductively built up from an ConstantPath, contains in 0 or more outer Outputs.

Encoding

The encoding is adjusted in a very simplest way, merely displaying the same

/nix/store/lxrn8v5aamkikg6agxwdqd1jz7746wz4-firefox-98.0.2.drv^foo.drv^bar.drv^out
|----------------------------------------------------------------------------| |-|
inner deriving path (usual encoding)                                           output name
|--------------------------------------------------------------------| |-----|
even more inner deriving path (usual encoding)                         output name
|------------------------------------------------------------| |-----|
innermost constant store path (usual encoding)                 output name

Extra extensions

`__structuredAttrs`

Historically speaking, most users of Nix made GNU Bash with a script the command run, regardless of what they were doing. Bash variable are automatically created from env vars, but bash also supports array and string-keyed map variables in addition to string variables. People also usually create derivations using language which also support these richer data types. It was thus desired a way to get this data from the language "planning" the derivation to language to bash, the language evaluated at "run time".

__structuredAttrs does this by smuggling inside the core derivation format a map of named richer data. At run time, this becomes two things:

A JSON file containing that map.
A bash script setting those variables.

The bash command can be passed a script which will "source" that Nix-created bash script, setting those variables with the richer data. The outer script can then do whatever it likes with those richer variables as input.

However, since derivations can already contain arbitary input sources, the vast majority of __structuredAttrs can be handled by upper layers. We might consider implementing __structuredAttrs in higher layers in the future, and simplifying the store layer.

11 KiB Raw Blame History

Derivation and Deriving Path

Derivation

Referencing derivations

Outputs

System

Builder

Args

Environment Variables

Placeholder

Inputs

Deriving path

Encoding

Derivation

Deriving Path

Extending the model to be higher-order

Encoding

Extra extensions

__structuredAttrs

11 KiB

Raw Blame History

`__structuredAttrs`