we combine closures built by different users, the resulting set may
contain multiple paths from the same output path equivalence class.
For instance, if we do
$ NIX_USER_ID=foo nix-env -i libXext
$ NIX_USER_ID=root nix-env -i libXt
$ NIX_USER_ID=foo nix-env -i libXmu
(where libXmu depends on libXext and libXt, who both depend on
libX11), then the following will happen:
* User foo builds libX11 and libXext because they don't exist
yet.
* User root builds libX11 and libXt because the latter doesn't
exist yet, while the former *does* exist but cannot be trusted.
The instance of libX11 built by root will almost certainly
differ from the one built by foo, so they are stored in separate
locations.
* User foo builds libXmu, which requires libXext and libXt. Foo
has trusted copies of both (libXext was built by himself, while
libXt was built by root, who is trusted by foo). So libXmu is
built with foo's libXext and root's libXt as inputs.
* The resulting libXmu will link against two copies of libX11,
namely the one used by foo's libXext and the one used by root's
libXt. This is bad semantically (it's observable behaviour, and
might well lead to build time or runtime failure (e.g.,
duplicate definitions of symbols)) and in terms of efficiency
(the closure of libXmu contains two copies of libX11, so both
must be deployed).
The problem is to apply hash rewriting to "consolidate" the set of
input paths to a build. The invariant we wish to maintain is that
any closure may contain at most one path from each equivalence
class.
So in the case of a collision, we select one path from each class,
and *rewrite* all paths in that set to point only to paths in that
set. For instance, in the example above, we can rewrite foo's
libXext to link against root's libX11. That is, the hash part of
foo's libX11 is replaced by the hash part of root's libX11.
The hard part is to figure out which path to select from each
class. Some selections may be cheaper than others (i.e., require
fewer rewrites). The current implementation is rather dumb: it
tries all possible selections, and picks the cheapest. This is an
exponential time algorithm.
There certainly are more efficient common-case (heuristical)
approaches. But I don't know yet if there is a worst-case
polynomial time algorithm.
* Only build a derivation if there are no trusted output paths in the
equivalence classes for that derivation's outputs.
* Set the trust ID to the current user name, or use the value of the
NIX_USER_ID environment variable.
paths (e.g., `/nix/store/...random-hash...-aterm'), which are
subsequently rewritten to actual content-addressable store paths
(i.e., the hash part of the store path equals the hash of the
contents).
A complication is that the temporary output paths have to be passed
to the builder (e.g., in $out). Likewise, other environment
variables and command-line arguments cannot contain fixed store
paths because their names are no longer known in advance.
Therefore, we now put placeholder store paths in environment
variables and command-line arguments, which we *rewrite* to the
actual paths prior to running the builder.
TODO: maintain the mapping of derivation placeholder outputs
("output path equivalence classes") to actual output paths in the
database. Right now the first build succeeds and all its
dependencies fail because they cannot find the output of the first.
TODO: locking is no longer an issue with random temporary paths, but
at the cost of having no blocking if we build the same thing twice
in parallel. Maybe the "random" path should actually be a hash of
the placeholder and the name of the user who started the build.
user environment, e.g.,
$ nix-env -i /nix/store/z58v41v21xd3ywrqk1vmvdwlagjx7f10-aterm-2.3.1.drv
or
$ nix-env -i /nix/store/hsyj5pbn0d9iz7q0aj0fga7cpaadvp1l-aterm-2.3.1
This is useful because it allows Nix expressions to be bypassed
entirely. For instance, if only a nix-pull manifest is provided,
plus the top-level path of some component, it can be installed
without having to supply the Nix expression (e.g., for obfuscation,
or to be independent of Nix expression language changes or context
dependencies).
This simplifies garbage collection and `nix-store --query
--requisites' since we no longer need to treat derivations
specially.
* Better maintaining of the invariants, e.g., setReferences() can only
be called on a valid/substitutable path.
closure of the referers relation rather than the references
relation, i.e., the set of all paths that directly or indirectly
refer to the given path. Note that contrary to the references
closure this set is not fixed; it can change as paths are added to
or removed from the store.
promise :-) This allows derivations to specify on *what* output
paths of input derivations they are dependent. This helps to
prevent unnecessary downloads. For instance, a build might be
dependent on the `devel' and `lib' outputs of some library
component, but not the `docs' output.
`derivations.cc', etc.
* Store the SHA-256 content hash of store paths in the database after
they have been built/added. This is so that we can check whether
the store has been messed with (a la `rpm --verify').
* When registering path validity, verify that the closure property
holds.
representation of closures as ATerms in the Nix store. Instead, the
file system pointer graph is now stored in the Nix database. This
has many advantages:
- It greatly simplifies the implementation (we can drop the notion
of `successors', and so on).
- It makes registering roots for the garbage collector much easier.
Instead of specifying the closure expression as a root, you can
simply specify the store path that must be retained as a root.
This could not be done previously, since there was no way to find
the closure store expression containing a given store path.
- Better traceability: it is now possible to query what paths are
referenced by a path, and what paths refer to a path.
* Formalise the notion of fixed-output derivations, i.e., derivations
for which a cryptographic hash of the output is known in advance.
Changes to such derivations should not propagate upwards through the
dependency graph. Previously this was done by specifying the hash
component of the output path through the `id' attribute, but this is
insecure since you can lie about it (i.e., you can specify any hash
and then produce a completely different output). Now the
responsibility for checking the output is moved from the builder to
Nix itself.
A fixed-output derivation can be created by specifying the
`outputHash' and `outputHashAlgo' attributes, the latter taking
values `md5', `sha1', and `sha256', and the former specifying the
actual hash in hexadecimal or in base-32 (auto-detected by looking
at the length of the attribute value). MD5 is included for
compatibility but should be considered deprecated.
* Removed the `drvPath' pseudo-attribute in derivation results. It's
no longer necessary.
* Cleaned up the support for multiple output paths in derivation store
expressions. Each output now has a unique identifier (e.g., `out',
`devel', `docs'). Previously there was no way to tell output paths
apart at the store expression level.
* `nix-hash' now has a flag `--base32' to specify that the hash should
be printed in base-32 notation.
* `fetchurl' accepts parameters `sha256' and `sha1' in addition to
`md5'.
* `nix-prefetch-url' now prints out a SHA-1 hash in base-32. (TODO: a
flag to specify the hash.)