rfcs/rfcs/0137-nix-language-version.md
Valentin Gagarin 6ee9ecd950 fix typo
2023-06-01 22:31:49 +02:00

29 KiB
Raw Blame History


RFC 137 Nix language versioning

Summary

Introduce a convention to determine which version of the Nix language grammar to use for parsing and evaluating Nix expressions.

Add parameters to the Nix language evaluator, controlling the behavior of deprecation warnings and errors.

Motivation

The stability of Nix language has been praised on multiple occasions, e.g. Nix and legacy enterprise software development: an unlikely match made in heaven. Yet, as with any software system, in order to accommodate new insights, we want to allow the Nix language to evolve. This sometimes involves backward-incompatible ("breaking") changes that currently cannot be made without significant downstream disruption.

Therefore we propose a mechanims and policies to introduce changes to the Nix language in a controlled and deliberate manner. It aims to avoid breaking existing setups, and to minimise maintenance burden for implementors and users. The goal is for new versions of the Nix language evaluator to stay backward compatible with existing Nix expressions, and for new Nix expressions to be deliberately incompatible with existing evaluators.

Regardless, changes to the language, especially breaking changes, should remain a rare exception.

Motivating examples

Incompatible changes from the past:

  • A changelog of Nix (language) versions, as reflected in builtins.langVersion
  • There have been other, sometimes breaking changes to the language that have not resulted in an increment of the language version (e.g. the recent fetchGit changes).
  • The builtins.toJSON 1.000001 output changed in Nix 2.12.

Possbile future changes that are in discussion:

  • Remove URL literals (currently implemented via experimental-features)
  • Remove the old let { body = ... } syntax
  • Disallow leading zeroes in integer literals (such as umask = 0022)
  • Disallow a.x or y if a is not an attribute set
  • Simplifying semantics of builtins.toString and string interpolation
  • Remove the let { body } syntax
  • __functor and __toString, probably
  • Remove __overrides
  • Make builtins more consistent, e.g. not exposing map, removeAttrs, fetchGit and and others in the global scope
  • Fix imprecision in string representation of floating point numbers
  • Make the @-pattern consistent with destructuring
  • A syntax to index into lists, e.g. [ 1 2 3 ].0 == 1
  • Use , to delimit elements of a list expression, like [ 1, 2, 3 ]
  • Do something about ? meaning two different things depending on where it occurs ({ x ? "" }: vs x ? "")
  • Better support for static analysis
  • Syntax for hexadecimal numbers

Other discussions around language changes:

Drawbacks

Allowing multiple language versions to coexist will over time introduce a proliferation of syntax highlighters and other tooling. Once the language version is accessible though, tooling can at least be adapted in a systematic way.

Alternatives

  • Keep the language as implemented by Nix compatible, but socially restrict the usage of undesirable features.

    • (+) Roughly matches the current practice, no technical change needed
    • (+) Maintains usability of old Nixpkgs versions (up to availability of fixed-output artifacts)
    • (+) Does not break third-party codebases before making a decision, keeping Nix a dependable upstream
      • (-) This proposal does not allow for breakages unless there is some eventual phase-out of support
    • (-) Strict enforcement requires extra tooling that this proposal would obviate
    • (-) The implementation of the features that are no longer desirable still incur complexity and maintenance cost
      • (-) It's still not really possible to make changes to the language
  • Introduce changes to the language with language extensions or feature flags

    • (-) Combinatorial explosion
    • (-) Even more maintenance overhead
    • (+) Allows gradual adoption of features
      • (-) We already have experimental feature flags as an orthogonal mechanism, with the added benefit that they don't incur support costs and can be dropped without loss
  • Never make breaking changes to the language

    • (+) No additional maintenance effort required
    • (-) Blocks improvements
    • (-) Requires additions to be made very carefully
      • (-) Even incremental changes are really expensive that way
    • (-) Makes solving some well-known problems impossible
  • Continue current practice

    • (-) There is no process for breaking changes
    • (-) Breaking changes are not always announced
    • (-) There are no means of determining compatibility between expressions and evaluator versions

Detailed Design

Language versioning

  1. The language version is a natural number.

    Arguments
    • (+) Formally decouples the Nix language version from the Nix version
      • (+) The Nix language is supposed to change much less often than the rest of Nix
      • (-) There are two version numbers to keep track of
      • (-) Makes more evident that the Nix language is a distinct architectural component of the Nix ecosystem
    • (+) It's currently handled that way, no change needed apart from documentation
    • (+) Simple and unambiguous
    • (+) Concise, even in the long term, since the language is supposed to change very rarely
    Alternatives
    • Calendar Versioning
      • (+) Provides information on when changes happened
        • (-) This is not needed because only compatibility information is needed
      • (-) Requires a minimum amount of characters
        • This may be relevant depending on where it has to be encoded
      • (+) Restricting to only the year would force language changes to be rare
        • (+) This would allow obvious synchronisation points with Nixpkgs releases
        • (-) It may be too much policy encoded in a mechanism
    • Semantic Versioning
      • (+) Can distinguish additions from other changes
        • (-) This is not needed for our use case, since any addition to an expression will break for older evaluators even if the major version matches
      • (-) Requires more characters to account for the added expressiveness
        • This may be relevant depending on where it has to be encoded
    • Use version numbers of Nix stable releases for specifying the version of the Nix language
      • (+) More obvious to see for users what the current Nix version is rather than builtins.langVersion
      • (-) Would tie alternative Nix language evaluators to the rest of Nix
        • (-) One can add a command line option such that it is not more effort than nix --version
          • (+) That requires adding another built-in to the public API
        • (-) Using a language feature requires an additional steps from users to determine the current version
          • (-) Requires adding another command line option to the public API
      • (+) The Nix language version is decoupled Nix version numbering
        • (+) It changes less often than the Nix version
          • (-) That was probably due to making changes being so hard
            • (+) The language changing slowly is a desirable property for wider adoption
        • (-) There are two version numbers to keep track of
  2. The language version for Nix expressions is denoted in special syntax, at the beginning of a parse unit.

    Arguments
    • (+) Will prevent older evaluators from evaluating expressions written in a newer language version following this proposal (no forward compatibility)
    • (+) Precedent: Perl use VERSION
    • (-) The errors on older evaluators will be opaque
      • (+) Syntax can be made self-describing and human-readable to alleviate that to some extent
    • (-) The syntax has to be fixed forever if one wanted to provide meaningful errors on language upgrades
      • This has the same trade-offs as when introducing the new syntax to begin with
    • (-) Editor support is made harder, since it requires switching the language based on file contents
      • (+) Making the language version accessible at all will probably outweigh the costs
    Alternatives
    • Use a magic comment at the beginning of the file

      • (+) Allows for gradual adoption: opt-in until semantics is implemented in Nix and the first backwards-incompatible change to the language is introduced
        • (-) This will produce surprising results if the next language version preserves syntax but changes semantics (forward compatibility)
        • (-) Requires the first language version following this proposal to be syntactically incompatible with the current language to avoid forward compatibility
      • (+) Can be made self-describing and human-readable
      • (+) Follows a well-known convention of using magic numbers in files
      • (-) May make the appearance that changing the language is harmless
        • (+) The convention itself is harmless and independent of the development culture around the language
        • (-) There is a chance of abusing the magic comment for more metadata in the future
      • (-) At least one form of comment is forever bound to begin with # to maintain compatibility
        • Forward compatibility is undesirable anyway
      • (-) Requires support by all tooling, lose semantics otherwise
    • Use assert builtins.languageVersion in the first line of the file

      • (+) Produces more telling error messages in existing evaluators
      • (+) Future evaluators could be augmented to treat this as specially for better errors
        • (-) Special treatment may confuse users: why does assert at the beginning of a file work differently than somewhere else?
      • (-) Bulky expression that can only be replaced by the magic string solution or kept forever
    • Denote the language version in the file extension

      • (+) Sidesteps misinterpretation by keeping metadata out of the actual data
      • (-) In general it does not prevent forward compatibility with current evaluators.
      • (+) Makes accidental mixing of versions impossible at the syntax level
        • Have to specify the file extension when importing a file
        • (-) Have to rename all files in a project to change the version
          • This is somewhat worse than replacing a magic string which is fixed to the beginning of the file
      • (-) Makes filenames longer, introduces visual noise
        • This is the cost of being explicit
      • (-) Enforces narrow restrictions on what information can be encoded and how
        • The only reasonable alternative is -, e.g. default.7-nix
          • .
            • (-) Nixpkgs has been packaging Linux kernels as linux-${major}.${minor}.nix
              • This may break backwards compatibility of newer evaluators with existing code in surprising ways
          • No separator – (-) Hard to discern visually
          • -
            • (+) Visually not intrusive
          • _ (-) Visually more intrusive
          • ^ (-) Overlaps with derivation output syntax
          • All of the following characters will interfere with some tooling:
            • ! - shells
            • " - shells
            • # - URLs
            • $ - shells
            • % - URLs
            • & - shells, URLs
            • + - URLs
            • , - natural language
            • / - paths, URLs
            • : - URLs
            • ; - shells
            • = - URLs, Nix language
            • ? - URLs
            • @ - URLs, Nix language
            • \ - Windows paths
      • (-) default.nix resolving needs specification:
        • If for import ./foo, all of ./foo/default.nix{6,7,8} exist, pick the one matching the version used by the evaluator, otherwise fail
          • Then you'd have to specify a file using a different version explicitly
    • Language versioning per "project" in a sidecar file

      • (+) This would easily allow inheriting the language version across imports (obviating many specifications in this proposal)
      • (-) There is currently no notion of "project" in the Nix language
        • (-) Attempting to establish one would be a large undertaking and not immediately help solving the problem at hand
      • (-) Cannot be introduced gradually, particularly relevant for a large codebase like Nixpkgs
      • (-) It would require a separate language to encode project metadata such as the language version
        • The edition field was removed from the flakes schema for that reason, as it not not allow distinguishing data from metadata
        • (+) Other languages do the same (Python, Haskell, Rust, JavaScript, ...)
          • (-) Recursive (albeit smaller) problem of managing the additional language for project metadata
  3. The following syntax is used for declaring the language version of a Nix expression:

    version \d*;
    

    This implies that if no language version is specified in a Nix file, it is written in version 6 (the version implemented in the stable release of Nix at the time of writing this RFC).

    The syntax is open for bikeshedding. Alternatives should be very short and self-describing.

    Arguments
    • (+) Short and fairly self-descriptive
    • (+) Not an invasive change
    Alternatives
    • use v\d*;
      • (+) Shorter
      • (-) Does not explain much
    • Nix language version \d*
      • (+) Says it all
      • (-) Very long
  4. The language version declaration is optional for bare Nix expressions, and can be specified with a parameter to the evaluator. If no version is specified in a bare Nix expression, assume the most recent language version supported by the evaluator.

    Alternatives
    • Make it mandatory

      • (-) This will be very inconvenient to use in the REPL
    • Assume the evaluator's current version

      • (-) When the evaluator advances in language version, evaluation may fail on existing code
      • (-) Defeats the purpose of explicit versioning: Which evaluator to use for a given file is left unspecified
        • (-) Following the latest evaluator version may inadvertently break the code for older evaluators
      • (+) Don't have to look up the latest version of the Nix language when writing code
      • (+) Does not clutter the file names for what is supposed to be the latest version of the code
  5. Language versions prior to 6 are not supported.

    Arguments
    • (+) Does not require additional development effort
    • (+) Prior langauge versions are not fully supported by current code already, and the rest of this proposal argues to deprecate old versions in the future in order to keep the implementation manageable
    • (-) Legacy code will not get support for managing compatibility
  6. The Nix language evaluator provides a command to output the most recent Nix language version. This command provides options to list all Nix language versions supported by the evaluator.

    Arguments
    • (+) This is for convenience to determine which features are available
  7. builtins.langVersion returns the language version used for evaluating the given expression.

    Arguments
    • (+) Could make use of it for generating Nix expressions programmatically and annotating them with the correct version
    • (+) builtins.langVersion is already part of the stable interface, this way we can make it pure
    • (-) Requires maintaining more API surface without a clear use case
    Alternatives
    • Return the latest language version instead

      • (-) Doesn't help to determine what is used for evaluating the current expression
        • (+) Might provide opportunities for forward-compatible Nix code
        • (-) Brittle and defeats the purpose of this RFC
      • (-) Impure, the value depends on the environment
    • Don't expose builtins.langVersion at all

      • (+) No need to have this if the Nix files are versioned
      • (+) builtins.langVersion is not documented and not widely used
      • (-) Requires a Nix language version bump to implement this RFC
  8. Each time the language specification (currently as embodied by the Nix language evaluator) is changed, the language version must be incremented.

    Arguments
    • (+) The principled solution: guarantees reproducibility given a fixed language version
    • (-) Implies additional overhead in development effort:
      • Either for Nix maintainers to accommodate that practice in the release lifecycle
        • For example, one would have to batch language changes for a version bump to limit the number of increments
          • (+) This would be beneficial for alternative implementations in terms of churn and effort to keep up
      • Or implementors of alternative evaluators catching up with changes
        • (+) Specifying the language precisely via the version actually offers alternative implementations an alternative to catching up: only support a given language version
    • (+) This essentially nudges one to organise Nix language (specification or evaluator implementation ) development to be more independent of the rest of Nix This is good, since it in turn forces stronger separation of concerns and more architectural clarity
    • (-) Prohibits best-effort attempts at evaluating expressions with possibly incompatible evaluators
      • (+) With the proposed level of strictness, one doesn't have to rely on best effort but can instead be explicit
    • (-) Fixing evaluator bugs (i.e., clearly unintended behavior) after releases would technically require a version bump and therefore (theoretically) cooperation by expression authors
      • This could be communicated with an increment in the Nix patch-level version, as is already practice
    Alternatives
    • Bump version only when evaluation result on prior version evaluators would be substantially different
      • (+) Leaves room for judgement by developers
        • (+) Allows controlling progression of versions to some degree
      • (-) Conversely, leaves room for sneaking in breaking changes unannounced
        • (-) This loses compatibility guarantees we'd get from a stricter paradigm
        • (-) Deprives expression authors of ability to be selective with evaluator versions
      • (-) Due to hashing this is often not much different from taking any change into account
    • Bump version when prior evaluators would fail
      • (-) Derivation hashes may change between evaluator versions if different evaluation results were allowed within one version
        • This amounts to giving up on specifying the language exhaustively using a version label
      • (+) Fewer version increments or precautions required given current development practice
  9. Whenever Nix drops support for evaluating prior language versions, a major version bump is required.

    Example: Assuming the current language version is 8, the Nix release version is 2.20, and support for language version 6 is dropped, the next Nix release must be version 3.0

    Arguments
    • (+) This enforces that existing code that works will not break inadvertently when upgrading Nix
  10. Semantics are preserved across file boundaries for past language versions.

    Example: Best-effort interoperability

    Arguments
    • (+) This allow for incremental evolution of code bases without having to change existing code at all.
    • (+) Expressions that are valid today will still be valid as long as a given evaluator supports the language version they are written in
    • (+) The root of evaluation is always at a language version determined by the user, and authors of expressions in newer language versions are responsible (and made aware of the fact by the file name signaling proposed here) to interoperate with existing code
    • (-) Passing around incompatible values (e.g. builtins or data types) between language versions can lead to surprising errors
      • (+) We can postpone dealing with particular issues as they arise, but the general setup should support most cases on a best(-and-minimal)-effort basis
      • (+) In any case, such breakages will only happen when adding new code, and will never break existing code
    Alternatives
    • Do not support version interoperability at all
      • (+) Avoids any unforseen issues at no cost
      • (-) Prohibits incremental changes, as any language update will require updating all files involved
    • Do not support passing values to functions from older language versions
      • (-) Calling functions is the most common use case, and you can hardly do anything without it
  11. It is not possible to import expressions written in newer versions.

    Example: Expressions are not forward compatible

  12. The Nix evaluator provides options to issue deprecation warnings and errors against a language version newer than the one under evaluation. A detailed design is provided in the next section.

    Arguments
    • (+) This allows systematic migration of existing code written for prior evaluators to the most recent language version
    • (+) It will notify users about what's going on instead of just breaking

Deprecation warnings and errors

  1. Each language construct to deprecate relative to a prior version is given a symbolic name. There is a way to refer to all language constructs.

    Examples: url-literal, let-body, int-leading-zeros

    These symbolic names must not be reused in future versions. Names for experimental language constructs of prior versions can be reused.

    Arguments
    • (+) Disallowing reuse precludes dealing with change of meaning across versions
    • (+) Not considering experimental features simplifies their handling and is not required by them being exempt from compatibility guarantees.
  2. The following options for issuing warning and errors are supported:

  • Default:
    • Issue warnings on deprecated language constructs without considering imported expressions written in prior versions
  • Don't warn (selection):
    • Do not issue warnings for selected language constructs
  • Errors instead of warnings (selection):
    • Throw an error instead of a warning for selected language constructs
    • The error setting overrides the warning setting
  • Recursive (version bound):
    • Issue warnings or errors on imported expressions written in prior versions (higher or equal than the version bound)
  • Verbose mode. During evaluation:
    • In non-verbose mode, issue a message once for each deprecated construct
    • In verbose mode, issue a message for each occurrence

For example, this can be exposed as the following flags:

--lang-no-warn=all
--lang-no-warn="url-literal let-body int-leading-zeroes non-attr-select"
--lang-error=all
--lang-error="url-literal let-body int-leading-zeroes non-attr-select"
--lang-warn-recursive=6
--lang-warn-verbose

The naming may need some bikeshedding. For example, one could use the same syntax as with C-compilers (probably not though):

  • -Wall
  • -Wno-
  • -Werror=
Arguments
  • (+) Disallowing reuse avoids dealing with change of meaning across versions
  • (+) Not considering experimental features simplifies their handling and is not required by them being exempt from compatibility guarantees.
  • (-) Maintenance burden for everyone using these old constructs, or evaluating old revisions of Nixpkgs.
Alternatives
  • Opt-in warnings
    • (-) Can't really make breaking changes as people won't be warned ahead of time
  • No warnings, just errors
    • (-) Doesn't offer a transition window to users
    • (+) Easier to implement
  1. A the end of the evaluation, print statistics and explanations. The specifics of displaying warnings and errors is up to implementation, but should include the symbolic name of the langauge construct in question.

Examples and Interactions

Show the current Nix language version

nix --language-version
7
nix --supported-language-versions
6
7

Version interoperability

Versioned built-ins can be passed across file boundaries

# a.nix
use 6;
builtins.langVersion
# b.nix
use 7;
[ (import ./a.6.nix) builtins.langVersion ]
$ nix-instantiate --eval b.nix
[ 6 7 ]

Expressions are not forward compatible

# a.nix
use 6;
import ./b.nix
# b.nix
use 7;
builtins.langVersion
$ nix-instantiate --eval a.nix
error: unsupported Nix language version 7

Best-effort interoperability

# a.nix
use 6;
{ increment }:
increment 1
# b.nix
use 7;
import ./a.nix { increment = x: x + 1 }

Since increment written in version 7 carries its own implementation with it, forcing it within an expression written in version 6 just works:

$ nix-instantiate --eval b.nix
2

Pathological example

Usually existing code will be interacted with by calling functions. When passing values from newer versions to functions from older versions of the language, interoperatbility can only be supported on a best-effort basis.

# a.nix
use 6;
{ value }:
value + 1

Here we pretend that langauge version 7 introduced a new value type and syntax for complex numbers:

# b.nix
use 7;
import ./a.nix { value = %5 + 7i%; }
$ nix-instantiate --eval bnix
error: unsupported value type `complex` at built-in operator `+`

Deprecation warnings

nix eval --json ./test.nix
warning: URL literals are deprecated (url-literal)
         please replace this with a string: "https://nixos.org"

       at test.nix:1:1:

            1| https://nixos.org
             | ^

"https://nixos.org"

warning: The following deprecated features were used:
  - url-literal (httsp://nixos.org), 1 time

  Add `--lang-warn-verbose` to show all occurrences
  Use `--lang-no-warn=url-literal` to disable this warning.
  Use `--lang-error=url-literal` to issue errors instead of warnings.

Prior art

  • Perl use VERSION

    Many similarities, with versions declared per file and having to deal with interoperability.

  • Rust edition field

    Rust has an easier problem to solve. Cargo files are written in TOML, so the edition information does not have to be part of Rust itself.

  • Haskell language extensions

    Haskell allows enabling separate language features per file.

  • JaveScript modules

    • .cjs and .mjs extensions for commonjs/es-modules syntax variants
    • function() { "use strict"; return 10 }
  • Flakes edition field

    There had been an attempt to include an edition field into the Flakes schema. It did not solve the problem of having to evaluate the Nix expression using some version of the grammar.

Future work

  • Define rules deciding when a change to the language is appropriate, to avoid version proliferation and limit complexity of implementations.