mirror of
https://github.com/NixOS/rfcs.git
synced 2025-11-09 03:56:11 +01:00
[RFC 0133] Git hashing and Git-hashing-based remote stores (#133)
* ipfs: Copy Template * ipfs: Start drafting * ipfs: Finish draft * ipfs: Expand discussion of managing complexity * ipfs: Fix typos Thanks! * ipfs: Fix more typos Thanks! * ipfs: FInish motivation on source distribution and archival * ipfs: Rename now that we have number * Apply suggestions from code review Thanks! Co-authored-by: Kevin Cox <kevincox@kevincox.ca> * Fix typos Thanks! Co-authored-by: Adam Joseph <54836058+amjoseph-nixpkgs@users.noreply.github.com> * 133: Add shepherd team! Co-authored-by: Eelco Dolstra <edolstra@gmail.com> * 133: Fix shepherds list mjoseph -> amjoseph * 133: Move non-`git` steps to future work * 133: Move one more section out of future work * 133: Move IPFS-specific motivation to future work too * 133: Rename feature in light of changes * 133: Rename RFC in light of changes * 133: Discuss the downside of git's file system model being different * Split future work, clean up Nix-agnostic stores section * Fix numerious typos Thanks, all of you! Co-authored-by: Kevin Cox <kevincox@kevincox.ca> Co-authored-by: Adam Joseph <54836058+amjoseph-nixpkgs@users.noreply.github.com> Co-authored-by: Linus Heckemann <git@sphalerite.org> * Add RFC open PR date * Be clearer about not supporting references to start * Update rfcs/0133-git-hashing.md Co-authored-by: Kevin Cox <kevincox@kevincox.ca> * Rip out both RFC-scal Future Work sections They are now in an `ipfs-2` branch in this repo. * Remove "Build adoption through seamless interop" That can go in a separate blog post. * Apply suggestions from code review Thank you both!! Co-authored-by: Valentin Gagarin <valentin.gagarin@tweag.io> Co-authored-by: Ryan Lahfa <masterancpp@gmail.com> * Slim down the layering section The other stuff is already in flight, we don't need to talk about it so much here. Co-authored-by: Valentin Gagarin <valentin.gagarin@tweag.io> --------- Co-authored-by: Kevin Cox <kevincox@kevincox.ca> Co-authored-by: Adam Joseph <54836058+amjoseph-nixpkgs@users.noreply.github.com> Co-authored-by: Eelco Dolstra <edolstra@gmail.com> Co-authored-by: Linus Heckemann <git@sphalerite.org> Co-authored-by: Valentin Gagarin <valentin.gagarin@tweag.io> Co-authored-by: Ryan Lahfa <masterancpp@gmail.com>
This commit is contained in:
parent
8c86187e6d
commit
f4706550b6
1 changed files with 164 additions and 0 deletions
164
rfcs/0133-git-hashing.md
Normal file
164
rfcs/0133-git-hashing.md
Normal file
|
|
@ -0,0 +1,164 @@
|
|||
---
|
||||
feature: git-hashing
|
||||
start-date: 2022-08-27
|
||||
author: John Ericsion (@Ericson2314) on behalf of [Obsidian Systems](https://obsidian.systems)
|
||||
co-authors: (find a buddy later to help out with the RFC)
|
||||
shepherd-team: edolstra, kevincox, gador, @amjoseph-nixpkgs
|
||||
shepherd-leader: amjoseph-nixpkgs
|
||||
related-issues: (will contain links to implementation PRs)
|
||||
---
|
||||
|
||||
# Summary
|
||||
[summary]: #summary
|
||||
|
||||
Integrate Git hashing with Nix.
|
||||
|
||||
Nix should support content-addressed store objects using git blob + tree hashing, and Nix-unaware remote stores that serve git objects.
|
||||
|
||||
This follows the work done and described in https://github.com/obsidiansystems/ipfs-nix-guide/ .
|
||||
|
||||
# Motivation
|
||||
[motivation]: #motivation
|
||||
|
||||
## Binary distribution
|
||||
|
||||
Currently distributing Nix binaries takes a lot of bandwidth and storage.
|
||||
This is a barrier to being a Nix user in areas of slower internet --- which includes the vast majority of the world's population at this time.
|
||||
This is also a barrier to users running their own caches.
|
||||
|
||||
Content-addressing opens up a *huge* design space of solutions to get around such problems.
|
||||
|
||||
The first steps proposed below do *not* tackle this problem directly, but it lays the ground-work for future experiments in this direction.
|
||||
|
||||
## Source distribution and archival
|
||||
|
||||
Source code used by Nix expressions frequently goes off-line. It would be beneficial if there was some resistance to this form of bitrot.
|
||||
The Software Heritage archive stores much of the source code that Nix expressions use. They would be a natural partner in this effort.
|
||||
|
||||
Unfortunately, as https://www.tweag.io/blog/2020-06-18-software-heritage/ describes at the end, a major challenge is the way Nix content-addresses software.
|
||||
First of all, Nix hashes sources in bespoke ways that no other project will adopt.
|
||||
Second of all, hashing tarballs instead of the underlying files leads to non-normative details (compression, odd perms, etc.).
|
||||
|
||||
We should natively support Git file hashing, which is supported both by Git repos and Software Heritage.
|
||||
This will completely obliterate these issues.
|
||||
|
||||
Overall, we are building out a uniform way to work with source code, regardless of its origins or the exact tools involved.
|
||||
|
||||
# Detailed design
|
||||
[design]: #detailed-design
|
||||
|
||||
Each item can be done separately provided its dependent items are also done.
|
||||
These are the items we wish to commit to at this time.
|
||||
(The goals mentioned under [future work](#future-work) are, in a separate document, also broken down into a dependency graph of smaller steps.)
|
||||
|
||||
## Git file hashing
|
||||
|
||||
- **Purpose**: Source distribution and archival
|
||||
|
||||
In addition to the various forms of content-addressing Nix supports today ("text", "fixed" with either "flat" or "nar" serialization of file system objects), Nix should support Git hashing.
|
||||
This support entails two basic things:
|
||||
|
||||
- Content addresses are used to compute store paths.
|
||||
- Content addresses are used to verify store object integrity.
|
||||
|
||||
Git hashing would not (in this first proposed version) support references, since references in Nix's sense are not part of Git's data model.
|
||||
This is OK for now; encoding references is not needed for the intended initial use-case of exchanging source code.
|
||||
|
||||
## Git file hashing for `buitins.fetch*`
|
||||
|
||||
- **Purpose**: Source distribution and archival
|
||||
- **Depends on**: Git file hashing
|
||||
|
||||
The built-in fetchers can also be made to work with Git file hashing just as they support the other types.
|
||||
In addition, Git repo fetching can leverage this better to than the other formats since the data in Git repos is already content-addressed in this way.
|
||||
|
||||
## Nix-agnostic content-addressing "stores"
|
||||
|
||||
- **Purpose**: All distribution
|
||||
|
||||
We want to be able to substitute from an arbitrary store (in the general, non-Nix sense) of content-addressed objects.
|
||||
For the purpose of this RFC, that means querying objects by Git hash, and being able to trust the results because we can verify them against the Git hash.
|
||||
|
||||
In the implementation, we could accomplish this in a variety of ways.
|
||||
|
||||
- On one extreme, we could have a `ContentAddressedSubstitutor` abstract interface completely separate from Nix's `Store` interface.
|
||||
|
||||
- On the other extreme, we can generalize `Store` itself to allow taking content addresses or store paths as references.
|
||||
|
||||
Exactly how this shakes out is to be determined post-RFC, but it would be nice to use Nix-agnostic persistent methods with `--store` and `--substituters`.
|
||||
|
||||
If we do go the route of modifying the `Store` class, note that these things will need to happen:
|
||||
|
||||
- Many store interface methods that today take store paths will need to also accept names & content address pairs.
|
||||
|
||||
For stores that are purpose-built for Nix, like the ones we support today, all addressing can be done with store paths, so the current interface is fine.
|
||||
But for Nix-agnostic stores, store paths are rather useless as a key type because Nix-agnostic tools don't know about them.
|
||||
Those store can, however, understand content addresses.
|
||||
And from such a name + content address, we can always produce a store path again, so there is no loss of functionality with existing stores.
|
||||
|
||||
- Relax `ValidPathInfo` to merely require that *either* the pair of `NarHash` and `NarSize` or just `CA` alone be defined.
|
||||
|
||||
As described in the first step, currently `NarHash` and `NarSize` are the *normative* fields which are used to verify a store object.
|
||||
But if the store object is content-addressed, we don't need these, because the content address (`CA` field) will also suffice, all by itself.
|
||||
|
||||
Existing Nix stores types are still required to contain a `NarHash` and `NarSize`, which is good for backwards compatibility and don't come with a cost.
|
||||
Only new Nix-agnostic store types would take advantage of these new, relaxed rules.
|
||||
|
||||
# Examples and Interactions
|
||||
[examples-and-interactions]: #examples-and-interactions
|
||||
|
||||
We encourage anyone interested to check our tutorial in https://github.com/obsidiansystems/ipfs-nix-guide/ which demonstrates the above functionality.
|
||||
Note at the time of writing this guide uses our original 2020 fork of Nix.
|
||||
|
||||
# Drawbacks
|
||||
[drawbacks]: #drawbacks
|
||||
|
||||
## Complexity
|
||||
|
||||
The main cost is more complexity to the store layer.
|
||||
For a few reasons we think this is not so bad.
|
||||
|
||||
Most importantly is the division of the work into a dependency graph of steps.
|
||||
This allows us to slowly try out things like IPFS that leverage Git hashing, and not commit to more change than we want to up front.
|
||||
|
||||
Even if we do end up adopting everything though, we think for the following two reasons the complexity can still be kept manageable:
|
||||
|
||||
1. Per the abstract vs concrete model of the Nix store in https://github.com/NixOS/nix/pull/6877, everything we are doing is simply flushing out alternative interpretations of the abstract model.
|
||||
This is the sense in which we are, per the Scheme mantra, "removing the weaknesses and restrictions that make additional features appear necessary":
|
||||
Instead of extending the model with new features, we are relaxing concrete model assumptions (e.g. references are always opaque store paths) while keeping the abstract model the same.
|
||||
|
||||
2. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it.
|
||||
Layering will "divide and conquer" the project so the interfaces between each layer are still rigorously enforced preventing a combinatorial explosion in complexity.
|
||||
That frees up "complexity budget" for projects like this.
|
||||
|
||||
## Git and Nix's file system data models do not entirely coincide
|
||||
|
||||
Nix puts the permission info of a file (executable bit for now) with that file, whereas Git puts it with the name and hash in the directory.
|
||||
The practical effect of this discrepancy is that a root file (as opposed to directory) in Nix has permission info, but does not in Git.
|
||||
|
||||
If we are trying to convert existing Nix data into Git, this is a problem.
|
||||
Assuming we treat "no permission bits" as meaning "non-executable", we will have a partial conversion that will fail on executable files without a parent directory.
|
||||
Tricks like always wrapping everything in a directory get around this, but then we have to be careful the directory is exactly as expected when "unwrapping" in the other direction.
|
||||
|
||||
For now, we only focus on ingesting data *from* Git *to* Nix, and this side-steps the issue.
|
||||
That mapping is total, i.e. all Git data can be mapped, and injective, i.e. each Git data has a unique Nix data representative (though not surjective, i.e. not all Nix data can be represented as a piece of Git data), and so there is no problem for now.
|
||||
|
||||
# Alternatives
|
||||
[alternatives]: #alternatives
|
||||
|
||||
The dependency graph of steps can be sliced to save some for future work.
|
||||
For now they are all written together, but during the RFC meetings we will decide which steps (if any) to ratify now, and which steps to save for later.
|
||||
|
||||
# Unresolved questions
|
||||
[unresolved]: #unresolved-questions
|
||||
|
||||
None at this time.
|
||||
|
||||
# Future work
|
||||
[future]: #future-work
|
||||
|
||||
- Integrate with outside content-addressing storage/transmission like
|
||||
|
||||
- The Software Heritage archive
|
||||
|
||||
- IPFS
|
||||
Loading…
Add table
Add a link
Reference in a new issue