CURL is not very strict about validation of URLs passed to it. We
should reflect this in our handling of URLs that we get from the user
in <nix/fetchurl.nix> or builtins.fetchurl. ValidURL was an attempt to
rectify this, but it turned out to be too strict. The only good way to
resolve this is to pass (in some cases) the user-provided string verbatim
to CURL. Other usages in libfetchers still benefit from using structured
ParsedURL and validation though.
nix store prefetch-file --name foo 'https://cdn.skypack.dev/big.js@^5.2.2'
error: 'https://cdn.skypack.dev/big.js@^5.2.2' is not a valid URL: leftover
(cherry picked from commit 47f427a172)
I (@Ericson2314) messed up. We were supposed to test the status quo
before landing any new chnages, and also there is one change that is not
quite right (relative paths).
I am reverting for now, and then backporting the test suite to the old
situation.
This reverts commit 04ad66af5f.
Git URI can also support scp style links similar to git itself.
This change augments the function fixGitURL to better handle the scp
style urls through a minimal parser rather than regex which has been
found to be brittle.
* Support for IPV6 added
* New test cases added for fixGitURL
* Clearer documentation on purpose and goal of function
* More `std::string_view` for performance
* A few more URL tests
Fixes#5958
The URL should not be normalized before handing it off to cURL, because
builtin fetchers like fetchTarball/fetchurl are expected to work with
arbitrary URLs, that might not be RFC3986 compliant. For those cases
Nix should not normalize URLs, though validation is fine. ParseURL and
cURL are supposed to match the set of acceptable URLs, since they implement
the same RFC.
See the new extensive doxygen in `url.hh`.
This fixes fetching gitlab: flakes.
Paths are now stored as a std::vector of individual path
segments, which can themselves contain path separators '/' (%2F).
This is necessary to make the Gitlab's /projects/ API work.
Co-authored-by: John Ericson <John.Ericson@Obsidian.Systems>
Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
This allows us to replace some very hacky and not correct string
concatentation in `HttpBinaryCacheStore`. It will especially be useful
with #13752, when today's hacks started to cause problems in practice,
not just theory.
Also make `fixGitURL` returned a `ParsedURL`.
This systematizes the way our s3:// URLs are parsed in filetransfer.cc.
Yoinked out and refactored out of [1].
[1]: https://github.com/NixOS/nix/pull/13752
Co-authored-by: Bernardo Meurer Costa <beme@anthropic.com>
Turns out we didn't have tests for some of the important behavior introduced
for flake reference fragments and url queries [1]. This is rather important
and is relied upon by existing tooling. This fixes up these exact cases before
handing off the URL to the Boost.URL parser.
To the best of my knowledge this implements the same behavior as prior regex-based
parser did [2]:
> fragmentRegex = "(?:" + pcharRegex + "|[/? \"^])*";
> queryRegex = "(?:" + pcharRegex + "|[/? \"])*";
[1]: 9c0a09f09f
[2]: https://github.com/NixOS/nix/blob/2.30.2/src/libutil/include/nix/util/url-parts.hh
Fixes usage of `#` symbol in the reference name.
This also seems to identify several deficiencies in the libgit2 refname
validation code wrt to DEL symbol and a singular `@` symbol [1].
[1]: https://git-scm.com/docs/git-check-ref-format#_description
This patch allows users to specify the connection port
in the store URLS like so:
```
nix store info --store "ssh-ng://localhost:22" --json
```
Previously this failed with: `error: failed to start SSH connection to 'localhost:22'`,
because the code did not distinguish the port from the hostname. This
patch remedies that problem by introducing a ParsedURL::Authority type
for working with parsed authority components of URIs.
Now that the URL parsing code is less ad-hoc we can
add more long-awaited fixes for specifying SSH connection
ports in store URIs.
Builds upon the work from bd1d2d1041.
Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
Co-authored-by: John Ericson <John.Ericson@Obsidian.Systems>
Boost.URL is a significantly more RFC-compliant parser
than what libutil currently has a bundle of incomprehensible
regexes.
One aspect of this change is that RFC4007 ZoneId IPv6 literals
are represented in URIs according to RFC6874 [1].
Previously they were represented naively like so: [fe80::818c:da4d:8975:415c\%enp0s25].
This is not entirely correct, because the percent itself has to be pct-encoded:
> "%" is always treated as
an escape character in a URI, so, according to the established URI
syntax [RFC3986] any occurrences of literal "%" symbols in a URI MUST
be percent-encoded and represented in the form "%25". Thus, the
scoped address fe80::a%en1 would appear in a URI as
http://[fe80::a%25en1].
[1]: https://datatracker.ietf.org/doc/html/rfc6874
Co-authored-by: Jörg Thalheim <joerg@thalheim.io>
The myriad of hand-rolled URL parsing and validation code
is a constant source of problems. Regexes are not a great way
of writing parsers and there's a history of getting them wrong.
Boost.URL is a good library we can outsource most of the heavy
lifting to.
* It is tough to contribute to a project that doesn't use a formatter,
* It is extra hard to contribute to a project which has configured the formatter, but ignores it for some files
* Code formatting makes it harder to hide obscure / weird bugs by accident or on purpose,
Let's rip the bandaid off?
Note that PRs currently in flight should be able to be merged relatively easily by applying `clang-format` to their tip prior to merge.
For example, instead of doing
#include "nix/store-config.hh"
#include "nix/derived-path.hh"
Now do
#include "nix/store/config.hh"
#include "nix/store/derived-path.hh"
This was originally planned in the issue, and also recent requested by
Eelco.
Most of the change is purely mechanical. There is just one small
additional issue. See how, in the example above, we took this
opportunity to also turn `<comp>-config.hh` into `<comp>/config.hh`.
Well, there was already a `nix/util/config.{cc,hh}`. Even though there
is not a public configuration header for libutil (which also would be
called `nix/util/config.{cc,hh}`) that's still confusing, To avoid any
such confusion, we renamed that to `nix/util/configuration.{cc,hh}`.
Finally, note that the libflake headers already did this, so we didn't
need to do anything to them. We wouldn't want to mistakenly get
`nix/flake/flake/flake.hh`!
Progress on #7876
The short answer for why we need to do this is so we can consistently do
`#include "nix/..."`. Without this change, there are ways to still make
that work, but they are hacky, and they have downsides such as making it
harder to make sure headers from the wrong Nix library (e..g.
`libnixexpr` headers in `libnixutil`) aren't being used.
The C API alraedy used `nix_api_*`, so its headers are *not* put in
subdirectories accordingly.
Progress on #7876
We resisted doing this for a while because it would be annoying to not
have the header source file pairs close by / easy to change file
path/name from one to the other. But I am ameliorating that with
symlinks in the next commit.
This gets rid of unnecessary copies in range-based-for loops and
local variables, when they are used solely as `const &`.
Also added a fixme comment about a suspicious move out of const,
which might not be intended.
Known behavior changes:
- `MemorySourceAccessor`'s comparison operators no longer forget to
compare the `SourceAccessor` base class.
Progress on #10832
What remains for that issue is hopefully much easier!
Previously, the "file:./" prefix was not correctly recognized in
fixGitURL; instead, it was mistaken as a file path, which resulted in a
parsed url of the form "file://file:./".
This commit fixes the issue by properly detecting the "file:" prefix.
Note, however, that unlike "file://", the "file:./" URI is _not_
standardized, but has been widely used to referred to relative file
paths. In particular, the "git+file:./" did work for nix<=2.18, and was
broken since nix 2.19.0.
Finally, this commit fixes the issue completely for the 2.19 series, but
is still inadequate for the 2.20 series due to new behaviors from the
switch to libgit2. However, it does improve the correctness of parsing
even though it is not yet a complete solution.
Users may select specific outputs using the ^output syntax or selecting
any output using ^*.
URL parsing currently doesn't support these kinds of output references:
parsing will fail.
Currently `queryRegex` was reused for URL fragments, which didn't
include support for ^. Now queryRegex has been split from fragmentRegex,
where only the fragmentRegex supports ^.
Two changes:
* The (probably unintentional) hack to handle paths as tarballs has
been removed. This is almost certainly not what users expect and is
inconsistent with flakeref handling everywhere else.
* The hack to support scp-style Git URLs has been moved to the Git
fetcher, so it's now supported not just by fetchTree but by flake
inputs.
Support using nix flakes in paths with spaces or abitrary unicode characters.
This introduces the convention that the path part of the URL should be
percent-encoded when dealing with `path:` urls and not when using
filepaths (following the convention of firefox).
Co-authored-by: Rendal <rasmus@rend.al>
If you have a URL that needs to be percent-encoded, such as
`http://localhost:8181/test/+3d.tar.gz`, and try to lock that in a Nix
flake such as the following:
{
inputs.test = { url = "http://localhost:8181/test/+3d.tar.gz"; flake = false; };
outputs = { test, ... }: {
t = builtins.readFile test;
};
}
running `nix flake metadata` shows that the input URL has been
incorrectly double-encoded (despite the flake.lock being correctly
encoded only once):
[...snip...]
Inputs:
└───test: http://localhost:8181/test/%252B3d.tar.gz?narHash=sha256-EFUdrtf6Rn0LWIJufrmg8q99aT3jGfLvd1//zaJEufY%3D
(Notice the `%252B`? That's just `%2B` but percent-encoded again)
With this patch, the double-encoding is gone; running `nix flake
metadata` will show the proper URL:
[...snip...]
Inputs:
└───test: http://localhost:8181/test/%2B3d.tar.gz?narHash=sha256-EFUdrtf6Rn0LWIJufrmg8q99aT3jGfLvd1//zaJEufY%3D
---
As far as I can tell, this happens because Nix already percent-encodes
the URL and stores this as the value of `inputs.asdf.url`.
However, when Nix later tries to read this out of the eval state as a
string (via `getStrAttr`), it has to run it through `parseURL` again to
get the `ParsedURL` structure.
Now, this itself isn't a problem -- the true problem arises when using
`ParsedURL::to_string` later, which then _re-escapes the path_. It is
at this point that what would have been `%2B` (`+`) becomes `%252B`
(`%2B`).
Add a new `file` fetcher type, which will fetch a plain file over
http(s), or from the local file.
Because plain `http(s)://` or `file://` urls can already correspond to
`tarball` inputs (if the path ends-up with a know archive extension),
the URL parsing logic is a bit convuluted in that:
- {http,https,file}:// urls will be interpreted as either a tarball or a
file input, depending on the extensions of the path part (so
`https://foo.com/bar` will be a `file` input and
`https://foo.com/bar.tar.gz` as a `tarball` input)
- `file+{something}://` urls will be interpreted as `file` urls (with
the `file+` part removed)
- `tarball+{something}://` urls will be interpreted as `tarball` urls (with
the `tarball+` part removed)
Fix#3785
Co-Authored-By: Tony Olagbaiye <me@fron.io>
The previous regex was too strict and did not match what git was allowing. It
could lead to `fetchGit` not accepting valid branch names, even though they
exist in a repository (for example, branch names containing `/`, which are
pretty standard, like `release/1.0` branches).
The new regex defines what a branch name should **NOT** contain. It takes the
definitions from `refs.c` in https://github.com/git/git and `git help
check-ref-format` pages.
This change also introduces a test for ref name validity checking, which
compares the result from Nix with the result of `git check-ref-format --branch`.
This provides a pluggable mechanism for defining new fetchers. It adds
a builtin function 'fetchTree' that generalizes existing fetchers like
'fetchGit', 'fetchMercurial' and 'fetchTarball'. 'fetchTree' takes a
set of attributes, e.g.
fetchTree {
type = "git";
url = "https://example.org/repo.git";
ref = "some-branch";
rev = "abcdef...";
}
The existing fetchers are just wrappers around this. Note that the
input attributes to fetchTree are the same as flake input
specifications and flake lock file entries.
All fetchers share a common cache stored in
~/.cache/nix/fetcher-cache-v1.sqlite. This replaces the ad hoc caching
mechanisms in fetchGit and download.cc (e.g. ~/.cache/nix/{tarballs,git-revs*}).
This also adds support for Git worktrees (c169ea5904).