1
1
Fork 0
mirror of https://github.com/NixOS/nix.git synced 2025-11-09 12:06:01 +01:00
Commit graph

50 commits

Author SHA1 Message Date
Sergei Zimmerman
11f9c59140 Remove validation of URLs passed to FileTransferRequest verbatim
CURL is not very strict about validation of URLs passed to it. We
should reflect this in our handling of URLs that we get from the user
in <nix/fetchurl.nix> or builtins.fetchurl. ValidURL was an attempt to
rectify this, but it turned out to be too strict. The only good way to
resolve this is to pass (in some cases) the user-provided string verbatim
to CURL. Other usages in libfetchers still benefit from using structured
ParsedURL and validation though.

nix store prefetch-file --name foo 'https://cdn.skypack.dev/big.js@^5.2.2'
error: 'https://cdn.skypack.dev/big.js@^5.2.2' is not a valid URL: leftover

(cherry picked from commit 47f427a172)
2025-10-13 20:48:15 +00:00
Sergei Zimmerman
98b7654390 libutil: Fix renderAuthorityAndPath unreachable for path:/ URLs
This was mistakenly triggered by path:/ URL, since the `//` would
correspond to 3 empty segments.

(cherry picked from commit 1d8dd77e1d)
2025-10-08 23:24:01 +00:00
John Ericson
d2f1860ee5 Revert "Improve Git URI handling"
I (@Ericson2314) messed up. We were supposed to test the status quo
before landing any new chnages, and also there is one change that is not
quite right (relative paths).

I am reverting for now, and then backporting the test suite to the old
situation.

This reverts commit 04ad66af5f.
2025-09-01 16:13:32 -04:00
John Ericson
7fde4f7d6f
Merge pull request #13821 from fzakaria/fzakaria/improve-fixgiturl
Improve Git URI handling
2025-09-01 14:15:17 -04:00
Farid Zakaria
04ad66af5f Improve Git URI handling
Git URI can also support scp style links similar to git itself.

This change augments the function fixGitURL to better handle the scp
style urls through a minimal parser rather than regex which has been
found to be brittle.

* Support for IPV6 added
* New test cases added for fixGitURL
* Clearer documentation on purpose and goal of function
* More `std::string_view` for performance
* A few more URL tests

Fixes #5958
2025-09-01 14:04:04 -04:00
Sergei Zimmerman
e548700010
lib{store,fetchers}: Pass URLs specified directly verbatim to FileTransferRequest
The URL should not be normalized before handing it off to cURL, because
builtin fetchers like fetchTarball/fetchurl are expected to work with
arbitrary URLs, that might not be RFC3986 compliant. For those cases
Nix should not normalize URLs, though validation is fine. ParseURL and
cURL are supposed to match the set of acceptable URLs, since they implement
the same RFC.
2025-09-01 02:22:23 +03:00
Jörg Thalheim
c436b7a32a
Fix ParsedURL handling of %2F in URL paths
See the new extensive doxygen in `url.hh`.
This fixes fetching gitlab: flakes.

Paths are now stored as a std::vector of individual path
segments, which can themselves contain path separators '/' (%2F).
This is necessary to make the Gitlab's /projects/ API work.

Co-authored-by: John Ericson <John.Ericson@Obsidian.Systems>
Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
2025-08-28 22:20:04 +03:00
John Ericson
e82210b3b2 Implement parseURLRelative, use in HttpBinaryCacheStore
This allows us to replace some very hacky and not correct string
concatentation in `HttpBinaryCacheStore`. It will especially be useful
with #13752, when today's hacks started to cause problems in practice,
not just theory.

Also make `fixGitURL` returned a `ParsedURL`.
2025-08-26 19:45:10 -04:00
Leandro Reina
7989e3192d Handle empty ports 2025-08-26 17:41:27 +02:00
John Ericson
72a548ed6a Limit to lenient parsing of non-standard URLs only where needed
This allows us to put `parseURL` in more spots without furthering
technical debt.
2025-08-22 12:37:37 -04:00
John Ericson
4083eff0c0 decodeQuery Take std::string_view not string ref 2025-08-22 12:26:48 -04:00
Sergei Zimmerman
69fcc2cfc1
libstore: Introduce ParsedS3URL type
This systematizes the way our s3:// URLs are parsed in filetransfer.cc.
Yoinked out and refactored out of [1].

[1]: https://github.com/NixOS/nix/pull/13752

Co-authored-by: Bernardo Meurer Costa <beme@anthropic.com>
2025-08-19 23:39:18 +03:00
Sergei Zimmerman
dc1b2012af
libutil: Fix handling of unescaped spaces, quotes and shevrons in queries and fragments
Turns out we didn't have tests for some of the important behavior introduced
for flake reference fragments and url queries [1]. This is rather important
and is relied upon by existing tooling. This fixes up these exact cases before
handing off the URL to the Boost.URL parser.

To the best of my knowledge this implements the same behavior as prior regex-based
parser did [2]:

> fragmentRegex = "(?:" + pcharRegex + "|[/? \"^])*";
> queryRegex = "(?:" + pcharRegex + "|[/? \"])*";

[1]: 9c0a09f09f
[2]: https://github.com/NixOS/nix/blob/2.30.2/src/libutil/include/nix/util/url-parts.hh
2025-08-16 23:00:31 +03:00
Sergei Zimmerman
e8e9376a7b
libfetchers: Remove badGitRefRegex and use libgit2 for reference validation
Fixes usage of `#` symbol in the reference name.
This also seems to identify several deficiencies in the libgit2 refname
validation code wrt to DEL symbol and a singular `@` symbol [1].

[1]: https://git-scm.com/docs/git-check-ref-format#_description
2025-08-11 02:38:45 +03:00
Maciej Krüger
49ba06175e
Add user@address:port support
This patch allows users to specify the connection port
in the store URLS like so:

```
nix store info --store "ssh-ng://localhost:22" --json
```

Previously this failed with: `error: failed to start SSH connection to 'localhost:22'`,
because the code did not distinguish the port from the hostname. This
patch remedies that problem by introducing a ParsedURL::Authority type
for working with parsed authority components of URIs.

Now that the URL parsing code is less ad-hoc we can
add more long-awaited fixes for specifying SSH connection
ports in store URIs.

Builds upon the work from bd1d2d1041.

Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
Co-authored-by: John Ericson <John.Ericson@Obsidian.Systems>
2025-08-06 23:48:14 +03:00
Sergei Zimmerman
bd1d2d1041
libutil: Use Boost.URL in parseURL
Boost.URL is a significantly more RFC-compliant parser
than what libutil currently has a bundle of incomprehensible
regexes.

One aspect of this change is that RFC4007 ZoneId IPv6 literals
are represented in URIs according to RFC6874 [1].

Previously they were represented naively like so: [fe80::818c:da4d:8975:415c\%enp0s25].
This is not entirely correct, because the percent itself has to be pct-encoded:

> "%" is always treated as
   an escape character in a URI, so, according to the established URI
   syntax [RFC3986] any occurrences of literal "%" symbols in a URI MUST
   be percent-encoded and represented in the form "%25".  Thus, the
   scoped address fe80::a%en1 would appear in a URI as
   http://[fe80::a%25en1].

[1]: https://datatracker.ietf.org/doc/html/rfc6874

Co-authored-by: Jörg Thalheim <joerg@thalheim.io>
2025-07-18 21:24:01 +03:00
Sergei Zimmerman
d020f21a2a
libutil: Use default operator== for ParsedURL
The default comparison operator can be generated
by the compiler since C++20.
2025-07-18 21:23:42 +03:00
Sergei Zimmerman
ad449c0288
libutil: Refactor percentDecode,percentEncode to use Boost.URL
The myriad of hand-rolled URL parsing and validation code
is a constant source of problems. Regexes are not a great way
of writing parsers and there's a history of getting them wrong.
Boost.URL is a good library we can outsource most of the heavy
lifting to.
2025-07-18 21:23:40 +03:00
Graham Christensen
e4f62e4608 Apply clang-format universally.
* It is tough to contribute to a project that doesn't use a formatter,
* It is extra hard to contribute to a project which has configured the formatter, but ignores it for some files
* Code formatting makes it harder to hide obscure / weird bugs by accident or on purpose,

Let's rip the bandaid off?

Note that PRs currently in flight should be able to be merged relatively easily by applying `clang-format` to their tip prior to merge.
2025-07-18 12:47:27 -04:00
Sergei Zimmerman
8ee513379a
Use StringMap instead of std::map<std::string, std::string> throughout the codebase 2025-05-19 20:33:28 +00:00
John Ericson
cc24766fa6 Expose the nix component in header include paths
For example, instead of doing

    #include "nix/store-config.hh"
    #include "nix/derived-path.hh"

Now do

    #include "nix/store/config.hh"
    #include "nix/store/derived-path.hh"

This was originally planned in the issue, and also recent requested by
Eelco.

Most of the change is purely mechanical. There is just one small
additional issue. See how, in the example above, we took this
opportunity to also turn `<comp>-config.hh` into `<comp>/config.hh`.
Well, there was already a `nix/util/config.{cc,hh}`. Even though there
is not a public configuration header for libutil (which also would be
called `nix/util/config.{cc,hh}`) that's still confusing, To avoid any
such confusion, we renamed that to `nix/util/configuration.{cc,hh}`.

Finally, note that the libflake headers already did this, so we didn't
need to do anything to them. We wouldn't want to mistakenly get
`nix/flake/flake/flake.hh`!

Progress on #7876
2025-04-01 11:40:42 -04:00
John Ericson
f3e1c47f47 Separate headers from source files
The short answer for why we need to do this is so we can consistently do
`#include "nix/..."`. Without this change, there are ways to still make
that work, but they are hacky, and they have downsides such as making it
harder to make sure headers from the wrong Nix library (e..g.
`libnixexpr` headers in `libnixutil`) aren't being used.

The C API alraedy used `nix_api_*`, so its headers are *not* put in
subdirectories accordingly.

Progress on #7876

We resisted doing this for a while because it would be annoying to not
have the header source file pairs close by / easy to change file
path/name from one to the other. But I am ameliorating that with
symlinks in the next commit.
2025-03-31 12:20:25 -04:00
Eelco Dolstra
1a38e62a09 Remove unused variable 2025-01-09 16:38:33 +01:00
Eelco Dolstra
4077aa43a8 ParsedURL: Remove base field 2025-01-07 14:52:00 +01:00
Eelco Dolstra
f705ce7f9a ParsedURL: Remove url field
This prevents a 'url' field that is out of sync with the other
fields. You can use to_string() to get the full URL.
2025-01-07 14:46:03 +01:00
Sergei Zimmerman
fafaec5ac3 fix(treewide): remove unnecessary copying in range for loops
This gets rid of unnecessary copies in range-based-for loops and
local variables, when they are used solely as `const &`.

Also added a fixme comment about a suspicious move out of const,
which might not be intended.
2024-11-26 00:06:29 +03:00
Bryan Honof
1f024ecfcd
fix: warn on malformed URI query parameter 2024-09-30 14:44:06 +02:00
Jörg Thalheim
5a5a010120 Revert "fix: Error on malformed URI query parameter"
This reverts commit c9f45677b5.

This now triggers on simple cases like `nix build .#nix`.
Reverting for now.
2024-09-05 15:18:16 +02:00
Jörg Thalheim
a81083d080 Revert "Update src/libutil/url.cc"
This reverts commit 9b1cefe27e.
2024-09-05 15:18:16 +02:00
Bryan Honof
9b1cefe27e
Update src/libutil/url.cc
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
2024-08-28 18:48:18 +02:00
Bryan Honof
c9f45677b5
fix: Error on malformed URI query parameter
Signed-off-by: Bryan Honof <bryanhonof@gmail.com>
2024-08-23 22:04:37 +02:00
John Ericson
bc83b9dc1f Remove comparator.hh and switch to <=> in a bunch of places
Known behavior changes:

- `MemorySourceAccessor`'s comparison operators no longer forget to
  compare the `SourceAccessor` base class.

Progress on #10832

What remains for that issue is hopefully much easier!
2024-07-12 14:54:18 -04:00
Bryan Lai
8594f3cd5a libutil/url: fix git+file:./ parse error
Previously, the "file:./" prefix was not correctly recognized in
fixGitURL; instead, it was mistaken as a file path, which resulted in a
parsed url of the form "file://file:./".

This commit fixes the issue by properly detecting the "file:" prefix.
Note, however, that unlike "file://", the "file:./" URI is _not_
standardized, but has been widely used to referred to relative file
paths. In particular, the "git+file:./" did work for nix<=2.18, and was
broken since nix 2.19.0.

Finally, this commit fixes the issue completely for the 2.19 series, but
is still inadequate for the 2.20 series due to new behaviors from the
switch to libgit2. However, it does improve the correctness of parsing
even though it is not yet a complete solution.
2024-02-01 10:51:22 +08:00
Eelco Dolstra
9d9d9ff0de Merge remote-tracking branch 'origin/master' into profile-names-instead-of-index 2023-12-21 16:21:26 +01:00
Robert Hensing
4eaeda6604 isValidSchemeName: Use regex
As requested by Eelco Dolstra. I think it used to be simpler.
2023-12-12 17:46:34 +01:00
Robert Hensing
2e451a663e schemeRegex -> schemeNameRegex
Scheme could be understood to include the typical `:` separator.
2023-12-12 17:25:20 +01:00
Robert Hensing
d3a85b6834 isValidSchemeName: Add function 2023-12-11 12:12:43 +01:00
Eelco Dolstra
2964a9f562 Fix relative submodule handling
Tested on

  nix flake prefetch 'git+https://github.com/blender/blender.git?rev=4ed8a360e956daf2591add4d3c9ec0719e2628fe&submodules=1'
2023-11-14 16:00:21 +01:00
Bob van der Linden
9c0a09f09f allow ^ in URLs
Users may select specific outputs using the ^output syntax or selecting
any output using ^*.

URL parsing currently doesn't support these kinds of output references:
parsing will fail.

Currently `queryRegex` was reused for URL fragments, which didn't
include support for ^. Now queryRegex has been split from fragmentRegex,
where only the fragmentRegex supports ^.
2023-11-06 21:21:20 +01:00
Eelco Dolstra
856fe13533 fetchTree cleanup
Two changes:

* The (probably unintentional) hack to handle paths as tarballs has
  been removed. This is almost certainly not what users expect and is
  inconsistent with flakeref handling everywhere else.

* The hack to support scp-style Git URLs has been moved to the Git
  fetcher, so it's now supported not just by fetchTree but by flake
  inputs.
2023-10-13 14:34:23 +02:00
John Ericson
b912f3a937 Move flakeIdRegex{,S} from libutil to flakeref.{cc,hh
It isn't used, and doesn't belong in `libutil`.
2023-09-28 20:55:41 -04:00
Théophane Hufschmitt
50e61f579c Allow special characters in flake paths
Support using nix flakes in paths with spaces or abitrary unicode characters.
This introduces the convention that the path part of the URL should be
percent-encoded when dealing with `path:` urls and not when using
filepaths (following the convention of firefox).

Co-authored-by: Rendal <rasmus@rend.al>
2023-09-22 10:06:43 +02:00
Cole Helbling
73696ec716 libutil: fix double-encoding of URLs
If you have a URL that needs to be percent-encoded, such as
`http://localhost:8181/test/+3d.tar.gz`, and try to lock that in a Nix
flake such as the following:

    {
      inputs.test = { url = "http://localhost:8181/test/+3d.tar.gz"; flake = false; };
      outputs = { test, ... }: {
        t = builtins.readFile test;
      };
    }

running `nix flake metadata` shows that the input URL has been
incorrectly double-encoded (despite the flake.lock being correctly
encoded only once):

    [...snip...]
    Inputs:
    └───test: http://localhost:8181/test/%252B3d.tar.gz?narHash=sha256-EFUdrtf6Rn0LWIJufrmg8q99aT3jGfLvd1//zaJEufY%3D

(Notice the `%252B`? That's just `%2B` but percent-encoded again)

With this patch, the double-encoding is gone; running `nix flake
metadata` will show the proper URL:

    [...snip...]
    Inputs:
    └───test: http://localhost:8181/test/%2B3d.tar.gz?narHash=sha256-EFUdrtf6Rn0LWIJufrmg8q99aT3jGfLvd1//zaJEufY%3D

---

As far as I can tell, this happens because Nix already percent-encodes
the URL and stores this as the value of `inputs.asdf.url`.

However, when Nix later tries to read this out of the eval state as a
string (via `getStrAttr`), it has to run it through `parseURL` again to
get the `ParsedURL` structure.

Now, this itself isn't a problem -- the true problem arises when using
`ParsedURL::to_string` later, which then _re-escapes the path_. It is
at this point that what would have been `%2B` (`+`) becomes `%252B`
(`%2B`).
2023-08-17 14:16:19 -07:00
Yorick van Pelt
0844856c84
url: make percentEncode stricter, expose and unit test it 2023-02-27 15:30:00 +01:00
Eric Wolf
4d50995eff Fix url parsing for urls using file+
`file+https://example.org/test.mp4` should not be rejected with
`unexpected authority`.
2023-01-20 10:31:26 +01:00
Tony Olagbaiye
5b8c1deb18 fetchTree: Allow fetching plain files
Add a new `file` fetcher type, which will fetch a plain file over
http(s), or from the local file.

Because plain `http(s)://` or `file://` urls can already correspond to
`tarball` inputs (if the path ends-up with a know archive extension),
the URL parsing logic is a bit convuluted in that:

- {http,https,file}:// urls will be interpreted as either a tarball or a
  file input, depending on the extensions of the path part (so
  `https://foo.com/bar` will be a `file` input and
  `https://foo.com/bar.tar.gz` as a `tarball` input)
- `file+{something}://` urls will be interpreted as `file` urls (with
  the `file+` part removed)
- `tarball+{something}://` urls will be interpreted as `tarball` urls (with
  the `tarball+` part removed)

Fix #3785

Co-Authored-By: Tony Olagbaiye <me@fron.io>
2022-05-19 18:24:49 +02:00
Pamplemousse
4a7a8b87cd Prefer to throw specific errors
Signed-off-by: Pamplemousse <xav.maso@gmail.com>
2021-07-01 11:09:31 -07:00
Eelco Dolstra
e8e1d420f3 Don't include <regex> in header files
This reduces compilation time by ~15 seconds (CPU time).

Issue #4045.
2020-09-21 18:22:45 +02:00
Nikola Knezevic
77007d4eab Improve ref validity checking in fetchGit
The previous regex was too strict and did not match what git was allowing. It
could lead to `fetchGit` not accepting valid branch names, even though they
exist in a repository (for example, branch names containing `/`, which are
pretty standard, like `release/1.0` branches).

The new regex defines what a branch name should **NOT** contain. It takes the
definitions from `refs.c` in https://github.com/git/git and `git help
check-ref-format` pages.

This change also introduces a test for ref name validity checking, which
compares the result from Nix with the result of `git check-ref-format --branch`.
2020-05-30 12:29:35 +02:00
Eelco Dolstra
462421d345 Backport libfetchers from the flakes branch
This provides a pluggable mechanism for defining new fetchers. It adds
a builtin function 'fetchTree' that generalizes existing fetchers like
'fetchGit', 'fetchMercurial' and 'fetchTarball'. 'fetchTree' takes a
set of attributes, e.g.

  fetchTree {
    type = "git";
    url = "https://example.org/repo.git";
    ref = "some-branch";
    rev = "abcdef...";
  }

The existing fetchers are just wrappers around this. Note that the
input attributes to fetchTree are the same as flake input
specifications and flake lock file entries.

All fetchers share a common cache stored in
~/.cache/nix/fetcher-cache-v1.sqlite. This replaces the ad hoc caching
mechanisms in fetchGit and download.cc (e.g. ~/.cache/nix/{tarballs,git-revs*}).

This also adds support for Git worktrees (c169ea5904).
2020-04-07 09:03:14 +02:00