Replace the multi-pass O(n²) algorithm with a single-pass O(n) approach
using hashmaps to track path endpoints.
Before:
- First pass: O(n*m) to join each edge with existing multiedges
- Second pass: O(m²) repeated merging until no more merges possible
- Required multiple iterations through the entire dataset
Now:
- Single O(n) pass through edges
- Two hashmaps (pathStartingAt, pathEndingAt) enable O(1) lookups
- Immediately identifies connections without linear searches
- Handles all cases in one pass:
* Extending existing paths (prepend/append)
* Joining two separate paths
* Forming cycles when a path connects to itself
Move `find-cycles.hh` to `src/libstore/include/nix/store/build/` to
ensure it is properly installed as a public header and can be used by
tests and other consumers of the library.
The "2" suffix was unclear and didn't communicate the function's purpose.
The new name better describes what it does: walks the filesystem tree and
scans each file using the provided sink.
The single-pass greedy algorithm could fail to connect all edges if
they arrived in certain orders. For example, edges A→B, C→D, D→A, B→C
would result in two paths [D→A→B→C] and [C→D] instead of one complete
cycle.
Added a second pass that repeatedly tries to merge existing multiedges
with each other until no more merges are possible. This ensures we find
complete cycle paths regardless of edge discovery order.
After refactoring to use `RefScanSink`, we no longer manually search for
hashes in buffers, so the `refLength` constant (hash length) is unused.
`RefScanSink` handles this internally.
Replaced raw `read()` with `readFull()` helper, which properly
handles partial reads and `EINTR`. The previous code manually checked
for errors but didn't handle the case where `read()` returns fewer
bytes than requested.
The `hashPathMap` was being passed to `CycleEdgeScanSink` and stored as
a member variable, but was never actually used. The sink only needs the
hash strings for detection via `RefScanSink`, not the full `StorePath`
mapping.
Previously, `CycleEdgeScanSink::operator()` copied the entire
`getResult()` `StringSet` twice on every 64KB chunk to detect newly
found hashes. For large files, this created O(n * chunks) overhead.
Now we track which hashes have been recorded for the current file using
`recordedForCurrentFile`, avoiding the set copies. The insert() returns
true only for newly seen hashes, making this O(1) per hash found.
In particular
- Remove `get`, it is redundant with `valueAt` and the `get` in
`util.hh`.
- Remove `nullableValueAt`. It is morally just the function composition
`getNullable . valueAt`, not an orthogonal combinator like the others.
- `optionalValueAt` return a pointer, not `std::optional`. This also
expresses optionality, but without creating a needless copy. This
brings it in line with the other combinators which also return
references.
- Delete `valueAt` and `optionalValueAt` taking the map by value, as we
did for `get` in 408c09a120, which
prevents bugs / unnecessary copies.
`adl_serializer<DerivationOptions::OutputChecks>::from_json` was the one
use of `getNullable`. I give it a little static function for the
ultimate creation of a `std::optional` it does need to do (after
switching it to using `getNullable . valueAt`. That could go in
`json-utils.hh` eventually, but I didn't bother for now since only one
things needs it.
Co-authored-by: Sergei Zimmerman <sergei@zimmerman.foo>
S3 buckets support object versioning to prevent unexpected changes,
but Nix previously lacked the ability to fetch specific versions of
S3 objects. This adds support for a `versionId` query parameter in S3
URLs, enabling users to pin to specific object versions:
```
s3://bucket/key?region=us-east-1&versionId=abc123
```
This has already been implemented in 1e709554d5
as a side-effect of mounting the accessors in storeFS. Let's test this so it
doesn't regress.
(cherry-picked from https://github.com/NixOS/nix/pull/12915)
Move HttpBinaryCacheStore class from .cc file to header to enable
inheritance by S3BinaryCacheStore. Create S3BinaryCacheStore class that
overrides upsertFile() to implement multipart upload logic.
Add a sizeHint parameter to BinaryCacheStore::upsertFile() to enable
size-based upload decisions in implementations. This lays the groundwork
for reintroducing S3 multipart upload support.
Add support for HTTP DELETE requests to FileTransfer infrastructure:
This enables S3 multipart upload abort functionality via DELETE requests
to S3 endpoints.
This reverts commit 90d1ff4805.
The initial issue with EPIPE was solved in 9f680874c5.
Now this patch does move bad than good by eating up boost::io::format_error that are
bugs.
addToStore(): Don't parse the NAR
* StringSource: Implement skip()
This is slightly faster than doing a read() into a buffer just to
discard the data.
* LocalStore::addToStore(): Skip unnecessary NARs rather than parsing them
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
A few changes had cropped up with `_NIX_TEST_ACCEPT=1`:
1. Blake hashing test JSON had a different indentation
2. Store URI had improper non-quoted spaces
(1) was is just fixed, as we trust nlohmann JSON to parse JSON
correctly, regardless of whitespace.
For (2), the existing URL was made a read-only test, since we very much
wish to continue parsing such invalid URLs directly. And then the
original read/write test was updated to properly percent-encode the
space, as the normal form should be.