Enabling CONFIG_NETFILTER_FAMILY_BRIDGE causes the new element,
hooks_bridge[] to be added to netns_nf. Since the KMI is frozen
this could not be added.
The only instantiation of struct netns_nf is as an embedded field
of struct net. So instead of adding the field to struct netns_nf,
a new "struct ext_net" is added that contains struct net and
the new hooks_bridge[] field. An accessor function,
get_nf_hooks_bridge() is added to get a pointer to the new
field.
There is a global init_net of type struct net which must be special
cased since it is not a member of a struct ext_net. All other
instances of struct net are allocated via net_alloc() which now
allocates a struct ext_net.
Since CONFIG_NETFILTER_FAMILY_BRIDGE is a hidden config that is
needed for vendor modules, it is enabled via init/Kconfig.gki.
Bug: 316040984
Fixes: 0145780bfc78 ("fix KASAN-related kernel crash by KMI W/A for NETFILTER_FAMILY_BRIDGE")
Change-Id: I2c7384e3df9b88f12464dc0138986fed12ca626a
Signed-off-by: Norihiko Hama <Norihiko.Hama@alpsalpine.com>
The commit 5aec776ef8c9 ("BACKPORT: ANDROID: dma-buf: Move sysfs work
out of DMA-BUF export path) re-purposed kobject as work_struct temporarily
to create the sysfs entries asynchronously. The author knows what he is
doing and rightly added a build assert if kobject struct size is smaller than
the work_struct size. We are hitting this build assert on a non-GKI platform
where CONFIG_ANDROID_KABI_RESERVE is not set. Fix this problem by allocating
a new union with dma_buf_sysfs_entry structure and temporary structure as
members. We only end up allocating more memory (because of union) only when
kobject size is smaller than work_struct which the original patch any way
assumed would never be true.
Bug: 261818147
Bug: 262666413
Change-Id: Ifb089bf80d8a3a44ece9f05fc0b99ee76cb11645
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
(cherry picked from commit ce18af9b5d7d0baad2ac3eea4c732d2bf128d690)
Signed-off-by: T.J. Mercier <tjmercier@google.com>
We have identified an animation lag issue on our Android 14-6.1 product
which seems to be caused by contention in the rwsem lock during the
dmabuf request process. It appears that other processes are holding
sysfs read locks, resulting in the blocking of dmabuf sysfs node
creation. We encountered an issue in android14-6.1 that is similar to
the problem described in [1]. So we cherry-pick this commit to
android14-6.1.
[1] https://android-review.googlesource.com/c/kernel/common/+/2111974
Bug: 311282169
Bug: 206979019
Link: https://lore.kernel.org/lkml/CABdmKX2dNYhgOYdrrJU6-jt6F=LjCidbKhR6t4F7yaa0SPr+-A@mail.gmail.com/T/
Signed-off-by: Dezhi Huang <huangdezhi@hihonor.com>
Conflicts:
include/linux/dma-buf.h
1. The android14-6.1 KMI is frozen, and the modification to struct
dma_buf_sysfs_entry in the original patch triggers ABI check
failures. Instead of an anonymous union, use the existing struct
kobject directly as a work_struct with type punning.
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Change-Id: Ic0386849b6b248b0a72215633fc1a50782455bac
commit 7315dc1e122c85ffdfc8defffbb8f8b616c2eb1a upstream.
NFT_MSG_DELSET deactivates all elements in the set, skip
set->ops->commit() to avoid the unnecessary clone (for the pipapo case)
as well as the sync GC cycle, which could deactivate again expired
elements in such set.
Bug: 318548348
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0105571f80)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie733688e27d9568d797fc1bc477261883b7dc8c1
Under certain circumstances __get_fault_info() may resolve the faulting
address using the AT instruction. Given that this is being done outside
of the host lock critical section, it is racy and the resolution via AT
may fail. We currently BUG() in this situation, which is obviously less
than ideal. Moving the address resolution to the critical section may
have a performance impact, so let's keep it where it is, but bail out
and return to the host to try a second time.
Bug: 311830307
Change-Id: I26d61b04a4ccf040bd31802abb3c6b998ff4a48b
Signed-off-by: Quentin Perret <qperret@google.com>
commit d920abd1e7c4884f9ecd0749d1921b7ab19ddfbd upstream.
From Alon:
"Due to a logical bug in the NVMe-oF/TCP subsystem in the Linux kernel,
a malicious user can cause a UAF and a double free, which may lead to
RCE (may also lead to an LPE in case the attacker already has local
privileges)."
Hence, when a queue initialization fails after the ahash requests are
allocated, it is guaranteed that the queue removal async work will be
called, hence leave the deallocation to the queue removal.
Also, be extra careful not to continue processing the socket, so set
queue rcv_state to NVMET_TCP_RECV_ERR upon a socket error.
Bug: 310114968
Cc: stable@vger.kernel.org
Reported-by: Alon Zahavi <zahavi.alon@gmail.com>
Tested-by: Alon Zahavi <zahavi.alon@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e985d78bdcf37f7ef73666a43b0d2407715f00d3)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ifd7ec8294182a6bf6d8c261aeda5d989e909f7ff
Current EP0 dequeue path will share the same as other EPs. However, there
are some special considerations that need to be made for EP0 transfers:
- EP0 transfers never transition into the started_list
- EP0 only has one active request at a time
In case there is a vendor specific control message for a function over USB
FFS, then there is no guarantee on the timeline which the DATA/STATUS stage
is responded to. While this occurs, any attempt to end transfers on
non-control EPs will end up having the DWC3_EP_DELAY_STOP flag set, and
defer issuing of the end transfer command. If the USB FFS application
decides to timeout the control transfer, or if USB FFS AIO path exits, the
USB FFS driver will issue a call to usb_ep_dequeue() for the ep0 request.
In case of the AIO exit path, the AIO FS blocks until all pending USB
requests utilizing the AIO path is completed. However, since the dequeue
of ep0 req does not happen properly, all non-control EPs with the
DWC3_EP_DELAY_STOP flag set will not be handled, and the AIO exit path will
be stuck waiting for the USB FFS data endpoints to receive a completion
callback.
Fix is to utilize dwc3_ep0_reset_state() in the dequeue API to ensure EP0
is brought back to the SETUP state, and ensures that any deferred end
transfer commands are handled. This also will end any active transfers
on EP0, compared to the previous implementation which directly called
giveback only.
Fixes: fcd2def663 ("usb: dwc3: gadget: Refactor dwc3_gadget_ep_dequeue")
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
Bug: 318577849
Change-Id: Ic00684db4b502f1aab128f7e49f22510dda24f60
(cherry picked from commit 730e12fbec53ab59dd807d981a204258a4cfb29a https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing)
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
commit 7644b1a1c9a7ae8ab99175989bfc8676055edb46 upstream.
We could race with SQ thread exit, and if we do, we'll hit a NULL pointer
dereference when the thread is cleared. Grab the SQPOLL data lock before
attempting to get the task cpu and pid for fdinfo, this ensures we have a
stable view of it.
Bug: 309790656
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9236d2ea64)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I044e0285d4535440606ff593230b873e3145db91
commit 4b7de801606e504e69689df71475d27e35336fb3 upstream.
Lee pointed out issue found by syscaller [0] hitting BUG in prog array
map poke update in prog_array_map_poke_run function due to error value
returned from bpf_arch_text_poke function.
There's race window where bpf_arch_text_poke can fail due to missing
bpf program kallsym symbols, which is accounted for with check for
-EINVAL in that BUG_ON call.
The problem is that in such case we won't update the tail call jump
and cause imbalance for the next tail call update check which will
fail with -EBUSY in bpf_arch_text_poke.
I'm hitting following race during the program load:
CPU 0 CPU 1
bpf_prog_load
bpf_check
do_misc_fixups
prog_array_map_poke_track
map_update_elem
bpf_fd_array_map_update_elem
prog_array_map_poke_run
bpf_arch_text_poke returns -EINVAL
bpf_prog_kallsyms_add
After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
poke update fails on expected jump instruction check in bpf_arch_text_poke
with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.
Similar race exists on the program unload.
Fixing this by moving the update to bpf_arch_poke_desc_update function which
makes sure we call __bpf_arch_text_poke that skips the bpf address check.
Each architecture has slightly different approach wrt looking up bpf address
in bpf_arch_text_poke, so instead of splitting the function or adding new
'checkip' argument in previous version, it seems best to move the whole
map_poke_run update as arch specific code.
[0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810
Bug: 309551558
Fixes: ebf7d1f508 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Reported-by: syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Cc: Lee Jones <lee@kernel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20231206083041.1306660-2-jolsa@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 57a6b0a464)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I251c3da579e5d48cd7de4043913fd42d0671d6b5
xhci-hub.c tracks suspended ports in a suspended_port bitfield.
This is checked when responding to a Get_Status(PORT) request to see if a
port in running U0 state was recently resumed, and adds the required
USB_PORT_STAT_C_SUSPEND change bit in those cases.
The suspended_port bit was left uncleared if a device is disconnected
during suspend. The bit remained set even when a new device was connected
and enumerated. The set bit resulted in a incorrect Get_Status(PORT)
response with a bogus USB_PORT_STAT_C_SUSPEND change
bit set once the new device reached U0 link state.
USB_PORT_STAT_C_SUSPEND change bit is only used for USB2 ports, but
xhci-hub keeps track of both USB2 and USB3 suspended ports.
Cc: stable@vger.kernel.org
Reported-by: Wesley Cheng <quic_wcheng@quicinc.com>
Closes: https://lore.kernel.org/linux-usb/d68aa806-b26a-0e43-42fb-b8067325e967@quicinc.com/
Fixes: 1d5810b692 ("xhci: Rework port suspend structures for limited ports.")
Tested-by: Wesley Cheng <quic_wcheng@quicinc.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20230915143108.1532163-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 200589374
(cherry picked from commit d7cdfc319b2bcf6899ab0a05eec0958bc802a9a1 https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-next)
[wcheng: modified change to remove dependency on updated resume timestamp tracking]
Change-Id: Icccc1778a1f193b4b4c03532d291db88772bd454
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
Export sysctl_sched_min_granularity and
sysctl_sched_idle_min_granularity. In the vendor module, it will use
several static function in GKI, while we do not want to export these
static functions, which will need to make them not static, we copied
them to the vendor module, so we need the export the symbols used in
those static functions. For example, sysctl_sched_min_granularity
and sysctl_sched_idle_min_granularity are referred in sched_slice(),
and they are only used as read-only.
Bug: 316276520
Change-Id: I976d0a1f3a70e8e60099e55fdd3cc99a90053fbb
Signed-off-by: Rick Yiu <rickyiu@google.com>
Setting the PARKMODE_DISABLE_HS bit in the DWC3_USB3_GUCTL1.
When this bit is set to '1' all HS bus instances in park mode are disabled
For some USB wifi devices, if enable this feature it will reduce the
performance. Therefore, add an option for disabling HS park mode by
device-tree.
In Synopsys's dwc3 data book:
In a few high speed devices when an IN request is sent within 900ns of the
ACK of the previous packet, these devices send a NAK. When connected to
these devices, if required, the software can disable the park mode if you
see performance drop in your system. When park mode is disabled,
pipelining of multiple packet is disabled and instead one packet at a time
is requested by the scheduler. This allows up to 12 NAKs in a micro-frame
and improves performance of these slow devices.
Bug: 300024866
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Signed-off-by: Stanley Chang <stanley_chang@realtek.com>
Link: https://lore.kernel.org/r/20230419020044.15475-1-stanley_chang@realtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: William Wu <william.wu@rock-chips.com>
(cherry picked from commit d21a797a3eeb2b001e07ff943e5611eab67a71a3)
Change-Id: I43ee416e54779a073a0ba4057edf4be8bd7886de
Signed-off-by: Kever Yang <kever.yang@rock-chips.com>
As we reserve only 1GB of memory for the MMIO region don't prepopulate
the entire remaining address space with MMIO as this is prone to failure.
Instead, let the MMIO regions to be created lazily on the fault path and
keep only the RAM regions prepopulated.
Bug: 307805059
Test: Boot pKVM with CONFIG_ARM64_16K_PAGES
Change-Id: I6327f42eb17c6588335a1e04736393c9032114ab
Signed-off-by: Sebastian Ene <sebastianene@google.com>
In dup_mmap(), using __mt_dup() to duplicate the old maple tree and then
directly replacing the entries of VMAs in the new maple tree can result in
better performance. __mt_dup() uses DFS pre-order to duplicate the maple
tree, so it is efficient.
The average time complexity of __mt_dup() is O(n), where n is the number
of VMAs. The proof of the time complexity is provided in the commit log
that introduces __mt_dup(). After duplicating the maple tree, each
element is traversed and replaced (ignoring the cases of deletion, which
are rare). Since it is only a replacement operation for each element,
this process is also O(n).
Analyzing the exact time complexity of the previous algorithm is
challenging because each insertion can involve appending to a node,
pushing data to adjacent nodes, or even splitting nodes. The frequency of
each action is difficult to calculate. The worst-case scenario for a
single insertion is when the tree undergoes splitting at every level. If
we consider each insertion as the worst-case scenario, we can determine
that the upper bound of the time complexity is O(n*log(n)), although this
is a loose upper bound. However, based on the test data, it appears that
the actual time complexity is likely to be O(n).
As the entire maple tree is duplicated using __mt_dup(), if dup_mmap()
fails, there will be a portion of VMAs that have not been duplicated in
the maple tree. To handle this, we mark the failure point with
XA_ZERO_ENTRY. In exit_mmap(), if this marker is encountered, stop
releasing VMAs that have not been duplicated after this point.
There is a "spawn" in byte-unixbench[1], which can be used to test the
performance of fork(). I modified it slightly to make it work with
different number of VMAs.
Below are the test results. The first row shows the number of VMAs. The
second and third rows show the number of fork() calls per ten seconds,
corresponding to next-20231006 and the this patchset, respectively. The
test results were obtained with CPU binding to avoid scheduler load
balancing that could cause unstable results. There are still some
fluctuations in the test results, but at least they are better than the
original performance.
21 121 221 421 821 1621 3221 6421 12821 25621 51221
112100 76261 54227 34035 20195 11112 6017 3161 1606 802 393
114558 83067 65008 45824 28751 16072 8922 4747 2436 1233 599
2.19% 8.92% 19.88% 34.64% 42.37% 44.64% 48.28% 50.17% 51.68% 53.74% 52.42%
[1] https://github.com/kdlucas/byte-unixbench/tree/master
Link: https://lkml.kernel.org/r/20231027033845.90608-11-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d2406291483775ecddaee929231a39c70c08fda2
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
[surenb: open-coded vma_iter_clear_gfp(), vma_iter_bulk_store();
replaced vma_next() with mas_find()]
Bug: 308042511
Change-Id: I42d6620e8ce6a0b16211c231a9b72ba16ba9c0d2
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
When destroying maple tree, preserve its attributes and then turn it into
an empty tree. This allows it to be reused without needing to be
reinitialized.
Link: https://lkml.kernel.org/r/20231027033845.90608-10-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8e50d32c7a89bde896945e4e572ef28ccd87bbf8
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 308042511
Change-Id: If1725d5a37dcd26bec23e6bffe95d877903dfab1
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Skip other tests when BENCH is enabled so that performance can be measured
in user space.
Link: https://lkml.kernel.org/r/20231027033845.90608-8-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit f670fa1caadb4ea532a89012c5451e4c6789bfcc
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 308042511
Change-Id: I0a761a4b6211b19ec80c97d5aef80f3979523bcb
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add test for mtree_dup().
Test by duplicating different maple trees and then comparing the two
trees. Includes tests for duplicating full trees and memory allocation
failures on different nodes.
Link: https://lkml.kernel.org/r/20231027033845.90608-6-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit a2587a7e8d37885dc063255f5400a66299b42e48
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 308042511
Change-Id: I7501db5735b1dfd15240ef2946b26d63ffe1d8e0
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
When kmem_cache_alloc_bulk() fails to allocate, leave the freed pointers
in the array. This enables a more accurate simulation of the kernel's
behavior and allows for testing potential double-free scenarios.
Link: https://lkml.kernel.org/r/20231027033845.90608-5-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 46c99e26f2f86260fed226cab217d0b3ca8dca56
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 308042511
Change-Id: If822e9d219066e1573b7c044ef9a7344f652e365
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Introduce interfaces __mt_dup() and mtree_dup(), which are used to
duplicate a maple tree. They duplicate a maple tree in Depth-First Search
(DFS) pre-order traversal. It uses memcopy() to copy nodes in the source
tree and allocate new child nodes in non-leaf nodes. The new node is
exactly the same as the source node except for all the addresses stored in
it. It will be faster than traversing all elements in the source tree and
inserting them one by one into the new tree. The time complexity of these
two functions is O(n).
The difference between __mt_dup() and mtree_dup() is that mtree_dup()
handles locks internally.
Analysis of the average time complexity of this algorithm:
For simplicity, let's assume that the maximum branching factor of all
non-leaf nodes is 16 (in allocation mode, it is 10), and the tree is a
full tree.
Under the given conditions, if there is a maple tree with n elements, the
number of its leaves is n/16. From bottom to top, the number of nodes in
each level is 1/16 of the number of nodes in the level below. So the
total number of nodes in the entire tree is given by the sum of n/16 +
n/16^2 + n/16^3 + ... + 1. This is a geometric series, and it has log(n)
terms with base 16. According to the formula for the sum of a geometric
series, the sum of this series can be calculated as (n-1)/15. Each node
has only one parent node pointer, which can be considered as an edge. In
total, there are (n-1)/15-1 edges.
This algorithm consists of two operations:
1. Traversing all nodes in DFS order.
2. For each node, making a copy and performing necessary modifications
to create a new node.
For the first part, DFS traversal will visit each edge twice. Let
T(ascend) represent the cost of taking one step downwards, and T(descend)
represent the cost of taking one step upwards. And both of them are
constants (although mas_ascend() may not be, as it contains a loop, but
here we ignore it and treat it as a constant). So the time spent on the
first part can be represented as ((n-1)/15-1) * (T(ascend) + T(descend)).
For the second part, each node will be copied, and the cost of copying a
node is denoted as T(copy_node). For each non-leaf node, it is necessary
to reallocate all child nodes, and the cost of this operation is denoted
as T(dup_alloc). The behavior behind memory allocation is complex and not
specific to the maple tree operation. Here, we assume that the time
required for a single allocation is constant. Since the size of a node is
fixed, both of these symbols are also constants. We can calculate that
the time spent on the second part is ((n-1)/15) * T(copy_node) + ((n-1)/15
- n/16) * T(dup_alloc).
Adding both parts together, the total time spent by the algorithm can be
represented as:
((n-1)/15) * (T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)) -
n/16 * T(dup_alloc) - (T(ascend) + T(descend))
Let C1 = T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)
Let C2 = T(dup_alloc)
Let C3 = T(ascend) + T(descend)
Finally, the expression can be simplified as:
((16 * C1 - 15 * C2) / (15 * 16)) * n - (C1 / 15 + C3).
This is a linear function, so the average time complexity is O(n).
Link: https://lkml.kernel.org/r/20231027033845.90608-4-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit fd32e4e9b7646510ee9010e0d5f8b8857d48a6f7
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 308042511
Change-Id: I385759a1184a202498e086458b572c203616b9b4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
In some cases, nested locks may be needed, so {mtree,mas}_lock_nested is
introduced. For example, when duplicating maple tree, we need to hold the
locks of two trees, in which case nested locks are needed.
At the same time, add the definition of spin_lock_nested() in tools for
testing.
Link: https://lkml.kernel.org/r/20231027033845.90608-3-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b2472efe4316b2687c153919c1513a098bd82c17
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 308042511
Change-Id: I06f0eb0a32a2f39b7842de08a0e5ce59895345c5
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Patch series "Introduce __mt_dup() to improve the performance of fork()", v7.
This series introduces __mt_dup() to improve the performance of fork().
During the duplication process of mmap, all VMAs are traversed and
inserted one by one into the new maple tree, causing the maple tree to be
rebalanced multiple times. Balancing the maple tree is a costly
operation. To duplicate VMAs more efficiently, mtree_dup() and __mt_dup()
are introduced for the maple tree. They can efficiently duplicate a maple
tree.
Here are some algorithmic details about {mtree,__mt}_dup(). We perform a
DFS pre-order traversal of all nodes in the source maple tree. During
this process, we fully copy the nodes from the source tree to the new
tree. This involves memory allocation, and when encountering a new node,
if it is a non-leaf node, all its child nodes are allocated at once.
This idea was originally from Liam R. Howlett's Maple Tree Work email,
and I added some of my own ideas to implement it. Some previous
discussions can be found in [1]. For a more detailed analysis of the
algorithm, please refer to the logs for patch [3/10] and patch [10/10].
There is a "spawn" in byte-unixbench[2], which can be used to test the
performance of fork(). I modified it slightly to make it work with
different number of VMAs.
Below are the test results. The first row shows the number of VMAs. The
second and third rows show the number of fork() calls per ten seconds,
corresponding to next-20231006 and the this patchset, respectively. The
test results were obtained with CPU binding to avoid scheduler load
balancing that could cause unstable results. There are still some
fluctuations in the test results, but at least they are better than the
original performance.
21 121 221 421 821 1621 3221 6421 12821 25621 51221
112100 76261 54227 34035 20195 11112 6017 3161 1606 802 393
114558 83067 65008 45824 28751 16072 8922 4747 2436 1233 599
2.19% 8.92% 19.88% 34.64% 42.37% 44.64% 48.28% 50.17% 51.68% 53.74% 52.42%
Thanks to Liam and Matthew for the review.
This patch (of 10):
Add two helpers:
1. mt_free_one(), used to free a maple node.
2. mt_attr(), used to obtain the attributes of maple tree.
Link: https://lkml.kernel.org/r/20231027033845.90608-1-zhangpeng.00@bytedance.com
Link: https://lkml.kernel.org/r/20231027033845.90608-2-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4f2267b58a22d972be98edef8e6b3c7a67c9fb91
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-unstable)
Bug: 308042511
Change-Id: Ib9b13dee357ac4c85668901c20a3c370fbdd08da
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
mas_set_range() resets the node to MAS_START, which will cause a re-walk
of the tree to the range. This is unnecessary when the maple state is
already at the correct location of the write. Add a function that only
sets the range to avoid unnecessary re-walking of the tree.
Link: https://lkml.kernel.org/r/20230724183157.3939892-6-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit c1297987cc2ada57a7faea7985c2334548d110f9)
Bug: 308042511
Change-Id: I9e026d0f103e3aa24b47998be6b83e28e7928540
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Adding the following symbols:
- __drmm_crtc_alloc_with_planes
Bug: 275278929
Change-Id: I41b6069612d44214f474ed82ee2a4b07ca739302
Signed-off-by: Ken Huang <kenbshuang@google.com>
The structures that define hyp events must be packed so they match
their format definitions in the tracefs file
hyp/events/hyp/<event>/format.
Bug: 299430621
Change-Id: Ia7e1a686744d5c9c3f8a21881f03228c8acecade
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
From pKVM point of view, unknown SMCs are simply forwarded, we can't
consider them invalid or not. This was probably a typo following a copy
of the host_hcall event.
Bug: 299430621
Change-Id: Ieb53f985a5187a8b5a9feb4a95982b15cdc1b04a
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
If we return the error, there's no way to recover the status as of now, since
fsck does not fix the xattr boundary issue.
Bug: 305658663
Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 50a472bbc79ff9d5a88be8019a60e936cadf9f13
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: I55060a4eede3f5f85066aba22a6ab7155517e5c4
(cherry picked from commit 70113b9d489050d3e7a6f28e0cd6e43f104fc132)
(cherry picked from commit 2c1f3789d609bd549f14c019b6c7b311bfd2fa64)
When this pKVM module ops has been introduced, the documentation has
been omitted.
Bug: 308373293
Change-Id: I9e471414e72a1ee04c132de4ed95d77e815ae8c9
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
A test [1] in Android test suite started failing after [2] was merged. It
turns out that after handling a major fault under per-VMA lock, the
process major fault counter does not register that fault as major. Before
[2] read faults would be done under mmap_lock, in which case
FAULT_FLAG_TRIED flag is set before retrying. That in turn causes
mm_account_fault() to account the fault as major once retry completes.
With per-VMA locks we often retry because a fault can't be handled without
locking the whole mm using mmap_lock. Therefore such retries do not set
FAULT_FLAG_TRIED flag. This logic does not work after [2] because we can
now handle read major faults under per-VMA lock and upon retry the fact
there was a major fault gets lost. Fix this by setting FAULT_FLAG_TRIED
after retrying under per-VMA lock if VM_FAULT_MAJOR was returned. Ideally
we would use an additional VM_FAULT bit to indicate the reason for the
retry (could not handle under per-VMA lock vs other reason) but this
simpler solution seems to work, so keeping it simple.
[1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp
[2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/
Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com
Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 46e714c729c8d1d8110bc0545d7ffe8a759c9dc0
https://git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-hotfixes-stable)
Bug: 317385399
Change-Id: Ic7e97bf610dcabb7d3ac2306b2f1213be0ddd269
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
I think this is a pretty rare occurrence, but for consistency handle
faults with the VMA lock held the same way that we handle other faults
with the VMA lock held.
Link: https://lkml.kernel.org/r/20231006195318.4087158-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4a68fef16df9d88d528094116f8bbd2dbfa62089)
Bug: 293665307
Change-Id: I69cec218c8a1fe14df3268722e6b1be6dffe7978
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Most file-backed faults are already handled through ->map_pages(), but if
we need to do I/O we'll come this way. Since filemap_fault() is now safe
to be called under the VMA lock, we can handle these faults under the VMA
lock now.
Link: https://lkml.kernel.org/r/20231006195318.4087158-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 12214eba1992642eee5813a9cc9f626e5b2d1815)
Bug: 293665307
Change-Id: Iee48af98b866d88d88ec01143eb26389ab373b6b
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
If the page is not currently present in the page tables, we need to call
the page fault handler to find out which page we're supposed to COW, so we
need to both check that there is already an anon_vma and that the fault
handler doesn't need the mmap_lock.
Link: https://lkml.kernel.org/r/20231006195318.4087158-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4de8c93a4751e10737b6af65db42c743228c67a6)
Bug: 293665307
Change-Id: If749a6f8fcf69d83bbf872c1d45865d1b1b77ea0
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
There are many implementations of ->fault and some of them depend on
mmap_lock being held. All vm_ops that implement ->map_pages() end up
calling filemap_fault(), which I have audited to be sure it does not rely
on mmap_lock. So (for now) key off ->map_pages existing as a flag to
indicate that it's safe to call ->fault while only holding the vma lock.
Link: https://lkml.kernel.org/r/20231006195318.4087158-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4ed4379881aa62588aba6442a9f362a8cf7624e6)
Bug: 293665307
Change-Id: Ifb5ab3df5d05fb182d0cb52820fa24e28e2d6496
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
It is usually safe to call wp_page_copy() under the VMA lock. The only
unsafe situation is when no anon_vma has been allocated for this VMA, and
we have to look at adjacent VMAs to determine if their anon_vma can be
shared. Since this happens only for the first COW of a page in this VMA,
the majority of calls to wp_page_copy() do not need to fall back to the
mmap_sem.
Add vmf_anon_prepare() as an alternative to anon_vma_prepare() which will
return RETRY if we currently hold the VMA lock and need to allocate an
anon_vma. This lets us drop the check in do_wp_page().
Link: https://lkml.kernel.org/r/20231006195318.4087158-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 164b06f238b986317131e6b61b2f22aabcbc2cc0)
[surenb: resolved merge conflicts due to folio/page differences]
Bug: 293665307
Change-Id: I39bdc247b375bd3dae8078b52c60fd4ce12e1850
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Patch series "Handle more faults under the VMA lock", v2.
At this point, we're handling the majority of file-backed page faults
under the VMA lock, using the ->map_pages entry point. This patch set
attempts to expand that for the following siutations:
- We have to do a read. This could be because we've hit the point in
the readahead window where we need to kick off the next readahead,
or because the page is simply not present in cache.
- We're handling a write fault. Most applications don't do I/O by writes
to shared mmaps for very good reasons, but some do, and it'd be nice
to not make that slow unnecessarily.
- We're doing a COW of a private mapping (both PTE already present
and PTE not-present). These are two different codepaths and I handle
both of them in this patch set.
There is no support in this patch set for drivers to mark themselves as
being VMA lock friendly; they could implement the ->map_pages
vm_operation, but if they do, they would be the first. This is probably
something we want to change at some point in the future, and I've marked
where to make that change in the code.
There is very little performance change in the benchmarks we've run;
mostly because the vast majority of page faults are handled through the
other paths. I still think this patch series is useful for workloads that
may take these paths more often, and just for cleaning up the fault path
in general (it's now clearer why we have to retry in these cases).
This patch (of 6):
Drop the VMA lock instead of the mmap_lock if that's the one which
is held.
Link: https://lkml.kernel.org/r/20231006195318.4087158-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20231006195318.4087158-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 5d74b2ab2c15d596c470bae6626f345d5575a9d0)
Bug: 293665307
Change-Id: Ife2d11ab12fb428868cd44751784cf731fbffe62
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
For modules to reuse default dma_map_ops implementations they need to be
exported. Export the following functions:
dma_direct_alloc
dma_direct_free
dma_common_mmap
dma_common_get_sgtable
dma_direct_get_required_mask
Bug: 151050914
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia77b797fcd909fce01da7431bfbde282dc70b3b3
(cherry picked from commit fd31496dae939c7bf2ef874e08d4bf8c6ab738b3)
Signed-off-by: Qian-Hao Huang <qhhuang@google.com>
(cherry picked from commit cdc9f6ef94)
Let's just disable cached decompression and inplace I/Os for partial
pages as the first step in order to enable sub-page block initial
support. In other words, currently it works primarily based on
temporary short-lived pages. Don't expect too much in terms of
performance.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206091057.87027-6-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: I00238aa437f20c46d015bbe5ab7b706b80b8cfd7
(cherry picked from commit 0ee3a0d59e007320167a2e9f4b8bf1304ada7771
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
[dhavale: resolved conflicts in inode.c in erofs_fill_inode()]
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Sub-page block support is still unusable even with previous commits if
interlaced PLAIN pclusters exist. Such pclusters can be found if the
fragment feature is enabled.
This commit tries to handle "the head part" of interlaced PLAIN
pclusters first: it was once explained in commit fdffc091e6 ("erofs:
support interlaced uncompressed data for compressed files").
It uses a unique way for both shifted and interlaced PLAIN pclusters.
As an added bonus, PLAIN pclusters larger than the block size is also
supported now for the upcoming large lclusters.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206091057.87027-5-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: I3d50132664f8754f56d62744420060108ed0da4f
(cherry picked from commit 192351616a9dde686492bcb9d1e4895a1411a527
https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
`pageofs_in` should be the compressed data offset of the page rather
than of the block.
Acked-by: Chao Yu <chao@kernel.org>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231214161337.753049-1-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: I0997a69b22b0f42c327c810359f55f5fa6a76275
(cherry picked from commit e5aba911dee5e20fa82efbe13e0af8f38ea459e7
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Previously, the block size always equaled to PAGE_SIZE, therefore
`lclusterbits` couldn't be less than 12.
Since sub-page compressed blocks are now considered, `lobits` for
a lcluster in each pack cannot always be `lclusterbits` as before.
Otherwise, there is no enough room for the special value
`Z_EROFS_VLE_DI_D0_CBLKCNT`.
To support smaller block sizes, `lobits` for each compacted lcluster is
now calculated as:
lobits = max(lclusterbits, ilog2(Z_EROFS_VLE_DI_D0_CBLKCNT) + 1)
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206091057.87027-4-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: Iacd89e2b33ddf39ea40b90e88a2bf99bb5a83b31
(cherry picked from commit 8d2517aaeea3ab8651bb517bca8f3c8664d318ea
https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
[dhavale: resolved conflicts in zmap.c due to older naming of constants
and updated commit message also to use the older names]
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Currently, compressed sizes are recorded in pages using `pclusterpages`,
However, for tailpacking pclusters, `tailpacking_size` is used instead.
This approach doesn't work when dealing with sub-page blocks. To address
this, let's switch them to the unified `pclustersize` in bytes.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206091057.87027-3-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: Ia8c50a7b4adcf6cd161b1d6f8bfc5a7fd3371079
(cherry picked from commit 54ed3fdd66055d073cb1cd2c6c65bbc0683c40cf
https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Add a basic I/O submission path first to support sub-page blocks:
- Temporary short-lived pages will be used entirely;
- In-place I/O pages can be used partially, but compressed pages need
to be able to be mapped in contiguous virtual memory.
As a start, currently cache decompression is explicitly disabled for
sub-page blocks, which will be supported in the future.
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206091057.87027-2-hsiangkao@linux.alibaba.com
Bug: 318378021
Change-Id: Ib2cb6120805ab479a450580fc8774af131271791
(cherry picked from commit 192351616a9dde686492bcb9d1e4895a1411a527
https: //git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git dev)
Signed-off-by: Sandeep Dhavale <dhavale@google.com>