Commit graph

41578 commits

Author SHA1 Message Date
Liujie Xie
ec8c8f6e33 ANDROID: vendor_hooks: add hook account_process_tick_gran
this hook will allow to account tick in every third of more ticks
to save cpu time for accounting

Bug: 279549765
Change-Id: I5d18e0167fdce076d13aecc653dcf6387bcb25f2
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
2023-05-17 11:31:51 +00:00
heshuai1
a9a44851ec ANDROID: freezer: Add vendor hook to freezer for GKI purpose.
Add the vendor hook to freezer.c so that OEM's logic can be executed
when the process is about to be frozen. We need to clear the flag for
some tasks and rebind task dependencies for optimization purposes.

Bug: 187458531
Bug: 281920779

Signed-off-by: heshuai1 <heshuai1@xiaomi.com>
Change-Id: Iea42fd9604d6b33ccd6502425416f0dd28eecebb
(cherry picked from commit a1580311c36ca28344b2f03b3c8a72d9f8db5bde)
2023-05-17 00:25:38 +00:00
Zhuguangqing
632ec01905 ANDROID: freezer: export the freezer_cgrp_subsys for GKI purpose.
Exporting the symbol freezer_cgrp_subsys, in that vendor module can
add can_attach & cancel_attach member function. It is vendor-specific
tuning.

Bug: 182496370
Bug: 281920779

Signed-off-by: Zhuguangqing <zhuguangqing@xiaomi.com>
Change-Id: I153682b9d1015eed3f048b45ea6495ebb8f3c261
(cherry picked from commit ee3f4d2821f5b2a794f0a1f5ed423f561a01adae)
(cherry picked from commit 8a90e4d4e555dd5484213c6fec5061958016a194)
2023-05-17 00:25:38 +00:00
Zhuguangqing
bf4922727c ANDROID: Add vendor hooks to signal.
This hook will allow us to get signal messages so that we can set
limitations for certain tasks and restore them when receiving important signals.

Bug: 184898838
Bug: 281920779

Signed-off-by: Zhuguangqing <zhuguangqing@xiaomi.com>
Change-Id: I83a28b0a6eb413976f4c57f2314d008ad792fa0d
(cherry picked from commit 58e3f869fc3fe84fb7062496ccd049db47f3ed7f)
2023-05-16 21:44:18 +00:00
Greg Kroah-Hartman
dec77ff4b5 Merge b1644a0031 ("drm/rockchip: vop2: Use regcache_sync() to fix suspend/resume") into android14-6.1
Steps on the way to 6.1.26

Change-Id: I76647cf6aaf4db218b2013de08a01cd9d11b0bb3
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-05-16 14:19:59 +00:00
Liujie Xie
44337937b1 ANDROID: vendor_hooks: Export the tracepoints sched_stat_sleep
and sched_waking to let module probe them

Get task info about sleep and waking

Bug: 190422437
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I828c93f531f84e6133c2c3a7f8faada51683afcf
(cherry picked from commit 13af062abf9b3586623adf7c3cdc5ced7bb5caed)
(cherry picked from commit 869954e72dac700580d0ea5734d07b574e41afe9)
(cherry picked from commit b7f527071c7716c1cc8dfea79353a0f00cec89d8)
2023-05-15 23:43:01 +00:00
Liujie Xie
527ffd22ad ANDROID: vendor_hooks: Export the tracepoints sched_stat_iowait, sched_stat_blocked, sched_stat_wait to let modules probe them
Get task info about scheduling delay, iowait, and block time.

It is used to get thread scheduling info when thread happened abnormal situation.

Bug: 189415303
Change-Id: Ib6b548f8a78de5b26d555e9a89e3cc79ea2d1024
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
(cherry picked from commit a6bb1af39d11ef0360cb34bb31b7224ca4db031f)
(cherry picked from commit 6d8d2ab52facfd6d5de2715e2470872e6a70cf22)
(cherry picked from commit c3c29177681354b815763c3592e227b2288f341d)
2023-05-15 23:43:01 +00:00
zhengding chen
1ffd1a1c55 ANDROID: workqueue: export symbol of the function wq_worker_comm()
Export symbol of the function wq_worker_comm() in kernel/workqueue.c for dlkm to get the description of the kworker process. It is used to get the description when kworker thread happened abnormal situation.

Bug: 208394207
Signed-off-by: zhengding chen <chenzhengding@oppo.com>
Change-Id: I2e7ddd52a15e22e99e6596f16be08243af1bb473
(cherry picked from commit 28de74186185e339123c86984729818d0d2d7f43)
(cherry picked from commit 87e0e98c25ba8e121975708943335e3abad651d9)
(cherry picked from commit 38a713dc80959bd855580a51b75654fb6fe9a5dd)
2023-05-15 23:43:01 +00:00
John Stultz
da126f8d02 FROMGIT: locking/rwsem: Add __always_inline annotation to __down_read_common() and inlined callers
Apparently despite it being marked inline, the compiler
may not inline __down_read_common() which makes it difficult
to identify the cause of lock contention, as the blocked
function in traceevents will always be listed as
__down_read_common().

So this patch adds __always_inline annotation to the common
function (as well as the inlined helper callers) to force it to
be inlined so the blocking function will be listed (via Wchan)
in traceevents.

Fixes: c995e638cc ("locking/rwsem: Fold __down_{read,write}*()")
Reported-by: Tim Murray <timmurray@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20230503023351.2832796-1-jstultz@google.com
Bug: 277817995
(cherry picked from commit 92cc5d00a431e96e5a49c0b97e5ad4fa7536bd4b
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/urgent)
Signed-off-by: John Stultz <jstultz@google.com>
(cherry picked from https://android-review.googlesource.com/q/commit:a6c75b2e64573cb9f49f6b89808207856fc0309b)
Merged-In: Ifad7ed7fe9f2d5a9eb0cfe7c35e45c0e86bc3ad4
Change-Id: Ifad7ed7fe9f2d5a9eb0cfe7c35e45c0e86bc3ad4
2023-05-12 18:16:24 +00:00
lijianzhong
8947e06ff7 ANDROID: cgroup: Add vendor hook for cpuset.
This hook allows us to capture information when a process is forked so
that we can stat and set some key task's CPU affinity in the ko module
later. This patch, along with aosp/2548175, is necessary for our
affinity settings.

Bug: 183674818
Signed-off-by: lijianzhong <lijianzhong@xiaomi.com>
Change-Id: Ib93e05e5f6c338c5f7ada56bfebdd705f87f1f66
(cherry picked from commit a188361628461c58a4dfc72869d9acb1dfa2542f)
2023-05-11 13:10:00 +00:00
lijianzhong
2e85c6c0fa ANDROID: export cpuset_cpus_allowed()for GKI purpose.
Exporting the symbol cpuset_cpus_allowed() so that we can adjust a
certain type of application's CPU affinity in vendor hooks according
to our tuning policy.
Related commit: aosp/2548175

Bug: 189725786

Signed-off-by: lijianzhong <lijianzhong@xiaomi.com>
Change-Id: I7919a893ab64bb441ab43cbb0b16825ed76d802d
(cherry picked from commit 5a7d01ed73e4fc812fda1d7288086dc73a283405)
2023-05-11 13:10:00 +00:00
lijianzhong
8979a0b16a ANDROID: sched: Add vendor hooks for cpu affinity.
Add vendor hooks for CPU affinity to support OEM's tuning policy, where
we can block or unblock a certain type of application's CPU affinity.

Bug: 183674818

Signed-off-by: lijianzhong <lijianzhong@xiaomi.com>
Change-Id: I3402abec4d9faa08f564409bfb8db8d7902f3aa2
(cherry picked from commit 7cf9646c245fdc63e2a3c9fad457c11fabdd2dfc)
2023-05-11 13:10:00 +00:00
xieliujie
22fd6b3625 ANDROID: vendor_hooks: Add hooks for futex
We want to use this hook to record the sleeping time due to Futex

Bug: 210947226
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I637f889dce42937116d10979e0c40fddf96cd1a2
(cherry picked from commit a7ab784f601a93a78c1c22cd0aacc2af64d8e3c8)
2023-05-11 05:22:29 +00:00
xieliujie
16f2ad77c7 ANDROID: vendor_hooks: Add hooks for oem futex optimization
If an important task is going to sleep through do_futex(),
find out it's futex-owner by the pid comes from userspace,
and boost the owner by some means to shorten the sleep time.
How to boost? Depends on these hooks:
commit 53e809978443 ("ANDROID: vendor_hooks: Add hooks for scheduler")

Bug: 243110112
Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I9a315cfb414fd34e0ef7a2cf9d57df50d4dd984f
(cherry picked from commit 548da5d23d98b796cf9a478675622a606b3307c8)
2023-05-11 05:22:29 +00:00
heshuai1
0ea0d6a7a2 ANDROID: power: Add vendor hook to qos for GKI purpose.
Add vendor hooks in add/update/remove frequency QoS request process to
ensure that we can access the OEM's "frequency watchdog" logic for
abnormal frequency monitoring. This is necessary for our power tuning
policy.

Bug: 187458531

Signed-off-by: heshuai1 <heshuai1@xiaomi.com>
Change-Id: I1fb8fd6134432ecfb44ad242c66ccd8280ab9b43
(cherry picked from commit c445fe4dc67ad74dacfa548bc78876a7ce057086)
2023-05-11 05:22:29 +00:00
lijianzhong
c22b82c2e4 ANDROID: export find_user() & free_uid()for GKI purpose.
Exporting the symbols find_user() & free_uid() to access user task
information in ko module for monitoring and optimization purposes. This
is a necessary component of our scheduling policy.

Bug: 183674818

Signed-off-by: lijianzhong <lijianzhong@xiaomi.com>
Change-Id: I12135c0af312904dd21b6f074beda086ad5ece98
(cherry picked from commit 16350016d8a3678da5012343ca00fa9918340c83)
(cherry picked from commit eec2cd3df3aa2d92136658d3619dc5142155c7d4)
2023-05-11 05:22:29 +00:00
heshuai1
4b87d7254b ANDROID: user: Add vendor hook to user for GKI purpose
In order to implement our scheduling tuning policy in certain cases, we
need to initialize the variables that we have defined in the
user_struct. To achieve this, we will add a vendor hook to user.c at
alloc_uid, which will ensure that our own logic is executed during the
initialization of the user_struct.

Bug: 187458531

Signed-off-by: heshuai1 <heshuai1@xiaomi.com>
Change-Id: I078484aac2c3d396aba5971d6d0f491652f3781c
(cherry picked from commit c9b8fa644f45e9c99da85d8947f6c7e06771f205)
(cherry picked from commit 9ac0923ef565e4de4e1f35edcba6fcb7e45948c9)
2023-05-11 05:22:29 +00:00
lijianzhong
e273916482 ANDROID: sched: add trace_android_vh_map_util_freq parameter
Add "cpufreq_policy" and "need_freq_update" parameters to the vendor
hook to enable frequency calculation in certain special cases related to
OEM's frequency tuning policy.

Bug: 183674818

Signed-off-by: lijianzhong <lijianzhong@xiaomi.com>
Change-Id: I232d2e1ae885d6736eca9e4709870f4272b4873d
2023-05-11 05:22:29 +00:00
Chun-Hung Wu
c451105379 FROMLIST: time/sched_clock: Export sched_clock_register()
clocksource driver may use sched_clock_register()
to resigter itself as a sched_clock source.
Export it to support building such driver
as module, like timer-mediatek.c

Link: https://lore.kernel.org/lkml/20230421034649.15247-5-walter.chang@mediatek.com/T/
Bug: 161675989
Change-Id: Ib052d1fd7ccf6a7422eb6f1755515e1236285e01
Signed-off-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
2023-05-09 09:49:00 +00:00
Greg Kroah-Hartman
d31ed3d059 Merge 55fba69fbf ("rust: kernel: Mark rust_fmt_argument as extern "C"") into android14-6.1
Steps on the way to 6.1.26

Change-Id: Idedac255abf3273edfdc8e1c3a88cc97c7af1f41
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-05-09 03:32:41 +00:00
Liujie Xie
e80c937cd0 ANDROID: vendor_hooks: Add hooks for rwsem and mutex
Add hooks to apply oem's optimization of rwsem and mutex

Bug: 182237112
Signed-off-by: xieliujie <xieliujie@oppo.com>
(cherry picked from commit 80b4341d0520dce2ef1e33aef06c6a2bc8ada518)

Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I36895c432e5b6d6bff8781b4a7872badb693284c
Signed-off-by: Carlos Llamas <cmllamas@google.com>
[cmllamas: completes the cherry-pick of original commit 80b4341d0520
since commit 0902cc73b793 was only partial]
(cherry picked from commit d4528a28cb5be0c322031f333a6230fa3042931f)
2023-05-05 23:07:06 +00:00
Liujie Xie
eb74e5b3fe ANDROID: vendor_hooks: Add hooks for mutex and rwsem optimistic spin
These hooks help us do the following things:
a) Record the number of mutex and rwsem optimistic spin.
b) Monitor the time of mutex and rwsem optimistic spin.
c) Make it possible if oems don't want mutex and rwsem to optimistic spin
for a long time.

Bug: 267565260
Change-Id: I2bee30fb17946be85e026213b481aeaeaee2459f
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
(cherry picked from commit d01f7e1269d2651768a0bd91a88f76339a1cb427)
(cherry picked from commit 05b5ff11ad98c5896b352b4c376a84b63684e06c)
2023-05-05 23:07:06 +00:00
xieliujie
86c4152970 ANDROID: vendor_hooks: Add hooks for rwsem and mutex
Add hooks to apply oem's optimization of rwsem and mutex

Bug: 182237112
Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I6332623732e2d6826b8b61087ca74e55393e0c3d
(cherry picked from commit 0902cc73b793f8b8cc2a80943d85d4ca9b98278b)
Signed-off-by: Carlos Llamas <cmllamas@google.com>
2023-05-05 23:07:06 +00:00
Peifeng Li
6c1c1552e6 ANDROID: vendor_hook: add hooks to protect locking-tsk in cpu scheduler
Providing vendor hooks to record the start time of holding the lock, which
protects rwsem/mutex locking-process from being preemptedfor a short time
in some cases.

- android_vh_record_mutex_lock_starttime
- android_vh_record_rtmutex_lock_starttime
- android_vh_record_rwsem_lock_starttime
- android_vh_record_pcpu_rwsem_starttime

Bug: 241191475

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I0e967a1e8b77c32a1ad588acd54028fae2f90c4e
(cherry picked from commit f7294947672eb6b786f3c16b49e71e6a239101ad)
2023-05-05 19:45:34 +00:00
Sudarshan Rajagopalan
dd29657536 FROMLIST: ANDROID: GKI: psi: remove 500ms min window size limitation for triggers
Current 500ms min window size for psi triggers limits polling interval
to 50ms to prevent polling threads from using too much cpu bandwidth by
polling too frequently. However the number of cgroups with triggers is
unlimited, so this protection can be defeated by creating multiple
cgroups with psi triggers (triggers in each cgroup are served by a single
"psimon" kernel thread).
Instead of limiting min polling period, which also limits the latency of
psi events, it's better to limit psi trigger creation to authorized users
only, like we do for system-wide psi triggers (/proc/pressure/* files can
be written only by processes with CAP_SYS_RESOURCE capability). This also
makes access rules for cgroup psi files consistent with system-wide ones.
Add a CAP_SYS_RESOURCE capability check for cgroup psi file writers and
remove the psi window min size limitation.

Bug: 269247660
Change-Id: I8876aa306cf2ba5acdf4daa5a5eff0665537bfeb
Suggested-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
Link: https://lore.kernel.org/all/20230303011346.3342233-1-surenb@google.com/
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
2023-05-02 23:08:35 +00:00
Isaac J. Manjarres
0f690b56fe ANDROID: iommu/dma: Add support for DMA_ATTR_SYS_CACHE_NWA
IOMMU_SYS_CACHE_NWA allows buffers for non-coherent devices to be
mapped with the correct memory attributes so that the buffers can be
cached in the system cache, with a no write allocate cache policy.
However, this property is only usable by drivers that invoke the IOMMU
API directly; it is not usable by drivers that use the DMA API.

Thus, introduce DMA_ATTR_SYS_CACHE_NWA, so that drivers for
non-coherent devices that use the DMA API can use it to specify if
they want a buffer to be cached in the system cache.

Bug: 189339242
Change-Id: Ic812a1fb144a58deb4279c2bf121fc6cc4c3b208
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
2023-05-01 21:16:31 -07:00
Isaac J. Manjarres
ed46a5c09b ANDROID: iommu/dma: Add support for DMA_ATTR_SYS_CACHE
IOMMU_SYS_CACHE allows buffers for non-coherent devices to be mapped
with the correct memory attributes so that the buffers can be cached
in the system cache. However, this property is only usable by drivers
that invoke the IOMMU API directly; it is not usable by drivers that
use the DMA API.

Thus, introduce DMA_ATTR_SYS_CACHE, so that drivers for non-coherent
devices that use the DMA API can use it to specify if they want a
buffer to be cached in the system cache.

Bug: 189339242
Change-Id: I849d7a3f36b689afd2f6ee400507223fd6395158
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>
2023-05-01 21:16:30 -07:00
Yun Hsiang
b77eeebb3c ANDROID: schedutil: add vendor hook for adjusting util to freq calculation
Currently, the frequency is calculated by max freq * 1.25 * util / max cap.
Add a vendor hook to adjust the frequency when the calculation
overestimate.

android_vh_map_util_freq
	adjust util to freq calculation

Bug: 177845439

Signed-off-by: Yun Hsiang <yun.hsiang@mediatek.com>
Change-Id: I9aa9079f00af7d3380b19f2fe21b75cddd107d15
(cherry picked from commit 3122e3ec9672036384304fdeaa1b1815f60ba817)
(cherry picked from commit a2d89d4f3a)
2023-04-29 09:02:05 +00:00
Greg Kroah-Hartman
55e4f0c551 This is the 6.1.25 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmRBFW0ACgkQONu9yGCS
 aT7Jew//Ytw9+JQ71LT1TuJnQ1GayJOL1BW5hgxoYgnBFasWDwsGA9rzHs6KHqHb
 0Vjk7MX7VZB+6zWakOxY5CFVM33J4fS7wY8WE2bj8X3QQhD/J0HQDMdELvSBi3qF
 7xI6sghEQEwOuwAj2+CBqm/q7rA5FTnO1QgJuk/AKJ6PHGRiQeZ7q1zGpFvSaj7S
 cyKvY99RsjnUN+PYk4LE2+u/6DVCqiWYVDQrdjalb9zsrXg4+nmPH6ZJzZX8+bbM
 eM0xAR675V8TXqDi+8bj7tWmiS52XyjYF3Q/bu9BmU67DqslH9FFyVQxhgTHUZpN
 qWXkojEU2djIc3qt7T/bpZS/vD8Kg3Px1CgyIRN8Y5SlZfhZyqVdTZ4AQCtJuLQJ
 wDIdQCLlGzzDNFvbD+LdfJSjZt7Ig1sM/HwtPZhUA9yF0FN1XV3dcESzCOeI0/S7
 ohRh8cs1sidnxrbvVwiVNENSqbJD7G9/9vVjIfyfcnt57q+fs6xCBhpOyNoVOp74
 I5i6ALMcVZoAB50vDjnoGZsSRe9W2AmOV6UMIkVCvRCWYFqBpgVftMTAACNyljni
 UlXmO7aDQj+nbHD/auclFtU02oHQbk62FSrwoWMFS090zWztQqUhgRY7Qnl13yCM
 poEvrKlskXhvunsNtdVmI5O3N2GANWKgGwkyFIiXvgxKkw1qpUo=
 =zeN9
 -----END PGP SIGNATURE-----

Merge 6.1.25 into android14-6.1

Changes in 6.1.25
	Revert "pinctrl: amd: Disable and mask interrupts on resume"
	drm/amd/display: Pass the right info to drm_dp_remove_payload
	ALSA: emu10k1: fix capture interrupt handler unlinking
	ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
	ALSA: i2c/cs8427: fix iec958 mixer control deactivation
	ALSA: hda: patch_realtek: add quirk for Asus N7601ZM
	ALSA: hda/realtek: Add quirks for Lenovo Z13/Z16 Gen2
	ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
	ALSA: emu10k1: don't create old pass-through playback device on Audigy
	ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
	ALSA: hda/hdmi: disable KAE for Intel DG2
	Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
	Bluetooth: Fix race condition in hidp_session_thread
	bluetooth: btbcm: Fix logic error in forming the board name.
	Bluetooth: Free potentially unfreed SCO connection
	Bluetooth: hci_conn: Fix possible UAF
	btrfs: restore the thread_pool= behavior in remount for the end I/O workqueues
	btrfs: fix fast csum implementation detection
	fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace
	mtdblock: tolerate corrected bit-flips
	mtd: rawnand: meson: fix bitmask for length in command word
	mtd: rawnand: stm32_fmc2: remove unsupported EDO mode
	mtd: rawnand: stm32_fmc2: use timings.mode instead of checking tRC_min
	KVM: arm64: PMU: Restore the guest's EL0 event counting after migration
	fbcon: Fix error paths in set_con2fb_map
	fbcon: set_con2fb_map needs to set con2fb_map!
	drm/i915/dsi: fix DSS CTL register offsets for TGL+
	clk: sprd: set max_register according to mapping range
	RDMA/irdma: Do not generate SW completions for NOPs
	RDMA/irdma: Fix memory leak of PBLE objects
	RDMA/irdma: Increase iWARP CM default rexmit count
	RDMA/irdma: Add ipv4 check to irdma_find_listener()
	IB/mlx5: Add support for 400G_8X lane speed
	RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192
	RDMA/erdma: Inline mtt entries into WQE if supported
	RDMA/erdma: Defer probing if netdevice can not be found
	clk: rs9: Fix suspend/resume
	RDMA/cma: Allow UD qp_type to join multicast only
	bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp
	LoongArch, bpf: Fix jit to skip speculation barrier opcode
	dmaengine: apple-admac: Handle 'global' interrupt flags
	dmaengine: apple-admac: Set src_addr_widths capability
	dmaengine: apple-admac: Fix 'current_tx' not getting freed
	9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
	bpf, arm64: Fixed a BTI error on returning to patched function
	KVM: arm64: Initialise hypervisor copies of host symbols unconditionally
	KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV2/3 to protected VMs
	niu: Fix missing unwind goto in niu_alloc_channels()
	tcp: restrict net.ipv4.tcp_app_win
	bonding: fix ns validation on backup slaves
	iavf: refactor VLAN filter states
	iavf: remove active_cvlans and active_svlans bitmaps
	net: openvswitch: fix race on port output
	Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure
	Bluetooth: Fix printing errors if LE Connection times out
	Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt
	Bluetooth: Set ISO Data Path on broadcast sink
	drm/armada: Fix a potential double free in an error handling path
	qlcnic: check pci_reset_function result
	net: wwan: iosm: Fix error handling path in ipc_pcie_probe()
	cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex
	net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
	sctp: fix a potential overflow in sctp_ifwdtsn_skip
	RDMA/core: Fix GID entry ref leak when create_ah fails
	selftests: openvswitch: adjust datapath NL message declaration
	udp6: fix potential access to stale information
	net: macb: fix a memory corruption in extended buffer descriptor mode
	skbuff: Fix a race between coalescing and releasing SKBs
	libbpf: Fix single-line struct definition output in btf_dump
	ARM: 9290/1: uaccess: Fix KASAN false-positives
	ARM: dts: qcom: apq8026-lg-lenok: add missing reserved memory
	power: supply: rk817: Fix unsigned comparison with less than zero
	power: supply: cros_usbpd: reclassify "default case!" as debug
	power: supply: axp288_fuel_gauge: Added check for negative values
	selftests/bpf: Fix progs/find_vma_fail1.c build error.
	wifi: mwifiex: mark OF related data as maybe unused
	i2c: imx-lpi2c: clean rx/tx buffers upon new message
	i2c: hisi: Avoid redundant interrupts
	efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
	block: ublk_drv: mark device as LIVE before adding disk
	ACPI: video: Add backlight=native DMI quirk for Acer Aspire 3830TG
	drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F
	hwmon: (peci/cputemp) Fix miscalculated DTS for SKX
	hwmon: (xgene) Fix ioremap and memremap leak
	verify_pefile: relax wrapper length check
	asymmetric_keys: log on fatal failures in PE/pkcs7
	nvme: send Identify with CNS 06h only to I/O controllers
	wifi: iwlwifi: mvm: fix mvmtxq->stopped handling
	wifi: iwlwifi: mvm: protect TXQ list manipulation
	drm/amdgpu: add mes resume when do gfx post soft reset
	drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
	drm/amdgpu/gfx: set cg flags to enter/exit safe mode
	ACPI: resource: Add Medion S17413 to IRQ override quirk
	x86/hyperv: Move VMCB enlightenment definitions to hyperv-tlfs.h
	KVM: selftests: Move "struct hv_enlightenments" to x86_64/svm.h
	KVM: SVM: Add a proper field for Hyper-V VMCB enlightenments
	x86/hyperv: KVM: Rename "hv_enlightenments" to "hv_vmcb_enlightenments"
	KVM: SVM: Flush Hyper-V TLB when required
	tracing: Add trace_array_puts() to write into instance
	tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
	maple_tree: fix write memory barrier of nodes once dead for RCU mode
	ksmbd: avoid out of bounds access in decode_preauth_ctxt()
	riscv: add icache flush for nommu sigreturn trampoline
	HID: intel-ish-hid: Fix kernel panic during warm reset
	net: sfp: initialize sfp->i2c_block_size at sfp allocation
	net: phy: nxp-c45-tja11xx: add remove callback
	net: phy: nxp-c45-tja11xx: fix unsigned long multiplication overflow
	scsi: ses: Handle enclosure with just a primary component gracefully
	x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
	cgroup: fix display of forceidle time at root
	cgroup/cpuset: Fix partition root's cpuset.cpus update bug
	cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
	drm/amd/pm: correct SMU13.0.7 pstate profiling clock settings
	drm/amd/pm: correct SMU13.0.7 max shader clock reporting
	mptcp: use mptcp_schedule_work instead of open-coding it
	mptcp: stricter state check in mptcp_worker
	ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
	ubi: Fix deadlock caused by recursively holding work_sem
	i2c: mchp-pci1xxxx: Update Timing registers
	powerpc/papr_scm: Update the NUMA distance table for the target node
	sched/fair: Fix imbalance overflow
	x86/rtc: Remove __init for runtime functions
	i2c: ocores: generate stop condition after timeout in polling mode
	cifs: fix negotiate context parsing
	nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
	nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
	cgroup/cpuset: Skip spread flags update on v2
	cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly
	cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods
	Linux 6.1.25

Change-Id: Ib4d2c49ea9bacb8d8dbdb7b3a4eecce937016427
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-26 13:13:19 +00:00
Ondrej Mosnacek
ec90129b91 kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()
commit 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553 upstream.

Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.

The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).

The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not.  It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.

Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.

While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.

Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 14:28:39 +02:00
Daniel Borkmann
89603f4c91 bpf: Fix incorrect verifier pruning due to missing register precision taints
[ Upstream commit 71b547f561247897a0a14f3082730156c0533fed ]

Juan Jose et al reported an issue found via fuzzing where the verifier's
pruning logic prematurely marks a program path as safe.

Consider the following program:

   0: (b7) r6 = 1024
   1: (b7) r7 = 0
   2: (b7) r8 = 0
   3: (b7) r9 = -2147483648
   4: (97) r6 %= 1025
   5: (05) goto pc+0
   6: (bd) if r6 <= r9 goto pc+2
   7: (97) r6 %= 1
   8: (b7) r9 = 0
   9: (bd) if r6 <= r9 goto pc+1
  10: (b7) r6 = 0
  11: (b7) r0 = 0
  12: (63) *(u32 *)(r10 -4) = r0
  13: (18) r4 = 0xffff888103693400 // map_ptr(ks=4,vs=48)
  15: (bf) r1 = r4
  16: (bf) r2 = r10
  17: (07) r2 += -4
  18: (85) call bpf_map_lookup_elem#1
  19: (55) if r0 != 0x0 goto pc+1
  20: (95) exit
  21: (77) r6 >>= 10
  22: (27) r6 *= 8192
  23: (bf) r1 = r0
  24: (0f) r0 += r6
  25: (79) r3 = *(u64 *)(r0 +0)
  26: (7b) *(u64 *)(r1 +0) = r3
  27: (95) exit

The verifier treats this as safe, leading to oob read/write access due
to an incorrect verifier conclusion:

  func#0 @0
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (b7) r6 = 1024                     ; R6_w=1024
  1: (b7) r7 = 0                        ; R7_w=0
  2: (b7) r8 = 0                        ; R8_w=0
  3: (b7) r9 = -2147483648              ; R9_w=-2147483648
  4: (97) r6 %= 1025                    ; R6_w=scalar()
  5: (05) goto pc+0
  6: (bd) if r6 <= r9 goto pc+2         ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff00000000; 0xffffffff)) R9_w=-2147483648
  7: (97) r6 %= 1                       ; R6_w=scalar()
  8: (b7) r9 = 0                        ; R9=0
  9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
  10: (b7) r6 = 0                       ; R6_w=0
  11: (b7) r0 = 0                       ; R0_w=0
  12: (63) *(u32 *)(r10 -4) = r0
  last_idx 12 first_idx 9
  regs=1 stack=0 before 11: (b7) r0 = 0
  13: R0_w=0 R10=fp0 fp-8=0000????
  13: (18) r4 = 0xffff8ad3886c2a00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  17: (07) r2 += -4                     ; R2_w=fp-4
  18: (85) call bpf_map_lookup_elem#1   ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
  19: (55) if r0 != 0x0 goto pc+1       ; R0=0
  20: (95) exit

  from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
  21: (77) r6 >>= 10                    ; R6_w=0
  22: (27) r6 *= 8192                   ; R6_w=0
  23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
  24: (0f) r0 += r6
  last_idx 24 first_idx 19
  regs=40 stack=0 before 23: (bf) r1 = r0
  regs=40 stack=0 before 22: (27) r6 *= 8192
  regs=40 stack=0 before 21: (77) r6 >>= 10
  regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
  parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
  last_idx 18 first_idx 9
  regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
  regs=40 stack=0 before 17: (07) r2 += -4
  regs=40 stack=0 before 16: (bf) r2 = r10
  regs=40 stack=0 before 15: (bf) r1 = r4
  regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00
  regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
  regs=40 stack=0 before 11: (b7) r0 = 0
  regs=40 stack=0 before 10: (b7) r6 = 0
  25: (79) r3 = *(u64 *)(r0 +0)         ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
  26: (7b) *(u64 *)(r1 +0) = r3         ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
  27: (95) exit

  from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
  11: (b7) r0 = 0                       ; R0_w=0
  12: (63) *(u32 *)(r10 -4) = r0
  last_idx 12 first_idx 11
  regs=1 stack=0 before 11: (b7) r0 = 0
  13: R0_w=0 R10=fp0 fp-8=0000????
  13: (18) r4 = 0xffff8ad3886c2a00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  17: (07) r2 += -4                     ; R2_w=fp-4
  18: (85) call bpf_map_lookup_elem#1
  frame 0: propagating r6
  last_idx 19 first_idx 11
  regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
  regs=40 stack=0 before 17: (07) r2 += -4
  regs=40 stack=0 before 16: (bf) r2 = r10
  regs=40 stack=0 before 15: (bf) r1 = r4
  regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00
  regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
  regs=40 stack=0 before 11: (b7) r0 = 0
  parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0
  last_idx 9 first_idx 9
  regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
  parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=0 R10=fp0
  last_idx 8 first_idx 0
  regs=40 stack=0 before 8: (b7) r9 = 0
  regs=40 stack=0 before 7: (97) r6 %= 1
  regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=40 stack=0 before 5: (05) goto pc+0
  regs=40 stack=0 before 4: (97) r6 %= 1025
  regs=40 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024
  19: safe
  frame 0: propagating r6
  last_idx 9 first_idx 0
  regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=40 stack=0 before 5: (05) goto pc+0
  regs=40 stack=0 before 4: (97) r6 %= 1025
  regs=40 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024

  from 6 to 9: safe
  verification time 110 usec
  stack depth 4
  processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2

The verifier considers this program as safe by mistakenly pruning unsafe
code paths. In the above func#0, code lines 0-10 are of interest. In line
0-3 registers r6 to r9 are initialized with known scalar values. In line 4
the register r6 is reset to an unknown scalar given the verifier does not
track modulo operations. Due to this, the verifier can also not determine
precisely which branches in line 6 and 9 are taken, therefore it needs to
explore them both.

As can be seen, the verifier starts with exploring the false/fall-through
paths first. The 'from 19 to 21' path has both r6=0 and r9=0 and the pointer
arithmetic on r0 += r6 is therefore considered safe. Given the arithmetic,
r6 is correctly marked for precision tracking where backtracking kicks in
where it walks back the current path all the way where r6 was set to 0 in
the fall-through branch.

Next, the pruning logics pops the path 'from 9 to 11' from the stack. Also
here, the state of the registers is the same, that is, r6=0 and r9=0, so
that at line 19 the path can be pruned as it is considered safe. It is
interesting to note that the conditional in line 9 turned r6 into a more
precise state, that is, in the fall-through path at the beginning of line
10, it is R6=scalar(umin=1), and in the branch-taken path (which is analyzed
here) at the beginning of line 11, r6 turned into a known const r6=0 as
r9=0 prior to that and therefore (unsigned) r6 <= 0 concludes that r6 must
be 0 (**):

  [...]                                 ; R6_w=scalar()
  9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
  [...]

  from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
  [...]

The next path is 'from 6 to 9'. The verifier considers the old and current
state equivalent, and therefore prunes the search incorrectly. Looking into
the two states which are being compared by the pruning logic at line 9, the
old state consists of R6_rwD=Pscalar() R9_rwD=0 R10=fp0 and the new state
consists of R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968)
R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0. While r6 had the reg->precise flag
correctly set in the old state, r9 did not. Both r6'es are considered as
equivalent given the old one is a superset of the current, more precise one,
however, r9's actual values (0 vs 0x80000000) mismatch. Given the old r9
did not have reg->precise flag set, the verifier does not consider the
register as contributing to the precision state of r6, and therefore it
considered both r9 states as equivalent. However, for this specific pruned
path (which is also the actual path taken at runtime), register r6 will be
0x400 and r9 0x80000000 when reaching line 21, thus oob-accessing the map.

The purpose of precision tracking is to initially mark registers (including
spilled ones) as imprecise to help verifier's pruning logic finding equivalent
states it can then prune if they don't contribute to the program's safety
aspects. For example, if registers are used for pointer arithmetic or to pass
constant length to a helper, then the verifier sets reg->precise flag and
backtracks the BPF program instruction sequence and chain of verifier states
to ensure that the given register or stack slot including their dependencies
are marked as precisely tracked scalar. This also includes any other registers
and slots that contribute to a tracked state of given registers/stack slot.
This backtracking relies on recorded jmp_history and is able to traverse
entire chain of parent states. This process ends only when all the necessary
registers/slots and their transitive dependencies are marked as precise.

The backtrack_insn() is called from the current instruction up to the first
instruction, and its purpose is to compute a bitmask of registers and stack
slots that need precision tracking in the parent's verifier state. For example,
if a current instruction is r6 = r7, then r6 needs precision after this
instruction and r7 needs precision before this instruction, that is, in the
parent state. Hence for the latter r7 is marked and r6 unmarked.

For the class of jmp/jmp32 instructions, backtrack_insn() today only looks
at call and exit instructions and for all other conditionals the masks
remain as-is. However, in the given situation register r6 has a dependency
on r9 (as described above in **), so also that one needs to be marked for
precision tracking. In other words, if an imprecise register influences a
precise one, then the imprecise register should also be marked precise.
Meaning, in the parent state both dest and src register need to be tracked
for precision and therefore the marking must be more conservative by setting
reg->precise flag for both. The precision propagation needs to cover both
for the conditional: if the src reg was marked but not the dst reg and vice
versa.

After the fix the program is correctly rejected:

  func#0 @0
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (b7) r6 = 1024                     ; R6_w=1024
  1: (b7) r7 = 0                        ; R7_w=0
  2: (b7) r8 = 0                        ; R8_w=0
  3: (b7) r9 = -2147483648              ; R9_w=-2147483648
  4: (97) r6 %= 1025                    ; R6_w=scalar()
  5: (05) goto pc+0
  6: (bd) if r6 <= r9 goto pc+2         ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff80000000; 0x7fffffff),u32_min=-2147483648) R9_w=-2147483648
  7: (97) r6 %= 1                       ; R6_w=scalar()
  8: (b7) r9 = 0                        ; R9=0
  9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
  10: (b7) r6 = 0                       ; R6_w=0
  11: (b7) r0 = 0                       ; R0_w=0
  12: (63) *(u32 *)(r10 -4) = r0
  last_idx 12 first_idx 9
  regs=1 stack=0 before 11: (b7) r0 = 0
  13: R0_w=0 R10=fp0 fp-8=0000????
  13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  17: (07) r2 += -4                     ; R2_w=fp-4
  18: (85) call bpf_map_lookup_elem#1   ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
  19: (55) if r0 != 0x0 goto pc+1       ; R0=0
  20: (95) exit

  from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
  21: (77) r6 >>= 10                    ; R6_w=0
  22: (27) r6 *= 8192                   ; R6_w=0
  23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
  24: (0f) r0 += r6
  last_idx 24 first_idx 19
  regs=40 stack=0 before 23: (bf) r1 = r0
  regs=40 stack=0 before 22: (27) r6 *= 8192
  regs=40 stack=0 before 21: (77) r6 >>= 10
  regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
  parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
  last_idx 18 first_idx 9
  regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
  regs=40 stack=0 before 17: (07) r2 += -4
  regs=40 stack=0 before 16: (bf) r2 = r10
  regs=40 stack=0 before 15: (bf) r1 = r4
  regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
  regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
  regs=40 stack=0 before 11: (b7) r0 = 0
  regs=40 stack=0 before 10: (b7) r6 = 0
  25: (79) r3 = *(u64 *)(r0 +0)         ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
  26: (7b) *(u64 *)(r1 +0) = r3         ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
  27: (95) exit

  from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
  11: (b7) r0 = 0                       ; R0_w=0
  12: (63) *(u32 *)(r10 -4) = r0
  last_idx 12 first_idx 11
  regs=1 stack=0 before 11: (b7) r0 = 0
  13: R0_w=0 R10=fp0 fp-8=0000????
  13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  17: (07) r2 += -4                     ; R2_w=fp-4
  18: (85) call bpf_map_lookup_elem#1
  frame 0: propagating r6
  last_idx 19 first_idx 11
  regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
  regs=40 stack=0 before 17: (07) r2 += -4
  regs=40 stack=0 before 16: (bf) r2 = r10
  regs=40 stack=0 before 15: (bf) r1 = r4
  regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
  regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
  regs=40 stack=0 before 11: (b7) r0 = 0
  parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0
  last_idx 9 first_idx 9
  regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
  parent didn't have regs=240 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=P0 R10=fp0
  last_idx 8 first_idx 0
  regs=240 stack=0 before 8: (b7) r9 = 0
  regs=40 stack=0 before 7: (97) r6 %= 1
  regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=240 stack=0 before 5: (05) goto pc+0
  regs=240 stack=0 before 4: (97) r6 %= 1025
  regs=240 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024
  19: safe

  from 6 to 9: R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0
  9: (bd) if r6 <= r9 goto pc+1
  last_idx 9 first_idx 0
  regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=240 stack=0 before 5: (05) goto pc+0
  regs=240 stack=0 before 4: (97) r6 %= 1025
  regs=240 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024
  last_idx 9 first_idx 0
  regs=200 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=240 stack=0 before 5: (05) goto pc+0
  regs=240 stack=0 before 4: (97) r6 %= 1025
  regs=240 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024
  11: R6=scalar(umax=18446744071562067968) R9=-2147483648
  11: (b7) r0 = 0                       ; R0_w=0
  12: (63) *(u32 *)(r10 -4) = r0
  last_idx 12 first_idx 11
  regs=1 stack=0 before 11: (b7) r0 = 0
  13: R0_w=0 R10=fp0 fp-8=0000????
  13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  17: (07) r2 += -4                     ; R2_w=fp-4
  18: (85) call bpf_map_lookup_elem#1   ; R0_w=map_value_or_null(id=3,off=0,ks=4,vs=48,imm=0)
  19: (55) if r0 != 0x0 goto pc+1       ; R0_w=0
  20: (95) exit

  from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=scalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm????
  21: (77) r6 >>= 10                    ; R6_w=scalar(umax=18014398507384832,var_off=(0x0; 0x3fffffffffffff))
  22: (27) r6 *= 8192                   ; R6_w=scalar(smax=9223372036854767616,umax=18446744073709543424,var_off=(0x0; 0xffffffffffffe000),s32_max=2147475456,u32_max=-8192)
  23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
  24: (0f) r0 += r6
  last_idx 24 first_idx 21
  regs=40 stack=0 before 23: (bf) r1 = r0
  regs=40 stack=0 before 22: (27) r6 *= 8192
  regs=40 stack=0 before 21: (77) r6 >>= 10
  parent didn't have regs=40 stack=0 marks: R0_rw=map_value(off=0,ks=4,vs=48,imm=0) R6_r=Pscalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm????
  last_idx 19 first_idx 11
  regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
  regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
  regs=40 stack=0 before 17: (07) r2 += -4
  regs=40 stack=0 before 16: (bf) r2 = r10
  regs=40 stack=0 before 15: (bf) r1 = r4
  regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
  regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
  regs=40 stack=0 before 11: (b7) r0 = 0
  parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0
  last_idx 9 first_idx 0
  regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
  regs=240 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
  regs=240 stack=0 before 5: (05) goto pc+0
  regs=240 stack=0 before 4: (97) r6 %= 1025
  regs=240 stack=0 before 3: (b7) r9 = -2147483648
  regs=40 stack=0 before 2: (b7) r8 = 0
  regs=40 stack=0 before 1: (b7) r7 = 0
  regs=40 stack=0 before 0: (b7) r6 = 1024
  math between map_value pointer and register with unbounded min value is not allowed
  verification time 886 usec
  stack depth 4
  processed 49 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 2

Fixes: b5dc0163d8 ("bpf: precise scalar_value tracking")
Reported-by: Juan Jose Lopez Jaimez <jjlopezjaimez@google.com>
Reported-by: Meador Inge <meadori@google.com>
Reported-by: Simon Scannell <simonscannell@google.com>
Reported-by: Nenad Stojanovski <thenenadx@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Juan Jose Lopez Jaimez <jjlopezjaimez@google.com>
Reviewed-by: Meador Inge <meadori@google.com>
Reviewed-by: Simon Scannell <simonscannell@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26 14:28:35 +02:00
Greg Kroah-Hartman
0fff48d6fe Merge 6.1.24 into android14-6.1
Changes in 6.1.24
	dm cache: Add some documentation to dm-cache-background-tracker.h
	dm integrity: Remove bi_sector that's only used by commented debug code
	dm: change "unsigned" to "unsigned int"
	dm: fix improper splitting for abnormal bios
	KVM: arm64: PMU: Align chained counter implementation with architecture pseudocode
	KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow
	KVM: arm64: PMU: Sanitise PMCR_EL0.LP on first vcpu run
	KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU
	gpio: GPIO_REGMAP: select REGMAP instead of depending on it
	Drivers: vmbus: Check for channel allocation before looking up relids
	ASoC: SOF: ipc4: Ensure DSP is in D0I0 during sof_ipc4_set_get_data()
	pwm: Make .get_state() callback return an error code
	pwm: hibvt: Explicitly set .polarity in .get_state()
	pwm: cros-ec: Explicitly set .polarity in .get_state()
	pwm: iqs620a: Explicitly set .polarity in .get_state()
	pwm: sprd: Explicitly set .polarity in .get_state()
	pwm: meson: Explicitly set .polarity in .get_state()
	ASoC: codecs: lpass: fix the order or clks turn off during suspend
	KVM: s390: pv: fix external interruption loop not always detected
	wifi: mac80211: fix the size calculation of ieee80211_ie_len_eht_cap()
	wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
	net: qrtr: Fix a refcount bug in qrtr_recvmsg()
	net: phylink: add phylink_expects_phy() method
	net: stmmac: check if MAC needs to attach to a PHY
	net: stmmac: remove redundant fixup to support fixed-link mode
	l2tp: generate correct module alias strings
	wifi: brcmfmac: Fix SDIO suspend/resume regression
	NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
	nfsd: call op_release, even when op_func returns an error
	icmp: guard against too small mtu
	ALSA: hda/hdmi: Preserve the previous PCM device upon re-enablement
	net: don't let netpoll invoke NAPI if in xmit context
	net: dsa: mv88e6xxx: Reset mv88e6393x force WD event bit
	sctp: check send stream number after wait_for_sndbuf
	net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT
	ipv6: Fix an uninit variable access bug in __ip6_make_skb()
	platform/x86: think-lmi: Fix memory leak when showing current settings
	platform/x86: think-lmi: Fix memory leaks when parsing ThinkStation WMI strings
	platform/x86: think-lmi: Clean up display of current_value on Thinkstation
	gpio: davinci: Do not clear the bank intr enable bit in save_context
	gpio: davinci: Add irq chip flag to skip set wake
	net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
	net: stmmac: fix up RX flow hash indirection table when setting channels
	sunrpc: only free unix grouplist after RCU settles
	NFSD: callback request does not use correct credential for AUTH_SYS
	ice: fix wrong fallback logic for FDIR
	ice: Reset FDIR counter in FDIR init stage
	raw: use net_hash_mix() in hash function
	raw: Fix NULL deref in raw_get_next().
	ping: Fix potentail NULL deref for /proc/net/icmp.
	ethtool: reset #lanes when lanes is omitted
	netlink: annotate lockless accesses to nlk->max_recvmsg_len
	gve: Secure enough bytes in the first TX desc for all TCP pkts
	arm64: compat: Work around uninitialized variable warning
	net: stmmac: check fwnode for phy device before scanning for phy
	cxl/pci: Fix CDAT retrieval on big endian
	cxl/pci: Handle truncated CDAT header
	cxl/pci: Handle truncated CDAT entries
	cxl/pci: Handle excessive CDAT length
	PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
	PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
	usb: xhci: tegra: fix sleep in atomic call
	xhci: Free the command allocated for setting LPM if we return early
	xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
	usb: cdnsp: Fixes error: uninitialized symbol 'len'
	usb: dwc3: pci: add support for the Intel Meteor Lake-S
	USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
	usb: typec: altmodes/displayport: Fix configure initial pin assignment
	USB: serial: option: add Telit FE990 compositions
	USB: serial: option: add Quectel RM500U-CN modem
	drivers: iio: adc: ltc2497: fix LSB shift
	iio: adis16480: select CONFIG_CRC32
	iio: adc: qcom-spmi-adc5: Fix the channel name
	iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
	iio: dac: cio-dac: Fix max DAC write value check for 12-bit
	iio: buffer: correctly return bytes written in output buffers
	iio: buffer: make sure O_NONBLOCK is respected
	iio: light: cm32181: Unregister second I2C client if present
	tty: serial: sh-sci: Fix transmit end interrupt handler
	tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
	tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
	nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
	nilfs2: fix sysfs interface lifetime
	dt-bindings: serial: renesas,scif: Fix 4th IRQ for 4-IRQ SCIFs
	serial: 8250: Prevent starting up DMA Rx on THRI interrupt
	ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN
	ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdr
	ALSA: hda/realtek: Add quirk for Clevo X370SNW
	ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook
	x86/acpi/boot: Correct acpi_is_processor_usable() check
	x86/ACPI/boot: Use FADT version to check support for online capable
	KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
	KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode
	mm: kfence: fix PG_slab and memcg_data clearing
	mm: kfence: fix handling discontiguous page
	coresight: etm4x: Do not access TRCIDR1 for identification
	coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
	counter: 104-quad-8: Fix race condition between FLAG and CNTR reads
	counter: 104-quad-8: Fix Synapse action reported for Index signals
	blk-mq: directly poll requests
	iio: adc: ad7791: fix IRQ flags
	io_uring: fix return value when removing provided buffers
	io_uring: fix memory leak when removing provided buffers
	scsi: qla2xxx: Fix memory leak in qla2x00_probe_one()
	scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
	nvme: fix discard support without oncs
	cifs: sanitize paths in cifs_update_super_prepath.
	block: ublk: make sure that block size is set correctly
	block: don't set GD_NEED_PART_SCAN if scan partition failed
	perf/core: Fix the same task check in perf_event_set_output
	ftrace: Mark get_lock_parent_ip() __always_inline
	ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
	fs: drop peer group ids under namespace lock
	can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
	can: isotp: fix race between isotp_sendsmg() and isotp_release()
	can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
	can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
	ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()
	ACPI: video: Make acpi_backlight=video work independent from GPU driver
	ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2
	ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530
	net: stmmac: Add queue reset into stmmac_xdp_open() function
	tracing/synthetic: Fix races on freeing last_cmd
	tracing/timerlat: Notify new max thread latency
	tracing/osnoise: Fix notify new tracing_max_latency
	tracing: Free error logs of tracing instances
	ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots()
	tracing/synthetic: Make lastcmd_mutex static
	zsmalloc: document freeable stats
	mm: vmalloc: avoid warn_alloc noise caused by fatal signal
	wifi: mt76: ignore key disable commands
	ublk: read any SQE values upfront
	drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
	drm/nouveau/disp: Support more modes by checking with lower bpc
	drm/i915: Fix context runtime accounting
	drm/i915: fix race condition UAF in i915_perf_add_config_ioctl
	ring-buffer: Fix race while reader and writer are on the same page
	mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
	mm/hugetlb: fix uffd wr-protection for CoW optimization path
	maple_tree: fix get wrong data_end in mtree_lookup_walk()
	maple_tree: fix a potential concurrency bug in RCU mode
	blk-throttle: Fix that bps of child could exceed bps limited in parent
	drm/amd/display: Clear MST topology if it fails to resume
	drm/amdgpu: for S0ix, skip SDMA 5.x+ suspend/resume
	drm/amdgpu: skip psp suspend for IMU enabled ASICs mode2 reset
	drm/display/dp_mst: Handle old/new payload states in drm_dp_remove_payload()
	drm/i915/dp_mst: Fix payload removal during output disabling
	drm/bridge: lt9611: Fix PLL being unable to lock
	drm/i915: Use _MMIO_PIPE() for SKL_BOTTOM_COLOR
	drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm()
	mm: take a page reference when removing device exclusive entries
	maple_tree: remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc_bulk()
	maple_tree: fix potential rcu issue
	maple_tree: reduce user error potential
	maple_tree: fix handle of invalidated state in mas_wr_store_setup()
	maple_tree: fix mas_prev() and mas_find() state handling
	maple_tree: be more cautious about dead nodes
	maple_tree: refine ma_state init from mas_start()
	maple_tree: detect dead nodes in mas_start()
	maple_tree: fix freeing of nodes in rcu mode
	maple_tree: remove extra smp_wmb() from mas_dead_leaves()
	maple_tree: add smp_rmb() to dead node detection
	maple_tree: add RCU lock checking to rcu callback functions
	mm: enable maple tree RCU mode by default.
	bpftool: Print newline before '}' for struct with padding only fields
	Linux 6.1.24

Change-Id: I475408e1166927565c7788e7095bdf2cb236c4b2
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-22 08:52:25 +00:00
Waiman Long
243d9f3a11 cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods
[ Upstream commit eee87853794187f6adbe19533ed79c8b44b36a91 ]

In the case of CLONE_INTO_CGROUP, not all cpusets are ready to accept
new tasks. It is too late to check that in cpuset_fork(). So we need
to add the cpuset_can_fork() and cpuset_cancel_fork() methods to
pre-check it before we can allow attachment to a different cpuset.

We also need to set the attach_in_progress flag to alert other code
that a new task is going to be added to the cpuset.

Fixes: ef2c41cf38 ("clone3: allow spawning processes into cgroups")
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:35:14 +02:00
Waiman Long
e33ab28395 cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly
[ Upstream commit 42a11bf5c5436e91b040aeb04063be1710bb9f9c ]

By default, the clone(2) syscall spawn a child process into the same
cgroup as its parent. With the use of the CLONE_INTO_CGROUP flag
introduced by commit ef2c41cf38 ("clone3: allow spawning processes
into cgroups"), the child will be spawned into a different cgroup which
is somewhat similar to writing the child's tid into "cgroup.threads".

The current cpuset_fork() method does not properly handle the
CLONE_INTO_CGROUP case where the cpuset of the child may be different
from that of its parent.  Update the cpuset_fork() method to treat the
CLONE_INTO_CGROUP case similar to cpuset_attach().

Since the newly cloned task has not been running yet, its actual
memory usage isn't known. So it is not necessary to make change to mm
in cpuset_fork().

Fixes: ef2c41cf38 ("clone3: allow spawning processes into cgroups")
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:35:14 +02:00
Waiman Long
ff5a4fe259 cgroup/cpuset: Skip spread flags update on v2
[ Upstream commit 18f9a4d47527772515ad6cbdac796422566e6440 ]

Cpuset v2 has no spread flags to set. So we can skip spread
flags update if cpuset v2 is being used. Also change the name to
cpuset_update_task_spread_flags() to indicate that there are multiple
spread flags.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Stable-dep-of: 42a11bf5c543 ("cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:35:14 +02:00
Vincent Guittot
31c5ad43bd sched/fair: Fix imbalance overflow
[ Upstream commit 91dcf1e8068e9a8823e419a7a34ff4341275fb70 ]

When local group is fully busy but its average load is above system load,
computing the imbalance will overflow and local group is not the best
target for pulling this load.

Fixes: 0b0695f2b3 ("sched/fair: Rework load_balance()")
Reported-by: Tingjia Cao <tjcao980311@gmail.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tingjia Cao <tjcao980311@gmail.com>
Link: https://lore.kernel.org/lkml/CABcWv9_DAhVBOq2=W=2ypKE9dKM5s2DvoV8-U0+GDwwuKZ89jQ@mail.gmail.com/T/
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:35:14 +02:00
Waiman Long
a746ad276b cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
commit ba9182a89626d5f83c2ee4594f55cb9c1e60f0e2 upstream.

After a successful cpuset_can_attach() call which increments the
attach_in_progress flag, either cpuset_cancel_attach() or cpuset_attach()
will be called later. In cpuset_attach(), tasks in cpuset_attach_wq,
if present, will be woken up at the end. That is not the case in
cpuset_cancel_attach(). So missed wakeup is possible if the attach
operation is somehow cancelled. Fix that by doing the wakeup in
cpuset_cancel_attach() as well.

Fixes: e44193d39e ("cpuset: let hotplug propagation work wait for task attaching")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:35:13 +02:00
Waiman Long
f06c9b0154 cgroup/cpuset: Fix partition root's cpuset.cpus update bug
commit 292fd843de26c551856e66faf134512c52dd78b4 upstream.

It was found that commit 7a2127e66a00 ("cpuset: Call
set_cpus_allowed_ptr() with appropriate mask for task") introduced a bug
that corrupted "cpuset.cpus" of a partition root when it was updated.

It is because the tmp->new_cpus field of the passed tmp parameter
of update_parent_subparts_cpumask() should not be used at all as
it contains important cpumask data that should not be overwritten.
Fix it by using tmp->addmask instead.

Also update update_cpumask() to make sure that trialcs->cpu_allowed
will not be corrupted until it is no longer needed.

Fixes: 7a2127e66a00 ("cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v6.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:35:13 +02:00
Josh Don
53244494bf cgroup: fix display of forceidle time at root
commit fcdb1eda5302599045bb366e679cccb4216f3873 upstream.

We need to reset forceidle_sum to 0 when reading from root, since the
bstat we accumulate into is stack allocated.

To make this more robust, just replace the existing cputime reset with a
memset of the overall bstat.

Signed-off-by: Josh Don <joshdon@google.com>
Fixes: 1fcf54deb7 ("sched/core: add forced idle accounting for cgroups")
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:35:13 +02:00
Steven Rostedt (Google)
f58574f238 tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
[ Upstream commit 9d52727f8043cfda241ae96896628d92fa9c50bb ]

If a trace instance has a failure with its snapshot code, the error
message is to be written to that instance's buffer. But currently, the
message is written to the top level buffer. Worse yet, it may also disable
the top level buffer and not the instance that had the issue.

Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <zwisler@google.com>
Fixes: 2824f50332 ("tracing: Make the snapshot trigger work with instances")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:35:12 +02:00
Steven Rostedt (Google)
5620eeb379 tracing: Add trace_array_puts() to write into instance
[ Upstream commit d503b8f7474fe7ac616518f7fc49773cbab49f36 ]

Add a generic trace_array_puts() that can be used to "trace_puts()" into
an allocated trace_array instance. This is just another variant of
trace_array_printk().

Link: https://lkml.kernel.org/r/20230207173026.584717290@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: 9d52727f8043 ("tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:35:12 +02:00
Tetsuo Handa
3756171b97 cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex
[ Upstream commit 57dcd64c7e036299ef526b400a8d12b8a2352f26 ]

syzbot is reporting circular locking dependency between cpu_hotplug_lock
and freezer_mutex, for commit f5d39b0208 ("freezer,sched: Rewrite core
freezer logic") replaced atomic_inc() in freezer_apply_state() with
static_branch_inc() which holds cpu_hotplug_lock.

cpu_hotplug_lock => cgroup_threadgroup_rwsem => freezer_mutex

  cgroup_file_write() {
    cgroup_procs_write() {
      __cgroup_procs_write() {
        cgroup_procs_write_start() {
          cgroup_attach_lock() {
            cpus_read_lock() {
              percpu_down_read(&cpu_hotplug_lock);
            }
            percpu_down_write(&cgroup_threadgroup_rwsem);
          }
        }
        cgroup_attach_task() {
          cgroup_migrate() {
            cgroup_migrate_execute() {
              freezer_attach() {
                mutex_lock(&freezer_mutex);
                (...snipped...)
              }
            }
          }
        }
        (...snipped...)
      }
    }
  }

freezer_mutex => cpu_hotplug_lock

  cgroup_file_write() {
    freezer_write() {
      freezer_change_state() {
        mutex_lock(&freezer_mutex);
        freezer_apply_state() {
          static_branch_inc(&freezer_active) {
            static_key_slow_inc() {
              cpus_read_lock();
              static_key_slow_inc_cpuslocked();
              cpus_read_unlock();
            }
          }
        }
        mutex_unlock(&freezer_mutex);
      }
    }
  }

Swap locking order by moving cpus_read_lock() in freezer_apply_state()
to before mutex_lock(&freezer_mutex) in freezer_change_state().

Reported-by: syzbot <syzbot+c39682e86c9d84152f93@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=c39682e86c9d84152f93
Suggested-by: Hillf Danton <hdanton@sina.com>
Fixes: f5d39b0208 ("freezer,sched: Rewrite core freezer logic")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20 12:35:09 +02:00
Kuan-Ying Lee
b28620e7db ANDROID: module: Add vendor hooks
Add vendor hook for module init, so we can get memory type and
use it to do memory type check for architecture
dependent page table setting. To make sure the architecture
dependent tables are created correctly, we need to know when
module parts are initialized and their attributes.

For releasing modules, corresponding tables and attributes should be
destroyed and restored.

These hooks may be invoked in non-atomic context, so it's
necessary to use restricted ones.

Bug: 248994334
Change-Id: Ie9f415c36bca1fb98e021522b627e562d27cdef4
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
2023-04-18 19:28:00 +00:00
Kuan-Ying Lee
05b36413b6 ANDROID: kernel: Add restricted vendor hook in creds
Add restricted vendor hook for creds, so we get the creds
information to monitor cred lifetime. During the lifetime,
we store the creds information in a standalone protected
memory and keep track of integrity.

These hooks may be invoked in non-atomic context, so it's
necessary to use restricted ones.

Bug: 248994334
Change-Id: I57fbb759452302fa1ba1e720c76bfe671eab96b5
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
2023-04-18 19:28:00 +00:00
Liam R. Howlett
1c87a6f82a mm: enable maple tree RCU mode by default.
commit 3dd4432549415f3c65dd52d5c687629efbf4ece1 upstream.

Use the maple tree in RCU mode for VMA tracking.

The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock.  This is safe as the
writes to the stack have a guard VMA which ensures there will always be
a NULL in the direction of the growth and thus will only update a pivot.

It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs.  syzbot has constructed a testcase which sets up a
VMA to grow and consume the empty space.  Overwriting the entire NULL
entry causes the tree to be altered in a way that is not safe for
concurrent readers; the readers may see a node being rewritten or one
that does not match the maple state they are using.

Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.

Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com
Cc: stable@vger.kernel.org
Fixes: d4af56c5c7 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: syzbot+8d95422d3537159ca390@syzkaller.appspotmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:55:40 +02:00
Zheng Yejian
3663f5d5bb ring-buffer: Fix race while reader and writer are on the same page
commit 6455b6163d8c680366663cdb8c679514d55fc30c upstream.

When user reads file 'trace_pipe', kernel keeps printing following logs
that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in
rb_get_reader_page(). It just looks like there's an infinite loop in
tracing_read_pipe(). This problem occurs several times on arm64 platform
when testing v5.10 and below.

  Call trace:
   rb_get_reader_page+0x248/0x1300
   rb_buffer_peek+0x34/0x160
   ring_buffer_peek+0xbc/0x224
   peek_next_entry+0x98/0xbc
   __find_next_entry+0xc4/0x1c0
   trace_find_next_entry_inc+0x30/0x94
   tracing_read_pipe+0x198/0x304
   vfs_read+0xb4/0x1e0
   ksys_read+0x74/0x100
   __arm64_sys_read+0x24/0x30
   el0_svc_common.constprop.0+0x7c/0x1bc
   do_el0_svc+0x2c/0x94
   el0_svc+0x20/0x30
   el0_sync_handler+0xb0/0xb4
   el0_sync+0x160/0x180

Then I dump the vmcore and look into the problematic per_cpu ring_buffer,
I found that tail_page/commit_page/reader_page are on the same page while
reader_page->read is obviously abnormal:
  tail_page == commit_page == reader_page == {
    .write = 0x100d20,
    .read = 0x8f9f4805,  // Far greater than 0xd20, obviously abnormal!!!
    .entries = 0x10004c,
    .real_end = 0x0,
    .page = {
      .time_stamp = 0x857257416af0,
      .commit = 0xd20,  // This page hasn't been full filled.
      // .data[0...0xd20] seems normal.
    }
 }

The root cause is most likely the race that reader and writer are on the
same page while reader saw an event that not fully committed by writer.

To fix this, add memory barriers to make sure the reader can see the
content of what is committed. Since commit a0fcaaed0c ("ring-buffer: Fix
race between reset page and reading page") has added the read barrier in
rb_get_reader_page(), here we just need to add the write barrier.

Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: 77ae365eca ("ring-buffer: make lockless")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:55:36 +02:00
Steven Rostedt (Google)
dc48648699 tracing/synthetic: Make lastcmd_mutex static
commit 31c683967174b487939efaf65e41f5ff1404e141 upstream.

The lastcmd_mutex is only used in trace_events_synth.c and should be
static.

Link: https://lore.kernel.org/linux-trace-kernel/202304062033.cRStgOuP-lkp@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230406111033.6e26de93@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Fixes: 4ccf11c4e8a8e ("tracing/synthetic: Fix races on freeing last_cmd")
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:55:35 +02:00
Steven Rostedt (Google)
c0cf0f55be tracing: Free error logs of tracing instances
commit 3357c6e429643231e60447b52ffbb7ac895aca22 upstream.

When a tracing instance is removed, the error messages that hold errors
that occurred in the instance needs to be freed. The following reports a
memory leak:

 # cd /sys/kernel/tracing
 # mkdir instances/foo
 # echo 'hist:keys=x' > instances/foo/events/sched/sched_switch/trigger
 # cat instances/foo/error_log
 [  117.404795] hist:sched:sched_switch: error: Couldn't find field
   Command: hist:keys=x
                      ^
 # rmdir instances/foo

Then check for memory leaks:

 # echo scan > /sys/kernel/debug/kmemleak
 # cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88810d8ec700 (size 192):
  comm "bash", pid 869, jiffies 4294950577 (age 215.752s)
  hex dump (first 32 bytes):
    60 dd 68 61 81 88 ff ff 60 dd 68 61 81 88 ff ff  `.ha....`.ha....
    a0 30 8c 83 ff ff ff ff 26 00 0a 00 00 00 00 00  .0......&.......
  backtrace:
    [<00000000dae26536>] kmalloc_trace+0x2a/0xa0
    [<00000000b2938940>] tracing_log_err+0x277/0x2e0
    [<000000004a0e1b07>] parse_atom+0x966/0xb40
    [<0000000023b24337>] parse_expr+0x5f3/0xdb0
    [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560
    [<00000000293a9645>] trigger_process_regex+0x135/0x1a0
    [<000000005c22b4f2>] event_trigger_write+0x87/0xf0
    [<000000002cadc509>] vfs_write+0x162/0x670
    [<0000000059c3b9be>] ksys_write+0xca/0x170
    [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0
    [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff888170c35a00 (size 32):
  comm "bash", pid 869, jiffies 4294950577 (age 215.752s)
  hex dump (first 32 bytes):
    0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 68 69 73 74  .  Command: hist
    3a 6b 65 79 73 3d 78 0a 00 00 00 00 00 00 00 00  :keys=x.........
  backtrace:
    [<000000006a747de5>] __kmalloc+0x4d/0x160
    [<000000000039df5f>] tracing_log_err+0x29b/0x2e0
    [<000000004a0e1b07>] parse_atom+0x966/0xb40
    [<0000000023b24337>] parse_expr+0x5f3/0xdb0
    [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560
    [<00000000293a9645>] trigger_process_regex+0x135/0x1a0
    [<000000005c22b4f2>] event_trigger_write+0x87/0xf0
    [<000000002cadc509>] vfs_write+0x162/0x670
    [<0000000059c3b9be>] ksys_write+0xca/0x170
    [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0
    [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

The problem is that the error log needs to be freed when the instance is
removed.

Link: https://lore.kernel.org/lkml/76134d9f-a5ba-6a0d-37b3-28310b4a1e91@alu.unizg.hr/
Link: https://lore.kernel.org/linux-trace-kernel/20230404194504.5790b95f@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Fixes: 2f754e771b ("tracing: Have the error logs show up in the proper instances")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:55:34 +02:00
Daniel Bristot de Oliveira
1ea5f8d1fa tracing/osnoise: Fix notify new tracing_max_latency
commit d3cba7f02cd82118c32651c73374d8a5a459d9a6 upstream.

osnoise/timerlat tracers are reporting new max latency on instances
where the tracing is off, creating inconsistencies between the max
reported values in the trace and in the tracing_max_latency. Thus
only report new tracing_max_latency on active tracing instances.

Link: https://lkml.kernel.org/r/ecd109fde4a0c24ab0f00ba1e9a144ac19a91322.1680104184.git.bristot@kernel.org

Cc: stable@vger.kernel.org
Fixes: dae181349f ("tracing/osnoise: Support a list of trace_array *tr")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:55:34 +02:00
Daniel Bristot de Oliveira
162e6e6ff2 tracing/timerlat: Notify new max thread latency
commit b9f451a9029a16eb7913ace09b92493d00f2e564 upstream.

timerlat is not reporting a new tracing_max_latency for the thread
latency. The reason is that it is not calling notify_new_max_latency()
function after the new thread latency is sampled.

Call notify_new_max_latency() after computing the thread latency.

Link: https://lkml.kernel.org/r/16e18d61d69073d0192ace07bf61e405cca96e9c.1680104184.git.bristot@kernel.org

Cc: stable@vger.kernel.org
Fixes: dae181349f ("tracing/osnoise: Support a list of trace_array *tr")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:55:34 +02:00