android_kernel_msm-6.1_noth.../kernel
Frederic Weisbecker c817f5c016 rcu: Defer RCU kthreads wakeup when CPU is dying
[ Upstream commit e787644caf7628ad3269c1fbd321c3255cf51710 ]

When the CPU goes idle for the last time during the CPU down hotplug
process, RCU reports a final quiescent state for the current CPU. If
this quiescent state propagates up to the top, some tasks may then be
woken up to complete the grace period: the main grace period kthread
and/or the expedited main workqueue (or kworker).

If those kthreads have a SCHED_FIFO policy, the wake up can indirectly
arm the RT bandwith timer to the local offline CPU. Since this happens
after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the
timer gets ignored. Therefore if the RCU kthreads are waiting for RT
bandwidth to be available, they may never be actually scheduled.

This triggers TREE03 rcutorture hangs:

	 rcu: INFO: rcu_preempt self-detected stall on CPU
	 rcu:     4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved)
	 rcu:     (t=21035 jiffies g=938281 q=40787 ncpus=6)
	 rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
	 rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
	 rcu: RCU grace-period kthread stack dump:
	 task:rcu_preempt     state:R  running task     stack:14896 pid:14    tgid:14    ppid:2      flags:0x00004000
	 Call Trace:
	  <TASK>
	  __schedule+0x2eb/0xa80
	  schedule+0x1f/0x90
	  schedule_timeout+0x163/0x270
	  ? __pfx_process_timeout+0x10/0x10
	  rcu_gp_fqs_loop+0x37c/0x5b0
	  ? __pfx_rcu_gp_kthread+0x10/0x10
	  rcu_gp_kthread+0x17c/0x200
	  kthread+0xde/0x110
	  ? __pfx_kthread+0x10/0x10
	  ret_from_fork+0x2b/0x40
	  ? __pfx_kthread+0x10/0x10
	  ret_from_fork_asm+0x1b/0x30
	  </TASK>

The situation can't be solved with just unpinning the timer. The hrtimer
infrastructure and the nohz heuristics involved in finding the best
remote target for an unpinned timer would then also need to handle
enqueues from an offline CPU in the most horrendous way.

So fix this on the RCU side instead and defer the wake up to an online
CPU if it's too late for the local one.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-31 16:17:05 -08:00
..
bpf bpf: Reject variable offset alu on PTR_TO_FLOW_KEYS 2024-01-25 15:27:50 -08:00
cgroup cgroup_freezer: cgroup_freezing: Check if not frozen 2023-12-13 18:39:19 +01:00
configs Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
debug kdb: Fix a potential buffer overflow in kdb_local() 2024-01-25 15:27:51 -08:00
dma dma-mapping: clear dev->dma_mem to NULL after freeing it 2024-01-25 15:27:28 -08:00
entry entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up 2023-03-30 12:49:13 +02:00
events perf: Fix perf_event_validate_size() lockdep splat 2023-12-20 17:00:24 +01:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:51:50 +01:00
gcov gcov: add support for checksum field 2022-12-31 13:33:11 +01:00
irq genirq/affinity: Move group_cpus_evenly() into lib/ 2024-01-10 17:10:33 +01:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-07-19 16:21:37 +02:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-11-20 11:52:10 +01:00
locking lockdep: Fix block chain corruption 2023-12-03 07:32:09 +01:00
module kallsyms: Make module_kallsyms_on_each_symbol generally available 2024-01-05 15:18:40 +01:00
power PM: hibernate: Enforce ordering during image compression/decompression 2024-01-31 16:16:58 -08:00
printk printk: Consolidate console deferred printing 2023-09-23 11:11:07 +02:00
rcu rcu: Defer RCU kthreads wakeup when CPU is dying 2024-01-31 16:17:05 -08:00
sched sched: Fix stop_one_cpu_nowait() vs hotplug 2023-11-20 11:51:50 +01:00
time tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug 2024-01-25 15:27:39 -08:00
trace tracing: Ensure visibility when inserting an element into tracing_map 2024-01-31 16:17:04 -08:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2022-12-31 13:32:58 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-01-31 16:17:00 -08:00
audit.c audit: use time_after to compare time 2022-08-29 19:47:03 -04:00
audit.h audit: remove selinux_audit_rule_update() declaration 2022-09-07 11:30:15 -04:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-22 18:50:06 -04:00
audit_tree.c
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-28 17:07:08 +00:00
auditfilter.c
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-25 12:03:04 +02:00
backtracetest.c
bounds.c mm: multi-gen LRU: minimal implementation 2022-09-26 19:46:09 -07:00
capability.c
cfi.c cfi: Switch to -fsanitize=kcfi 2022-09-26 10:13:13 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-06 12:10:40 +02:00
configs.c
context_tracking.c context_tracking: Fix noinstr vs KASAN 2023-03-10 09:33:45 +01:00
cpu.c cpu/SMT: Make SMT control more robust against enumeration failures 2024-01-10 17:10:26 +01:00
cpu_pm.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
crash_core.c vmcoreinfo: add kallsyms_num_syms symbol 2022-08-28 14:02:44 -07:00
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 17:00:20 +01:00
delayacct.c delayacct: support re-entrance detection of thrashing accounting 2022-09-26 19:46:07 -07:00
dma.c
exec_domain.c
exit.c exit: Detect and fix irq disabled state in oops 2023-03-10 09:33:45 +01:00
extable.c context_tracking: Take NMI eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 13:55:39 +01:00
fork.c kernel/fork: beware of __put_task_struct() calling context 2023-09-23 11:11:00 +02:00
freezer.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
gen_kheaders.sh kbuild: build init/built-in.a just once 2022-09-29 04:40:15 +09:00
groups.c security: Add LSM hook to setgroups() syscall 2022-07-15 18:21:49 +00:00
hung_task.c sched: Fix more TASK_state comparisons 2022-09-30 16:50:39 +02:00
iomem.c
irq_work.c
jump_label.c jump_label: make initial NOP patching the special case 2022-06-24 09:48:55 +02:00
kallsyms.c kallsyms: Add helper kallsyms_on_each_match_symbol() 2023-10-25 12:03:16 +02:00
kallsyms_internal.h kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] 2023-10-25 12:03:16 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: kmsan: unpoison area->list in kcov_remote_area_put() 2022-10-03 14:03:23 -07:00
kexec.c kernel: kexec: copy user-array safely 2023-11-28 17:06:57 +00:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-19 16:21:08 +02:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 16:00:55 +02:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-11 23:03:02 +09:00
kmod.c
kprobes.c kprobes: consistent rcu api usage for kretprobe holder 2023-12-13 18:39:17 +01:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2022-09-11 21:55:06 -07:00
kthread.c signal: break out of wait loops on kthread_stop() 2022-10-09 16:01:59 -07:00
latencytop.c latencytop: use the last element of latency_record of system 2022-09-11 21:55:12 -07:00
Makefile cfi: Fix CFI failure with KASAN 2022-12-31 13:33:08 +01:00
module_signature.c
notifier.c
nsproxy.c Revert "fs/exec: allow to unshare a time namespace on vfork+exec" 2022-09-13 10:38:43 -07:00
padata.c crypto: pcrypt - Fix hungtask for PADATA_RESET 2023-11-28 17:06:58 +00:00
panic.c panic: Reenable preemption in WARN slowpath 2023-09-23 11:11:09 +02:00
params.c
pid.c gfs2: Add glockfd debugfs file 2022-06-29 13:07:16 +02:00
pid_namespace.c rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() 2023-03-10 09:32:52 +01:00
profile.c kernel/profile.c: simplify duplicated code in profile_setup() 2022-09-11 21:55:12 -07:00
ptrace.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
range.c
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 17:07:13 +00:00
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-11 23:03:03 +09:00
resource.c PCI: Allow drivers to request exclusive config regions 2023-09-13 09:42:46 +02:00
resource_kunit.c
rseq.c rseq: Use pr_warn_once() when deprecated/unknown ABI flags are encountered 2022-11-14 09:58:32 +01:00
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:11:00 +02:00
scs.c
seccomp.c
signal.c mm: suppress mm fault logging if fatal signal already pending 2023-08-03 10:24:01 +02:00
smp.c smp,csd: Throw an error if a CSD lock is stuck for too long 2023-11-28 17:06:55 +00:00
smpboot.c smpboot: use atomic_try_cmpxchg in cpu_wait_death and cpu_report_death 2022-09-11 21:55:10 -07:00
smpboot.h
softirq.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
stackleak.c
stacktrace.c
static_call.c
static_call_inline.c
stop_machine.c
sys.c kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() 2023-04-26 14:28:39 +02:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-20 15:17:45 -07:00
sysctl-test.c kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred} 2022-09-08 16:56:45 -07:00
sysctl.c proc: proc_skip_spaces() shouldn't think it is working on C strings 2022-12-05 12:09:06 -08:00
task_work.c task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run 2022-09-11 21:55:10 -07:00
taskstats.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
torture.c torture: Fix hang during kthread shutdown phase 2023-03-10 09:34:07 +01:00
tracepoint.c tracepoint: Optimize the critical region of mutex_lock in tracepoint_module_coming() 2022-09-26 13:01:18 -04:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c freezer,umh: Fix call_usermode_helper_exec() vs SIGKILL 2023-02-22 12:59:50 +01:00
up.c
user-return-notifier.c
user.c
user_namespace.c ucounts: Split rlimit and ucount values and max values 2022-10-09 16:24:05 -07:00
usermode_driver.c
utsname.c
utsname_sysctl.c kernel/utsname_sysctl.c: Fix hostname polling 2022-10-23 12:01:01 -07:00
watch_queue.c kernel: watch_queue: copy user-array safely 2023-11-28 17:06:57 +00:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 17:07:09 +00:00
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-07-19 16:21:08 +02:00
workqueue.c workqueue: Provide one lock class key per work_on_cpu() callsite 2023-11-28 17:06:55 +00:00
workqueue_internal.h