android_kernel_msm-6.1_noth.../kernel
Daniel Borkmann 9cda812c76 bpf: Adjust insufficient default bpf_jit_limit
[ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ]

We've seen recent AWS EKS (Kubernetes) user reports like the following:

  After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS
  clusters after a few days a number of the nodes have containers stuck
  in ContainerCreating state or liveness/readiness probes reporting the
  following error:

    Readiness probe errored: rpc error: code = Unknown desc = failed to
    exec in container: failed to start exec "4a11039f730203ffc003b7[...]":
    OCI runtime exec failed: exec failed: unable to start container process:
    unable to init seccomp: error loading seccomp filter into kernel:
    error loading seccomp filter: errno 524: unknown

  However, we had not been seeing this issue on previous AMIs and it only
  started to occur on v20230217 (following the upgrade from kernel 5.4 to
  5.10) with no other changes to the underlying cluster or workloads.

  We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528)
  which helped to immediately allow containers to be created and probes to
  execute but after approximately a day the issue returned and the value
  returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
  was steadily increasing.

I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem
their sizes passed in as well as bpf_jit_current under tcpdump BPF filter,
seccomp BPF and native (e)BPF programs, and the behavior all looks sane
and expected, that is nothing "leaking" from an upstream perspective.

The bpf_jit_limit knob was originally added in order to avoid a situation
where unprivileged applications loading BPF programs (e.g. seccomp BPF
policies) consuming all the module memory space via BPF JIT such that loading
of kernel modules would be prevented. The default limit was defined back in
2018 and while good enough back then, we are generally seeing far more BPF
consumers today.

Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the
module memory space to better reflect today's needs and avoid more users
running into potentially hard to debug issues.

Fixes: fdadd04931 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Reported-by: Stephen Haynes <sh@synk.net>
Reported-by: Lefteris Alexakis <lefteris.alexakis@kpn.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://github.com/awslabs/amazon-eks-ami/issues/1179
Link: https://github.com/awslabs/amazon-eks-ami/issues/1219
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230320143725.8394-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-30 12:49:08 +02:00
..
bpf bpf: Adjust insufficient default bpf_jit_limit 2023-03-30 12:49:08 +02:00
cgroup cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task 2023-02-14 19:11:45 +01:00
configs Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
debug mm: remove vmacache 2022-09-26 19:46:18 -07:00
dma - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
entry entry: kmsan: introduce kmsan_unpoison_entry_regs() 2022-10-03 14:03:25 -07:00
events perf: fix perf_event_context->time 2023-03-30 12:48:59 +02:00
futex futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error 2023-01-04 11:28:58 +01:00
gcov gcov: add support for checksum field 2022-12-31 13:33:11 +01:00
irq genirq/ipi: Fix NULL pointer deref in irq_data_get_affinity_mask() 2023-03-11 13:55:29 +01:00
kcsan kcsan: test: don't put the expect array on the stack 2023-02-01 08:34:29 +01:00
livepatch Livepatching changes for 6.1 2022-10-10 11:36:19 -07:00
locking locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2023-03-10 09:34:06 +01:00
module module: Don't wait for GOING modules 2023-02-01 08:34:37 +01:00
power PM: EM: fix memory leak with using debugfs_lookup() 2023-03-10 09:33:53 +01:00
printk kernel/printk/index.c: fix memory leak with using debugfs_lookup() 2023-03-11 13:55:32 +01:00
rcu rcu-tasks: Handle queue-shrink/callback-enqueue race condition 2023-03-10 09:33:48 +01:00
sched wait: Return number of exclusive waiters awaken 2023-03-10 09:34:34 +01:00
time time/debug: Fix memory leak with using debugfs_lookup() 2023-03-10 09:33:52 +01:00
trace tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr 2023-03-30 12:48:59 +02:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2022-12-31 13:32:58 +01:00
async.c
audit.c audit: use time_after to compare time 2022-08-29 19:47:03 -04:00
audit.h audit: remove selinux_audit_rule_update() declaration 2022-09-07 11:30:15 -04:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-22 18:50:06 -04:00
audit_tree.c
audit_watch.c audit_init_parent(): constify path 2022-09-01 17:39:30 -04:00
auditfilter.c
auditsc.c audit/stable-6.1 PR 20221003 2022-10-04 11:05:43 -07:00
backtracetest.c
bounds.c mm: multi-gen LRU: minimal implementation 2022-09-26 19:46:09 -07:00
capability.c
cfi.c cfi: Switch to -fsanitize=kcfi 2022-09-26 10:13:13 -07:00
compat.c
configs.c
context_tracking.c context_tracking: Fix noinstr vs KASAN 2023-03-10 09:33:45 +01:00
cpu.c cpu/hotplug: Do not bail-out in DYING/STARTING sections 2022-12-31 13:31:59 +01:00
cpu_pm.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
crash_core.c vmcoreinfo: add kallsyms_num_syms symbol 2022-08-28 14:02:44 -07:00
crash_dump.c
cred.c
delayacct.c delayacct: support re-entrance detection of thrashing accounting 2022-09-26 19:46:07 -07:00
dma.c
exec_domain.c
exit.c exit: Detect and fix irq disabled state in oops 2023-03-10 09:33:45 +01:00
extable.c context_tracking: Take NMI eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 13:55:39 +01:00
fork.c fork: allow CLONE_NEWTIME in clone3 flags 2023-03-17 08:50:14 +01:00
freezer.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
gen_kheaders.sh kbuild: build init/built-in.a just once 2022-09-29 04:40:15 +09:00
groups.c security: Add LSM hook to setgroups() syscall 2022-07-15 18:21:49 +00:00
hung_task.c sched: Fix more TASK_state comparisons 2022-09-30 16:50:39 +02:00
iomem.c
irq_work.c
jump_label.c jump_label: make initial NOP patching the special case 2022-06-24 09:48:55 +02:00
kallsyms.c kcfi updates for v6.1-rc1 2022-10-03 17:11:07 -07:00
kallsyms_internal.h kallsyms: move declarations to internal header 2022-07-17 17:31:39 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: kmsan: unpoison area->list in kcov_remote_area_put() 2022-10-03 14:03:23 -07:00
kexec.c panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kexec_core.c kexec: replace kmap() with kmap_local_page() 2022-09-11 21:55:08 -07:00
kexec_elf.c
kexec_file.c panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kheaders.c
kmod.c
kprobes.c kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list 2023-03-10 09:34:27 +01:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2022-09-11 21:55:06 -07:00
kthread.c signal: break out of wait loops on kthread_stop() 2022-10-09 16:01:59 -07:00
latencytop.c latencytop: use the last element of latency_record of system 2022-09-11 21:55:12 -07:00
Makefile cfi: Fix CFI failure with KASAN 2022-12-31 13:33:08 +01:00
module_signature.c
notifier.c notifier: Add blocking/atomic_notifier_chain_register_unique_prio() 2022-05-19 19:30:30 +02:00
nsproxy.c Revert "fs/exec: allow to unshare a time namespace on vfork+exec" 2022-09-13 10:38:43 -07:00
padata.c padata: Fix list iterator in padata_do_serial() 2022-12-31 13:32:34 +01:00
panic.c panic: fix the panic_print NMI backtrace setting 2023-03-10 09:34:25 +01:00
params.c
pid.c gfs2: Add glockfd debugfs file 2022-06-29 13:07:16 +02:00
pid_namespace.c rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() 2023-03-10 09:32:52 +01:00
profile.c kernel/profile.c: simplify duplicated code in profile_setup() 2022-09-11 21:55:12 -07:00
ptrace.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
range.c
reboot.c kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode 2022-10-04 15:59:36 +02:00
regset.c
relay.c relay: fix type mismatch when allocating memory in relay_create_buf() 2022-12-31 13:32:00 +01:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-03-10 09:34:25 +01:00
resource_kunit.c
rseq.c rseq: Use pr_warn_once() when deprecated/unknown ABI flags are encountered 2022-11-14 09:58:32 +01:00
scftorture.c
scs.c
seccomp.c
signal.c Scheduler changes for v6.1: 2022-10-10 09:10:28 -07:00
smp.c bitmap patches for v6.1-rc1 2022-10-10 12:49:34 -07:00
smpboot.c smpboot: use atomic_try_cmpxchg in cpu_wait_death and cpu_report_death 2022-09-11 21:55:10 -07:00
smpboot.h
softirq.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
stackleak.c
stacktrace.c
static_call.c
static_call_inline.c
stop_machine.c Scheduler changes in this cycle were: 2022-05-24 11:11:13 -07:00
sys.c prlimit: do_prlimit needs to have a speculation check 2023-01-24 07:24:34 +01:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-20 15:17:45 -07:00
sysctl-test.c kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred} 2022-09-08 16:56:45 -07:00
sysctl.c proc: proc_skip_spaces() shouldn't think it is working on C strings 2022-12-05 12:09:06 -08:00
task_work.c task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run 2022-09-11 21:55:10 -07:00
taskstats.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
torture.c torture: Fix hang during kthread shutdown phase 2023-03-10 09:34:07 +01:00
tracepoint.c tracepoint: Optimize the critical region of mutex_lock in tracepoint_module_coming() 2022-09-26 13:01:18 -04:00
tsacct.c
ucount.c ucounts: Split rlimit and ucount values and max values 2022-05-18 18:24:57 -05:00
uid16.c
uid16.h
umh.c freezer,umh: Fix call_usermode_helper_exec() vs SIGKILL 2023-02-22 12:59:50 +01:00
up.c
user-return-notifier.c
user.c
user_namespace.c ucounts: Split rlimit and ucount values and max values 2022-10-09 16:24:05 -07:00
usermode_driver.c blob_to_mnt(): kern_unmount() is needed to undo kern_mount() 2022-05-19 23:25:47 -04:00
utsname.c
utsname_sysctl.c kernel/utsname_sysctl.c: Fix hostname polling 2022-10-23 12:01:01 -07:00
watch_queue.c watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths 2023-03-17 08:50:30 +01:00
watchdog.c powerpc updates for 6.0 2022-08-06 16:38:17 -07:00
watchdog_hld.c Revert "printk: add functions to prefer direct printing" 2022-06-23 18:41:40 +02:00
workqueue.c workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex 2023-03-10 09:32:53 +01:00
workqueue_internal.h