Kernel Source and devicetree for NOTHING Phone(3a) and Phone(3a)Pro
Find a file
John Fastabend 9f4d7efb33 bpf, sockmap: Convert schedule_work into delayed_work
[ Upstream commit 29173d07f79883ac94f5570294f98af3d4287382 ]

Sk_buffs are fed into sockmap verdict programs either from a strparser
(when the user might want to decide how framing of skb is done by attaching
another parser program) or directly through tcp_read_sock. The
tcp_read_sock is the preferred method for performance when the BPF logic is
a stream parser.

The flow for Cilium's common use case with a stream parser is,

 tcp_read_sock()
  sk_psock_verdict_recv
    ret = bpf_prog_run_pin_on_cpu()
    sk_psock_verdict_apply(sock, skb, ret)
     // if system is under memory pressure or app is slow we may
     // need to queue skb. Do this queuing through ingress_skb and
     // then kick timer to wake up handler
     skb_queue_tail(ingress_skb, skb)
     schedule_work(work);

The work queue is wired up to sk_psock_backlog(). This will then walk the
ingress_skb skb list that holds our sk_buffs that could not be handled,
but should be OK to run at some later point. However, its possible that
the workqueue doing this work still hits an error when sending the skb.
When this happens the skbuff is requeued on a temporary 'state' struct
kept with the workqueue. This is necessary because its possible to
partially send an skbuff before hitting an error and we need to know how
and where to restart when the workqueue runs next.

Now for the trouble, we don't rekick the workqueue. This can cause a
stall where the skbuff we just cached on the state variable might never
be sent. This happens when its the last packet in a flow and no further
packets come along that would cause the system to kick the workqueue from
that side.

To fix we could do simple schedule_work(), but while under memory pressure
it makes sense to back off some instead of continue to retry repeatedly. So
instead to fix convert schedule_work to schedule_delayed_work and add
backoff logic to reschedule from backlog queue on errors. Its not obvious
though what a good backoff is so use '1'.

To test we observed some flakes whil running NGINX compliance test with
sockmap we attributed these failed test to this bug and subsequent issue.

>From on list discussion. This commit

 bec217197b41("skmsg: Schedule psock work if the cached skb exists on the psock")

was intended to address similar race, but had a couple cases it missed.
Most obvious it only accounted for receiving traffic on the local socket
so if redirecting into another socket we could still get an sk_buff stuck
here. Next it missed the case where copied=0 in the recv() handler and
then we wouldn't kick the scheduler. Also its sub-optimal to require
userspace to kick the internal mechanisms of sockmap to wake it up and
copy data to user. It results in an extra syscall and requires the app
to actual handle the EAGAIN correctly.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-3-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05 09:26:18 +02:00
arch arm64: dts: imx8mn-var-som: fix PHY detection bug by adding deassert delay 2023-05-30 14:03:33 +01:00
block block, bfq: Fix division by zero error on zero wsum 2023-05-24 17:32:38 +01:00
certs certs: Fix build error when PKCS#11 URI contains semicolon 2023-02-09 11:28:11 +01:00
crypto crypto: testmgr - fix RNG performance in fuzz tests 2023-05-24 17:32:53 +01:00
Documentation dt-binding: cdns,usb3: Fix cdns,on-chip-buff-size type 2023-05-30 14:03:19 +01:00
drivers gpio-f7188x: fix chip name and pin count on Nuvoton chip 2023-06-05 09:26:18 +02:00
fs cifs: mapchars mount option ignored 2023-05-30 14:03:21 +01:00
include bpf, sockmap: Convert schedule_work into delayed_work 2023-06-05 09:26:18 +02:00
init gcc: disable '-Warray-bounds' for gcc-13 too 2023-04-26 14:28:43 +02:00
io_uring io_uring/rsrc: use nospec'ed indexes 2023-05-11 23:03:24 +09:00
ipc ipc: fix memory leak in init_mqueue_fs() 2022-12-31 13:32:01 +01:00
kernel x86/pci/xen: populate MSI sysfs entries 2023-05-30 14:03:22 +01:00
lib debugobjects: Don't wake up kswapd from fill_pool() 2023-05-30 14:03:20 +01:00
LICENSES LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers 2021-12-16 14:33:10 +01:00
mm mm: fix zswap writeback race condition 2023-05-24 17:32:51 +01:00
net bpf, sockmap: Convert schedule_work into delayed_work 2023-06-05 09:26:18 +02:00
rust rust: kernel: Mark rust_fmt_argument as extern "C" 2023-04-26 14:28:38 +02:00
samples samples/bpf: Fix fout leak in hbm's run_bpf_prog 2023-05-24 17:32:38 +01:00
scripts recordmcount: Fix memory leaks in the uwrite function 2023-05-24 17:32:41 +01:00
security selinux: ensure av_permissions.h is built when needed 2023-05-11 23:03:06 +09:00
sound ASoC: Intel: avs: Access path components under lock 2023-05-30 14:03:32 +01:00
tools selftests/bpf: Fix pkg-config call building sign-file 2023-06-05 09:26:17 +02:00
usr usr/gen_init_cpio.c: remove unnecessary -1 values from int file 2022-10-03 14:21:44 -07:00
virt KVM: Fix vcpu_array[0] races 2023-05-24 17:32:50 +01:00
.clang-format inet: ping: use hlist_nulls rcu iterator during lookup 2022-12-01 12:42:46 +01:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes
.gitignore Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
.mailmap 9 hotfixes. 6 for MM, 3 for other areas. Four of these patches address 2022-12-10 17:10:52 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Remove Michal Marek from Kbuild maintainers 2022-11-16 14:53:00 +09:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS platform/x86: Move existing HP drivers to a new hp subdir 2023-05-24 17:32:42 +01:00
Makefile Linux 6.1.31 2023-05-30 14:03:33 +01:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.