android_kernel_msm-6.1_noth.../include
John Fastabend ba4fec5bd6 bpf, sockmap: Improved check for empty queue
[ Upstream commit 405df89dd52cbcd69a3cd7d9a10d64de38f854b2 ]

We noticed some rare sk_buffs were stepping past the queue when system was
under memory pressure. The general theory is to skip enqueueing
sk_buffs when its not necessary which is the normal case with a system
that is properly provisioned for the task, no memory pressure and enough
cpu assigned.

But, if we can't allocate memory due to an ENOMEM error when enqueueing
the sk_buff into the sockmap receive queue we push it onto a delayed
workqueue to retry later. When a new sk_buff is received we then check
if that queue is empty. However, there is a problem with simply checking
the queue length. When a sk_buff is being processed from the ingress queue
but not yet on the sockmap msg receive queue its possible to also recv
a sk_buff through normal path. It will check the ingress queue which is
zero and then skip ahead of the pkt being processed.

Previously we used sock lock from both contexts which made the problem
harder to hit, but not impossible.

To fix instead of popping the skb from the queue entirely we peek the
skb from the queue and do the copy there. This ensures checks to the
queue length are non-zero while skb is being processed. Then finally
when the entire skb has been copied to user space queue or another
socket we pop it off the queue. This way the queue length check allows
bypassing the queue only after the list has been completely processed.

To reproduce issue we run NGINX compliance test with sockmap running and
observe some flakes in our testing that we attributed to this issue.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-5-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05 09:26:19 +02:00
..
acpi ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type() 2023-04-13 16:55:33 +02:00
asm-generic asm-generic/io.h: suppress endianness warnings for readq() and writeq() 2023-05-11 23:02:58 +09:00
clocksource
crypto crypto: api - Add scaffolding to change completion function signature 2023-05-17 11:53:40 +02:00
drm drm: fix drmm_mutex_init() 2023-05-30 14:03:20 +01:00
dt-bindings dt-bindings: clocks: imx8mp: Add ID for usb suspend clock 2022-12-31 13:33:09 +01:00
keys
kunit kunit: fix kunit_test_init_section_suites(...) 2023-02-09 11:28:08 +01:00
kvm KVM: arm64: PMU: Align chained counter implementation with architecture pseudocode 2023-04-13 16:55:17 +02:00
linux bpf, sockmap: Improved check for empty queue 2023-06-05 09:26:19 +02:00
math-emu
media media: uvcvideo: Add GUID for BGRA/X 8:8:8:8 2023-03-11 13:55:35 +01:00
memory memory: renesas-rpc-if: Split-off private data from struct rpcif 2023-03-11 13:55:17 +01:00
misc
net tls: rx: strp: preserve decryption status of skbs when needed 2023-06-05 09:26:18 +02:00
pcmcia
ras
rdma
rv
scsi scsi: libsas: Add sas_ata_device_link_abort() 2023-05-11 23:03:20 +09:00
soc ARM: at91: pm: avoid soft resetting AC DLL 2022-11-01 12:25:19 +02:00
sound ASoC: amd: fix ACP version typo mistake 2023-05-11 23:02:59 +09:00
target scsi: target: Fix multiple LUN_RESET handling 2023-05-11 23:03:19 +09:00
trace f2fs: refactor extent_cache to support for read and more 2023-05-17 11:53:52 +02:00
uapi ipv{4,6}/raw: fix output xfrm lookup wrt protocol 2023-06-05 09:26:16 +02:00
ufs scsi: ufs: exynos: Fix DMA alignment for PAGE_SIZE != 4096 2023-03-10 09:33:15 +01:00
vdso
video
xen ACPI: processor: Fix evaluating _PDC method when running as Xen dom0 2023-05-11 23:03:11 +09:00