android_kernel_msm-6.1_noth.../include/net
mfreemon@cloudflare.com 0796c53424 tcp: enforce receive buffer memory limits by allowing the tcp window to shrink
[ Upstream commit b650d953cd391595e536153ce30b4aab385643ac ]

Under certain circumstances, the tcp receive buffer memory limit
set by autotuning (sk_rcvbuf) is increased due to incoming data
packets as a result of the window not closing when it should be.
This can result in the receive buffer growing all the way up to
tcp_rmem[2], even for tcp sessions with a low BDP.

To reproduce:  Connect a TCP session with the receiver doing
nothing and the sender sending small packets (an infinite loop
of socket send() with 4 bytes of payload with a sleep of 1 ms
in between each send()).  This will cause the tcp receive buffer
to grow all the way up to tcp_rmem[2].

As a result, a host can have individual tcp sessions with receive
buffers of size tcp_rmem[2], and the host itself can reach tcp_mem
limits, causing the host to go into tcp memory pressure mode.

The fundamental issue is the relationship between the granularity
of the window scaling factor and the number of byte ACKed back
to the sender.  This problem has previously been identified in
RFC 7323, appendix F [1].

The Linux kernel currently adheres to never shrinking the window.

In addition to the overallocation of memory mentioned above, the
current behavior is functionally incorrect, because once tcp_rmem[2]
is reached when no remediations remain (i.e. tcp collapse fails to
free up any more memory and there are no packets to prune from the
out-of-order queue), the receiver will drop in-window packets
resulting in retransmissions and an eventual timeout of the tcp
session.  A receive buffer full condition should instead result
in a zero window and an indefinite wait.

In practice, this problem is largely hidden for most flows.  It
is not applicable to mice flows.  Elephant flows can send data
fast enough to "overrun" the sk_rcvbuf limit (in a single ACK),
triggering a zero window.

But this problem does show up for other types of flows.  Examples
are websockets and other type of flows that send small amounts of
data spaced apart slightly in time.  In these cases, we directly
encounter the problem described in [1].

RFC 7323, section 2.4 [2], says there are instances when a retracted
window can be offered, and that TCP implementations MUST ensure
that they handle a shrinking window, as specified in RFC 1122,
section 4.2.2.16 [3].  All prior RFCs on the topic of tcp window
management have made clear that sender must accept a shrunk window
from the receiver, including RFC 793 [4] and RFC 1323 [5].

This patch implements the functionality to shrink the tcp window
when necessary to keep the right edge within the memory limit by
autotuning (sk_rcvbuf).  This new functionality is enabled with
the new sysctl: net.ipv4.tcp_shrink_window

Additional information can be found at:
https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/

[1] https://www.rfc-editor.org/rfc/rfc7323#appendix-F
[2] https://www.rfc-editor.org/rfc/rfc7323#section-2.4
[3] https://www.rfc-editor.org/rfc/rfc1122#page-91
[4] https://www.rfc-editor.org/rfc/rfc793
[5] https://www.rfc-editor.org/rfc/rfc1323

Signed-off-by: Mike Freemon <mfreemon@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-19 23:08:54 +02:00
..
9p net/9p: add 'pooled_rbuffers' flag to struct p9_trans_module 2022-10-05 07:05:41 +09:00
bluetooth Bluetooth: use RCU for hci_conn_params and iterate safely in hci_sync 2023-07-27 08:50:47 +02:00
caif net: remove the caif_hsi driver 2021-07-01 13:19:48 -07:00
iucv net/af_iucv: Use struct_group() to zero struct iucv_sock region 2021-11-19 11:52:25 +00:00
netfilter netfilter: nf_tables: fix kdoc warnings after gc rework 2023-10-06 14:57:02 +02:00
netns tcp: enforce receive buffer memory limits by allowing the tcp window to shrink 2023-10-19 23:08:54 +02:00
nfc NFC: add NCI_UNREG flag to eliminate the race 2021-11-17 20:17:05 -08:00
phonet net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
sctp sctp: add a refcnt in sctp_stream_priorities to avoid a nested loop 2023-03-11 13:55:26 +01:00
tc_act net/sched: transition act_pedit to rcu and percpu stats 2023-03-11 13:55:28 +01:00
6lowpan.h
act_api.h net: sched: act: move global static variable net_id to tc_action_ops 2022-09-09 08:24:41 +01:00
addrconf.h ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr 2022-07-28 10:42:44 -07:00
af_ieee802154.h
af_rxrpc.h rxrpc: Remove rxrpc_get_reply_time() which is no longer used 2022-09-01 11:44:13 +01:00
af_unix.h af_unix: Remove unix_table_locks. 2022-06-22 12:59:43 +01:00
af_vsock.h vsock: add API call for data ready 2022-08-23 10:43:11 +02:00
ah.h
amt.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
arp.h neighbour: switch to standard rcu, instead of rcu_bh 2023-10-10 22:00:42 +02:00
atmclip.h
ax25.h ax25: fix incorrect dev_tracker usage 2022-07-28 22:06:15 -07:00
ax88796.h ax88796: Fix some typo in a comment 2022-08-09 22:14:02 -07:00
bareudp.h bareudp: Move definition of struct bareudp_conf to bareudp.c 2021-12-13 12:34:09 +00:00
bond_3ad.h net: bonding: Share lacpdu_mcast_addr definition 2022-09-16 14:34:01 +01:00
bond_alb.h bonding (gcc13): synchronize bond_{a,t}lb_xmit() types 2023-05-11 23:03:41 +09:00
bond_options.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
bonding.h bonding: fix macvlan over alb bond support 2023-08-30 16:11:04 +02:00
bpf_sk_storage.h bpf: struct sock is declared twice in bpf_sk_storage header 2021-03-26 17:43:55 +01:00
busy_poll.h net: Fix a data-race around sysctl_net_busy_poll. 2022-08-24 13:46:58 +01:00
calipso.h
cfg80211-wext.h
cfg80211.h wifi: cfg80211: add missing kernel-doc for cqm_rssi_work 2023-10-10 22:00:40 +02:00
cfg802154.h net: wrap the wireless pointers in struct net_device in an ifdef 2022-05-22 21:51:54 +01:00
checksum.h powerpc/net: Implement powerpc specific csum_shift() to remove branch 2022-03-11 10:57:22 +00:00
cipso_ipv4.h
cls_cgroup.h
codel.h codel: remove unnecessary pkt_sched.h include 2021-12-22 15:03:51 -08:00
codel_impl.h codel: remove unnecessary sock.h include 2021-12-22 15:03:47 -08:00
codel_qdisc.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
compat.h net: copy from user before calling __get_compat_msghdr 2022-07-24 18:39:17 -06:00
datalink.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
dcbevent.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
dcbnl.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
devlink.h net: devlink: add port_init/fini() helpers to allow pre-register/post-unregister functions 2022-09-30 18:17:16 -07:00
dropreason.h net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM 2022-09-07 15:28:08 +01:00
dsa.h net: dsa: remove bool devlink_port_setup 2022-09-30 18:17:17 -07:00
dsfield.h
dst.h net: add atomic_long_t to net_device_stats fields 2022-12-31 13:33:02 +01:00
dst_cache.h wireguard: device: reset peer src endpoint when netns exits 2021-11-29 19:50:45 -08:00
dst_metadata.h Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2022-10-03 07:52:13 +01:00
dst_ops.h
erspan.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
esp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
espintcp.h
ethoc.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
failover.h net: failover: add net device refcount tracker 2021-12-06 16:06:02 -08:00
fib_notifier.h
fib_rules.h fib: expand fib_rule_policy 2021-12-16 07:18:35 -08:00
firewire.h firewire: net: Make use of get_unaligned_be48(), put_unaligned_be48() 2022-07-28 22:21:54 -07:00
flow.h net: Remove DECnet leftovers from flow.h. 2022-10-03 12:41:59 +01:00
flow_dissector.h flow_dissector: Add L2TPv3 dissectors 2022-09-20 09:13:38 +02:00
flow_offload.h flow_offload: Introduce flow_match_l2tpv3 2022-09-20 09:13:38 +02:00
fou.h
fq.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
fq_impl.h net/fq_impl: Use the bitmap API to allocate bitmaps 2022-07-11 19:49:38 -07:00
garp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
gen_stats.h net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
genetlink.h genetlink: piggy back on resv_op to default to a reject policy 2022-10-24 19:08:46 -07:00
geneve.h
gre.h ip_gre: add csum offload support for gre header 2021-01-29 20:39:14 -08:00
gro.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-08-25 16:07:42 -07:00
gro_cells.h
gtp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
gue.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
hwbm.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
icmp.h ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages 2021-06-28 14:29:45 -07:00
ieee80211_radiotap.h ieee80211: radiotap: fix -Wcast-qual warnings 2022-02-04 16:25:21 +01:00
ieee802154_netdev.h net: ieee802154: return -EINVAL for unknown addr type 2022-10-07 08:42:00 +01:00
if_inet6.h ipv6: fix locking issues with loops over idev->addr_list 2022-04-06 22:09:39 -07:00
ife.h
ila.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
inet6_connection_sock.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
inet6_hashtables.h net: allow unbound socket for packets in VRF when tcp_l3mdev_accept set 2022-07-29 11:58:54 +01:00
inet_common.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
inet_connection_sock.h net: Add a bhash2 table hashed by port and address 2022-08-24 19:30:07 -07:00
inet_dscp.h ipv6: Define dscp_t and stop taking ECN bits into account in fib6-rules 2022-02-07 20:12:45 -08:00
inet_ecn.h net: add skb_get_dsfield() helper 2021-10-15 11:33:08 +01:00
inet_frag.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
inet_hashtables.h tcp: Add TIME_WAIT sockets in bhash2. 2023-01-12 12:02:15 +01:00
inet_sock.h ipv4: fix data-races around inet->inet_id 2023-08-30 16:11:02 +02:00
inet_timewait_sock.h tcp: Add TIME_WAIT sockets in bhash2. 2023-01-12 12:02:15 +01:00
inetpeer.h
ioam6.h treewide: Replace zero-length arrays with flexible-array members 2022-02-17 07:00:39 -06:00
ip.h ipv4: ignore dst hint for multipath routes 2023-09-19 12:28:01 +02:00
ip6_checksum.h net: move gro definitions to include/net/gro.h 2021-11-16 13:16:54 +00:00
ip6_fib.h ipv6: Remove in6addr_any alternatives. 2023-09-19 12:28:10 +02:00
ip6_route.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
ip6_tunnel.h ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode 2022-04-25 11:40:45 +01:00
ip_fib.h net: fib: avoid warn splat in flow dissector 2023-09-19 12:28:01 +02:00
ip_tunnels.h ip_tunnels: use DEV_STATS_INC() 2023-09-19 12:28:03 +02:00
ip_vs.h ipvs: Update width of source for ip_vs_sync_conn_options 2023-05-24 17:32:39 +01:00
ipcomp.h xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
ipv6.h tcp: Fix bind() regression for v4-mapped-v6 wildcard address. 2023-09-19 12:28:10 +02:00
ipv6_frag.h net: don't include ndisc.h from ipv6.h 2022-02-04 14:15:11 -08:00
ipv6_stubs.h bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt() 2022-09-02 20:34:32 -07:00
iw_handler.h
kcm.h
l3mdev.h
lag.h
lapb.h net: lapb: Make "lapb_t1timer_running" able to detect an already running timer 2021-03-23 14:14:50 -07:00
lib80211.h
llc.h llc: fix out-of-bound array index in llc_sk_dev_hash() 2021-11-07 19:25:29 +00:00
llc_c_ac.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
llc_c_ev.h
llc_c_st.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
llc_conn.h llc: add net device refcount tracker 2021-12-07 20:44:59 -08:00
llc_if.h llc/snap: constify dev_addr passing 2021-10-13 09:40:46 -07:00
llc_pdu.h net: llc: fix skb_over_panic 2021-07-27 13:05:56 +01:00
llc_s_ac.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
llc_s_ev.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
llc_s_st.h add missing includes and forward declarations to networking includes under linux/ 2022-07-28 11:29:36 +02:00
llc_sap.h
lwtunnel.h lwt: Check LWTUNNEL_XMIT_CONTINUE strictly 2023-09-13 09:42:33 +02:00
mac80211.h mac80211: make ieee80211_tx_info padding explicit 2023-09-13 09:42:34 +02:00
mac802154.h net: mac802154: Create an error helper for asynchronous offloading errors 2022-04-25 20:51:12 +02:00
macsec.h net: macsec: indicate next pn update when offloading 2023-10-19 23:08:53 +02:00
mctp.h mctp: Use output netdev to allocate skb headroom 2022-04-01 12:04:15 +01:00
mctpdevice.h mctp: Pass flow data & flow release events to drivers 2021-10-29 13:23:51 +01:00
mip6.h
mld.h mld: add new workqueues for process mld events 2021-03-26 15:14:56 -07:00
mpls.h
mpls_iptunnel.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
mptcp.h mptcp: remove MPTCP 'ifdef' in TCP SYN cookies 2023-01-07 11:11:44 +01:00
mrp.h mrp: introduce active flags to prevent UAF when applicant uninit 2022-12-31 13:33:02 +01:00
ncsi.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
ndisc.h neighbour: switch to standard rcu, instead of rcu_bh 2023-10-10 22:00:42 +02:00
neighbour.h neighbour: fix data-races around n->output 2023-10-10 22:00:42 +02:00
net_debug.h net: add CONFIG_DEBUG_NET 2022-05-11 12:43:10 +01:00
net_failover.h
net_namespace.h netfilter: nf_flow_table: count pending offload workqueue tasks 2022-07-11 16:25:14 +02:00
net_ratelimit.h
net_trackers.h net: add networking namespace refcount tracker 2021-12-10 06:38:26 -08:00
netevent.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
netlabel.h
netlink.h netlink: split up copies in the ack construction 2023-10-10 22:00:44 +02:00
netprio_cgroup.h
netrom.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
nexthop.h ipv6: remove nexthop_fib6_nh_bh() 2023-10-10 22:00:46 +02:00
nl802154.h net: ieee802154: Fix compilation error when CONFIG_IEEE802154_NL802154_EXPERIMENTAL is disabled 2022-09-02 19:59:08 -07:00
nsh.h
p8022.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
page_pool.h page_pool: fix inconsistency for page_pool_ring_[un]lock() 2023-06-05 09:26:20 +02:00
pie.h
ping.h net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294 2023-06-14 11:15:16 +02:00
pkt_cls.h net: sched: cls_api: introduce tc_cls_bind_class() helper 2022-10-02 16:07:17 +01:00
pkt_sched.h net/sched: make psched_mtu() RTNL-less safe 2023-07-23 13:49:27 +02:00
pptp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
protocol.h tcp/udp: Make early_demux back namespacified. 2022-07-15 18:50:35 -07:00
psample.h psample: Add a fwd declaration for skbuff 2021-08-09 15:34:21 -07:00
psnap.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
raw.h raw: Fix NULL deref in raw_get_next(). 2023-04-13 16:55:23 +02:00
rawv6.h raw: convert raw sockets to RCU 2022-06-19 10:00:02 +01:00
red.h treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
regulatory.h wifi: cfg80211: fix regulatory disconnect with OCB/NAN 2023-07-19 16:21:10 +02:00
request_sock.h tcp: Use BPF timeout setting for SYN ACK RTO 2022-02-02 14:45:18 +00:00
rose.h net: rose: add netdev ref tracker to 'struct rose_sock' 2022-08-01 11:59:23 -07:00
route.h net: annotate data-races around sk->sk_mark 2023-08-11 12:08:14 +02:00
rpl.h ipv6: rpl: Fix Route of Death. 2023-06-14 11:15:20 +02:00
rsi_91x.h
rtnetlink.h net: validate veth and vxcan peer ifindexes 2023-08-30 16:11:02 +02:00
rtnh.h
sch_generic.h net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting 2023-06-21 16:01:01 +02:00
scm.h scm: fix MSG_CTRUNC setting condition for SO_PASSSEC 2023-05-11 23:03:18 +09:00
secure_seq.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
seg6.h udp6: Use Segment Routing Header for dest address if present 2022-01-04 12:17:35 +00:00
seg6_hmac.h
seg6_local.h
selftests.h net: selftest: fix build issue if INET is disabled 2021-04-28 14:06:45 -07:00
slhc_vj.h
smc.h net/smc: Pass on DMBE bit mask in IRQ handler 2022-07-27 13:24:42 +01:00
snmp.h
sock.h net: annotate data-races around sk->sk_forward_alloc 2023-09-19 12:28:01 +02:00
sock_reuseport.h soreuseport: Fix socket selection for SO_INCOMING_CPU. 2022-12-31 13:32:04 +01:00
Space.h wan: remove sbni/granch driver 2021-08-03 13:05:26 +01:00
stp.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
strparser.h tls: rx: remove the message decrypted tracking 2022-07-18 11:24:10 +01:00
switchdev.h net: switchdev: add reminder near struct switchdev_notifier_fdb_info 2022-06-29 20:37:36 -07:00
tcp.h tcp: fix quick-ack counting to count actual ACKs of new data 2023-10-10 22:00:43 +02:00
tcp_states.h
timewait_sock.h
tipc.h
tls.h tls: rx: strp: preserve decryption status of skbs when needed 2023-06-05 09:26:18 +02:00
tls_toe.h
transp_v6.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
tso.h
tun_proto.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
udp.h tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). 2022-10-12 17:50:37 -07:00
udp_tunnel.h rxrpc: Fix ICMP/ICMP6 error handling 2022-09-01 11:42:12 +01:00
udplite.h tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). 2022-10-12 17:50:37 -07:00
vsock_addr.h
vxlan.h vxlan: Fix nexthop hash size 2023-08-11 12:08:17 +02:00
wext.h
x25.h
x25device.h
xdp.h xdp: Adjust xdp_frame layout to avoid using bitfields 2022-09-26 13:28:19 -07:00
xdp_priv.h net: add missing includes and forward declarations under net/ 2022-07-22 12:53:22 +01:00
xdp_sock.h net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
xdp_sock_drv.h xsk: Remove unused xsk_buff_discard 2022-09-30 07:55:46 -07:00
xfrm.h xfrm: Treat already-verified secpath entries as optional 2023-06-28 11:12:28 +02:00
xsk_buff_pool.h xsk: Fix unaligned descriptor validation 2023-05-11 23:03:21 +09:00