android_kernel_msm-6.1_noth.../include
Lawrence Brakmo e6546ef6d8 bpf: add support for BPF_SOCK_OPS_BASE_RTT
A congestion control algorithm can make a call to the BPF socket_ops
program to request the base RTT. The base RTT can be congestion control
dependent and is meant to represent a congestion threshold such that
RTTs above it indicate congestion. This is especially useful for flows
within a DC where the base RTT is easy to obtain.

Being provided a base RTT solves a basic problem in RTT based congestion
avoidance algorithms (such as Vegas, NV and BBR). Although it is easy
to get the base RTT when the network is not congested, it is very
diffcult to do when it is very congested. Newer connections get an
inflated value of the base RTT leading to unfariness (newer flows with a
larger base RTT get more bandwidth). As a result, RTT based congestion
avoidance algorithms tend to update their base RTTs to improve fairness.
In very congested networks this can lead to base RTT inflation, reducing
the ability of these RTT based congestion control algorithms to prevent
congestion.

Note that in my experiments with TCP-NV, the base RTT provided can be
much larger than the actual hardware RTT. For example, experimenting
with hosts within a rack where the hardware RTT is 16-20us, I've used
base RTTs up to 150us. The effect of using a larger base RTT is that the
congestion avoidance algorithm will allow more queueing. When there are
only a few flows the main effect is larger measured RTTs and RPC
latencies due to the increased queueing. When there are a lot of flows,
a larger base RTT can lead to more congestion and more packet drops.
For this case, where the hardware RTT is 20us, a base RTT of 80us
produces good results.

This patch only introduces BPF_SOCK_OPS_BASE_RTT, a later patch in this
set adds support for using it in TCP-NV. Further study and testing is
needed before support can be added to other delay based congestion
avoidance algorithms.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:12:05 +01:00
..
acpi ACPI / bus: Make ACPI_HANDLE() work for non-GPL code again 2017-09-19 22:42:31 +02:00
asm-generic percpu: make this_cpu_generic_read() atomic w.r.t. interrupts 2017-09-26 07:37:33 -07:00
clocksource
crypto crypto: hash - add crypto_(un)register_ahashes() 2017-08-22 14:54:52 +08:00
drm lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
dt-bindings ARC: reset: remove the misleading v1 suffix all over 2017-09-18 13:02:03 +02:00
keys net: rxrpc: Replace time_t type with time64_t type 2017-08-29 10:16:00 +01:00
kvm KVM: arm/arm64: PMU: Fix overflow interrupt injection 2017-07-25 14:18:01 +01:00
linux drivers, connector: convert cn_callback_entry.refcnt from atomic_t to refcount_t 2017-10-22 02:22:39 +01:00
math-emu
media media updates for v4.14-rc1 2017-09-07 12:53:14 -07:00
memory
misc
net net: sched: remove unused is_classid_clsact_ingress/egress helpers 2017-10-21 03:04:08 +01:00
pcmcia
ras
rdma IB: Correct MR length field to be 64-bit 2017-09-25 11:47:23 -04:00
scsi scsi: libiscsi: Remove iscsi_destroy_session 2017-10-02 22:23:21 -04:00
soc ARM: SoC driver updates for v4.14 2017-09-10 20:40:00 -07:00
sound sound fixes for 4.14-rc4 2017-10-05 10:39:29 -07:00
target iscsi-target: Fix iscsi_np reset hung task during parallel delete 2017-08-06 14:41:41 -07:00
trace ipv6: let trace_fib6_table_lookup() dereference the fib table 2017-10-21 02:23:38 +01:00
uapi bpf: add support for BPF_SOCK_OPS_BASE_RTT 2017-10-22 03:12:05 +01:00
video
xen xen, arm64: drop dummy lookup_address() 2017-09-19 09:25:05 -04:00