Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:
1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).
3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)
5) Make netfitler network namespace teardown less expensive (Florian
Westphal)
6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)
11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)
12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)
14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...
To get the changes in the commit Fixes: 3209f68b3c ("statx: Include a
mask for stx_attributes in struct statx")
Silencing this perf build warning:
Warning: tools/include/uapi/linux/stat.h differs from kernel
No need to change the statx syscall beautifiers in 'perf trace' at this
time.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Ahern <dsahern@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-y8bgiyzuvura62lffvh1zbg9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add napi_id access to __sk_buff for socket filter program types, tc
program types and other bpf_convert_ctx_access() users. Having access
to skb->napi_id is useful for per RX queue listener siloing, f.e.
in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
used, meaning SO_REUSEPORT enabled listeners can then select the
corresponding socket at SYN time already [1]. The skb is marked via
skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b
("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
the NAPI ID associated with the queue for steering, which requires a
prior sk_mark_napi_id() after the socket was looked up.
Semantics for the __sk_buff napi_id access are similar, meaning if
skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
then an invalid napi_id of 0 is returned to the program, otherwise a
valid non-zero napi_id.
[1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
o s/bpf_bpf_get_socket_cookie/bpf_get_socket_cookie
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
add support for BPF_PROG_TEST_RUN command to libbpf.a
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will need it to build tools/perf/trace/beauty/statx.h.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-nin41ve2fa63lrfbdr6x57yr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Returns the owner uid of the socket inside a sk_buff. This is useful to
perform per-UID accounting of network traffic or per-UID packet
filtering. The socket need to be a fullsock otherwise overflowuid is
returned.
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retrieve the socket cookie generated by sock_gen_cookie() from a sk_buff
with a known socket. Generates a new cookie if one was not yet set.If
the socket pointer inside sk_buff is NULL, 0 is returned. The helper
function coud be useful in monitoring per socket networking traffic
statistics and provide a unique socket identifier per namespace.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test cases for array of maps and hash of maps.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To get PERF_AUX_FLAG_PARTIAL, introduced in:
ae0c2d995d ("perf/core: Add a flag for partial AUX records")
and that will be used to warn the user about gaps in AUX records due
to VMX being used in KVM guests.
Silences the kernel/tools file copy detector:
Warning: include/uapi/linux/perf_event.h differs from kernel
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Link: http://lkml.kernel.org/r/8760j941ig.fsf@ashishki-desk.ger.corp.intel.com
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
by the kernel when fork, clone, setns or unshare are invoked. And update
perf-record documentation with the new option to record namespace
events.
Committer notes:
Combined it with a later patch to allow printing it via 'perf report -D'
and be able to test the feature introduced in this patch. Had to move
here also perf_ns__name(), that was introduced in another later patch.
Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
ret += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
^
Testing it:
# perf record --namespaces -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
#
# perf report -D
<SNIP>
3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
[0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
0x1151e0 [0x30]: event: 9
.
. ... raw event: size 48 bytes
. 0000: 09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00 ......0..q.h....
. 0010: a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00 .9...9...(.c....
. 0020: 03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00 ................
<SNIP>
NAMESPACES events: 1
<SNIP>
#
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Recent merge of 'linux-kselftest-4.11-rc1' tree broke bpf test build.
None of the tests were building and test_verifier.c had tons of compiler errors.
Fix it and add #ifdef CAP_IS_SUPPORTED to support old versions of libcap.
Tested on centos 6.8 and 7
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tools version of this header is out of date; update it to the latest
version from kernel header.
Synchronize with the following commits:
* b95a5c4db0 ("bpf: add a longest prefix match trie map implementation")
* a5e8c07059 ("bpf: add bpf_probe_read_str helper")
* d1b662adcd ("bpf: allow option for setting bpf_l4_csum_replace from scratch")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gianluca Borello <g.borello@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull perf fixes from Ingo Molnar:
"On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
plus on the tooling side there's a number of fixes and some late
updates"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
perf sched timehist: Fix invalid period calculation
perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
perf sched timehist: Enlarge default 'comm_width'
perf sched timehist: Honour 'comm_width' when aligning the headers
perf/x86: Fix overlap counter scheduling bug
perf/x86/pebs: Fix handling of PEBS buffer overflows
samples/bpf: Move open_raw_sock to separate header
samples/bpf: Remove perf_event_open() declaration
samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
tools lib bpf: Add bpf_prog_{attach,detach}
samples/bpf: Switch over to libbpf
perf diff: Do not overwrite valid build id
perf annotate: Don't throw error for zero length symbols
perf bench futex: Fix lock-pi help string
perf trace: Check if MAP_32BIT is defined (again)
samples/bpf: Make perf_event_read() static
uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
samples/bpf: Make samples more libbpf-centric
tools lib bpf: Add flags to bpf_create_map()
tools lib bpf: use __u32 from linux/types.h
...
The tools version of this header is out of date; update it to the latest
version from the kernel headers.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: http://lkml.kernel.org/r/20161209024620.31660-2-joe@ovn.org
[ Sync it harder, after merging with what was in net-next via perf/urgent via torvalds/master to get BPG_PROG_(AT|DE)TACH, etc ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We only support breakpoint/watchpoint of length 1, 2, 4 and 8. If we can
support other length as well, then user may watch more data with less
number of watchpoints (provided hardware supports it). For example: if we
have to watch only 4th, 5th and 6th byte from a 64 bit aligned address, we
will have to use two slots to implement it currently. One slot will watch a
half word at offset 4 and other a byte at offset 6. If we can have a
watchpoint of length 3 then we can watch it with single slot as well.
ARM64 hardware does support such functionality, therefore adding these new
definitions in generic layer.
Signed-off-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 747ea55e4f ("bpf: fix bpf_skb_in_cgroup helper naming") renames
BPF_FUNC_skb_in_cgroup to bpf_skb_under_cgroup, triggering this warning
while building perf:
Warning: tools/include/uapi/linux/bpf.h differs from kernel
Update the copy to ack that, no changes needed, as
BPF_FUNC_skb_in_cgroup isn't used so far.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-x67d2gq8ct6ko12ex14q8bbx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some mmap related macros have different values for different
architectures. This patch introduces uapi mman.h for each
architectures.
Three headers are cloned from kernel include to tools/include:
tools/include/uapi/asm-generic/mman-common.h
tools/include/uapi/asm-generic/mman.h
tools/include/uapi/linux/mman.h
The main part of this patch is generated by following script:
macros=`cat $0 | awk 'V==1 {print}; /^# start macro list/ {V=1}'`
for arch in `ls tools/arch`
do
[ -d tools/arch/$arch/include/uapi/asm ] || mkdir -p tools/arch/$arch/include/uapi/asm
src=arch/$arch/include/uapi/asm/mman.h
target=tools/arch/$arch/include/uapi/asm/mman.h
guard="TOOLS_ARCH_"`echo $arch | awk '{print toupper($0)}'`_UAPI_ASM_MMAN_FIX_H
echo '#ifndef '$guard > $target
echo '#define '$guard >> $target
[ -f $src ] &&
for m in $macros
do
if grep '#define[ \t]*'$m $src > /dev/null 2>&1
then
grep -h '#define[ \t]*'$m $src | sed 's/[ \t]*\/\*.*$//g' >> $target
fi
done
if [ -f $src ]
then
grep '#include <asm-generic' $src >> $target
else
echo "#include <asm-generic/mman.h>" >> $target
fi
echo '#endif' >> $target
echo "$target"
done
exit 0
# Following macros are extracted from:
# tools/perf/trace/beauty/mmap.c
#
# start macro list
MADV_DODUMP
MADV_DOFORK
MADV_DONTDUMP
MADV_DONTFORK
MADV_DONTNEED
MADV_HUGEPAGE
MADV_HWPOISON
MADV_MERGEABLE
MADV_NOHUGEPAGE
MADV_NORMAL
MADV_RANDOM
MADV_REMOVE
MADV_SEQUENTIAL
MADV_SOFT_OFFLINE
MADV_UNMERGEABLE
MADV_WILLNEED
MAP_32BIT
MAP_ANONYMOUS
MAP_DENYWRITE
MAP_EXECUTABLE
MAP_FILE
MAP_FIXED
MAP_GROWSDOWN
MAP_HUGETLB
MAP_LOCKED
MAP_NONBLOCK
MAP_NORESERVE
MAP_POPULATE
MAP_PRIVATE
MAP_SHARED
MAP_STACK
MAP_UNINITIALIZED
MREMAP_FIXED
MREMAP_MAYMOVE
PROT_EXEC
PROT_GROWSDOWN
PROT_GROWSUP
PROT_NONE
PROT_READ
PROT_SEM
PROT_WRITE
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1473684871-209320-2-git-send-email-wangnan0@huawei.com
[ Added new files to tools/perf/MANIFEST to fix the detached tarball build, add mman.h for ARC ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The way we're using kernel headers in tools/ now, with a copy that is
made to the same path prefixed by "tools/" plus checking if that copy
got stale, i.e. if the kernel counterpart changed, helps in keeping
track with new features that may be useful for tools to exploit.
For instance, looking at all the changes to bpf.h since it was last
copied to tools/include brings this to toolers' attention:
Need to investigate this one to check how to run a program via perf, setting up
a BPF event, that will take advantage of the way perf already calls clang/LLVM,
sets up the event and runs the workload in a single command line, helping in
debugging such semi cooperative programs:
96ae522795 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
This one needs further investigation about using the feature it improves
in 'perf trace' to do some tcpdumpin' mixed with syscalls, tracepoints,
probe points, callgraphs, etc:
555c8a8623 ("bpf: avoid stack copy and use skb ctx for event output")
Add tracing just packets that are related to some container to that mix:
4a482f34af ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
4ed8ec521e ("cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY")
Definetely needs to have example programs accessing task_struct from a bpf proggie
started from 'perf trace':
606274c5ab ("bpf: introduce bpf_get_current_task() helper")
Core networking related, XDP:
6ce96ca348 ("bpf: add XDP_TX xdp_action for direct forwarding")
6a773a15a1 ("bpf: add XDP prog type for early driver filter")
13c5c240f7 ("bpf: add bpf_get_hash_recalc helper")
d2485c4242 ("bpf: add bpf_skb_change_type helper")
6578171a7f ("bpf: add bpf_skb_change_proto helper")
Changes detected by the tools build system:
$ make -C tools/perf O=/tmp/build/perf install-bin
make: Entering directory '/home/acme/git/linux/tools/perf'
BUILD: Doing 'make -j4' parallel build
Warning: tools/include/uapi/linux/bpf.h differs from kernel
INSTALL GTK UI
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
<SNIP>
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Brenden Blanco <bblanco@plumgrid.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-difq4ts1xvww6eyfs9e7zlft@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To allow the build to complete on older systems, where those files are
either not uptodate, lacking some recent additions or not present at
all.
And check if the copy drifts from the kernel.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3jz31pz4nw526uko5da9e7o3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To allow the build to complete on older systems, where those files are
either not uptodate, lacking some recent additions or not present at
all.
And check if the copy drifts from the kernel, as in this synthetic test:
BUILD: Doing 'make -j4' parallel build
Warning: tools/include/linux/bpf.h differs from kernel
Warning: tools/include/linux/bpf_common.h differs from kernel
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-5plvi2gq4x469dcyybiu226q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We shouldn't use headers from the kernel sources directly, instead we
should use the system's headers or in cases where that isn't possible,
like with perf_event.h, where the introduction of kernel features such
as perf_event_attr.{write_backwards,sample_max_stack} and
PERF_EVENT_IOC_PAUSE_OUTPUT take some time to become available in
/usr/include/linux/perf_event.h we need a copy.
Do it and check for source code drift, emitting a warning when changes
are detected.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-v6aks5un3s5pehory6f42nrl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>