Commit graph

5016 commits

Author SHA1 Message Date
Florian Westphal
01cd267bff netfilter: fix fallout from xt/nf osf separation
Stephen Rothwell says:
  today's linux-next build (x86_64 allmodconfig) produced this warning:
  ./usr/include/linux/netfilter/nf_osf.h:25: found __[us]{8,16,32,64} type without #include <linux/types.h>

Fix that up and also move kernel-private struct out of uapi (it was not
exposed in any released kernel version).

tested via allmodconfig build + make headers_check.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: bfb15f2a95 ("netfilter: extract Passive OS fingerprint infrastructure from xt_osf")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-17 13:52:04 +02:00
David S. Miller
b9f672af14 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-05-17

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Provide a new BPF helper for doing a FIB and neighbor lookup
   in the kernel tables from an XDP or tc BPF program. The helper
   provides a fast-path for forwarding packets. The API supports
   IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are
   implemented in this initial work, from David (Ahern).

2) Just a tiny diff but huge feature enabled for nfp driver by
   extending the BPF offload beyond a pure host processing offload.
   Offloaded XDP programs are allowed to set the RX queue index and
   thus opening the door for defining a fully programmable RSS/n-tuple
   filter replacement. Once BPF decided on a queue already, the device
   data-path will skip the conventional RSS processing completely,
   from Jakub.

3) The original sockmap implementation was array based similar to
   devmap. However unlike devmap where an ifindex has a 1:1 mapping
   into the map there are use cases with sockets that need to be
   referenced using longer keys. Hence, sockhash map is added reusing
   as much of the sockmap code as possible, from John.

4) Introduce BTF ID. The ID is allocatd through an IDR similar as
   with BPF maps and progs. It also makes BTF accessible to user
   space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data
   through BPF_OBJ_GET_INFO_BY_FD, from Martin.

5) Enable BPF stackmap with build_id also in NMI context. Due to the
   up_read() of current->mm->mmap_sem build_id cannot be parsed.
   This work defers the up_read() via a per-cpu irq_work so that
   at least limited support can be enabled, from Song.

6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND
   JIT conversion as well as implementation of an optimized 32/64 bit
   immediate load in the arm64 JIT that allows to reduce the number of
   emitted instructions; in case of tested real-world programs they
   were shrinking by three percent, from Daniel.

7) Add ifindex parameter to the libbpf loader in order to enable
   BPF offload support. Right now only iproute2 can load offloaded
   BPF and this will also enable libbpf for direct integration into
   other applications, from David (Beckett).

8) Convert the plain text documentation under Documentation/bpf/ into
   RST format since this is the appropriate standard the kernel is
   moving to for all documentation. Also add an overview README.rst,
   from Jesper.

9) Add __printf verification attribute to the bpf_verifier_vlog()
   helper. Though it uses va_list we can still allow gcc to check
   the format string, from Mathieu.

10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...'
    is a bash 4.0+ feature which is not guaranteed to be available
    when calling out to shell, therefore use a more portable variant,
    from Joe.

11) Fix a 64 bit division in xdp_umem_reg() by using div_u64()
    instead of relying on the gcc built-in, from Björn.

12) Fix a sock hashmap kmalloc warning reported by syzbot when an
    overly large key size is used in hashmap then causing overflows
    in htab->elem_size. Reject bogus attr->key_size early in the
    sock_hash_alloc(), from Yonghong.

13) Ensure in BPF selftests when urandom_read is being linked that
    --build-id is always enabled so that test_stacktrace_build_id[_nmi]
    won't be failing, from Alexei.

14) Add bitsperlong.h as well as errno.h uapi headers into the tools
    header infrastructure which point to one of the arch specific
    uapi headers. This was needed in order to fix a build error on
    some systems for the BPF selftests, from Sirio.

15) Allow for short options to be used in the xdp_monitor BPF sample
    code. And also a bpf.h tools uapi header sync in order to fix a
    selftest build failure. Both from Prashant.

16) More formally clarify the meaning of ID in the direct packet access
    section of the BPF documentation, from Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16 22:47:11 -04:00
Eric Sandeen
62750d040b fs: copy BTRFS_IOC_[SG]ET_FSLABEL to vfs
This retains 256 chars as the maximum size through the interface, which
is the btrfs limit and AFAIK exceeds any other filesystem's maximum
label size.

This just copies the ioctl for now and leaves it in place for btrfs
for the time being.  A later patch will allow btrfs to use the new
common ioctl definition, but it may be sent after this is merged.

(Note, Reviewed-by's were originally given for the combined vfs+btrfs
patch, some license taken here.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-16 08:50:16 -07:00
John Fastabend
8111038444 bpf: sockmap, add hash map support
Sockmap is currently backed by an array and enforces keys to be
four bytes. This works well for many use cases and was originally
modeled after devmap which also uses four bytes keys. However,
this has become limiting in larger use cases where a hash would
be more appropriate. For example users may want to use the 5-tuple
of the socket as the lookup key.

To support this add hash support.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-15 20:41:03 +02:00
Jorge Sanjuan
6cfd839ae7 ALSA: usb-audio: UAC3. Add support for mixer unit.
This adds support for the MIXER UNIT in UAC3. All the information
is obtained from the (HIGH CAPABILITY) Cluster's header. We don't
read the rest of the logical cluster to obtain the channel config
as that wont make any difference in the current mixer behaviour.

The name of the mixer unit is not yet requested as there is not
support for the UAC3 Class Specific String requests.

Tested in an UAC3 device working as a HEADSET with a basic mixer
unit (same as the one in the BADD spec) with no controls.

Signed-off-by: Jorge Sanjuan <jorge.sanjuan@codethink.co.uk>
Reviewed-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Tested-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-05-15 07:32:50 +02:00
Marcelo Ricardo Leitner
81c7288b17 sched: cls: enable verbose logging
Currently, when the rule is not to be exclusively executed by the
hardware, extack is not passed along and offloading failures don't
get logged. The idea was that hardware failures are okay because the
rule will get executed in software then and this way it doesn't confuse
unware users.

But this is not helpful in case one needs to understand why a certain
rule failed to get offloaded. Considering it may have been a temporary
failure, like resources exceeded or so, reproducing it later and knowing
that it is triggering the same reason may be challenging.

The ultimate goal is to improve Open vSwitch debuggability when using
flower offloading.

This patch adds a new flag to enable verbose logging. With the flag set,
extack will be passed to the driver, which will be able to log the
error. As the operation itself probably won't fail (not because of this,
at least), current iproute will already log it as a Warning.

The flag is generic, so it can be reused later. No need to restrict it
just for HW offloading. The command line will follow the syntax that
tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning.

For example:
# ./tc qdisc add dev p7p1 ingress
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
	flower verbose \
	src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
	src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
Warning: TC offload is disabled on net device.
# echo $?
0
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
	flower \
	src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
	src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
# echo $?
0

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14 16:18:27 -04:00
Richard Guy Briggs
f0b752168d audit: convert sessionid unset to a macro
Use a macro, "AUDIT_SID_UNSET", to replace each instance of
initialization and comparison to an audit session ID.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-05-14 15:56:35 -04:00
Rahul Lakkireddy
2724273e8f vmcore: add API to collect hardware dump in second kernel
The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/vmcore are as follows:

1. During probe (before hardware is initialized), device drivers
register to the vmcore module (via vmcore_add_device_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.

2. vmcore module allocates the buffer with requested size. It adds
an Elf note and invokes the device driver's registered callback
function.

3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to vmcore module.

Ensure that the device dump buffer size is always aligned to page size
so that it can be mmaped.

Also, rename alloc_elfnotes_buf() to vmcore_alloc_buf() to make it more
generic and reserve NT_VMCOREDD note type to indicate vmcore device
dump.

Suggested-by: Eric Biederman <ebiederm@xmission.com>.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14 13:46:04 -04:00
Dave Airlie
110ab11d41 drm/virtio: add define for second capset to the virgl code.
Although the kernel doesn't use this, qemu imports these headers
and it's best to keep them consistent.

This define is also something userspace may want to use.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20180503021021.10694-1-airlied@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2018-05-14 11:01:29 +02:00
David S. Miller
4f6b15c3a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) Fix handling of simultaneous open TCP connection in conntrack,
   from Jozsef Kadlecsik.

2) Insufficient sanitify check of xtables extension names, from
   Florian Westphal.

3) Skip unnecessary synchronize_rcu() call when transaction log
   is already empty, from Florian Westphal.

4) Incorrect destination mac validation in ebt_stp, from Stephen
   Hemminger.

5) xtables module reference counter leak in nft_compat, from
   Florian Westphal.

6) Incorrect connection reference counting logic in IPVS
   one-packet scheduler, from Julian Anastasov.

7) Wrong stats for 32-bits CPU in IPVS, also from Julian.

8) Calm down sparse error in netfilter core, also from Florian.

9) Use nla_strlcpy to fix compilation warning in nfnetlink_acct
   and nfnetlink_cthelper, again from Florian.

10) Missing module alias in icmp and icmp6 xtables extensions,
    from Florian Westphal.

11) Base chain statistics in nf_tables may be unset/null, from Florian.

12) Fix handling of large matchinfo size in nft_compat, this includes
    one preparation for before this fix. From Florian.

13) Fix bogus EBUSY error when deleting chains due to incorrect reference
    counting from the preparation phase of the two-phase commit protocol.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-13 20:28:47 -04:00
David S. Miller
b2d6cee117 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The bpf syscall and selftests conflicts were trivial
overlapping changes.

The r8169 change involved moving the added mdelay from 'net' into a
different function.

A TLS close bug fix overlapped with the splitting of the TLS state
into separate TX and RX parts.  I just expanded the tests in the bug
fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
== X".

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11 20:53:22 -04:00
Linus Torvalds
4bc871984f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Verify lengths of keys provided by the user is AF_KEY, from Kevin
    Easton.

 2) Add device ID for BCM89610 PHY. Thanks to Bhadram Varka.

 3) Add Spectre guards to some ATM code, courtesy of Gustavo A. R.
    Silva.

 4) Fix infinite loop in NSH protocol code. To Eric Dumazet we are most
    grateful for this fix.

 5) Line up /proc/net/netlink headers properly. This fix from YU Bo, we
    do appreciate.

 6) Use after free in TLS code. Once again we are blessed by the
    honorable Eric Dumazet with this fix.

 7) Fix regression in TLS code causing stalls on partial TLS records.
    This fix is bestowed upon us by Andrew Tomt.

 8) Deal with too small MTUs properly in LLC code, another great gift
    from Eric Dumazet.

 9) Handle cached route flushing properly wrt. MTU locking in ipv4, to
    Hangbin Liu we give thanks for this.

10) Fix regression in SO_BINDTODEVIC handling wrt. UDP socket demux.
    Paolo Abeni, he gave us this.

11) Range check coalescing parameters in mlx4 driver, thank you Moshe
    Shemesh.

12) Some ipv6 ICMP error handling fixes in rxrpc, from our good brother
    David Howells.

13) Fix kexec on mlx5 by freeing IRQs in shutdown path. Daniel Juergens,
    you're the best!

14) Don't send bonding RLB updates to invalid MAC addresses. Debabrata
    Benerjee saved us!

15) Uh oh, we were leaking in udp_sendmsg and ping_v4_sendmsg. The ship
    is now water tight, thanks to Andrey Ignatov.

16) IPSEC memory leak in ixgbe from Colin Ian King, man we've got holes
    everywhere!

17) Fix error path in tcf_proto_create, Jiri Pirko what would we do
    without you!

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
  net sched actions: fix refcnt leak in skbmod
  net: sched: fix error path in tcf_proto_create() when modules are not configured
  net sched actions: fix invalid pointer dereferencing if skbedit flags missing
  ixgbe: fix memory leak on ipsec allocation
  ixgbevf: fix ixgbevf_xmit_frame()'s return type
  ixgbe: return error on unsupported SFP module when resetting
  ice: Set rq_last_status when cleaning rq
  ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
  mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
  bonding: send learning packets for vlans on slave
  bonding: do not allow rlb updates to invalid mac
  net/mlx5e: Err if asked to offload TC match on frag being first
  net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
  net/mlx5: Free IRQs in shutdown path
  rxrpc: Trace UDP transmission failure
  rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
  rxrpc: Fix the min security level for kernel calls
  rxrpc: Fix error reception on AF_INET6 sockets
  rxrpc: Fix missing start of call timeout
  qed: fix spelling mistake: "taskelt" -> "tasklet"
  ...
2018-05-11 14:14:46 -07:00
David Ahern
87f5fc7e48 bpf: Provide helper to do forwarding lookups in kernel FIB table
Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet continues up the stack.

If it is to be forwarded, the forwarding can be done directly if the
neighbor is already known. If the neighbor does not exist, the first
few packets go up the stack for neighbor resolution. Once resolved, the
xdp program provides the fast path.

On successful lookup the nexthop dmac, current device smac and egress
device index are returned.

The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6
are implemented in this patch. The API includes layer 4 parameters if
the XDP program chooses to do deep packet inspection to allow compare
against ACLs implemented as FIB rules.

Header rewrite is left to the XDP program.

The lookup takes 2 flags:
- BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes
  straight to the table associated with the device (expert setting for
  those looking to maximize throughput)

- BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective.
  Default is an ingress lookup.

Initial performance numbers collected by Jesper, forwarded packets/sec:

       Full stack    XDP FIB lookup    XDP Direct lookup
IPv4   1,947,969       7,074,156          7,415,333
IPv6   1,728,000       6,165,504          7,262,720

These number are single CPU core forwarding on a Broadwell
E5-1650 v4 @ 3.60GHz.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00
David S. Miller
b2a9643855 We only have a few fixes this time:
* WMM element validation
  * SAE timeout
  * add-BA timeout
  * docbook parsing
  * a few memory leaks in error paths
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlrzTUQACgkQB8qZga/f
 l8QgqA/+KrZlJPuQjT0xZYYj0rXGc4GzeOJiyDqVz/25sGrmDP8pCnabJhC0l10X
 +4edB85XjgPbQ/OZsnRSRaCQDPOuBUiQqCl33u6T/qngHdbxuVoxx8UykAcr/+z1
 bmrokLBMXWijaacMine9ouCjjuMKXaiyPsZ5aobMpFAKFcVxHDK2hnlMp2bhfqVw
 fuCLtMlvc5G5FGKEfLoT0YzFGhOpcz6Zmnuk58pCoVTUTNDOdbKUVI/4TSGHUnaH
 YrS1I+ERIIIRf8NCdUHqvygigHyEYd4WlOtFnmrEEUrBf3YP/frUtf1oUlg77Kdc
 bPIKD76cO8ONaFu45z4/nBhzQQPz6idUBMfk+gYuZ6bNxrFkJCHNoSkz2AXiOxQF
 tThXUoSMnlrqyW+jJ0JdZHZQqU0vwaJvCh96F97ox3WWPoMlty+JtQ0orVJSkUF1
 LHNmKCJfCJ2LfK5xr0tSnhBBeteyE18zaPZMHkEUh7gUkQeov9+wgvtPFdX/pQ0N
 /sZCs4QQLmHn3R1f/XSytSIsEoIEneCKN/pwSc63M6SqqVDOE4+Mue7+Jzjq4xad
 oY+pFIWusxyUQmC4R01/bQzADROp1vJFUS9/4sDmwsIk+RbRxSGP3o2kHwzl+Wc2
 DxxQv2WN8V0L2i+ru7Ck+Q4zAUgxoIHqHWefKpI4MO9liCcBHFQ=
 =mJ4m
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2018-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
We only have a few fixes this time:
 * WMM element validation
 * SAE timeout
 * add-BA timeout
 * docbook parsing
 * a few memory leaks in error paths
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-10 17:34:50 -04:00
Mauro Carvalho Chehab
71db1cd7ff Linux 4.17-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlrvwLAeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGEX4H/iRAefubpMXptH55
 ru/Mr7hsbKtKQOCmH3LuYP2SwSEHclX+kxHfiaIMsBN9iOGWpRJkZVLuOG0Vl0sZ
 d68I5UTEJHPY0FZqOvfXY1sLTfG9jlhQ/SyppVyuL+QbH+cecnWqvkMkwo2p99MJ
 7OZ3wZjSObaJGXEoLUbZrEbTf8PsT8eT7MWMkmOJHO/J8rnnoDKaOiB5fBxhck0M
 xd/FdOYNZhqOfzIMUuLPWPE3THjfJuIQOYsTvEq0lcuAD2ZOesbV5hsGLC1AG6jd
 GaJJKy80mRvPXSLv7GTigBG9zZbs0mszDJeeh0FCjSlc3sjMDZn7A12N1c8AcQ52
 rLmzSQ8=
 =Nmna
 -----END PGP SIGNATURE-----

Merge tag 'v4.17-rc4' into patchwork

Linux 4.17-rc4

* tag 'v4.17-rc4': (920 commits)
  Linux 4.17-rc4
  KVM: x86: remove APIC Timer periodic/oneshot spikes
  genksyms: fix typo in parse.tab.{c,h} generation rules
  kbuild: replace hardcoded bison in cmd_bison_h with $(YACC)
  gcc-plugins: fix build condition of SANCOV plugin
  MAINTAINERS: Update Kbuild entry with a few paths
  Revert "usb: host: ehci: Use dma_pool_zalloc()"
  platform/x86: Kconfig: Fix dell-laptop dependency chain.
  platform/x86: asus-wireless: Fix NULL pointer dereference
  arm64: vgic-v2: Fix proxying of cpuif access
  KVM: arm/arm64: vgic_init: Cleanup reference to process_maintenance
  KVM: arm64: Fix order of vcpu_write_sys_reg() arguments
  MAINTAINERS & files: Canonize the e-mails I use at files
  media: imx-media-csi: Fix inconsistent IS_ERR and PTR_ERR
  tools: power/acpi, revert to LD = gcc
  bdi: Fix oops in wb_workfn()
  RDMA/cma: Do not query GID during QP state transition to RTR
  IB/mlx4: Fix integer overflow when calculating optimal MTT size
  IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
  IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
  ...
2018-05-10 07:19:23 -04:00
Arnd Bergmann
378e3f81cb media: omap3isp: support 64-bit version of omap3isp_stat_data
C libraries with 64-bit time_t use an incompatible format for
struct omap3isp_stat_data. This changes the kernel code to
support either version, by moving over the normal handling
to the 64-bit variant, and adding compatiblity code to handle
the old binary format with the existing ioctl command code.

Fortunately, the command code includes the size of the structure,
so the difference gets handled automatically. In the process of
eliminating the references to 'struct timeval' from the kernel,
I also change the way the timestamp is generated internally,
basically by open-coding the v4l2_get_timestamp() call.

[Sakari Ailus: Alphabetical order of headers, clean up compat code]

Cc: Sakari Ailus <sakari.ailus@iki.fi>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2018-05-09 16:37:05 -04:00
Martin KaFai Lau
62dab84c81 bpf: btf: Add struct bpf_btf_info
During BPF_OBJ_GET_INFO_BY_FD on a btf_fd, the current bpf_attr's
info.info is directly filled with the BTF binary data.  It is
not extensible.  In this case, we want to add BTF ID.

This patch adds "struct bpf_btf_info" which has the BTF ID as
one of its member.  The BTF binary data itself is exposed through
the "btf" and "btf_size" members.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 17:25:13 +02:00
Martin KaFai Lau
78958fca7e bpf: btf: Introduce BTF ID
This patch gives an ID to each loaded BTF.  The ID is allocated by
the idr like the existing prog-id and map-id.

The bpf_put(map->btf) is moved to __bpf_map_put() so that the
userspace can stop seeing the BTF ID ASAP when the last BTF
refcnt is gone.

It also makes BTF accessible from userspace through the
1. new BPF_BTF_GET_FD_BY_ID command.  It is limited to CAP_SYS_ADMIN
   which is inline with the BPF_BTF_LOAD cmd and the existing
   BPF_[MAP|PROG]_GET_FD_BY_ID cmd.
2. new btf_id (and btf_key_id + btf_value_id) in "struct bpf_map_info"

Once the BTF ID handler is accessible from userspace, freeing a BTF
object has to go through a rcu period.  The BPF_BTF_GET_FD_BY_ID cmd
can then be done under a rcu_read_lock() instead of taking
spin_lock.
[Note: A similar rcu usage can be done to the existing
       bpf_prog_get_fd_by_id() in a follow up patch]

When processing the BPF_BTF_GET_FD_BY_ID cmd,
refcount_inc_not_zero() is needed because the BTF object
could be already in the rcu dead row .  btf_get() is
removed since its usage is currently limited to btf.c
alone.  refcount_inc() is used directly instead.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-09 17:25:13 +02:00
Toke Høiland-Jørgensen
52539ca89f cfg80211: Expose TXQ stats and parameters to userspace
This adds support for exporting the mac80211 TXQ stats via nl80211 by
way of a nested TXQ stats attribute, as well as for configuring the
quantum and limits that were previously only changeable through debugfs.

This commit adds just the nl80211 API, a subsequent commit adds support to
mac80211 itself.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-05-08 13:19:24 +02:00
Greg Kroah-Hartman
58318cd4df Merge 4.17-rc4 into usb-next
We want the USB fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-08 09:47:16 +02:00
David S. Miller
01adc4851a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Minor conflict, a CHECK was placed into an if() statement
in net-next, whilst a newline was added to that CHECK
call in 'net'.  Thanks to Daniel for the merge resolution.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07 23:35:08 -04:00
Balaji Pothunoori
81d5439da8 cfg80211: average ack rssi support for data frames
Average ack rssi will be given to userspace via NL80211 interface
if firmware is capable. Userspace tool ‘iw’ can process this
information and give the output as one of the fields in
‘iw dev wlanX station dump’.

Example output :

localhost ~ #iw dev wlan-5000mhz station dump Station
34:f3:9a:aa:3b:29 (on wlan-5000mhz)
        inactive time:  5370 ms
        rx bytes:       85321
        rx packets:     576
        tx bytes:       14225
        tx packets:     71
        tx retries:     0
        tx failed:      2
        beacon loss:    0
        rx drop misc:   0
        signal:         -54 dBm
        signal avg:     -53 dBm
        tx bitrate:     866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
        rx bitrate:     866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
        avg ack signal: -56 dBm
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
       short preamble: yes
       short slot time:yes
       connected time: 203 seconds

Main use case is to measure the signal strength of a connected station
to AP. Data packet transmit rates and bandwidth used by station can vary
a lot even if the station is at fixed location, especially if the rates
used are multi stream(2stream, 3stream) rates with different bandwidth(20/40/80 Mhz).
These multi stream rates are sensitive and station can use different transmit power
for each of the rate and bandwidth combinations. RSSI measured from these RX packets
on AP will be not stable and can vary a lot with in a short time.
Whereas 802.11 ack frames from station are sent relatively at a constant
rate (6/12/24 Mbps) with constant bandwidth(20 Mhz).
So average rssi of the ack packets is good and more accurate.

Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-05-07 21:37:20 +02:00
Haim Dreyfuss
50f32718e1 nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump command
This will serve userspace entity to maintain its regulatory limitation.
More specifcally APs can use this data to calculate the WMM IE when
building: beacons, probe responses, assoc responses etc...

Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-05-07 20:57:40 +02:00
David S. Miller
90278871d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree, more relevant updates in this batch are:

1) Add Maglev support to IPVS. Moreover, store lastest server weight in
   IPVS since this is needed by maglev, patches from from Inju Song.

2) Preparation works to add iptables flowtable support, patches
   from Felix Fietkau.

3) Hand over flows back to conntrack slow path in case of TCP RST/FIN
   packet is seen via new teardown state, also from Felix.

4) Add support for extended netlink error reporting for nf_tables.

5) Support for larger timeouts that 23 days in nf_tables, patch from
   Florian Westphal.

6) Always set an upper limit to dynamic sets, also from Florian.

7) Allow number generator to make map lookups, from Laura Garcia.

8) Use hash_32() instead of opencode hashing in IPVS, from Vicent Bernat.

9) Extend ip6tables SRH match to support previous, next and last SID,
   from Ahmed Abdelsalam.

10) Move Passive OS fingerprint nf_osf.c, from Fernando Fernandez.

11) Expose nf_conntrack_max through ctnetlink, from Florent Fourcot.

12) Several housekeeping patches for xt_NFLOG, x_tables and ebtables,
   from Taehee Yoo.

13) Unify meta bridge with core nft_meta, then make nft_meta built-in.
   Make rt and exthdr built-in too, again from Florian.

14) Missing initialization of tbl->entries in IPVS, from Cong Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-06 21:51:37 -04:00
Florent Fourcot
538c5672be netfilter: ctnetlink: export nf_conntrack_max
IPCTNL_MSG_CT_GET_STATS netlink command allow to monitor current number
of conntrack entries. However, if one wants to compare it with the
maximum (and detect exhaustion), the only solution is currently to read
sysctl value.

This patch add nf_conntrack_max value in netlink message, and simplify
monitoring for application built on netlink API.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-07 00:04:02 +02:00
Fernando Fernandez Mancera
bfb15f2a95 netfilter: extract Passive OS fingerprint infrastructure from xt_osf
Add nf_osf_ttl() and nf_osf_match() into nf_osf.c to prepare for
nf_tables support.

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-07 00:02:11 +02:00
Phil Sutter
3f9c56a581 netfilter: nf_tables: Provide NFT_{RT,CT}_MAX for userspace
These macros allow conveniently declaring arrays which use NFT_{RT,CT}_*
values as indexes.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-06 23:35:10 +02:00
Ahmed Abdelsalam
c1c7e44b4f netfilter: ip6t_srh: extend SRH matching for previous, next and last SID
IPv6 Segment Routing Header (SRH) contains a list of SIDs to be crossed
by SR encapsulated packet. Each SID is encoded as an IPv6 prefix.

When a Firewall receives an SR encapsulated packet, it should be able
to identify which node previously processed the packet (previous SID),
which node is going to process the packet next (next SID), and which
node is the last to process the packet (last SID) which represent the
final destination of the packet in case of inline SR mode.

An example use-case of using these features could be SID list that
includes two firewalls. When the second firewall receives a packet,
it can check whether the packet has been processed by the first firewall
or not. Based on that check, it decides to apply all rules, apply just
subset of the rules, or totally skip all rules and forward the packet to
the next SID.

This patch extends SRH match to support matching previous SID, next SID,
and last SID.

Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-06 23:33:03 +02:00
Laura Garcia Liebana
d734a28889 netfilter: nft_numgen: add map lookups for numgen statements
This patch includes a new attribute in the numgen structure to allow
the lookup of an element based on the number generator as a key.

For this purpose, different ops have been included to extend the
current numgen inc functions.

Currently, only supported for numgen incremental operations, but
it will be supported for random in a follow-up patch.

Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-06 23:18:44 +02:00
Linus Torvalds
eb4f959b26 First pull request for 4.17-rc
- Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)
 - SPDX license tag cleanups (new tag Linux-OpenIB)
 - RoCE GID fixes related to default GIDs
 - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
   mlx4, mlx5, and hfi1 (medium batch)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJa7JXPAAoJELgmozMOVy/dc0AP/0i7EajAmgl1ihka6BYVj2pa
 DV8iSrVMDPulh9AVnAtwLJSbdwmgN/HeVzLzcutHyMYk6tAf8RCs6TsyoB36XiOL
 oUh5+V2GyNnyh9veWPwyGTgZKCpPJc3uQFV6502lZVDYwArMfGatumApBgQVKiJ+
 YdPEXEQZPNIs6YZB1WXkNYV/ra9u0aBByQvUrxwVZ2AND+srJYO82tqZit2wBtjK
 UXrhmZbWXGWMFg8K3/lpfUkQhkG3Arj+tMKkCfqsVzC7wUPhlTKBHR9NmvdLIiy9
 5Vhv7Xp78udcxZKtUeTFsbhaMqqK7x7sKHnpKAs7hOZNZ/Eg47BrMwMrZVLOFuDF
 nBLUL1H+nJ1mASZoMWH5xzOpVew+e9X0cot09pVDBIvsOIh97wCG7hgptQ2Z5xig
 fcDiMmg6tuakMsaiD0dzC9JI5HR6Z7+6oR1tBkQFDxQ+XkkcoFabdmkJaIRRwOj7
 CUhXRgcm0UgVd03Jdta6CtYXsjSODirWg4AvSSMt9lUFpjYf9WZP00/YojcBbBEH
 UlVrPbsKGyncgrm3FUP6kXmScESfdTljTPDLiY9cO9+bhhPGo1OHf005EfAp178B
 jGp6hbKlt+rNs9cdXrPSPhjds+QF8HyfSlwyYVWKw8VWlh/5DG8uyGYjF05hYO0q
 xhjIS6/EZjcTAh5e4LzR
 =PI8v
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
 "This is our first pull request of the rc cycle. It's not that it's
  been overly quiet, we were just waiting on a few things before sending
  this off.

  For instance, the 6 patch series from Intel for the hfi1 driver had
  actually been pulled in on Tuesday for a Wednesday pull request, only
  to have Jason notice something I missed, so we held off for some
  testing, and then on Thursday had to respin the series because the
  very first patch needed a minor fix (unnecessary cast is all).

  There is a sizable hns patch series in here, as well as a reasonably
  largish hfi1 patch series, then all of the lines of uapi updates are
  just the change to the new official Linux-OpenIB SPDX tag (a bunch of
  our files had what amounts to a BSD-2-Clause + MIT Warranty statement
  as their license as a result of the initial code submission years ago,
  and the SPDX folks decided it was unique enough to warrant a unique
  tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4
  and core/cache/cma updates to round out the bunch.

  None of it was overly large by itself, but in the 2 1/2 weeks we've
  been collecting patches, it has added up :-/.

  As best I can tell, it's been through 0day (I got a notice about my
  last for-next push, but not for my for-rc push, but Jason seems to
  think that failure messages are prioritized and success messages not
  so much). It's also been through linux-next. And yes, we did notice in
  the context portion of the CMA query gid fix patch that there is a
  dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage
  and remove it anywhere we can.

  Summary:

   - Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)

   - SPDX license tag cleanups (new tag Linux-OpenIB)

   - RoCE GID fixes related to default GIDs

   - Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
     mlx4, mlx5, and hfi1 (medium batch)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits)
  RDMA/cma: Do not query GID during QP state transition to RTR
  IB/mlx4: Fix integer overflow when calculating optimal MTT size
  IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
  IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
  IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
  IB/hfi1: Fix loss of BECN with AHG
  IB/hfi1 Use correct type for num_user_context
  IB/hfi1: Fix handling of FECN marked multicast packet
  IB/core: Make ib_mad_client_id atomic
  iw_cxgb4: Atomically flush per QP HW CQEs
  IB/uverbs: Fix kernel crash during MR deregistration flow
  IB/uverbs: Prevent reregistration of DM_MR to regular MR
  RDMA/mlx4: Add missed RSS hash inner header flag
  RDMA/hns: Fix a couple misspellings
  RDMA/hns: Submit bad wr
  RDMA/hns: Update assignment method for owner field of send wqe
  RDMA/hns: Adjust the order of cleanup hem table
  RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set
  RDMA/hns: Remove some unnecessary attr_mask judgement
  RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set
  ...
2018-05-04 20:51:10 -10:00
Kees Cook
00a02d0c50 seccomp: Add filter flag to opt-out of SSB mitigation
If a seccomp user is not interested in Speculative Store Bypass mitigation
by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when
adding filters.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-05 00:51:44 +02:00
Thomas Gleixner
356e4bfff2 prctl: Add force disable speculation
For certain use cases it is desired to enforce mitigations so they cannot
be undone afterwards. That's important for loader stubs which want to
prevent a child from disabling the mitigation again. Will also be used for
seccomp(). The extra state preserving of the prctl state for SSB is a
preparatory step for EBPF dymanic speculation control.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-05 00:51:43 +02:00
David S. Miller
a7b15ab887 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Overlapping changes in selftests Makefile.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-04 09:58:56 -04:00
Daniel Borkmann
4e1ec56cdc bpf: add skb_load_bytes_relative helper
This adds a small BPF helper similar to bpf_skb_load_bytes() that
is able to load relative to mac/net header offset from the skb's
linear data. Compared to bpf_skb_load_bytes(), it takes a fifth
argument namely start_header, which is either BPF_HDR_START_MAC
or BPF_HDR_START_NET. This allows for a more flexible alternative
compared to LD_ABS/LD_IND with negative offset. It's enabled for
tc BPF programs as well as sock filter program types where it's
mainly useful in reuseport programs to ease access to lower header
data.

Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2017-March/000698.html
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 16:49:19 -07:00
Magnus Karlsson
af75d9e02d xsk: statistics support
In this commit, a new getsockopt is added: XDP_STATISTICS. This is
used to obtain stats from the sockets.

v2: getsockopt now returns size of stats structure.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:25 -07:00
Magnus Karlsson
f61459030e xsk: add Tx queue setup and mmap support
Another setsockopt (XDP_TX_QUEUE) is added to let the process allocate
a queue, where the user process can pass frames to be transmitted by
the kernel.

The mmapping of the queue is done using the XDP_PGOFF_TX_QUEUE offset.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:24 -07:00
Magnus Karlsson
fe2308328c xsk: add umem completion queue support and mmap
Here, we add another setsockopt for registered user memory (umem)
called XDP_UMEM_COMPLETION_QUEUE. Using this socket option, the
process can ask the kernel to allocate a queue (ring buffer) and also
mmap it (XDP_UMEM_PGOFF_COMPLETION_QUEUE) into the process.

The queue is used to explicitly pass ownership of umem frames from the
kernel to user process. This will be used by the TX path to tell user
space that a certain frame has been transmitted and user space can use
it for something else, if it wishes.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:24 -07:00
Björn Töpel
fbfc504a24 bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP
The xskmap is yet another BPF map, very much inspired by
dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application
adds AF_XDP sockets into the map, and by using the bpf_redirect_map
helper, an XDP program can redirect XDP frames to an AF_XDP socket.

Note that a socket that is bound to certain ifindex/queue index will
*only* accept XDP frames from that netdev/queue index. If an XDP
program tries to redirect from a netdev/queue index other than what
the socket is bound to, the frame will not be received on the socket.

A socket can reside in multiple maps.

v3: Fixed race and simplified code.
v2: Removed one indirection in map lookup.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:24 -07:00
Magnus Karlsson
965a990984 xsk: add support for bind for Rx
Here, the bind syscall is added. Binding an AF_XDP socket, means
associating the socket to an umem, a netdev and a queue index. This
can be done in two ways.

The first way, creating a "socket from scratch". Create the umem using
the XDP_UMEM_REG setsockopt and an associated fill queue with
XDP_UMEM_FILL_QUEUE. Create the Rx queue using the XDP_RX_QUEUE
setsockopt. Call bind passing ifindex and queue index ("channel" in
ethtool speak).

The second way to bind a socket, is simply skipping the
umem/netdev/queue index, and passing another already setup AF_XDP
socket. The new socket will then have the same umem/netdev/queue index
as the parent so it will share the same umem. You must also set the
flags field in the socket address to XDP_SHARED_UMEM.

v2: Use PTR_ERR instead of passing error variable explicitly.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:23 -07:00
Björn Töpel
b9b6b68e8a xsk: add Rx queue setup and mmap support
Another setsockopt (XDP_RX_QUEUE) is added to let the process allocate
a queue, where the kernel can pass completed Rx frames from the kernel
to user process.

The mmapping of the queue is done using the XDP_PGOFF_RX_QUEUE offset.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:23 -07:00
Magnus Karlsson
423f38329d xsk: add umem fill queue support and mmap
Here, we add another setsockopt for registered user memory (umem)
called XDP_UMEM_FILL_QUEUE. Using this socket option, the process can
ask the kernel to allocate a queue (ring buffer) and also mmap it
(XDP_UMEM_PGOFF_FILL_QUEUE) into the process.

The queue is used to explicitly pass ownership of umem frames from the
user process to the kernel. These frames will in a later patch be
filled in with Rx packet data by the kernel.

v2: Fixed potential crash in xsk_mmap.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:23 -07:00
Björn Töpel
c0c77d8fb7 xsk: add user memory registration support sockopt
In this commit the base structure of the AF_XDP address family is set
up. Further, we introduce the abilty register a window of user memory
to the kernel via the XDP_UMEM_REG setsockopt syscall. The memory
window is viewed by an AF_XDP socket as a set of equally large
frames. After a user memory registration all frames are "owned" by the
user application, and not the kernel.

v2: More robust checks on umem creation and unaccount on error.
    Call set_page_dirty_lock on cleanup.
    Simplified xdp_umem_reg.

Co-authored-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:23 -07:00
Thomas Gleixner
b617cfc858 prctl: Add speculation control prctls
Add two new prctls to control aspects of speculation related vulnerabilites
and their mitigations to provide finer grained control over performance
impacting mitigations.

PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
the following meaning:

Bit  Define           Description
0    PR_SPEC_PRCTL    Mitigation can be controlled per task by
                      PR_SET_SPECULATION_CTRL
1    PR_SPEC_ENABLE   The speculation feature is enabled, mitigation is
                      disabled
2    PR_SPEC_DISABLE  The speculation feature is disabled, mitigation is
                      enabled

If all bits are 0 the CPU is not affected by the speculation misfeature.

If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.

PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.

The common return values are:

EINVAL  prctl is not implemented by the architecture or the unused prctl()
        arguments are not 0
ENODEV  arg2 is selecting a not supported speculation misfeature

PR_SET_SPECULATION_CTRL has these additional return values:

ERANGE  arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
ENXIO   prctl control of the selected speculation misfeature is disabled

The first supported controlable speculation misfeature is
PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
architectures.

Based on an initial patch from Tim Chen and mostly rewritten.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-03 13:55:50 +02:00
Christoph Hellwig
7a074e96de aio: implement io_pgetevents
This is the io_getevents equivalent of ppoll/pselect and allows to
properly mix signals and aio completions (especially with IOCB_CMD_POLL)
and atomically executes the following sequence:

	sigset_t origmask;

	pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
	ret = io_getevents(ctx, min_nr, nr, events, timeout);
	pthread_sigmask(SIG_SETMASK, &origmask, NULL);

Note that unlike many other signal related calls we do not pass a sigmask
size, as that would get us to 7 arguments, which aren't easily supported
by the syscall infrastructure.  It seems a lot less painful to just add a
new syscall variant in the unlikely case we're going to increase the
sigset size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-02 19:57:24 +02:00
Thomas Gleixner
604a98f1df Merge branch 'timers/urgent' into timers/core
Pick up urgent fixes to apply dependent cleanup patch
2018-05-02 16:11:12 +02:00
Soheil Hassas Yeganeh
b75eba76d3 tcp: send in-queue bytes in cmsg upon read
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read from the socket.

The number of bytes pending on the socket is directly
available through ioctl(FIONREAD/SIOCINQ) and can be
approximated using getsockopt(MEMINFO) (rmem_alloc includes
skb overheads in addition to application data). But, both of
these options add an extra syscall per recvmsg. Moreover,
ioctl(FIONREAD/SIOCINQ) takes the socket lock.

Add the TCP_INQ socket option to TCP. When this socket
option is set, recvmsg() relays the number of bytes available
on the socket for reading to the application via the
TCP_CM_INQ control message.

Calculate the number of bytes after releasing the socket lock
to include the processed backlog, if any. To avoid an extra
branch in the hot path of recvmsg() for this new control
message, move all cmsg processing inside an existing branch for
processing receive timestamps. Since the socket lock is not held
when calculating the size of receive queue, TCP_INQ is a hint.
For example, it can overestimate the queue size by one byte,
if FIN is received.

With this method, applications can start reading from the socket
using a small buffer, and then use larger buffers based on the
remaining data when needed.

V3 change-log:
	As suggested by David Miller, added loads with barrier
	to check whether we have multiple threads calling recvmsg
	in parallel. When that happens we lock the socket to
	calculate inq.
V4 change-log:
	Removed inline from a static function.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 18:56:29 -04:00
Stefan Strogin
b086ff8725 connector: add parent pid and tgid to coredump and exit events
The intention is to get notified of process failures as soon
as possible, before a possible core dumping (which could be very long)
(e.g. in some process-manager). Coredump and exit process events
are perfect for such use cases (see 2b5faa4c55 "connector: Added
coredumping event to the process connector").

The problem is that for now the process-manager cannot know the parent
of a dying process using connectors. This could be useful if the
process-manager should monitor for failures only children of certain
parents, so we could filter the coredump and exit events by parent
process and/or thread ID.

Add parent pid and tgid to coredump and exit process connectors event
data.

Signed-off-by: Stefan Strogin <sstrogin@cisco.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 14:25:37 -04:00
Greg Kroah-Hartman
890fa45d01 Merge 4.17-rc3 into usb-next
This resolves the merge issue with drivers/usb/core/hcd.c

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-30 04:58:51 -07:00
Quentin Monnet
79552fbc0f bpf: fix formatting for bpf_get_stack() helper doc
Fix formatting (indent) for bpf_get_stack() helper documentation, so
that the doc is rendered correctly with the Python script.

Fixes: c195651e56 ("bpf: add bpf_get_stack helper")
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-30 13:53:12 +02:00
Quentin Monnet
3bd5a09b52 bpf: fix formatting for bpf_perf_event_read() helper doc
Some edits brought to the last iteration of BPF helper functions
documentation introduced an error with RST formatting. As a result, most
of one paragraph is rendered in bold text when only the name of a helper
should be. Fix it, and fix formatting of another function name in the
same paragraph.

Fixes: c6b5fb8690 ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-30 13:53:11 +02:00