Commit graph

51638 commits

Author SHA1 Message Date
Nicolas Pitre
efee95f42b ptp_clock: future-proofing drivers against PTP subsystem becoming optional
Drivers must be ready to accept NULL from ptp_clock_register() if the
PTP clock subsystem is configured out.

This patch documents that and ensures that all drivers cope well
with a NULL return.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Eugenia Emantayev <eugenia@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 02:18:33 -04:00
Shmulik Ladkani
bfca4c520f net: skbuff: Export __skb_vlan_pop
This exports the functionality of extracting the tag from the payload,
without moving next vlan tag into hw accel tag.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 01:34:20 -04:00
Jakub Kicinski
13a27dfc66 bpf: enable non-core use of the verfier
Advanced JIT compilers and translators may want to use
eBPF verifier as a base for parsers or to perform custom
checks and validations.

Add ability for external users to invoke the verifier
and provide callbacks to be invoked for every intruction
checked.  For now only add most basic callback for
per-instruction pre-interpretation checks is added.  More
advanced users may also like to have per-instruction post
callback and state comparison callback.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 19:50:02 -04:00
Jakub Kicinski
58e2af8b3a bpf: expose internal verfier structures
Move verifier's internal structures to a header file and
prefix their names with bpf_ to avoid potential namespace
conflicts.  Those structures will soon be used by external
analyzers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 19:50:02 -04:00
Jakub Kicinski
332ae8e2f6 net: cls_bpf: add hardware offload
This patch adds hardware offload capability to cls_bpf classifier,
similar to what have been done with U32 and flower.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 19:50:02 -04:00
Yuchung Cheng
eb8329e0a0 tcp: export data delivery rate
This commit export two new fields in struct tcp_info:

  tcpi_delivery_rate: The most recent goodput, as measured by
    tcp_rate_gen(). If the socket is limited by the sending
    application (e.g., no data to send), it reports the highest
    measurement instead of the most recent. The unit is bytes per
    second (like other rate fields in tcp_info).

  tcpi_delivery_rate_app_limited: A boolean indicating if the goodput
    was measured when the socket's throughput was limited by the
    sending application.

This delivery rate information can be useful for applications that
want to know the current throughput the TCP connection is seeing,
e.g. adaptive bitrate video streaming. It can also be very useful for
debugging or troubleshooting.

Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 00:23:00 -04:00
Soheil Hassas Yeganeh
d7722e8570 tcp: track application-limited rate samples
This commit adds code to track whether the delivery rate represented
by each rate_sample was limited by the application.

Upon each transmit, we store in the is_app_limited field in the skb a
boolean bit indicating whether there is a known "bubble in the pipe":
a point in the rate sample interval where the sender was
application-limited, and did not transmit even though the cwnd and
pacing rate allowed it.

This logic marks the flow app-limited on a write if *all* of the
following are true:

  1) There is less than 1 MSS of unsent data in the write queue
     available to transmit.

  2) There is no packet in the sender's queues (e.g. in fq or the NIC
     tx queue).

  3) The connection is not limited by cwnd.

  4) There are no lost packets to retransmit.

The tcp_rate_check_app_limited() code in tcp_rate.c determines whether
the connection is application-limited at the moment. If the flow is
application-limited, it sets the tp->app_limited field. If the flow is
application-limited then that means there is effectively a "bubble" of
silence in the pipe now, and this silence will be reflected in a lower
bandwidth sample for any rate samples from now until we get an ACK
indicating this bubble has exited the pipe: specifically, until we get
an ACK for the next packet we transmit.

When we send every skb we record in scb->tx.is_app_limited whether the
resulting rate sample will be application-limited.

The code in tcp_rate_gen() checks to see when it is safe to mark all
known application-limited bubbles of silence as having exited the
pipe. It does this by checking to see when the delivered count moves
past the tp->app_limited marker. At this point it zeroes the
tp->app_limited marker, as all known bubbles are out of the pipe.

We make room for the tx.is_app_limited bit in the skb by borrowing a
bit from the in_flight field used by NV to record the number of bytes
in flight. The receive window in the TCP header is 16 bits, and the
max receive window scaling shift factor is 14 (RFC 1323). So the max
receive window offered by the TCP protocol is 2^(16+14) = 2^30. So we
only need 30 bits for the tx.in_flight used by NV.

Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 00:23:00 -04:00
Yuchung Cheng
b9f64820fb tcp: track data delivery rate for a TCP connection
This patch generates data delivery rate (throughput) samples on a
per-ACK basis. These rate samples can be used by congestion control
modules, and specifically will be used by TCP BBR in later patches in
this series.

Key state:

tp->delivered: Tracks the total number of data packets (original or not)
	       delivered so far. This is an already-existing field.

tp->delivered_mstamp: the last time tp->delivered was updated.

Algorithm:

A rate sample is calculated as (d1 - d0)/(t1 - t0) on a per-ACK basis:

  d1: the current tp->delivered after processing the ACK
  t1: the current time after processing the ACK

  d0: the prior tp->delivered when the acked skb was transmitted
  t0: the prior tp->delivered_mstamp when the acked skb was transmitted

When an skb is transmitted, we snapshot d0 and t0 in its control
block in tcp_rate_skb_sent().

When an ACK arrives, it may SACK and ACK some skbs. For each SACKed
or ACKed skb, tcp_rate_skb_delivered() updates the rate_sample struct
to reflect the latest (d0, t0).

Finally, tcp_rate_gen() generates a rate sample by storing
(d1 - d0) in rs->delivered and (t1 - t0) in rs->interval_us.

One caveat: if an skb was sent with no packets in flight, then
tp->delivered_mstamp may be either invalid (if the connection is
starting) or outdated (if the connection was idle). In that case,
we'll re-stamp tp->delivered_mstamp.

At first glance it seems t0 should always be the time when an skb was
transmitted, but actually this could over-estimate the rate due to
phase mismatch between transmit and ACK events. To track the delivery
rate, we ensure that if packets are in flight then t0 and and t1 are
times at which packets were marked delivered.

If the initial and final RTTs are different then one may be corrupted
by some sort of noise. The noise we see most often is sending gaps
caused by delayed, compressed, or stretched acks. This either affects
both RTTs equally or artificially reduces the final RTT. We approach
this by recording the info we need to compute the initial RTT
(duration of the "send phase" of the window) when we recorded the
associated inflight. Then, for a filter to avoid bandwidth
overestimates, we generalize the per-sample bandwidth computation
from:

    bw = delivered / ack_phase_rtt

to the following:

    bw = delivered / max(send_phase_rtt, ack_phase_rtt)

In large-scale experiments, this filtering approach incorporating
send_phase_rtt is effective at avoiding bandwidth overestimates due to
ACK compression or stretched ACKs.

Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 00:23:00 -04:00
Neal Cardwell
0682e6902a tcp: count packets marked lost for a TCP connection
Count the number of packets that a TCP connection marks lost.

Congestion control modules can use this loss rate information for more
intelligent decisions about how fast to send.

Specifically, this is used in TCP BBR policer detection. BBR uses a
high packet loss rate as one signal in its policer detection and
policer bandwidth estimation algorithm.

The BBR policer detection algorithm cannot simply track retransmits,
because a retransmit can be (and often is) an indicator of packets
lost long, long ago. This is particularly true in a long CA_Loss
period that repairs the initial massive losses when a policer kicks
in.

Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 00:23:00 -04:00
Neal Cardwell
6403389211 tcp: use windowed min filter library for TCP min_rtt estimation
Refactor the TCP min_rtt code to reuse the new win_minmax library in
lib/win_minmax.c to simplify the TCP code.

This is a pure refactor: the functionality is exactly the same. We
just moved the windowed min code to make TCP easier to read and
maintain, and to allow other parts of the kernel to use the windowed
min/max filter code.

Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 00:22:59 -04:00
Neal Cardwell
a4f1f9ac81 lib/win_minmax: windowed min or max estimator
This commit introduces a generic library to estimate either the min or
max value of a time-varying variable over a recent time window. This
is code originally from Kathleen Nichols. The current form of the code
is from Van Jacobson.

A single struct minmax_sample will track the estimated windowed-max
value of the series if you call minmax_running_max() or the estimated
windowed-min value of the series if you call minmax_running_min().

Nearly equivalent code is already in place for minimum RTT estimation
in the TCP stack. This commit extracts that code and generalizes it to
handle both min and max. Moving the code here reduces the footprint
and complexity of the TCP code base and makes the filter generally
available for other parts of the codebase, including an upcoming TCP
congestion control module.

This library works well for time series where the measurements are
smoothly increasing or decreasing.

Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 00:22:59 -04:00
Daniel Borkmann
36bbef52c7 bpf: direct packet write and access for helpers for clsact progs
This work implements direct packet access for helpers and direct packet
write in a similar fashion as already available for XDP types via commits
4acf6c0b84 ("bpf: enable direct packet data write for xdp progs") and
6841de8b0d ("bpf: allow helpers access the packet directly"), and as a
complementary feature to the already available direct packet read for tc
(cls/act) programs.

For enabling this, we need to introduce two helpers, bpf_skb_pull_data()
and bpf_csum_update(). The first is generally needed for both, read and
write, because they would otherwise only be limited to the current linear
skb head. Usually, when the data_end test fails, programs just bail out,
or, in the direct read case, use bpf_skb_load_bytes() as an alternative
to overcome this limitation. If such data sits in non-linear parts, we
can just pull them in once with the new helper, retest and eventually
access them.

At the same time, this also makes sure the skb is uncloned, which is, of
course, a necessary condition for direct write. As this needs to be an
invariant for the write part only, the verifier detects writes and adds
a prologue that is calling bpf_skb_pull_data() to effectively unclone the
skb from the very beginning in case it is indeed cloned. The heuristic
makes use of a similar trick that was done in 233577a220 ("net: filter:
constify detection of pkt_type_offset"). This comes at zero cost for other
programs that do not use the direct write feature. Should a program use
this feature only sparsely and has read access for the most parts with,
for example, drop return codes, then such write action can be delegated
to a tail called program for mitigating this cost of potential uncloning
to a late point in time where it would have been paid similarly with the
bpf_skb_store_bytes() as well. Advantage of direct write is that the
writes are inlined whereas the helper cannot make any length assumptions
and thus needs to generate a call to memcpy() also for small sizes, as well
as cost of helper call itself with sanity checks are avoided. Plus, when
direct read is already used, we don't need to cache or perform rechecks
on the data boundaries (due to verifier invalidating previous checks for
helpers that change skb->data), so more complex programs using rewrites
can benefit from switching to direct read plus write.

For direct packet access to helpers, we save the otherwise needed copy into
a temp struct sitting on stack memory when use-case allows. Both facilities
are enabled via may_access_direct_pkt_data() in verifier. For now, we limit
this to map helpers and csum_diff, and can successively enable other helpers
where we find it makes sense. Helpers that definitely cannot be allowed for
this are those part of bpf_helper_changes_skb_data() since they can change
underlying data, and those that write into memory as this could happen for
packet typed args when still cloned. bpf_csum_update() helper accommodates
for the fact that we need to fixup checksum_complete when using direct write
instead of bpf_skb_store_bytes(), meaning the programs can use available
helpers like bpf_csum_diff(), and implement csum_add(), csum_sub(),
csum_block_add(), csum_block_sub() equivalents in eBPF together with the
new helper. A usage example will be provided for iproute2's examples/bpf/
directory.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20 23:32:11 -04:00
Herbert Xu
ca26893f05 rhashtable: Add rhlist interface
The insecure_elasticity setting is an ugly wart brought out by
users who need to insert duplicate objects (that is, distinct
objects with identical keys) into the same table.

In fact, those users have a much bigger problem.  Once those
duplicate objects are inserted, they don't have an interface to
find them (unless you count the walker interface which walks
over the entire table).

Some users have resorted to doing a manual walk over the hash
table which is of course broken because they don't handle the
potential existence of multiple hash tables.  The result is that
they will break sporadically when they encounter a hash table
resize/rehash.

This patch provides a way out for those users, at the expense
of an extra pointer per object.  Essentially each object is now
a list of objects carrying the same key.  The hash table will
only see the lists so nothing changes as far as rhashtable is
concerned.

To use this new interface, you need to insert a struct rhlist_head
into your objects instead of struct rhash_head.  While the hash
table is unchanged, for type-safety you'll need to use struct
rhltable instead of struct rhashtable.  All the existing interfaces
have been duplicated for rhlist, including the hash table walker.

One missing feature is nulls marking because AFAIK the only potential
user of it does not need duplicate objects.  Should anyone need
this it shouldn't be too hard to add.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20 04:43:36 -04:00
Mahesh Bandewar
e8bffe0cf9 net: Add _nf_(un)register_hooks symbols
Add _nf_register_hooks() and _nf_unregister_hooks() calls which allow
caller to hold RTNL mutex.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19 01:25:22 -04:00
Nogah Frankel
2c9d85d4d8 netdevice: Add offload statistics ndo
Add a new ndo to return statistics for offloaded operation.
Since there can be many different offloaded operation with many
stats types, the ndo gets an attribute id by which it knows which
stats are wanted. The ndo also gets a void pointer to be cast according
to the attribute id.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18 22:33:41 -04:00
David S. Miller
e812bd905a wireless-drivers-next patches for 4.9
Major changes:
 
 iwlwifi
 
 * preparation for new a000 HW continues
 * some DQA improvements
 * add support for GMAC
 * add support for 9460, 9270 and 9170 series
 
 mwifiex
 
 * support random MAC address for scanning
 * add HT aggregation support for adhoc mode
 * add custom regulatory domain support
 * add manufacturing mode support via nl80211 testmode interface
 
 bcma
 
 * support BCM53573 series of wireless SoCs
 
 bitfield.h
 
 * add FIELD_PREP() and FIELD_GET() macros
 
 mt7601u
 
 * convert to use the new bitfield.h macros
 
 brcmfmac
 
 * add support for bcm4339 chip with modalias sdio:c00v02D0d4339
 
 ath10k
 
 * add nl80211 testmode support for 10.4 firmware
 * hide kernel addresses from logs using %pK format specifier
 * implement NAPI support
 * enable peer stats by default
 
 ath9k
 
 * use ieee80211_tx_status_noskb where possible
 
 wil6210
 
 * extract firmware capabilities from the firmware file
 
 ath6kl
 
 * enable firmware crash dumps on the AR6004
 
 ath-current is also merged to fix a conflict in ath10k.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJX2rF7AAoJEG4XJFUm622bD3EH/icZDT7vVxnb0VPP8jAScA4h
 bMNrI3iFxnPohO8Rzp+edWSdxEZoxwrBVk/6BHXO9PHHZwPX7/b8/OOXmLWB2X1c
 ffj1jt83RENcsZFvd5OJfDYxIq89uOkWybdD6nIUd3umKC9KeFOI5nCju31fEZrQ
 ZptqvKGIV36bbx07K8Y/PQRL2SA6T+09WqvuljLHZD5hfPGZ+GWXV2p+HAm3Moos
 iy6HUx5+pYfC+zlcmvJvL47Wxj+HppS/48ujyQ68DD2UkjOtF620YJjVy3o+njip
 GNJtCgWFDp2ar3uvRP2BfBd9FtseDTKsKusxJQvNGoSR0ON+uGIzURCznQ+2PCM=
 =FyXw
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-for-davem-2016-09-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.9

Major changes:

iwlwifi

* preparation for new a000 HW continues
* some DQA improvements
* add support for GMAC
* add support for 9460, 9270 and 9170 series

mwifiex

* support random MAC address for scanning
* add HT aggregation support for adhoc mode
* add custom regulatory domain support
* add manufacturing mode support via nl80211 testmode interface

bcma

* support BCM53573 series of wireless SoCs

bitfield.h

* add FIELD_PREP() and FIELD_GET() macros

mt7601u

* convert to use the new bitfield.h macros

brcmfmac

* add support for bcm4339 chip with modalias sdio:c00v02D0d4339

ath10k

* add nl80211 testmode support for 10.4 firmware
* hide kernel addresses from logs using %pK format specifier
* implement NAPI support
* enable peer stats by default

ath9k

* use ieee80211_tx_status_noskb where possible

wil6210

* extract firmware capabilities from the firmware file

ath6kl

* enable firmware crash dumps on the AR6004

ath-current is also merged to fix a conflict in ath10k.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17 09:53:29 -04:00
David S. Miller
b20b378d49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mediatek/mtk_eth_soc.c
	drivers/net/ethernet/qlogic/qed/qed_dcbx.c
	drivers/net/phy/Kconfig

All conflicts were cases of overlapping commits.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12 15:52:44 -07:00
Linus Torvalds
da499f8f53 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly small sets of driver fixes scattered all over the place.

   1) Mediatek driver fixes from Sean Wang.  Forward port not written
      correctly during TX map, missed handling of EPROBE_DEFER, and
      mistaken use of put_page() instead of skb_free_frag().

   2) Fix socket double-free in KCM code, from WANG Cong.

   3) QED driver fixes from Sudarsana Reddy Kalluru, including a fix for
      using the dcbx buffers before initializing them.

   4) Mellanox Switch driver fixes from Jiri Pirko, including a fix for
      double fib removals and an error handling fix in
      mlxsw_sp_module_init().

   5) Fix kernel panic when enabling LLDP in i40e driver, from Dave
      Ertman.

   6) Fix padding of TSO packets in thunderx driver, from Sunil Goutham.

   7) TCP's rcv_wup not initialized properly when using fastopen, from
      Neal Cardwell.

   8) Don't use uninitialized flow keys in flow dissector, from Gao
      Feng.

   9) Use after free in l2tp module unload, from Sabrina Dubroca.

  10) Fix interrupt registry ordering issues in smsc911x driver, from
      Jeremy Linton.

  11) Fix crashes in bonding having to do with enslaving and rx_handler,
      from Mahesh Bandewar.

  12) AF_UNIX deadlock fixes from Linus.

  13) In mlx5 driver, don't read skb->xmit_mode after it might have been
      freed from the TX reclaim path.  From Tariq Toukan.

  14) Fix a bug from 2015 in TCP Yeah where the congestion window does
      not increase, from Artem Germanov.

  15) Don't pad frames on receive in NFP driver, from Jakub Kicinski.

  16) Fix chunk fragmenting in SCTP wrt. GSO, from Marcelo Ricardo
      Leitner.

  17) Fix deletion of VRF routes, from Mark Tomlinson.

  18) Fix device refcount leak when DAD fails in ipv6, from Wei Yongjun"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
  net/mlx4_en: Fix panic on xmit while port is down
  net/mlx4_en: Fixes for DCBX
  net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state()
  net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all()
  net: ethernet: renesas: sh_eth: add POST registers for rz
  drivers: net: phy: mdio-xgene: Add hardware dependency
  dwc_eth_qos: do not register semi-initialized device
  sctp: identify chunks that need to be fragmented at IP level
  mlxsw: spectrum: Set port type before setting its address
  mlxsw: spectrum_router: Fix error path in mlxsw_sp_router_init
  nfp: don't pad frames on receive
  nfp: drop support for old firmware ABIs
  nfp: remove linux/version.h includes
  tcp: cwnd does not increase in TCP YeAH
  net/mlx5e: Fix parsing of vlan packets when updating lro header
  net/mlx5e: Fix global PFC counters replication
  net/mlx5e: Prevent casting overflow
  net/mlx5e: Move an_disable_cap bit to a new position
  net/mlx5e: Fix xmit_more counter race issue
  tcp: fastopen: avoid negative sk_forward_alloc
  ...
2016-09-12 07:56:06 -07:00
Stephen Hemminger
3a8963acc7 Revert "hv_netvsc: make inline functions static"
These functions are used by other code misc-next tree.

This reverts commit 30d1de08c8.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:23:00 -07:00
Mohamad Haj Yahia
737a234bb6 net/mlx5: Introduce attach/detach to interface API
Add attach/detach callbacks to interface API.
This is crucial for implementing seamless reset flow which releases the
hardware and it's resources upon detach while keeping software
structures and state (e.g netdev) then reset and reallocate the hardware
needed resources upon attach.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:49 -07:00
Mohamad Haj Yahia
6b6adee3da net/mlx5: SRIOV core code refactoring
Simplify the code and makes it look modular and symmetric.
Split sriov enable/disable to two levels: device level and pci level.
When user enable/disable sriov (via sriov_configure driver callback) we
will enable/disable both device and pci sriov.
When driver load/unload we will enable/disable (on demand) only device
sriov while keeping the PCI sriov enabled for next driver load.
On internal/pci error, VFs will be kept enabled on PCI and the reset
is done only in device level.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:49 -07:00
Linus Torvalds
6905732c80 Fix some brown-paper-bag bugs for fscrypto, including one one which
allows a malicious user to set an encryption policy on an empty
 directory which they do not own.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJX05q4AAoJEPL5WVaVDYGjOywH/AyXoo4d1/5H/XTakNYPxYIW
 vtBOXciHai4ZE9RygL3gdZuiyY9bTx2sc80So3KboNUdiuOJBPnuAkOQMr973UCI
 yGW3eP/RYGA1XQUbtOyFvzJMIZLKXV2ytakFeRz+m1CQF2F5F7/prKQB2j4sWHff
 JigAC67LlZSiz7L8SqtPG4uG1C8K/YEorf14dG6k37fMwE/AaBYXxkyc7MmHIKeW
 Tils0ZZcTK0U0udNSel/jRSS/qENEuLvKhFsMAlIDrCETVMidCvv2OAcT0z0z5Ln
 v+Oq0Xfutd12nfb95LUfROMtTzrtILYC2qNfDChOoFtlU8UyKmY+WT1GfYUiy8g=
 =ahmA
 -----END PGP SIGNATURE-----

Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull fscrypto fixes fromTed Ts'o:
 "Fix some brown-paper-bag bugs for fscrypto, including one one which
  allows a malicious user to set an encryption policy on an empty
  directory which they do not own"

* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  fscrypto: require write access to mount to set encryption policy
  fscrypto: only allow setting encryption policy on directories
  fscrypto: add authorization check for setting encryption policy
2016-09-10 09:18:33 -07:00
Eric Biggers
ba63f23d69 fscrypto: require write access to mount to set encryption policy
Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem.  This was handled correctly by f2fs but not by ext4.  Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-10 01:18:57 -04:00
Daniel Borkmann
f3694e0012 bpf: add BPF_CALL_x macros for declaring helpers
This work adds BPF_CALL_<n>() macros and converts all the eBPF helper functions
to use them, in a similar fashion like we do with SYSCALL_DEFINE<n>() macros
that are used today. Motivation for this is to hide all the register handling
and all necessary casts from the user, so that it is done automatically in the
background when adding a BPF_CALL_<n>() call.

This makes current helpers easier to review, eases to write future helpers,
avoids getting the casting mess wrong, and allows for extending all helpers at
once (f.e. build time checks, etc). It also helps detecting more easily in
code reviews that unused registers are not instrumented in the code by accident,
breaking compatibility with existing programs.

BPF_CALL_<n>() internals are quite similar to SYSCALL_DEFINE<n>() ones with some
fundamental differences, for example, for generating the actual helper function
that carries all u64 regs, we need to fill unused regs, so that we always end up
with 5 u64 regs as an argument.

I reviewed several 0-5 generated BPF_CALL_<n>() variants of the .i results and
they look all as expected. No sparse issue spotted. We let this also sit for a
few days with Fengguang's kbuild test robot, and there were no issues seen. On
s390, it barked on the "uses dynamic stack allocation" notice, which is an old
one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion
to the call wrapper, just telling that the perf raw record/frag sits on stack
(gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests
and they were fine as well. All eBPF helpers are now converted to use these
macros, getting rid of a good chunk of all the raw castings.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:04 -07:00
Daniel Borkmann
f035a51536 bpf: add BPF_SIZEOF and BPF_FIELD_SIZEOF macros
Add BPF_SIZEOF() and BPF_FIELD_SIZEOF() macros to improve the code a bit
which otherwise often result in overly long bytes_to_bpf_size(sizeof())
and bytes_to_bpf_size(FIELD_SIZEOF()) lines. So place them into a macro
helper instead. Moreover, we currently have a BUILD_BUG_ON(BPF_FIELD_SIZEOF())
check in convert_bpf_extensions(), but we should rather make that generic
as well and add a BUILD_BUG_ON() test in all BPF_SIZEOF()/BPF_FIELD_SIZEOF()
users to detect any rewriter size issues at compile time. Note, there are
currently none, but we want to assert that it stays this way.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:04 -07:00
Arend Van Spriel
634faf3686 brcmfmac: add support for bcm4339 chip with modalias sdio:c00v02D0d4339
The driver already supports the bcm4339 chipset but only for the variant
that shares the same modalias as the bcm4335, ie. sdio:c00v02D0d4335.
It turns out that there are also bcm4339 devices out there that have a
more distiguishable modalias sdio:c00v02D0d4339.

Reported-by: Steve deRosier <derosier@gmail.com>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:12:14 +03:00
Jakub Kicinski
3e9b3112ec add basic register-field manipulation macros
Common approach to accessing register fields is to define
structures or sets of macros containing mask and shift pair.
Operations on the register are then performed as follows:

 field = (reg >> shift) & mask;

 reg &= ~(mask << shift);
 reg |= (field & mask) << shift;

Defining shift and mask separately is tedious.  Ivo van Doorn
came up with an idea of computing them at compilation time
based on a single shifted mask (later refined by Felix) which
can be used like this:

 #define REG_FIELD 0x000ff000

 field = FIELD_GET(REG_FIELD, reg);

 reg &= ~REG_FIELD;
 reg |= FIELD_PREP(REG_FIELD, field);

FIELD_{GET,PREP} macros take care of finding out what the
appropriate shift is based on compilation time ffs operation.

GENMASK can be used to define registers (which is usually
less error-prone and easier to match with datasheets).

This approach is the most convenient I've seen so to limit code
multiplication let's move the macros to a global header file.
Attempts to use static inlines instead of macros failed due
to false positive triggering of BUILD_BUG_ON()s, especially with
GCC < 6.0.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:09:24 +03:00
Yaogong Wang
9f5afeae51 tcp: use an RB tree for ooo receive queue
Over the years, TCP BDP has increased by several orders of magnitude,
and some people are considering to reach the 2 Gbytes limit.

Even with current window scale limit of 14, ~1 Gbytes maps to ~740,000
MSS.

In presence of packet losses (or reorders), TCP stores incoming packets
into an out of order queue, and number of skbs sitting there waiting for
the missing packets to be received can be in the 10^5 range.

Most packets are appended to the tail of this queue, and when
packets can finally be transferred to receive queue, we scan the queue
from its head.

However, in presence of heavy losses, we might have to find an arbitrary
point in this queue, involving a linear scan for every incoming packet,
throwing away cpu caches.

This patch converts it to a RB tree, to get bounded latencies.

Yaogong wrote a preliminary patch about 2 years ago.
Eric did the rebase, added ofo_last_skb cache, polishing and tests.

Tested with network dropping between 1 and 10 % packets, with good
success (about 30 % increase of throughput in stress tests)

Next step would be to also use an RB tree for the write queue at sender
side ;)

Signed-off-by: Yaogong Wang <wygivan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-By: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:25:58 -07:00
Eric Garver
fe19c4f971 vlan: Check for vlan ethernet types for 8021.q or 802.1ad
This is to simplify using double tagged vlans. This function allows all
valid vlan ethertypes to be checked in a single function call.
Also replace some instances that check for both ETH_P_8021Q and
ETH_P_8021AD.

Patch based on one originally by Thomas F Herbert.

Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:10:28 -07:00
Bodong Wang
e7e31ca43d net/mlx5e: Move an_disable_cap bit to a new position
Previous an_disable_cap position bit31 is deprecated to be use in driver
with newer firmware.  New firmware will advertise the same capability
in bit29.

Old capability didn't allow setting more than one protocol for a
specific speed when autoneg is off, while newer firmware will allow
this and it is indicated in the new capability location.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:28 -07:00
Lorenzo Colitti
d545caca82 net: inet: diag: expose the socket mark to privileged processes.
This adds the capability for a process that has CAP_NET_ADMIN on
a socket to see the socket mark in socket dumps.

Commit a52e95abf7 ("net: diag: allow socket bytecode filters to
match socket marks") recently gave privileged processes the
ability to filter socket dumps based on mark. This patch is
complementary: it ensures that the mark is also passed to
userspace in the socket's netlink attributes.  It is useful for
tools like ss which display information about sockets.

Tested: https://android-review.googlesource.com/270210
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:13:09 -07:00
Tomer Tayar
e0971c832a qed*: Add support for the ethtool get_regs operation
Signed-off-by: Tomer Tayar <Tomer.Tayar@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-07 17:46:59 -07:00
Tomer Tayar
c965db4446 qed: Add support for debug data collection
This patch adds the support for dumping and formatting the HW/FW debug data.

Signed-off-by: Tomer Tayar <Tomer.Tayar@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-07 17:46:59 -07:00
Kees Cook
a85d6b8242 usercopy: force check_object_size() inline
Just for good measure, make sure that check_object_size() is always
inlined too, as already done for copy_*_user() and __copy_*_user().

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-09-07 11:33:26 -07:00
David S. Miller
60175ccdf4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree.  Most relevant updates are the removal of per-conntrack timers to
use a workqueue/garbage collection approach instead from Florian
Westphal, the hash and numgen expression for nf_tables from Laura
Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
removal of ip_conntrack sysctl and many other incremental updates on our
Netfilter codebase.

More specifically, they are:

1) Retrieve only 4 bytes to fetch ports in case of non-linear skb
   transport area in dccp, sctp, tcp, udp and udplite protocol
   conntrackers, from Gao Feng.

2) Missing whitespace on error message in physdev match, from Hangbin Liu.

3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang.

4) Add nf_ct_expires() helper function and use it, from Florian Westphal.

5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also
   from Florian.

6) Rename nf_tables set implementation to nft_set_{name}.c

7) Introduce the hash expression to allow arbitrary hashing of selector
   concatenations, from Laura Garcia Liebana.

8) Remove ip_conntrack sysctl backward compatibility code, this code has
   been around for long time already, and we have two interfaces to do
   this already: nf_conntrack sysctl and ctnetlink.

9) Use nf_conntrack_get_ht() helper function whenever possible, instead
   of opencoding fetch of hashtable pointer and size, patch from Liping Zhang.

10) Add quota expression for nf_tables.

11) Add number generator expression for nf_tables, this supports
    incremental and random generators that can be combined with maps,
    very useful for load balancing purpose, again from Laura Garcia Liebana.

12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King.

13) Introduce a nft_chain_parse_hook() helper function to parse chain hook
    configuration, this is used by a follow up patch to perform better chain
    update validation.

14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the
    nft_set_hash implementation to honor the NLM_F_EXCL flag.

15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(),
    patch from Florian Westphal.

16) Don't use the DYING bit to know if the conntrack event has been already
    delivered, instead a state variable to track event re-delivery
    states, also from Florian.

17) Remove the per-conntrack timer, use the workqueue approach that was
    discussed during the NFWS, from Florian Westphal.

18) Use the netlink conntrack table dump path to kill stale entries,
    again from Florian.

19) Add a garbage collector to get rid of stale conntracks, from
    Florian.

20) Reschedule garbage collector if eviction rate is high.

21) Get rid of the __nf_ct_kill_acct() helper.

22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger.

23) Make nf_log_set() interface assertive on unsupported families.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06 12:45:26 -07:00
Kees Cook
81409e9e28 usercopy: fold builtin_const check into inline function
Instead of having each caller of check_object_size() need to remember to
check for a const size parameter, move the check into check_object_size()
itself. This actually matches the original implementation in PaX, though
this commit cleans up the now-redundant builtin_const() calls in the
various architectures.

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-09-06 12:17:29 -07:00
Mahesh Bandewar
24b27fc4cd bonding: Fix bonding crash
Following few steps will crash kernel -

  (a) Create bonding master
      > modprobe bonding miimon=50
  (b) Create macvlan bridge on eth2
      > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
	   type macvlan
  (c) Now try adding eth2 into the bond
      > echo +eth2 > /sys/class/net/bond0/bonding/slaves
      <crash>

Bonding does lots of things before checking if the device enslaved is
busy or not.

In this case when the notifier call-chain sends notifications, the
bond_netdev_event() assumes that the rx_handler /rx_handler_data is
registered while the bond_enslave() hasn't progressed far enough to
register rx_handler for the new slave.

This patch adds a rx_handler check that can be performed right at the
beginning of the enslave code to avoid getting into this situation.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04 11:41:12 -07:00
Linus Torvalds
018c81b827 Staging/IIO driver fixes for 4.8-rc5
Here are a number of small fixes for staging and IIO drivers that
 resolve reported problems.
 
 Full details are in the shortlog.  All of these have been in linux-next
 with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iFYEABECABYFAlfK4HoPHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspt0MAn0wC
 dYhZOUHxOptLiEkVGXFCU9kzAJ4gETEbuGn9lgp2TFATOOAN7oqPUw==
 =6MKk
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO driver fixes from Greg KH:
 "Here are a number of small fixes for staging and IIO drivers that
  resolve reported problems.

  Full details are in the shortlog.  All of these have been in
  linux-next with no reported issues"

* tag 'staging-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (35 commits)
  arm: dts: rockchip: add reset node for the exist saradc SoCs
  arm64: dts: rockchip: add reset saradc node for rk3368 SoCs
  iio: adc: rockchip_saradc: reset saradc controller before programming it
  iio: accel: kxsd9: Fix raw read return
  iio: adc: ti_am335x_adc: Increase timeout value waiting for ADC sample
  iio: adc: ti_am335x_adc: Protect FIFO1 from concurrent access
  include/linux: fix excess fence.h kernel-doc notation
  staging: wilc1000: correctly check if associatedsta has not been found
  staging: wilc1000: NULL dereference on error
  staging: wilc1000: txq_event: Fix coding error
  MAINTAINERS: Add file patterns for ion device tree bindings
  MAINTAINERS: Update maintainer entry for wilc1000
  iio: chemical: atlas-ph-sensor: fix typo in val assignment
  iio: fix sched WARNING "do not call blocking ops when !TASK_RUNNING"
  staging: comedi: ni_mio_common: fix AO inttrig backwards compatibility
  staging: comedi: dt2811: fix a precedence bug
  staging: comedi: adv_pci1760: Do not return EINVAL for CMDF_ROUND_DOWN.
  staging: comedi: ni_mio_common: fix wrong insn_write handler
  staging: comedi: comedi_test: fix timer race conditions
  staging: comedi: daqboard2000: bug fix board type matching code
  ...
2016-09-03 11:33:33 -07:00
Linus Torvalds
39da979c98 Serial driver fixes for 4.8-rc5
Here are some small serial driver fixes for 4.8-rc5.  One fixes an
 oft-reported build issue with the fintek driver, another reverts a patch
 that was causing problems, one fixes a crash, and some new device ids
 were added.
 
 All of these have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iFYEABECABYFAlfK4ScPHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspmEcAn0FE
 lUzWBQb4to15YXhl8wtNF9ZbAJ9Gi2r5MEfXlnLStI3XM/gq8BdXGw==
 =pkwv
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull serial driver fixes from Greg KH:
 "Here are some small serial driver fixes for 4.8-rc5.  One fixes an
  oft-reported build issue with the fintek driver, another reverts a
  patch that was causing problems, one fixes a crash, and some new
  device ids were added.

  All of these have been in linux-next for a while"

* tag 'tty-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250: added acces i/o products quad and octal serial cards
  serial: 8250_mid: fix divide error bug if baud rate is 0
  Revert "tty/serial/8250: use mctrl_gpio helpers"
  8250/fintek: rename IRQ_MODE macro
2016-09-03 11:29:31 -07:00
Linus Torvalds
70dad4998e USB/PHY fixes for 4.8-rc5
Here are some USB and PHY driver fixes for 4.8-rc5
 
 Nothing major, lots of little fixes for reported bugs, and a build fix
 for a missing .h file that the phy drivers needed.  All of these have
 been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iFYEABECABYFAlfK4h8PHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspLTMAoMNL
 Q3TlSgupOTzsV1MMRdRMuU6fAJ4/jMOV5h8XTx9ETmhx8N9RB63QTw==
 =RTNl
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/PHY fixes from Greg KH:
 "Here are some USB and PHY driver fixes for 4.8-rc5

  Nothing major, lots of little fixes for reported bugs, and a build fix
  for a missing .h file that the phy drivers needed.  All of these have
  been in linux-next for a while with no reported issues"

* tag 'usb-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits)
  usb: musb: Fix locking errors for host only mode
  usb: dwc3: gadget: always decrement by 1
  usb: dwc3: debug: fix ep name on trace output
  usb: gadget: udc: core: don't starve DMA resources
  USB: serial: option: add WeTelecom 0x6802 and 0x6803 products
  USB: avoid left shift by -1
  USB: fix typo in wMaxPacketSize validation
  usb: gadget: Add the gserial port checking in gs_start_tx()
  usb: dwc3: gadget: don't rely on jiffies while holding spinlock
  usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame()
  usb: gadget: function: f_rndis: socket buffer may be NULL
  usb: gadget: function: f_eem: socket buffer may be NULL
  usb: renesas_usbhs: gadget: fix return value check in usbhs_mod_gadget_probe()
  usb: dwc2: Add reset control to dwc2
  usb: dwc3: core: allow device to runtime_suspend several times
  usb: dwc3: pci: runtime_resume child device
  USB: serial: option: add WeTelecom WM-D200
  usb: chipidea: udc: don't touch DP when controller is in host mode
  USB: serial: mos7840: fix non-atomic allocation in write path
  USB: serial: mos7720: fix non-atomic allocation in write path
  ...
2016-09-03 11:24:23 -07:00
Rafał Miłecki
3f37ec79dd bcma: support BCM53573 series of wireless SoCs
BCM53573 seems to be the first series of Northstar family with wireless
on the chip. The base models are BCM53573-s (A0, A1) and there is also
BCM47189B0 which seems to be some small modification.

The only problem with these chipsets seems to be watchdog. It's totally
unavailable on 53573A0 / 53573A1 and preferable PMU watchdog is broken
on 53573B0 / 53573B1.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-03 12:58:42 +03:00
Linus Torvalds
0141af184a Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A collection of fixes for the nvme over fabrics code"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvme-rdma: Get rid of redundant defines
  nvme-rdma: Get rid of duplicate variable
  nvme: fabrics drivers don't need the nvme-pci driver
  nvme-fabrics: get a reference when reusing a nvme_host structure
  nvme-fabrics: change NQN UUID to big-endian format
  nvme-loop: set sqsize to 0-based value, per spec
  nvme-rdma: fix sqsize/hsqsize per spec
  fabrics: define admin sqsize min default, per spec
  nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
  nvmet-rdma: Fix use after free
  nvme-rdma: initialize ret to zero to avoid returning garbage
2016-09-02 21:05:38 -07:00
Linus Torvalds
601b586994 ACPI fixes for v4.8-rc5
Two stable-candidate fixes for the ACPI early device probing code
 added during the 4.4 cycle, one fixing a typo in a stub macro used
 when CONFIG_ACPI is unset and one that prevents sleeping functions
 from being called under a spinlock (Lorenzo Pieralisi).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXyex0AAoJEILEb/54YlRxdlsP/jMcKQ/6dPNxaZjr+073m9/X
 J7SGlzN8JeO5JdKHoTrfjI50tb1JUf+giZD72/v5f6/LjKJTCQAPqTRBKdhvBc0B
 glqUvHVA0Pvag4tCKLlb8amF1hcYg4Gb14wzyk6IOq0bfeNdOm4vOIMZUOFcOZzO
 eTXzne5e7OUpcmRaO07wv/QCKxi0dpW4CAQobhO18CJ9dw66D/wUNfCO7Bq10Hm8
 ZcgEjxW4tcltJuffw9w8Qxi9bbGTscG7BxRcOM0E9MC32xFuk+Os5y+dq47qtCTq
 NQiar/bZC/NY9OcmSF7H0UwMVRar44LlQtxjCefcJzYS9xMbPgwgcecfe8gOBI/q
 Fmj+tPC1y2QJQFQWbJT1uW8aRJzt4mKHe+GutXUhTiJWQ0DUj7IFVR11OfR5K0Fd
 uB2Cf0waOSAnS3lOmAS4hLJAVL5fhl4bmTtDbDNU8wSTt7opMM38ltL1qfX8sgiR
 Ig9tYW4IVEZuZPqIsdcQ1FOtnitZ9RXIcWuR1aYNadqimZqVO+78KjdnfxQ0e5Fs
 KBOde+Pb7DogZbF11dbErjXTPsBfP6WYC3nIsxZmg9VefipNqt4PSy4ZQUZctVv9
 8ENuv0GMHkK1qdgcet+v7mwgDYyGHeL45j6zjuLNgN35FsbyZRhzQ0EeDQAfMCeT
 fjHCLBtYGJ6J/Du4dly3
 =tsb6
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes ffrom Rafael Wysocki:
 "Two stable-candidate fixes for the ACPI early device probing code
  added during the 4.4 cycle, one fixing a typo in a stub macro used
  when CONFIG_ACPI is unset and one that prevents sleeping functions
  from being called under a spinlock (Lorenzo Pieralisi)"

* tag 'acpi-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / drivers: replace acpi_probe_lock spinlock with mutex
  ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro
2016-09-02 15:16:04 -07:00
Lorenzo Pieralisi
3feab13c91 ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro
When the ACPI_DECLARE_PROBE_ENTRY macro was added in
commit e647b53227 ("ACPI: Add early device probing infrastructure"),
a stub macro adding an unused entry was added for the !CONFIG_ACPI
Kconfig option case to make sure kernel code making use of the
macro did not require to be guarded within CONFIG_ACPI in order to
be compiled.

The stub macro was never used since all kernel code that defines
ACPI_DECLARE_PROBE_ENTRY entries is currently guarded within
CONFIG_ACPI; it contains a typo that should be nonetheless fixed.

Fix the typo in the stub (ie !CONFIG_ACPI) ACPI_DECLARE_PROBE_ENTRY()
macro so that it can actually be used if needed.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Fixes: e647b53227 (ACPI: Add early device probing infrastructure)
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-02 22:21:34 +02:00
Alexei Starovoitov
aa6a5f3cb2 perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs
Allow attaching BPF_PROG_TYPE_PERF_EVENT programs to sw and hw perf events
via overflow_handler mechanism.
When program is attached the overflow_handlers become stacked.
The program acts as a filter.
Returning zero from the program means that the normal perf_event_output handler
will not be called and sampling event won't be stored in the ring buffer.

The overflow_handler_context==NULL is an additional safety check
to make sure programs are not attached to hw breakpoints and watchdog
in case other checks (that prevent that now anyway) get accidentally
relaxed in the future.

The program refcnt is incremented in case perf_events are inhereted
when target task is forked.
Similar to kprobe and tracepoint programs there is no ioctl to
detach the program or swap already attached program. The user space
expected to close(perf_event_fd) like it does right now for kprobe+bpf.
That restriction simplifies the code quite a bit.

The invocation of overflow_handler in __perf_event_overflow() is now
done via READ_ONCE, since that pointer can be replaced when the program
is attached while perf_event itself could have been active already.
There is no need to do similar treatment for event->prog, since it's
assigned only once before it's accessed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02 10:46:44 -07:00
Alexei Starovoitov
0515e5999a bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type
Introduce BPF_PROG_TYPE_PERF_EVENT programs that can be attached to
HW and SW perf events (PERF_TYPE_HARDWARE and PERF_TYPE_SOFTWARE
correspondingly in uapi/linux/perf_event.h)

The program visible context meta structure is
struct bpf_perf_event_data {
    struct pt_regs regs;
     __u64 sample_period;
};
which is accessible directly from the program:
int bpf_prog(struct bpf_perf_event_data *ctx)
{
  ... ctx->sample_period ...
  ... ctx->regs.ip ...
}

The bpf verifier rewrites the accesses into kernel internal
struct bpf_perf_event_data_kern which allows changing
struct perf_sample_data without affecting bpf programs.
New fields can be added to the end of struct bpf_perf_event_data
in the future.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02 10:46:44 -07:00
Linus Torvalds
f28929ba36 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
 "Most of this is regression fixes for posix acl behavior introduced in
  4.8-rc1 (these were caught by the pjd-fstest suite).  The are also
  miscellaneous fixes marked as stable material and cleanups.

  Other than overlayfs code, it touches <linux/fs.h> to add a constant
  with which to disable posix acl caching.  No changes needed to the
  actual caching code, it automatically does the right thing, although
  later we may want to optimize this case.

  I'm now testing overlayfs with the following test suites to catch
  regressions:

   - unionmount-testsuite
   - xfstests
   - pjd-fstest"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: update doc
  ovl: listxattr: use strnlen()
  ovl: Switch to generic_getxattr
  ovl: copyattr after setting POSIX ACL
  ovl: Switch to generic_removexattr
  ovl: Get rid of ovl_xattr_noacl_handlers array
  ovl: Fix OVL_XATTR_PREFIX
  ovl: fix spelling mistake: "directries" -> "directories"
  ovl: don't cache acl on overlay layer
  ovl: use cached acl on underlying layer
  ovl: proper cleanup of workdir
  ovl: remove posix_acl_default from workdir
  ovl: handle umask and posix_acl_default correctly on creation
  ovl: don't copy up opaqueness
2016-09-02 09:32:15 -07:00
Nikolay Aleksandrov
b6cb5ac833 net: bridge: add per-port multicast flood flag
Add a per-port flag to control the unknown multicast flood, similar to the
unknown unicast flood flag and break a few long lines in the netlink flag
exports.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 22:48:33 -07:00
Linus Torvalds
b9677faf45 Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "14 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  rapidio/tsi721: fix incorrect detection of address translation condition
  rapidio/documentation/mport_cdev: add missing parameter description
  kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
  MAINTAINERS: Vladimir has moved
  mm, mempolicy: task->mempolicy must be NULL before dropping final reference
  printk/nmi: avoid direct printk()-s from __printk_nmi_flush()
  treewide: remove references to the now unnecessary DEFINE_PCI_DEVICE_TABLE
  drivers/scsi/wd719x.c: remove last declaration using DEFINE_PCI_DEVICE_TABLE
  mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator
  lib/test_hash.c: fix warning in preprocessor symbol evaluation
  lib/test_hash.c: fix warning in two-dimensional array init
  kconfig: tinyconfig: provide whole choice blocks to avoid warnings
  kexec: fix double-free when failing to relocate the purgatory
  mm, oom: prevent premature OOM killer invocation for high order request
2016-09-01 18:23:22 -07:00
David Rientjes
c11600e4fe mm, mempolicy: task->mempolicy must be NULL before dropping final reference
KASAN allocates memory from the page allocator as part of
kmem_cache_free(), and that can reference current->mempolicy through any
number of allocation functions.  It needs to be NULL'd out before the
final reference is dropped to prevent a use-after-free bug:

	BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr ffff88010b48102c
	CPU: 0 PID: 15425 Comm: trinity-c2 Not tainted 4.8.0-rc2+ #140
	...
	Call Trace:
		dump_stack
		kasan_object_err
		kasan_report_error
		__asan_report_load2_noabort
		alloc_pages_current	<-- use after free
		depot_save_stack
		save_stack
		kasan_slab_free
		kmem_cache_free
		__mpol_put		<-- free
		do_exit

This patch sets current->mempolicy to NULL before dropping the final
reference.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1608301442180.63329@chino.kir.corp.google.com
Fixes: cd11016e5f ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>	[4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-01 17:52:01 -07:00