Commit graph

56861 commits

Author SHA1 Message Date
Russell King
770a1ad557 phylink: add module EEPROM support
Add support for reading module EEPROMs through phylink.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-06 20:55:29 -07:00
Russell King
ce0aa27ff3 sfp: add sfp-bus to bridge between network devices and sfp cages
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-06 20:55:29 -07:00
Russell King
9525ae8395 phylink: add phylink infrastructure
The link between the ethernet MAC and its PHY has become more complex
as the interface evolves.  This is especially true with serdes links,
where the part of the PHY is effectively integrated into the MAC.

Serdes links can be connected to a variety of devices, including SFF
modules soldered down onto the board with the MAC, a SFP cage with
a hotpluggable SFP module which may contain a PHY or directly modulate
the serdes signals onto optical media with or without a PHY, or even
a classical PHY connection.

Moreover, the negotiation information on serdes links comes in two
varieties - SGMII mode, where the PHY provides its speed/duplex/flow
control information to the MAC, and 1000base-X mode where both ends
exchange their abilities and each resolve the link capabilities.

This means we need a more flexible means to support these arrangements,
particularly with the hotpluggable nature of SFP, where the PHY can
be attached or detached after the network device has been brought up.

Ethtool information can come from multiple sources:
- we may have a PHY operating in either SGMII or 1000base-X mode, in
  which case we take ethtool/mii data directly from the PHY.
- we may have a optical SFP module without a PHY, with the MAC
  operating in 1000base-X mode - the ethtool/mii data needs to come
  from the MAC.
- we may have a copper SFP module with a PHY whic can't be accessed,
  which means we need to take ethtool/mii data from the MAC.

Phylink aims to solve this by providing an intermediary between the
MAC and PHY, providing a safe way for PHYs to be hotplugged, and
allowing a SFP driver to reconfigure the serdes connection.

Phylink also takes over support of fixed link connections, where the
speed/duplex/flow control are fixed, but link status may be controlled
by a GPIO signal.  By avoiding the fixed-phy implementation, phylink
can provide a faster response to link events: fixed-phy has to wait for
phylib to operate its state machine, which can take several seconds.
In comparison, phylink takes milliseconds.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

- remove sync status
- rework supported and advertisment handling
- add 1000base-x speed for fixed links
- use functionality exported from phy-core, reworking
  __phylink_ethtool_ksettings_set for it
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-06 20:55:29 -07:00
Russell King
a81497bee7 net: phy: provide a hook for link up/link down events
Sometimes, we need to do additional work between the PHY coming up and
marking the carrier present - for example, we may need to wait for the
PHY to MAC link to finish negotiation.  This changes phylib to provide
a notification function pointer which avoids the built-in
netif_carrier_on() and netif_carrier_off() functions.

Standard ->adjust_link functionality is provided by hooking a helper
into the new ->phy_link_change method.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-06 20:55:28 -07:00
Russell King
0ccb4fc65d net: phy: move phy_lookup_setting() and guts of phy_supported_speeds() to phy-core
phy_lookup_setting() provides useful functionality in ethtool code
outside phylib.  Move it to phy-core and allow it to be re-used (eg,
in phylink) rather than duplicated elsewhere.  Note that this supports
the larger linkmode space.

As we move the phy settings table, we also need to move the guts of
phy_supported_speeds() as well.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-06 20:55:28 -07:00
Russell King
da4625ac26 net: phy: split out PHY speed and duplex string generation
Other code would like to make use of this, so make the speed and duplex
string generation visible, and place it in a separate file.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-06 20:55:28 -07:00
Willem de Bruijn
a91dbff551 sock: ulimit on MSG_ZEROCOPY pages
Bound the number of pages that a user may pin.

Follow the lead of perf tools to maintain a per-user bound on memory
locked pages commit 789f90fcf6 ("perf_counter: per user mlock gift")

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 21:37:30 -07:00
Willem de Bruijn
4ab6c99d99 sock: MSG_ZEROCOPY notification coalescing
In the simple case, each sendmsg() call generates data and eventually
a zerocopy ready notification N, where N indicates the Nth successful
invocation of sendmsg() with the MSG_ZEROCOPY flag on this socket.

TCP and corked sockets can cause send() calls to append new data to an
existing sk_buff and, thus, ubuf_info. In that case the notification
must hold a range. odify ubuf_info to store a inclusive range [N..N+m]
and add skb_zerocopy_realloc() to optionally extend an existing range.

Also coalesce notifications in this common case: if a notification
[1, 1] is about to be queued while [0, 0] is the queue tail, just modify
the head of the queue to read [0, 1].

Coalescing is limited to a few TSO frames worth of data to bound
notification latency.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 21:37:30 -07:00
Willem de Bruijn
1f8b977ab3 sock: enable MSG_ZEROCOPY
Prepare the datapath for refcounted ubuf_info. Clone ubuf_info with
skb_zerocopy_clone() wherever needed due to skb split, merge, resize
or clone.

Split skb_orphan_frags into two variants. The split, merge, .. paths
support reference counted zerocopy buffers, so do not do a deep copy.
Add skb_orphan_frags_rx for paths that may loop packets to receive
sockets. That is not allowed, as it may cause unbounded latency.
Deep copy all zerocopy copy buffers, ref-counted or not, in this path.

The exact locations to modify were chosen by exhaustively searching
through all code that might modify skb_frag references and/or the
the SKBTX_DEV_ZEROCOPY tx_flags bit.

The changes err on the safe side, in two ways.

(1) legacy ubuf_info paths virtio and tap are not modified. They keep
    a 1:1 ubuf_info to sk_buff relationship. Calls to skb_orphan_frags
    still call skb_copy_ubufs and thus copy frags in this case.

(2) not all copies deep in the stack are addressed yet. skb_shift,
    skb_split and skb_try_coalesce can be refined to avoid copying.
    These are not in the hot path and this patch is hairy enough as
    is, so that is left for future refinement.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 21:37:30 -07:00
Willem de Bruijn
52267790ef sock: add MSG_ZEROCOPY
The kernel supports zerocopy sendmsg in virtio and tap. Expand the
infrastructure to support other socket types. Introduce a completion
notification channel over the socket error queue. Notifications are
returned with ee_origin SO_EE_ORIGIN_ZEROCOPY. ee_errno is 0 to avoid
blocking the send/recv path on receiving notifications.

Add reference counting, to support the skb split, merge, resize and
clone operations possible with SOCK_STREAM and other socket types.

The patch does not yet modify any datapaths.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 21:37:29 -07:00
Willem de Bruijn
3ece782693 sock: skb_copy_ubufs support for compound pages
Refine skb_copy_ubufs to support compound pages. With upcoming TCP
zerocopy sendmsg, such fragments may appear.

The existing code replaces each page one for one. Splitting each
compound page into an independent number of regular pages can result
in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned.

Instead, fill all destination pages but the last to PAGE_SIZE.
Split the existing alloc + copy loop into separate stages:
1. compute bytelength and minimum number of pages to store this.
2. allocate
3. copy, filling each page except the last to PAGE_SIZE bytes
4. update skb frag array

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 21:37:29 -07:00
Xin Long
bb96dec745 sctp: remove the typedef sctp_auth_chunk_t
This patch is to remove the typedef sctp_auth_chunk_t, and
replace with struct sctp_auth_chunk in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
96f7ef4d58 sctp: remove the typedef sctp_authhdr_t
This patch is to remove the typedef sctp_authhdr_t, and
replace with struct sctp_authhdr in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
68d7546946 sctp: remove the typedef sctp_addip_chunk_t
This patch is to remove the typedef sctp_addip_chunk_t, and
replace with struct sctp_addip_chunk in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
65205cc465 sctp: remove the typedef sctp_addiphdr_t
This patch is to remove the typedef sctp_addiphdr_t, and
replace with struct sctp_addiphdr in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
8b32f2348a sctp: remove the typedef sctp_addip_param_t
This patch is to remove the typedef sctp_addip_param_t, and
replace with struct sctp_addip_param in the places where it's
using this typedef.

It is to use sizeof(variable) instead of sizeof(type), and
also fix some indent problems.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
05b25d0ba6 sctp: remove the typedef sctp_cwr_chunk_t
Remove this typedef including the struct, there is even no places
using it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
65f7710543 sctp: remove the typedef sctp_cwrhdr_t
This patch is to remove the typedef sctp_cwrhdr_t, and
replace with struct sctp_cwrhdr in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
b515fd2759 sctp: remove the typedef sctp_ecne_chunk_t
This patch is to remove the typedef sctp_ecne_chunk_t, and
replace with struct sctp_ecne_chunk in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
1fb6d83bd3 sctp: remove the typedef sctp_ecnehdr_t
This patch is to remove the typedef sctp_ecnehdr_t, and
replace with struct sctp_ecnehdr in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
2a49321677 sctp: remove the typedef sctp_error_t
This patch is to remove the typedef sctp_error_t, and replace
with enum sctp_error in the places where it's using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:47 -07:00
Xin Long
87caeba791 sctp: remove the typedef sctp_operr_chunk_t
This patch is to remove the typedef sctp_operr_chunk_t, and
replace with struct sctp_operr_chunk in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:46 -07:00
Xin Long
d8238d9dab sctp: remove the typedef sctp_errhdr_t
This patch is to remove the typedef sctp_errhdr_t, and replace
with struct sctp_errhdr in the places where it's using this
typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:46 -07:00
Xin Long
ac23e68133 sctp: fix the name of struct sctp_shutdown_chunk_t
This patch is to fix the name of struct sctp_shutdown_chunk_t
, replace with struct sctp_initack_chunk in the places where
it's using it.

It is also to fix some indent problem.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:46 -07:00
Xin Long
e61e4055b1 sctp: remove the typedef sctp_shutdownhdr_t
This patch is to remove the typedef sctp_shutdownhdr_t, and
replace with struct sctp_shutdownhdr in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 09:45:46 -07:00
WANG Cong
b2f9d432de flow_dissector: remove unused functions
They are introduced by commit f70ea018da
("net: Add functions to get skb->hash based on flow structures")
but never gets used in tree.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:50:03 -07:00
Willem de Bruijn
c613c209c3 net: add skb_frag_foreach_page and use with kmap_atomic
Skb frags may contain compound pages. Various operations map frags
temporarily using kmap_atomic, but this function works on single
pages, not whole compound pages. The distinction is only relevant
for high mem pages that require temporary mappings.

Introduce a looping mechanism that for compound highmem pages maps
one page at a time, does not change behavior on other pages.
Use the loop in the kmap_atomic callers in net/core/skbuff.c.

Verified by triggering skb_copy_bits with

    tcpdump -n -c 100 -i ${DEV} -w /dev/null &
    netperf -t TCP_STREAM -H ${HOST}

  and by triggering __skb_checksum with

    ethtool -K ${DEV} tx off

  repeated the tests with looping on a non-highmem platform
  (x86_64) by making skb_frag_must_loop always return true.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 16:07:10 -07:00
Tom Herbert
20bf50de30 skbuff: Function to send an skbuf on a socket
Add skb_send_sock to send an skbuff on a socket within the kernel.
Arguments include an offset so that an skbuf might be sent in mulitple
calls (e.g. send buffer limit is hit).

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:26:18 -07:00
Tom Herbert
306b13eb3c proto_ops: Add locked held versions of sendmsg and sendpage
Add new proto_ops sendmsg_locked and sendpage_locked that can be
called when the socket lock is already held. Correspondingly, add
kernel_sendmsg_locked and kernel_sendpage_locked as front end
functions.

These functions will be used in zero proxy so that we can take
the socket lock in a ULP sendmsg/sendpage and then directly call the
backend transport proto_ops functions.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 15:26:18 -07:00
David S. Miller
29fda25a2d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two minor conflicts in virtio_net driver (bug fix overlapping addition
of a helper) and MAINTAINERS (new driver edit overlapping revamp of
PHY entry).

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 10:07:50 -07:00
Linus Torvalds
bc78d646e7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Handle notifier registry failures properly in tun/tap driver, from
    Tonghao Zhang.

 2) Fix bpf verifier handling of subtraction bounds and add a testcase
    for this, from Edward Cree.

 3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt.

 4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET,
    from Cong Wang.

 5) Fix SElinux regression due to recent UDP optimizations, from Paolo
    Abeni.

 6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code
    paths, fix from Stefano Brivio.

 7) Fix some mem leaks in dccp, from Xin Long.

 8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd
    Bergmann.

 9) Mac address length check and buffer size fixes from Cong Wang.

10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni.

11) Fix return value when copy_from_user() fails in
    bpf_prog_get_info_by_fd(), from Daniel Borkmann.

12) Handle PHY_HALTED properly in phy library state machine, from
    Florian Fainelli.

13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel.

14) Fix truesize calculation in virtio_net which led to performance
    regressions, from Michael S Tsirkin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  samples/bpf: fix bpf tunnel cleanup
  udp6: fix jumbogram reception
  ppp: Fix a scheduling-while-atomic bug in del_chan
  Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
  virtio_net: fix truesize for mergeable buffers
  mv643xx_eth: fix of_irq_to_resource() error check
  MAINTAINERS: Add more files to the PHY LIBRARY section
  ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
  net: phy: Correctly process PHY_HALTED in phy_stop_machine()
  sunhme: fix up GREG_STAT and GREG_IMASK register offsets
  bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
  tcp: avoid bogus gcc-7 array-bounds warning
  net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
  bpf: don't indicate success when copy_from_user fails
  udp6: fix socket leak on early demux
  net: thunderx: Fix BGX transmit stall due to underflow
  Revert "vhost: cache used event for better performance"
  team: use a larger struct for mac address
  net: check dev->addr_len for dev_set_mac_address()
  phy: bcm-ns-usb3: fix MDIO_BUS dependency
  ...
2017-07-31 22:36:42 -07:00
Paolo Abeni
cb891fa6a1 udp6: fix jumbogram reception
Since commit 67a51780ae ("ipv6: udp: leverage scratch area
helpers") udp6_recvmsg() read the skb len from the scratch area,
to avoid a cache miss.
But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
length exceeds the 16 bits available in the scratch area. As a side
effect the length returned by recvmsg() is:
<ingress datagram len> % (1<<16)

This commit addresses the issue allocating one more bit in the
IP6CB flags field and setting it for incoming jumbograms.
Such field is still in the first cacheline, so at recvmsg()
time we can check it and fallback to access skb->len if
required, without a measurable overhead.

Fixes: 67a51780ae ("ipv6: udp: leverage scratch area helpers")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 22:01:21 -07:00
Florian Fainelli
f248aff86d net: phy: mdio-bcm-unimac: Allow specifying platform data
In preparation for having the bcmgenet driver migrate over the
mdio-bcm-unimac driver, add a platform data structure which allows
passing integrating specific details like bus name, wait function to
complete MDIO operations and PHY mask.

We also define what the platform device name contract is by defining
UNIMAC_MDIO_DRV_NAME and moving it to the platform_data header.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 14:40:58 -07:00
Florian Westphal
45f119bf93 tcp: remove header prediction
Like prequeue, I am not sure this is overly useful nowadays.

If we receive a train of packets, GRO will aggregate them if the
headers are the same (HP predates GRO by several years) so we don't
get a per-packet benefit, only a per-aggregated-packet one.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 14:37:49 -07:00
Florian Westphal
e7942d0633 tcp: remove prequeue support
prequeue is a tcp receive optimization that moves part of rx processing
from bh to process context.

This only works if the socket being processed belongs to a process that
is blocked in recv on that socket.

In practice, this doesn't happen anymore that often because nowadays
servers tend to use an event driven (epoll) model.

Even normal client applications (web browsers) commonly use many tcp
connections in parallel.

This has measureable impact only in netperf (which uses plain recv and
thus allows prequeue use) from host to locally running vm (~4%), however,
there were no changes when using netperf between two physical hosts with
ixgbe interfaces.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 14:37:49 -07:00
Linus Torvalds
ff2620f778 Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
 "Two notable fixes.

   - While adding NUMA affinity support to unbound workqueues, the
     assumption that an unbound workqueue with max_active == 1 is
     ordered was broken.

     The plan was to use explicit alloc_ordered_workqueue() for those
     cases. Unfortunately, I forgot to update the documentation properly
     and we grew a handful of use cases which depend on that assumption.

     While we want to convert them to alloc_ordered_workqueue(), we
     don't really lose anything by enforcing ordered execution on
     unbound max_active == 1 workqueues and it doesn't make sense to
     risk subtle bugs. Restore the assumption.

   - Workqueue assumes that CPU <-> NUMA node mapping remains static.

     This is a general assumption - we don't have any synchronization
     mechanism around CPU <-> node mapping. Unfortunately, powerpc may
     change the mapping dynamically leading to crashes. Michael added a
     workaround so that we at least don't crash while powerpc hotplug
     code gets updated"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Work around edge cases for calc of pool's cpumask
  workqueue: implicit ordered attribute should be overridable
  workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
2017-07-31 13:37:28 -07:00
Linus Torvalds
3dcc4c7d42 Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
 "Dan found a really old bug where libata hotplug code wasn't sanitizing
  index value from userland and may end up indexing with a negative
  number. It is scary but fortunately can only be triggered by root.

  Other than that, minor fixes"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: fix a couple of doc build warnings
  libata: array underflow in ata_find_dev()
  ata: sata_rcar: add gen[23] fallback compatibility strings
  libata: remove unused rc in ata_eh_handle_port_resume
  libata: Cleanup ata_read_log_page()
  ata: fix gemini Kconfig dependencies
2017-07-31 13:33:21 -07:00
Linus Torvalds
e4776b8ccb Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "Two patches addressing build warnings caused by inconsistent kernel
  doc comments"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/wait: Clean up some documentation warnings
  sched/core: Fix some documentation build warnings
2017-07-30 11:54:08 -07:00
Linus Torvalds
06efc7df37 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
 "Fix for a regression caused by the conversion of x86 to the generic
  hotplug code.

  Instead of doing a plain single line revert, this adds a pile of
  comments so the semantics of the force argument are clear"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"
2017-07-30 11:27:33 -07:00
Vidya Sagar Ravipati
1a5f3da20b net: ethtool: add support for forward error correction modes
Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds. Various networking devices
which support 25G/40G/100G provides ability to manage supported FEC
modes and the lack of FEC encoding control and reporting today is a
source for interoperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of
base link codeword.

This patch set intends to provide option under ethtool to manage
and report FEC encoding settings for networking devices as per
IEEE 802.3 bj, bm and by specs.

set-fec/show-fec option(s) are designed to provide control and
report the FEC encoding on the link.

SET FEC option:
root@tor: ethtool --set-fec  swp1 encoding [off | RS | BaseR | auto]

Encoding: Types of encoding
Off    :  Turning off any encoding
RS     :  enforcing RS-FEC encoding on supported speeds
BaseR  :  enforcing Base R encoding on supported speeds
Auto   :  IEEE defaults for the speed/medium combination

Here are a few examples of what we would expect if encoding=auto:
- if autoneg is on, we are  expecting FEC to be negotiated as on or off
  as long as protocol supports it
- if the hardware is capable of detecting the FEC encoding on it's
      receiver it will reconfigure its encoder to match
- in absence of the above, the configuration would be set to IEEE
  defaults.

>From our  understanding , this is essentially what most hardware/driver
combinations are doing today in the absence of a way for users to
control the behavior.

SHOW FEC option:
root@tor: ethtool --show-fec  swp1
FEC parameters for swp1:
Active FEC encodings: RS
Configured FEC encodings:  RS | BaseR

ETHTOOL DEVNAME output modification:

ethtool devname output:
root@tor:~# ethtool swp1
Settings for swp1:
root@hpe-7712-03:~# ethtool swp18
Settings for swp18:
    Supported ports: [ FIBRE ]
    Supported link modes:   40000baseCR4/Full
                            40000baseSR4/Full
                            40000baseLR4/Full
                            100000baseSR4/Full
                            100000baseCR4/Full
                            100000baseLR4_ER4/Full
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Supported FEC modes: [RS | BaseR | None | Not reported]
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Advertised FEC modes: [RS | BaseR | None | Not reported]
<<<< One or more FEC modes
    Speed: 100000Mb/s
    Duplex: Full
    Port: FIBRE
    PHYAD: 106
    Transceiver: internal
    Auto-negotiation: off
    Link detected: yes

This patch includes following changes
a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
  the new get_fecparam/set_fecparam callbacks, provides support
  for configuration of forward error correction modes.
b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
  are defined so that users can configure these fec modes for supported
  and advertising fields as part of link autonegotiation.

Signed-off-by: Vidya Sagar Ravipati <vidya.chowdary@gmail.com>
Signed-off-by: Dustin Byford <dustin@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 23:23:44 -07:00
Linus Torvalds
8155469341 s390: SRCU fix.
PPC: host crash fixes.
 x86: bugfixes, including making nested posted interrupts really work.
 Generic: tweaks to kvm_stat and to uevents
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZe2EYAAoJEL/70l94x66D4JYH/AnvioKWTsplhUKt4Y4JlpJX
 EXYjQd/CIZ+MHNNUH+U+XEj6tKQymKrz4TeZSs1o0nyxCeyparR3gK27OYVpPspN
 GkPSit3hyRgW9r5uXp6pZCJuFCAMpMZ6z4sKbT1FxDhnWnpWayV9w8KA+yQT/UUX
 dNQ9JJPUxApcM4NCaj2OCQ8K1koNIDCc52+jATf0iK/Heiaf6UGqCcHXUIy5I5wM
 OWk05Qm32VBAYb6P6FfoyGdLMNAAkJtr1fyOJDkxX730CYgwpjIP0zifnJ1bt8V2
 YRnjvPO5QciDHbZ8VynwAkKi0ZAd8psjwXh0KbyahPL/2/sA2xCztMH25qweriI=
 =fsfr
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "s390:
   - SRCU fix

  PPC:
   - host crash fixes

  x86:
   - bugfixes, including making nested posted interrupts really work

  Generic:
   - tweaks to kvm_stat and to uevents"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: LAPIC: Fix reentrancy issues with preempt notifiers
  tools/kvm_stat: add '-f help' to get the available event list
  tools/kvm_stat: use variables instead of hard paths in help output
  KVM: nVMX: Fix loss of L2's NMI blocking state
  KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode
  x86: irq: Define a global vector for nested posted interrupts
  KVM: x86: do mask out upper bits of PAE CR3
  KVM: make pid available for uevents without debugfs
  KVM: s390: take srcu lock when getting/setting storage keys
  KVM: VMX: remove unused field
  KVM: PPC: Book3S HV: Fix host crash on changing HPT size
  KVM: PPC: Book3S HV: Enable TM before accessing TM registers
2017-07-28 13:36:56 -07:00
Linus Torvalds
3d9d7405c0 arm64 fixes:
- Ensure we have a guard page after the kernel image in vmalloc
 
 - Fix incorrect prefetch stride in copy_page
 
 - Ensure irqs are disabled in die()
 
 - Fix for event group validation in QCOM L2 PMU driver
 
 - Fix requesting of PMU IRQs on AMD Seattle
 
 - Minor cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJZey1iAAoJELescNyEwWM0w/0H/1RaHFUSoFUIoL+qFD0eGXcp
 hORI0sIHrUlHRONTFYMTyNko7kxELz5aDm6pc87dzBUNoUq3gxhqeEa0zsmwOPsQ
 m4iDa7r9xXT+nBITe2auAg6miEMX7Ym448dDrIyKNcRK+2SyZoFqS0vr8UVqs1P/
 NwdFGgpKHbV4r1Jeoosom+n7VnuyE0vYBKo8TlRks6NvQJoh2duiPkL+AsBgCfBq
 fznck7jIPL4z4kf4Fp/Yz1QsmMhkDSidPmGD/m97Bj4wvEbMwf0u8Dnv1tySK5wx
 NwKeN0Dn7JphtL5c5j+OGiri7gTcswjxHJ9f6d0Ez+2TwnjWFM6JNQ+xdVqFcxc=
 =EpS9
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "I'd been collecting these whilst we debugged a CPU hotplug failure,
  but we ended up diagnosing that one to tglx, who has taken a fix via
  the -tip tree separately.

  We're seeing some NFS issues that we haven't gotten to the bottom of
  yet, and we've uncovered some issues with our backtracing too so there
  might be another fixes pull before we're done.

  Summary:

   - Ensure we have a guard page after the kernel image in vmalloc

   - Fix incorrect prefetch stride in copy_page

   - Ensure irqs are disabled in die()

   - Fix for event group validation in QCOM L2 PMU driver

   - Fix requesting of PMU IRQs on AMD Seattle

   - Minor cleanups and fixes"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mmu: Place guard page after mapping of kernel image
  drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU
  arm64: sysreg: Fix unprotected macro argmuent in write_sysreg
  perf: qcom_l2: fix column exclusion check
  arm64/lib: copy_page: use consistent prefetch stride
  arm64/numa: Drop duplicate message
  perf: Convert to using %pOF instead of full_name
  arm64: Convert to using %pOF instead of full_name
  arm64: traps: disable irq in die()
  arm64: atomics: Remove '&' from '+&' asm constraint in lse atomics
  arm64: uaccess: Remove redundant __force from addr cast in __range_ok
2017-07-28 13:29:36 -07:00
Linus Torvalds
1731a47444 - A few DM integrity fixes that improve performance. One that address
inefficiencies in the on-disk journal device layout.  Another that
   makes use of the block layer's on-stack plugging when writing the
   journal.
 
 - A dm-bufio fix for the blk_status_t conversion that went in during the
   merge window.
 
 - A few DM raid fixes that address correctness when suspending the
   device and a validation fix for validation that occurs during device
   activation.
 
 - A couple DM zoned target fixes.  Important one being the fix to not
   use GFP_KERNEL in the IO path due to concerns about deadlock in
   low-memory conditions (e.g. swap over a DM zoned device, etc).
 
 - A DM DAX device fix to make sure dm_dax_flush() is called if the
   underlying DAX device is operating as a write cache.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZe13OAAoJEMUj8QotnQNav/gIAMXMUbXlYHVikVNq+6rNkXRk
 FlsltNcJEDeZCit0nJd/2nOWGpssXdz+7cJTUU28Kp+3IscIolSHS51bzfSFI05V
 7LbYqEX1EdXkTwEeYfHlAoOexvj4oarpAWWQF/ACU8rHCruaqfqIa57mstxLoyDY
 XcxsIY/fds6GZViLB0MD/jBAKaLWX90aFZ9MQcF7AmdpMr56kCO2PUhiqHcrN47t
 BjH7E5QSKGl2pMND1bR6pleWFw8HB7h82Qjaasd5bQuVWseQ4u9Illxny6bhhk2E
 BiEWjzFvZB+JL1zl7JIXnBjhdmbwgAVvoW6EqHuVzHuR0X8gylBF2gDLnSzUZu4=
 =3MxS
 -----END PGP SIGNATURE-----

Merge tag 'for-4.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - a few DM integrity fixes that improve performance. One that address
   inefficiencies in the on-disk journal device layout. Another that
   makes use of the block layer's on-stack plugging when writing the
   journal.

 - a dm-bufio fix for the blk_status_t conversion that went in during
   the merge window.

 - a few DM raid fixes that address correctness when suspending the
   device and a validation fix for validation that occurs during device
   activation.

 - a couple DM zoned target fixes. Important one being the fix to not
   use GFP_KERNEL in the IO path due to concerns about deadlock in
   low-memory conditions (e.g. swap over a DM zoned device, etc).

 - a DM DAX device fix to make sure dm_dax_flush() is called if the
   underlying DAX device is operating as a write cache.

* tag 'for-4.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm, dax: Make sure dm_dax_flush() is called if device supports it
  dm verity fec: fix GFP flags used with mempool_alloc()
  dm zoned: use GFP_NOIO in I/O path
  dm zoned: remove test for impossible REQ_OP_FLUSH conditions
  dm raid: bump target version
  dm raid: avoid mddev->suspended access
  dm raid: fix activation check in validate_raid_redundancy()
  dm raid: remove WARN_ON() in raid10_md_layout_to_format()
  dm bufio: fix error code in dm_bufio_write_dirty_buffers()
  dm integrity: test for corrupted disk format during table load
  dm integrity: WARN_ON if variables representing journal usage get out of sync
  dm integrity: use plugging when writing the journal
  dm integrity: fix inefficient allocation of journal space
2017-07-28 12:17:17 -07:00
Linus Torvalds
0fa8dc423c Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A small collection of fixes that should go into this series. This
  contains:

   - NVMe pull request from Christoph, with various fixes for nvme
     proper and nvme-fc.

   - disable runtime PM for blk-mq for now.

     With scsi now defaulting to using blk-mq, this reared its head as
     an issue. Longer term we'll fix up runtime PM for blk-mq, for now
     just disable it to prevent a hang on laptop resume for some folks.

   - blk-mq CPU <-> hw queue map fix from Christoph.

   - xen/blkfront pull request from Konrad, with two small fixes for the
     blkfront driver.

   - a few fixups for nbd from Joseph.

   - a stable fix for pblk from Javier"

* 'for-linus' of git://git.kernel.dk/linux-block:
  lightnvm: pblk: advance bio according to lba index
  nvme: validate admin queue before unquiesce
  nbd: clear disconnected on reconnect
  nvme-pci: fix HMB size calculation
  nvme-fc: revise TRADDR parsing
  nvme-fc: address target disconnect race conditions in fcp io submit
  nvme: fabrics commands should use the fctype field for data direction
  nvme: also provide a UUID in the WWID sysfs attribute
  xen/blkfront: always allocate grants first from per-queue persistent grants
  xen-blkfront: fix mq start/stop race
  blk-mq: map queues to all present CPUs
  block: disable runtime-pm for blk-mq
  xen-blkfront: Fix handling of non-supported operations
  nbd: only set sndtimeo if we have a timeout set
  nbd: take tx_lock before disconnecting
  nbd: allow multiple disconnects to be sent
2017-07-28 12:13:34 -07:00
Linus Torvalds
a2d48756ca MMC host:
- sunxi: Correct time phase settings
  - omap_hsmmc: Clean up some dead code
  - dw_mmc: Fix message printed for deprecated num-slots DT binding
  - dw_mmc: Fix DT documentation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZeu0JAAoJEP4mhCVzWIwpECAP/0kf5o3lEneUOqSGE9d1EszW
 GpHgX0hA1+FaQpBUhGAfFCjmNigsH8rwz8dOv17PX8iFyQmal0ivgC7DXJPG2Rq1
 ofC4MwfWzZE0AKRSLButvRyNHqmwZsSM8nlFPwAtsINktJCx/WhSr6OS5pNEdz/j
 1tEGDgLzBiq9Yd3FHf07KPPkMhxut0eI1gXke8pRgFkLQIwU4/8zFb6450w0RIxQ
 BtmqEEK0p3cyZLN/FxpyMG6ZVmypTUiMFX9G0xkcKdsTxGqnYpWvCFuEbEtx6vbU
 5IjjKc2oINMs3z53tRiN/vQaSuZMn1O4dKHydADxP68Pm/ff09+pgvnpTV4D83RV
 /gw9olO1Y//ONCT+p/k2fHhOlLUa4YY2+SUCN7VZAqP5gYEjtH9/doOoWND//WPA
 BhFcZsWBoDva+M3OC2wNnZb5aCERVLuHPl3NhdiOpxyGoEEG1c0MvhegaUI4Rm0K
 hoVyuXqWsbu+3A3H+biELb0VEIlgELkCIRh7mjKSq8oicPN7PtymhAgfGyxcCkn+
 qcvlN0UxcJjYxYTXzKjEaTiCHeel0UB3toCWdfq+L5znHgRMjZoxYM/tUhF8+rts
 wTcS/NkLz4DGJGQavfJvdawddafCbFnoL19KnWd2Wl3RxNOrQ2lnvfBODDuPrVoz
 IJqzwddWO/w6gTDT5Xt6
 =xWHL
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "Here are a couple of mmc fixes intended for v4.13-rc1.

  I have also included a couple of cleanup patches in this pull request
  for OMAP2+, related to the omap_hsmmc driver. The reason is because of
  the changes are also depending on OMAP SoC specific code, so this
  simplifies how to deal with this.

  Summary:

  MMC host:
   - sunxi: Correct time phase settings
   - omap_hsmmc: Clean up some dead code
   - dw_mmc: Fix message printed for deprecated num-slots DT binding
   - dw_mmc: Fix DT documentation"

* tag 'mmc-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  Documentation: dw-mshc: deprecate num-slots
  mmc: dw_mmc: fix the wrong condition check of getting num-slots from DT
  mmc: host: omap_hsmmc: remove unused platform callbacks
  ARM: OMAP2+: hsmmc.c: Remove dead code
  mmc: sunxi: Keep default timing phase settings for new timing mode
2017-07-28 12:04:36 -07:00
Eugenia Emantayev
fa3676885e net/mlx5e: Add field select to MTPPS register
In order to mark relevant fields while setting the MTPPS register
add field select. Otherwise it can cause a misconfiguration in
firmware.

Fixes: ee7f12205a ('net/mlx5e: Implement 1PPS support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-27 16:40:17 +03:00
Eugenia Emantayev
0b794ffae7 net/mlx5: Fix mlx5_ifc_mtpps_reg_bits structure size
Fix miscalculation in reserved_at_1a0 field.

Fixes: ee7f12205a ('net/mlx5e: Implement 1PPS support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-07-27 16:40:17 +03:00
Thomas Gleixner
8397913303 genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"
That commit was part of the changes moving x86 to the generic CPU hotplug
interrupt migration code. The force flag was required on x86 before the
hierarchical irqdomain rework, but invoking set_affinity() with force=true
stayed and had no side effects.

At some point in the past, the force flag got repurposed to support the
exynos timer interrupt affinity setting to a not yet online CPU, so the
interrupt controller callback does not verify the supplied affinity mask
against cpu_online_mask.

Setting the flag in the CPU hotplug code causes the cpu online masking to
be blocked on these irq controllers and results in potentially affining an
interrupt to the CPU which is unplugged, i.e. instead of moving it away,
it's just reassigned to it.

As the force flags is not longer needed on x86, it's safe to revert that
patch so the ARM irqchips which use the force flag work again.

Add comments to that effect, so this won't happen again.

Note: The online mask handling should be done in the generic code and the
force flag and the masking in the irq chips removed all together, but
that's not a change possible for 4.13. 

Fixes: 77f85e66aa ("genirq/cpuhotplug: Set force affinity flag on hotplug migration")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707271217590.3109@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-07-27 15:40:02 +02:00
Will Deacon
a3287c41ff drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU
Since the PMU register interface is banked per CPU, CPU PMU interrrupts
cannot be handled by a CPU other than the one with the PMU asserting the
interrupt. This means that migrating PMU SPIs, as we do during a CPU
hotplug operation doesn't make any sense and can lead to the IRQ being
disabled entirely if we route a spurious IRQ to the new affinity target.

This has been observed in practice on AMD Seattle, where CPUs on the
non-boot cluster appear to take a spurious PMU IRQ when coming online,
which is routed to CPU0 where it cannot be handled.

This patch passes IRQF_PERCPU for PMU SPIs and forcefully sets their
affinity prior to requesting them, ensuring that they cannot
be migrated during hotplug events. This interacts badly with the DB8500
erratum workaround that ping-pongs the interrupt affinity from the handler,
so we avoid passing IRQF_PERCPU in that case by allowing the IRQ flags
to be overridden in the platdata.

Fixes: 3cf7ee98b8 ("drivers/perf: arm_pmu: move irq request/free into probe")
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-07-27 13:43:22 +01:00
Rahul Verma
41822878b2 qed: enhanced per queue max coalesce value.
Maximum coalesce per Rx/Tx queue is extended from
255 to 511.

Signed-off-by: Rahul Verma <rahul.verma@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-27 00:05:22 -07:00