Abstract a general interface "of_phy_get_and_connect"
for PHY connect. User will have no bother with getting
"phy-mode" and "phy-handle" any more.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Reviewed-by: Jiancheng Xue <xuejiancheng@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for hardware offloading of ipmr/ip6mr we need an
interface that allows to check (and later update) the age of entries.
Relying on stats alone can show activity but not actual age of the entry,
furthermore when there're tens of thousands of entries a lot of the
hardware implementations only support "hit" bits which are cleared on
read to denote that the entry was active and shouldn't be aged out,
these can then be naturally translated into age timestamp and will be
compatible with the software forwarding age. Using a lastuse entry doesn't
affect performance because the members in that cache line are written to
along with the age.
Since all new users are encouraged to use ipmr via netlink, this is
exported via the RTA_EXPIRES attribute.
Also do a minor local variable declaration style adjustment - arrange them
longest to shortest.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Shrijeet Mukherjee <shm@cumulusnetworks.com>
CC: Satish Ashok <sashok@cumulusnetworks.com>
CC: Donald Sharp <sharpd@cumulusnetworks.com>
CC: David S. Miller <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work addresses a couple of issues bpf_skb_event_output()
helper currently has: i) We need two copies instead of just a
single one for the skb data when it should be part of a sample.
The data can be non-linear and thus needs to be extracted via
bpf_skb_load_bytes() helper first, and then copied once again
into the ring buffer slot. ii) Since bpf_skb_load_bytes()
currently needs to be used first, the helper needs to see a
constant size on the passed stack buffer to make sure BPF
verifier can do sanity checks on it during verification time.
Thus, just passing skb->len (or any other non-constant value)
wouldn't work, but changing bpf_skb_load_bytes() is also not
the proper solution, since the two copies are generally still
needed. iii) bpf_skb_load_bytes() is just for rather small
buffers like headers, since they need to sit on the limited
BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
this work improves the bpf_skb_event_output() helper to address
all 3 at once.
We can make use of the passed in skb context that we have in
the helper anyway, and use some of the reserved flag bits as
a length argument. The helper will use the new __output_custom()
facility from perf side with bpf_skb_copy() as callback helper
to walk and extract the data. It will pass the data for setup
to bpf_event_output(), which generates and pushes the raw record
with an additional frag part. The linear data used in the first
frag of the record serves as programmatically defined meta data
passed along with the appended sample.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for non-linear data on raw records. It
extends raw records to have one or multiple fragments that will
be written linearly into the ring slot, where each fragment can
optionally have a custom callback handler to walk and extract
complex, possibly non-linear data.
If a callback handler is provided for a fragment, then the new
__output_custom() will be used instead of __output_copy() for
the perf_output_sample() part. perf_prepare_sample() does all
the size calculation only once, so perf_output_sample() doesn't
need to redo the same work anymore, meaning real_size and padding
will be cached in the raw record. The raw record becomes 32 bytes
in size without holes; to not increase it further and to avoid
doing unnecessary recalculations in fast-path, we can reuse
next pointer of the last fragment, idea here is borrowed from
ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
path for PERF_SAMPLE_RAW minimal.
This facility is needed for BPF's event output helper as a first
user that will, in a follow-up, add an additional perf_raw_frag
to its perf_raw_record in order to be able to more efficiently
dump skb context after a linear head meta data related to it.
skbs can be non-linear and thus need a custom output function to
dump buffers. Currently, the skb data needs to be copied twice;
with the help of __output_custom() this work only needs to be
done once. Future users could be things like XDP/BPF programs
that work on different context though and would thus also have
a different callback function.
The few users of raw records are adapted to initialize their frag
data from the raw record itself, no change in behavior for them.
The code is based upon a PoC diff provided by Peter Zijlstra [1].
[1] http://thread.gmane.org/gmane.linux.network/421294
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can't detect the FXEN (fiber mode) bootstrap pin, so configure
it via a boolean device tree property "micrel,fiber-mode".
If it is enabled, auto-negotiation is not supported.
The only available modes are 100base-fx (full duplex and half duplex).
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit utilize the ability of ConnectX-4 to bulk read flow counters.
Few bulk counter queries could be done instead of issuing thousands of
firmware commands per second to get statistics of all flows set to HW,
such as those programmed when we offload tc filters.
Counters are stored sorted by hardware id, and queried in blocks (id +
number of counters).
Due to hardware requirement, start of block and number of counters in a
block must be four aligned.
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to use bulk counters, we need to have counters sorted by id.
Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2016-07-13
Here's our main bluetooth-next pull request for the 4.8 kernel:
- Fixes and cleanups in 802.15.4 and 6LoWPAN code
- Fix out of bounds issue in btmrvl driver
- Fixes to Bluetooth socket recvmsg return values
- Use crypto_cipher_encrypt_one() instead of crypto_skcipher
- Cleanup of Bluetooth connection sysfs interface
- New Authentication failure reson code for Disconnected mgmt event
- New USB IDs for Atheros, Qualcomm and Intel Bluetooth controllers
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAV3zeiPSw1s6N8H32AQI9Iw//RxwhAzb+ovNIizPqcFRt4BIS3Or6unov
Bx9jtq+U1POWZWtlIHYZKDH8ndxMnC56NkEHmpIK3uEiqJ6BoQPpDdfN+hhREysV
Hoa+JOkMgufPBanU/JyKY/4vsmlvLuoOdppN1OA/kx1KECux9xJWIrvFsCUQGeat
nDsdWChHkZAm/GDPZiFvxEBVaxDe2dmnDMBFTst1RsrH2uICSqM4k5srmjc3NPAY
bsTqZeQGTIK1V9MggwBHxHFMvvERlGDpcrpoMRjeTzMmCpCg5endJoSu3hdNjHUO
o5Fi50dhLI5jo84DQiXL0wM4SLND0QQygl+QeU3zlJYtsQsF6WxPnIEGqlGr3+WV
I4wjDc5lxECyQIjCsrCo5ZwJ47Kqmm/ZQ4uGd9JooAVhqhP7/2dhFH0zXywJZzKs
zo+dWTF5Xvde+mlknm1RCTgkdx3msPH9EVkEoO4FOPOAg6lhIMQFFXLXZfGr9oX6
V99t+8t8YhDPTbL4AQnzh/aMHtbpM6be4TYiRRjT6iZLuvWOPW/zpp/1hmyeEkbU
KZNDunC2tH030Fx5toGi3b2i8M5SJdyex9Udg/YsNexpWmyHMS49PoGk9ZnRRPA+
xn9+xIVsqTh+xbiyCOPJqlQMMK9ayF7isT2N8T19qoVJxurdE/tMBdtKrJ5uTJFT
W0n8KV46a+4=
=1ZSd
-----END PGP SIGNATURE-----
Merge tag 'rxrpc-rewrite-20160706' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Improve conn/call lookup and fix call number generation [ver #3]
I've fixed a couple of patch descriptions and excised the patch that
duplicated the connections list for reconsideration at a later date.
For reference, the excised patch is sitting on the rxrpc-experimental
branch of my git tree, based on top of the rxrpc-rewrite branch. Diffing
it against yesterday's tag shows no differences.
Would you prefer the patch set to be emailed afresh instead of a git-pull
request?
David
---
Here's the next part of the AF_RXRPC rewrite. The two main purposes of
this set are to fix the call number handling and to make use of RCU when
looking up the connection or call to pass a received packet to.
Important changes in this set include:
(1) Avoidance of placing stack data into SG lists in rxkad so that kernel
stacks can become vmalloc'd (Herbert Xu).
(2) Calls cease pinning the connection they used as soon as possible,
which allows the connection to be discarded sooner and allows the call
channel on that connection to be reused earlier.
(3) Make each call channel on a connection have a separate and independent
call number space rather than having a shared number space for the
connection. Call numbers should increment monotonically per channel
on the client, and the server should ignore a call with a lower call
number for that channel than the latest it has seen. The RESPONSE
packet sets the minimum values of each call ID counter on a
connection.
(4) Look up calls by indexing the channel array on a connection rather
than by keeping calls in an rbtree on that connection. Also look up
calls using the channel array rather than using a hashtable.
The call hashtable can then be removed.
(5) Call terminal statuses are cached in the channel array for the last
call. It is assumed that if we the server have seen call N, then the
client no longer cares about call N-1 on the same channel.
This will allow retransmission of the terminal status in future
without the need to keep the rxrpc_call struct around.
(6) Peer lookups are moved out of common connection handling code and into
service connection handling code as client connections (a) must point
to a peer before they can be used and (b) are looked up by a
machine-unique connection ID directly, so we only need to look up the
peer first if we're going to deal with a service call.
(7) The reference count on a connection is held elevated by 1 whilst it is
alive (ie. idle unused connections have a refcount of 1). The reaper
will attempt to change the refcount from 1->0 and skip if this cannot
be done, whilst look ups only increment the refcount if it's non-zero.
This makes the implementation of RCU lookups easier as we don't have
to get a ref on the connection or a lock on the connection list to
prevent a connection being reaped whilst we're contemplating queueing
a packet that initiates a new service call upon it.
If we need to get a connection, but there's a dead connection in the
tree, we use rb_replace_node() to replace the dead one with a new one.
(8) Use a seqlock to validate the walk over the service connection rbtree
attached to a peer when it's being walked in RCU mode.
(9) Make the incoming call/connection packet handling code use RCU mode
and locks and make it only take a reference if the call/connection
gets queued on a workqueue.
The intention is that the next set will introduce the connection lifetime
management and capacity limits to prevent clients from overloading the
server.
There are some fixes too:
(1) Verifying that a packet coming in to a client connection came from the
expected source.
(2) Fix handling of connection failure in client call creation where we
don't reinitialise the list linkage block and a second attempt to
unlink the failed connection oopses and also we don't set the state
correctly, which causes an assertion failure.
(3) New service calls were being added to the socket's accept queue under
the wrong lock.
Changes:
(V2) In rxrpc_find_service_conn_rcu() initialised the sequence number to 0.
Fixed the RCU handling in conn_service.c by introducing and using
rb_replace_node_rcu() as an RCU-safe alternative in
rxrpc_publish_service_conn().
Modified and used rcu_dereference_raw() to avoid RCU sparse warnings
in rxrpc_find_service_conn_rcu().
Added in some missing RCU dereference wrappers. It seems to be
necessary to turn on CONFIG_PROVE_RCU_REPEATEDLY as well as
CONFIG_SPARSE_RCU_POINTER to get the static __rcu annotation checking
to happen.
Fixed some other sparse warnings, including a missing ntohs() in
jumbo packet processing.
(V3) Fixed some commit descriptions.
Excised the patch that duplicated the connection list to separate out
the procfs list for reconsideration at a later date.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some devices running OpenWrt need this and it makes sense to add this
to ath9k_platform_data as the next patches will add a devicetree
(boolean) property for it as well.
Suggested-by: Vittorio Gambaletta <openwrt@vittgam.net>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This patch adds ieee802154_skb_src_pan function to get the pointer
address of the source pan id at skb mac pointer.
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds ieee802154_skb_dst_pan function to get the pointer
address of the destination pan id at skb mac pointer.
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
* beacon report (for radio measurement) support in cfg80211/mac80211
* hwsim: allow wmediumd in namespaces
* mac80211: extend 160MHz workaround to CSA IEs
* mesh: properly encrypt group-addressed privacy action frames
* mesh: allow setting peer AID
* first steps for MU-MIMO monitor mode
* along with various other cleanups and improvements
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJXfSCuAAoJEGt7eEactAAdaugP/ilrcELQRsIN5ZXCAKYZuwXV
T01JPwgOaWL9ILu7h1SfG/+j9kzMnyk4WmRpeoj2FGNcyfG2AvULWSLpQJ2abwgQ
8o/emuLinQwRENevaMUTRSOE0HkXoFPCbbq37+a2i6bAv1QSYY3A0xvWpcU5fZ4D
7CKYDYPBAdMXYwEwy1g4nYWfDAYqS4rthr3l3rS1Cy7Q2T1ZlMlD90GjD7oeQAEw
orKulhkkDSzvxfvZCYTzXmUoBQE8sNXGDD+OFsJyowyt+ugM/xan+2tmhCaHSnda
HpdCS2aRj779UBn9cOfELjffTNpS++PM6KFd8ZDaPcJSMginn/BAHTOeNfNUfL0Q
+Enu59I82qMDzbG2z1Qezzjv7OTzyEvyvYzNbLOqljTBSBklDa3rHwhyk+g1oVBH
+4xX1Vk5QBLde+Q0NS0gTkcqOQK8KT5+HEqiUfgLSNDETN0lSGsKbtvMfU/ikE1Y
aLRkTp7nzUd03qjIFLS6RMf7JjucWWzH1ZXTHvbpDFAG7riOhYRD3Sw+0e7madTd
+HXjH9dnOGnGPDL+FyDwtW6iclYwNjcIPQiNdOjwWfMA2Wmr7iq+aFUptRCwQTHB
WtgJ3f8OHax2JXcm4grYfxELZip5vbWJJHUC84Drvmzw3X7FRITf+OEWjdNOzsRD
Fc7w5ceThh9Id1BvcH2+
=R1qP
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2016-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
One more set of new features:
* beacon report (for radio measurement) support in cfg80211/mac80211
* hwsim: allow wmediumd in namespaces
* mac80211: extend 160MHz workaround to CSA IEs
* mesh: properly encrypt group-addressed privacy action frames
* mesh: allow setting peer AID
* first steps for MU-MIMO monitor mode
* along with various other cleanups and improvements
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/en.h
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/usb/r8152.c
All three conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) All users of AF_PACKET's fanout feature want a symmetric packet
header hash for load balancing purposes, so give it to them.
2) Fix vlan state synchronization in e1000e, from Jarod Wilson.
3) Use correct socket pointer in ip_skb_dst_mtu(), from Shmulik
Ladkani.
4) mlx5 bug fixes from Mohamad Haj Yahia, Daniel Jurgens, Matthew
Finlay, Rana Shahout, and Shaker Daibes. Mostly to do with
operation timeouts and PCI error handling.
5) Fix checksum handling in mirred packet action, from WANG Cong.
6) Set skb->dev correctly when transmitting in !protect_frames case of
macsec driver, from Daniel Borkmann.
7) Fix MTU calculation in geneve driver, from Haishuang Yan.
8) Missing netif_napi_del() in unregister path of qeth driver, from
Ursula Braun.
9) Handle malformed route netlink messages in decnet properly, from
Vergard Nossum.
10) Memory leak of percpu data in ipv6 routing code, from Martin KaFai
Lau.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
ipv6: Fix mem leak in rt6i_pcpu
net: fix decnet rtnexthop parsing
cxgb4: update latest firmware version supported
net/mlx5: Avoid setting unused var when modifying vport node GUID
bonding: fix enslavement slave link notifications
r8152: fix runtime function for RTL8152
qeth: delete napi struct when removing a qeth device
Revert "fsl/fman: fix error handling"
fsl/fman: fix error handling
cdc_ncm: workaround for EM7455 "silent" data interface
RDS: fix rds_tcp_init() error path
geneve: fix max_mtu setting
net: phy: dp83867: Fix initialization of PHYCR register
enc28j60: Fix race condition in enc28j60 driver
net: stmmac: Fix null-function call in ISR on stmmac1000
tipc: fix nl compat regression for link statistics
net: bcmsysport: Device stats are unsigned long
macsec: set actual real device for xmit when !protect_frames
net_sched: fix mirrored packets checksum
packet: Use symmetric hash for PACKET_FANOUT_HASH.
...
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next,
they are:
1) Don't use userspace datatypes in bridge netfilter code, from
Tobin Harding.
2) Iterate only once over the expectation table when removing the
helper module, instead of once per-netns, from Florian Westphal.
3) Extra sanitization in xt_hook_ops_alloc() to return error in case
we ever pass zero hooks, xt_hook_ops_alloc():
4) Handle NFPROTO_INET from the logging core infrastructure, from
Liping Zhang.
5) Autoload loggers when TRACE target is used from rules, this doesn't
change the behaviour in case the user already selected nfnetlink_log
as preferred way to print tracing logs, also from Liping Zhang.
6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
by cache lines, increases the size of entries in 11% per entry.
From Florian Westphal.
7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.
8) Remove useless defensive check in nf_logger_find_get() from Shivani
Bhardwaj.
9) Remove zone extension as place it in the conntrack object, this is
always include in the hashing and we expect more intensive use of
zones since containers are in place. Also from Florian Westphal.
10) Owner match now works from any namespace, from Eric Bierdeman.
11) Make sure we only reply with TCP reset to TCP traffic from
nf_reject_ipv4, patch from Liping Zhang.
12) Introduce --nflog-size to indicate amount of network packet bytes
that are copied to userspace via log message, from Vishwanath Pai.
This obsoletes --nflog-range that has never worked, it was designed
to achieve this but it has never worked.
13) Introduce generic macros for nf_tables object generation masks.
14) Use generation mask in table, chain and set objects in nf_tables.
This allows fixes interferences with ongoing preparation phase of
the commit protocol and object listings going on at the same time.
This update is introduced in three patches, one per object.
15) Check if the object is active in the next generation for element
deactivation in the rbtree implementation, given that deactivation
happens from the commit phase path we have to observe the future
status of the object.
16) Support for deletion of just added elements in the hash set type.
17) Allow to resize hashtable from /proc entry, not only from the
obscure /sys entry that maps to the module parameter, from Florian
Westphal.
18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
anymore since we tear down the ruleset whenever the netdevice
goes away.
19) Support for matching inverted set lookups, from Arturo Borrero.
20) Simplify the iptables_mangle_hook() by removing a superfluous
extra branch.
21) Introduce ether_addr_equal_masked() and use it from the netfilter
codebase, from Joe Perches.
22) Remove references to "Use netfilter MARK value as routing key"
from the Netfilter Kconfig description given that this toggle
doesn't exists already for 10 years, from Moritz Sichert.
23) Introduce generic NF_INVF() and use it from the xtables codebase,
from Joe Perches.
24) Setting logger to NONE via /proc was not working unless explicit
nul-termination was included in the string. This fixes seems to
leave the former behaviour there, so we don't break backward.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Data structures that are used both with and without RCU protection
are difficult to write in a sparse-clean manner. If you mark the
relevant pointers with __rcu, sparse will complain about all non-RCU
uses, but if you don't mark those pointers, sparse will complain about
all RCU uses.
This commit therefore suppresses sparse warnings for rcu_dereference_raw(),
allowing mixed-protection data structures to avoid these warnings.
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Implement an RCU-safe variant of rb_replace_node() and rearrange
rb_replace_node() to do things in the same order.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
L2 upper device needs to propagate neigh_construct/destroy calls down to
lower devices. Do this by defining default ndo functions and use them in
team, bond, bridge and vlan.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the following patch will allow upper devices to follow the call down
lower devices, we need to add dev here and not rely on n->dev.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement etrhtool set_rxnfc callback to support ethtool flow spec
direct steering. This patch adds only the support of ether flow type
spec. L3/L4 flow specs support will be added in downstream patches.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having all steering private name spaces and
steering module fields flat in mlx5_core_priv, we wrap
them in mlx5_flow_steering for better modularity and
API exposure.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the set of arguments passed to mlx5_add_flow_rule
by introducing flow_spec structure.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add functions that iterate over lower devices and find port device.
As a dependency add netdev_for_each_all_lower_dev and
netdev_for_each_all_lower_dev_rcu macro with
netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers.
Also, add functions to return mlxsw struct according to lower device
found and mlxsw_port struct with a reference to lower device.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter uses multiple FWINV #defines with identical form that hide a
specific structure variable and dereference it with a invflags member.
$ git grep "#define FWINV"
include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg))
net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg))
net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg)))
net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg)))
net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg)))
net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg)))
Consolidate these macros into a single NF_INVF macro.
Miscellanea:
o Neaten the alignment around these uses
o A few lines are > 80 columns for intelligibility
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
To allow creating more than one netdev over the same PCI function, we
change the driver such that global NIC resources are created once and
later be shared amongst all the mlx5e netdevs running over that port.
Move the CQ UAR, PD (pdn), Transport Domain (tdn), MKey resources from
being kept in the mlx5e priv part to a new resources structure
(mlx5e_resources) placed under the mlx5_core device.
This patch doesn't add any new functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new namespace (MLX5_FLOW_NAMESPACE_OFFLOADS) to be populated
with flow steering rules that deal with rules that have have to
be executed before the EN NIC steering rules are matched.
The namespace is located after the bypass name-space and before the
kernel name-space. Therefore, it precedes the HW processing done for
rules set for the kernel NIC name-space.
Under SRIOV, it would allow us to match on e-switch missed packet
and forward them to the relevant VF representor TIR.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper function to get a cgroup2 from a fd. It will be
stored in a bpf array (BPF_MAP_TYPE_CGROUP_ARRAY) which will
be introduced in the later patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to commit 9b368814b3 ("net: fix bridge multicast packet checksum validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.
The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.
But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.
Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports. This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.
Reported-by: Eric Leblond <eric@regit.org>
Tested-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since bpf_prog_get() and program type check is used in a couple of places,
refactor this into a small helper function that we can make use of. Since
the non RO prog->aux part is not used in performance critical paths and a
program destruction via RCU is rather very unlikley when doing the put, we
shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
check, but actually not taking the ref at all (due to being in fdget() /
fdput() section of the bpf fd) is even cleaner and makes the diff smaller
as well, so just go for that. Callsites are changed to make use of the new
helper where possible.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jann Horn reported following analysis that could potentially result
in a very hard to trigger (if not impossible) UAF race, to quote his
event timeline:
- Set up a process with threads T1, T2 and T3
- Let T1 set up a socket filter F1 that invokes another filter F2
through a BPF map [tail call]
- Let T1 trigger the socket filter via a unix domain socket write,
don't wait for completion
- Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
- Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
- Let T3 close the file descriptor for F2, dropping the reference
count of F2 to 2
- At this point, T1 should have looked up F2 from the map, but not
finished executing it
- Let T3 remove F2 from the BPF map, dropping the reference count of
F2 to 1
- Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
via schedule_work()
- At this point, the BPF program could be freed
- BPF execution is still running in a freed BPF program
While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
event fd we're doing the syscall on doesn't disappear from underneath us
for whole syscall time, it may not be the case for the bpf fd used as
an argument only after we did the put. It needs to be a valid fd pointing
to a BPF program at the time of the call to make the bpf_prog_get() and
while T2 gets preempted, F2 must have dropped reference to 1 on the other
CPU. The fput() from the close() in T3 should also add additionally delay
to the reference drop via exit_task_work() when bpf_prog_release() gets
called as well as scheduling bpf_prog_free_deferred().
That said, it makes nevertheless sense to move the BPF prog destruction
generally after RCU grace period to guarantee that such scenario above,
but also others as recently fixed in ceb5607035 ("bpf, perf: delay release
of BPF prog after grace period") with regards to tail calls won't happen.
Integrating bpf_prog_free_deferred() directly into the RCU callback is
not allowed since the invocation might happen from either softirq or
process context, so we're not permitted to block. Reviewing all bpf_prog_put()
invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
their destruction) with call_rcu() look good to me.
Since we don't know whether at the time of attaching the program, we're
already part of a tail call map, we need to use RCU variant. However, due
to this, there won't be severely more stress on the RCU callback queue:
situations with above bpf_prog_get() and bpf_prog_put() combo in practice
normally won't lead to releases, but even if they would, enough effort/
cycles have to be put into loading a BPF program into the kernel already.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are a number of small USB and PHY driver fixes for 4.7-rc6.
Nothing major here, all are described in the shortlog below. All have
been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iFYEABECABYFAld2ljcPHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspN+sAn28Z
+GbDzyY5XwZ7Qs5/5uPYoGONAKCObDpgZr1A+/EMvDd57gfHWmFqTw==
=uYl3
-----END PGP SIGNATURE-----
Merge tag 'usb-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB and PHY fixes from Greg KH:
"Here are a number of small USB and PHY driver fixes for 4.7-rc6.
Nothing major here, all are described in the shortlog below. All have
been in linux-next with no reported issues"
* tag 'usb-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: don't free bandwidth_mutex too early
USB: EHCI: declare hostpc register as zero-length array
phy-sun4i-usb: Fix irq free conditions to match request conditions
phy: bcm-ns-usb2: checking the wrong variable
phy-sun4i-usb: fix missing __iomem *
phy: phy-sun4i-usb: Fix optional gpios failing probe
phy: rockchip-dp: fix return value check in rockchip_dp_phy_probe()
phy: rcar-gen3-usb2: fix unexpected repeat interrupts of VBUS change
usb: common: otg-fsm: add license to usb-otg-fsm
There are code duplications of a masked ethernet address comparison here
so make it a separate function instead.
Miscellanea:
o Neaten alignment of FWINV macro uses to make it clearer for the reader
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The current implementation does not handle timeout in case of command
with callback request, and this can lead to deadlock if the command
doesn't get fw response.
Add delayed callback timeout work before posting the command to fw.
In case of real fw command completion we will cancel the delayed work.
In case of fw command timeout the callback timeout handler will be
called and it will simulate fw completion with timeout error.
Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to queue tx packets in sk_receive_queue, this is less
efficient since it requires spinlocks to synchronize between producer
and consumer.
This patch tries to address this by:
- switch from sk_receive_queue to a skb_array, and resize it when
tx_queue_len was changed.
- introduce a new proto_ops peek_len which was used for peeking the
skb length.
- implement a tun version of peek_len for vhost_net to use and convert
vhost_net to use peek_len if possible.
Pktgen test shows about 15.3% improvement on guest receiving pps for small
buffers:
Before: ~1300000pps
After : ~1500000pps
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this
will be triggered when tx_queue_len. It could be used by net device
who want to do some processing at that time. An example is tun who may
want to resize tx array when tx_queue_len is changed.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes, we need support resizing multiple queues at once. This is
because it was not easy to recover to recover from a partial failure
of multiple queues resizing.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes, we need zero length ring. But current code will crash since
we don't do any check before accessing the ring. This patch fixes this.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One more fix for some fallout observed after the introduction of the
atomic API.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXdODRAAoJEN0jrNd/PrOhGF0P/Ay5Vmf5oypPjY/2L/SE5gX4
RWDt1/UA4HqhuWU56DmJbYQVPl8f0rrZF2Dnp/xrNudm4H6ivYHUXQ2HxYykmTQJ
DU1ciqj5oeTpXfe+fqEC4jnrti3w3rjIGmiOtFT3iDrlqsPyNlk/ljoPCx3PIZ7C
J3Ip9uhe4dRR1uFrM6UN/dqBT7ipyOA2L1TM0DJoDMc0BC1i+mIiLyHagOoR6C6k
DuIuJvZIH/bGF3xd/U2bNxDHcXdiMhCRwnpot9XOUgh1yEwkuwJyTFS24ElYn8Ce
AcWkZUAbRI8MZxEGtW6EV9Pni9ueB8QUFdkttuuRrzkfzi/wYSG7YSetgwYSfWE9
InVbWtmb2MIEjoAcgk8hRtNeCTLchvBVZRywf5H15a5KcPW0kblm0H87tOcbdy5v
qncu+DNu7cj4aTLF670toczJRzA09yHHusQDJzwcLAmL8Yj1ktiXJLwLcgOveNDf
522uuS2DtEm0XLkvQ7L7dc1xD0dmM+exelFmwxrh6jrGHzQCqjj2MDpvENffFFYw
+pVYpbUXQhtwd1T+IeIDazpelhmELT2d17KiaCgFuUi5OzOo/bhHG0vrs7+0qs8x
gyfYY++rdvRq8OTYplminUohsCHasJthO/0UTW/a/vYwf1JxbJkAaISYAMbcAOe+
PHuI115cjKJr59iuPRyg
=cXTI
-----END PGP SIGNATURE-----
Merge tag 'pwm/for-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm fixes from Thierry Reding:
"One more fix for some fallout observed after the introduction of the
atomic API"
* tag 'pwm/for-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: Fix pwm_apply_args()
- Use new reset_*_get_shared() variant to prevent reset line obtainment failure
- Fixes: 0b52297 ("reset: Add support for shared reset controls")
- Fix unintentional switch() fall-through into error path
- Fix uninitialised variable compiler warning
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXdMAeAAoJEFGvii+H/Hdh2JsQAJ5ez/+WlfVY5enmZTCPTfjv
YAZOhGuKyK1RcNKVmhRTXKDDlEQtbzX01AG73UOsMuR6Gjfq4q+lsK88ybE6Y3MS
YuIBa0TXicyywbLhAh4bCqbImFQZ2MhU07mJWn5n6qX1LwrxncKN5MdRj2DKzldv
s9AG1EKUKf/83k6kuODaP8dXutvWPEGKYKXnq9MQPBliU2j7DT7mfEp3jLGlJvXy
UkW2uq1RAhu27fe7TKVefatXLIyqo74fi3fuHVZpWfPsoUTFlQd3RzeFL7Q/orWH
JqiUk55xAbsmtkEeKNbpph1gPyR1pzKKHiNvSfRt/yIjsxOVPGZQZA2y4OpIcOKV
mbxPUrurhcdI+sK2+7JpcbjKYJpCsGgCXtwbLubLcgler+RatmFWOB8+ayH32eVC
DELYFtc80YBaDS+2KrgY4la1Mk1agbfDpyhq4Vp5uiRZFhQyldd1XrPPQssMJNc+
Zme+LRu+EQiullbHDRb6cb0dnV2Tcek4IjI1CsNA6UMAMLcVb8IZ1gX39vWr+BNt
ZWxxoBOcskPJ/t3h4ddPbuMzSA0ctBkJct9bdjg+bpZeA0aJ46lTQCJk4fZfjViu
YcDgO8/ZVaDLz5f4f44xJ+Hwnzxr2fezBUirD3UtPvs57BSVnFbHMBXUdy0NCBHk
mZi1lFOjy7eKuW1+kbrx
=mo7m
-----END PGP SIGNATURE-----
Merge tag 'mfd-fixes-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD fixes from Lee Jones:
"Contained are some standard fixes and unusually an extension to the
Reset API. Some of those changes are required to fix a bug introduced
in -rc1, which introduces extra 'reset line checks' i.e. whether the
line is shared or not. If a line is shared and the new *_shared() API
is not used, the request fails with an error. This breaks USB in v4.7
for ST's platforms.
Admittedly, there are some patches contained in our (MFD/Reset)
immutable branch which are not true -fixes, but there isn't anything I
can do about that. Rest assured though, there aren't any API
'changes'. Everything is the same from the consumer's perspective.
- Use new reset_*_get_shared() variant to prevent reset line
obtainment failure (Fixes commit 0b52297f22: "reset: Add support
for shared reset controls")
- Fix unintentional switch() fall-through into error path
- Fix uninitialised variable compiler warning"
* tag 'mfd-fixes-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: da9053: Fix compiler warning message for uninitialised variable
mfd: max77620: Fix FPS switch statements
phy: phy-stih407-usb: Inform the reset framework that our reset line may be shared
usb: dwc3: st: Inform the reset framework that our reset line may be shared
usb: host: ehci-st: Inform the reset framework that our reset line may be shared
usb: host: ohci-st: Inform the reset framework that our reset line may be shared
reset: TRIVIAL: Add line break at same place for similar APIs
reset: Supply *_shared variant calls when using *_optional APIs
reset: Supply *_shared variant calls when using of_* API
reset: Ensure drivers are explicit when requesting reset lines
reset: Reorder inline reset_control_get*() wrappers
Previously, the action frames to group address was not encrypted. But
[1] "Table 8-38 Category values" indicates "Mesh" and "Multihop" category
action frames should be encrypted (Group addressed privacy == yes). And the
encyption key should be MGTK ([1] 10.13 Group addressed robust management frame
procedures). So this patch modifies the code to make it suitable for spec.
[1] IEEE Std 802.11-2012
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Several cases of overlapping changes, except the packet scheduler
conflicts which deal with the addition of the free list parameter
to qdisc_enqueue().
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix compiler warning caused by an uninitialised variable inside
da9052_group_write() function. Defaulting the value to zero covers
the trivial case.
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Pull audit fixes from Paul Moore:
"Two small patches to fix audit problems in 4.7-rcX: the first fixes a
potential kref leak, the second removes some header file noise.
The first is an important bug fix that really should go in before 4.7
is released, the second is not critical, but falls into the very-nice-
to-have category so I'm including in the pull request.
Both patches are straightforward, self-contained, and pass our
testsuite without problem"
* 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit:
audit: move audit_get_tty to reduce scope and kabi changes
audit: move calcs after alloc and check when logging set loginuid