Commit graph

36499 commits

Author SHA1 Message Date
William Manley
cab70040df net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3
If an IGMP join packet is lost you will not receive data sent to the
multicast group so if no data arrives from that multicast group in a
period of time after the IGMP join a second IGMP join will be sent.  The
delay between joins is the "IGMP Unsolicited Report Interval".

Previously this value was hard coded to be chosen randomly between 0-10s.
This can be too long for some use-cases, such as IPTV as it can cause
channel change to be slow in the presence of packet loss.

The value 10s has come from IGMPv2 RFC2236, which was reduced to 1s in
IGMPv3 RFC3376.  This patch makes the kernel use the 1s value from the
later RFC if we are operating in IGMPv3 mode.  IGMPv2 behaviour is
unaffected.

Tested with Wireshark and a simple program to join a (non-existent)
multicast group.  The distribution of timings for the second join differ
based upon setting /proc/sys/net/ipv4/conf/eth0/force_igmp_version.

Signed-off-by: William Manley <william.manley@youview.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:27:46 -07:00
Eric Dumazet
e11aada32b net: flow_dissector: add 802.1ad support
Same behavior than 802.1q : finds the encapsulated protocol and
skip 32bit header.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:06:23 -07:00
Timo Teräs
77a482bdb2 ip_gre: fix ipgre_header to return correct offset
Fix ipgre_header() (header_ops->create) to return the correct
amount of bytes pushed. Most callers of dev_hard_header() seem
to care only if it was success, but af_packet.c uses it as
offset to the skb to copy from userspace only once. In practice
this fixes packet socket sendto()/sendmsg() to gre tunnels.

Regression introduced in c544193214
("GRE: Refactor GRE tunneling code.")

Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09 11:05:24 -07:00
Simon Wunderlich
ab1e8ad3b4 mac80211: fix ieee80211_sta_process_chanswitch for 5/10 MHz channels
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-09 15:16:39 +02:00
Chun-Yeow Yeoh
0670307992 mac80211: allow lowest basic rate for unicast management for mesh
Allow lowest basic rate to be used for unicast management frame in
mesh. Otherwise, the lowest supported rate is used for unicast
management frame, such as 1Mbps for 2.4GHz and 6Mbps for 5GHz. Rename
the rc_send_low_broadcast to re_send_low_basicrate since now it is
also applied to unicast management frame in mesh.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-09 15:11:27 +02:00
Florian Westphal
c655bc6896 netfilter: nf_conntrack: don't send destroy events from iterator
Let nf_ct_delete handle delivery of the DESTROY event.

Based on earlier patch from Pablo Neira.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-09 12:03:33 +02:00
Eric Dumazet
1f07d03e20 net: add SNMP counters tracking incoming ECN bits
With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP
counters do not count the number of frames, but number of aggregated
segments.

Its probably too late to change this now.

This patch adds four new counters, tracking number of frames, regardless
of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6.

Ip[6]NoECTPkts : Number of packets received with NOECT
Ip[6]ECT1Pkts  : Number of packets received with ECT(1)
Ip[6]ECT0Pkts  : Number of packets received with ECT(0)
Ip[6]CEPkts    : Number of packets received with Congestion Experienced

lph37:~# nstat | egrep "Pkts|InReceive"
IpInReceives                    1634137            0.0
Ip6InReceives                   3714107            0.0
Ip6InNoECTPkts                  19205              0.0
Ip6InECT0Pkts                   52651828           0.0
IpExtInNoECTPkts                33630              0.0
IpExtInECT0Pkts                 15581379           0.0
IpExtInCEPkts                   6                  0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-08 22:24:59 -07:00
Tejun Heo
d99c8727e7 cgroup: make cgroup_taskset deal with cgroup_subsys_state instead of cgroup
cgroup is in the process of converting to css (cgroup_subsys_state)
from cgroup as the principal subsystem interface handle.  This is
mostly to prepare for the unified hierarchy support where css's will
be created and destroyed dynamically but also helps cleaning up
subsystem implementations as css is usually what they are interested
in anyway.

cgroup_taskset which is used by the subsystem attach methods is the
last cgroup subsystem API which isn't using css as the handle.  Update
cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and
cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp.

The conversions are pretty mechanical.  One exception is
cpuset::cgroup_cs(), which lost its last user and got removed.

This patch shouldn't introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-08 20:11:27 -04:00
Tejun Heo
182446d087 cgroup: pass around cgroup_subsys_state instead of cgroup in file methods
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup.
Please see the previous commit which converts the subsystem methods
for rationale.

This patch converts all cftype file operations to take @css instead of
@cgroup.  cftypes for the cgroup core files don't have their subsytem
pointer set.  These will automatically use the dummy_css added by the
previous patch and can be converted the same way.

Most subsystem conversions are straight forwards but there are some
interesting ones.

* freezer: update_if_frozen() is also converted to take @css instead
  of @cgroup for consistency.  This will make the code look simpler
  too once iterators are converted to use css.

* memory/vmpressure: mem_cgroup_from_css() needs to be exported to
  vmpressure while mem_cgroup_from_cont() can be made static.
  Updated accordingly.

* cpu: cgroup_tg() doesn't have any user left.  Removed.

* cpuacct: cgroup_ca() doesn't have any user left.  Removed.

* hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
  Removed.

* net_cls: cgrp_cls_state() doesn't have any user left.  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-08 20:11:24 -04:00
Tejun Heo
eb95419b02 cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup *
in subsystem implementations for the following reasons.

* With unified hierarchy, subsystems will be dynamically bound and
  unbound from cgroups and thus css's (cgroup_subsys_state) may be
  created and destroyed dynamically over the lifetime of a cgroup,
  which is different from the current state where all css's are
  allocated and destroyed together with the associated cgroup.  This
  in turn means that cgroup_css() should be synchronized and may
  return NULL, making it more cumbersome to use.

* Differing levels of per-subsystem granularity in the unified
  hierarchy means that the task and descendant iterators should behave
  differently depending on the specific subsystem the iteration is
  being performed for.

* In majority of the cases, subsystems only care about its part in the
  cgroup hierarchy - ie. the hierarchy of css's.  Subsystem methods
  often obtain the matching css pointer from the cgroup and don't
  bother with the cgroup pointer itself.  Passing around css fits
  much better.

This patch converts all cgroup_subsys methods to take @css instead of
@cgroup.  The conversions are mostly straight-forward.  A few
noteworthy changes are

* ->css_alloc() now takes css of the parent cgroup rather than the
  pointer to the new cgroup as the css for the new cgroup doesn't
  exist yet.  Knowing the parent css is enough for all the existing
  subsystems.

* In kernel/cgroup.c::offline_css(), unnecessary open coded css
  dereference is replaced with local variable access.

This patch shouldn't cause any behavior differences.

v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
    with local variable @css as suggested by Li Zefan.

    Rebased on top of new for-3.12 which includes for-3.11-fixes so
    that ->css_free() invocation added by da0a12caff ("cgroup: fix a
    leak when percpu_ref_init() fails") is converted too.  Suggested
    by Li Zefan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-08 20:11:23 -04:00
Tejun Heo
6387698699 cgroup: add css_parent()
Currently, controllers have to explicitly follow the cgroup hierarchy
to find the parent of a given css.  cgroup is moving towards using
cgroup_subsys_state as the main controller interface construct, so
let's provide a way to climb the hierarchy using just csses.

This patch implements css_parent() which, given a css, returns its
parent.  The function is guarnateed to valid non-NULL parent css as
long as the target css is not at the top of the hierarchy.

freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
are converted to use css_parent() instead of accessing cgroup->parent
directly.

* __parent_ca() is dropped from cpuacct and its usage is replaced with
  parent_ca().  The only difference between the two was NULL test on
  cgroup->parent which is now embedded in css_parent() making the
  distinction moot.  Note that eventually a css->parent field will be
  added to css and the NULL check in css_parent() will go away.

This patch shouldn't cause any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08 20:11:23 -04:00
Tejun Heo
a7c6d554aa cgroup: add/update accessors which obtain subsys specific data from css
css (cgroup_subsys_state) is usually embedded in a subsys specific
data structure.  Subsystems either use container_of() directly to cast
from css to such data structure or has an accessor function wrapping
such cast.  As cgroup as whole is moving towards using css as the main
interface handle, add and update such accessors to ease dealing with
css's.

All accessors explicitly handle NULL input and return NULL in those
cases.  While this looks like an extra branch in the code, as all
controllers specific data structures have css as the first field, the
casting doesn't involve any offsetting and the compiler can trivially
optimize out the branch.

* blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
  accessor.  Added.

* memory, hugetlb and devices already had one but didn't explicitly
  handle NULL input.  Updated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08 20:11:23 -04:00
Tejun Heo
6d37b97428 netprio_cgroup: pass around @css instead of @cgroup and kill struct cgroup_netprio_state
cgroup controller API will be converted to primarily use struct
cgroup_subsys_state instead of struct cgroup.  In preparation, make
the internal functions of netprio_cgroup pass around @css instead of
@cgrp.

While at it, kill struct cgroup_netprio_state which only contained
struct cgroup_subsys_state without serving any purpose.  All functions
are converted to deal with @css directly.

This patch shouldn't cause any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David S. Miller <davem@davemloft.net>
2013-08-08 20:11:22 -04:00
Tejun Heo
8af01f56a0 cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/
The names of the two struct cgroup_subsys_state accessors -
cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
The former clashes with the type name and the latter doesn't even
indicate it's somehow related to cgroup.

We're about to revamp large portion of cgroup API, so, let's rename
them so that they're less awkward.  Most per-controller usages of the
accessors are localized in accessor wrappers and given the amount of
scheduled changes, this isn't gonna add any noticeable headache.

Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
to task_css().  This patch is pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08 20:11:22 -04:00
John W. Linville
1826ff2357 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-08-08 13:12:42 -04:00
Hannes Frederic Sowa
3e3be27585 ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match
In case a subtree did not match we currently stop backtracking and return
NULL (root table from fib_lookup). This could yield in invalid routing
table lookups when using subtrees.

Instead continue to backtrack until a valid subtree or node is found
and return this match.

Also remove unneeded NULL check.

Reported-by: Teco Boot <teco@inf-net.nl>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David Lamparter <equinox@diac24.net>
Cc: <boutier@pps.univ-paris-diderot.fr>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 17:17:19 -07:00
David S. Miller
09effa67a1 packet: Revert recent header parsing changes.
This reverts commits:

0f75b09c79
cbd89acb9e
c483e02614

Amongst other things, it's modifies the SKB header
to pull the ethernet headers off via eth_type_trans()
on the output path which is bogus.

It's causing serious regressions for people.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 17:11:00 -07:00
Jason Wang
3d9953a2ef net: use skb_copy_datagram_from_iovec() in zerocopy_sg_from_iovec()
Use skb_copy_datagram_from_iovec() to avoid code duplication and make it easy to
be read. Also we can do the skipping inside the zero-copy loop.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:38 -07:00
Jason Wang
0433547aa7 net: use release_pages() in zerocopy_sg_from_iovec()
To reduce the duplicated codes.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:38 -07:00
Jason Wang
234a426708 net: remove the useless comment in zerocopy_sg_from_iovec()
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:37 -07:00
Jason Wang
0a57ec62df net: use skb_fill_page_desc() in zerocopy_sg_from_iovec()
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:35 -07:00
Jason Wang
c3bdeb5c7c net: move zerocopy_sg_from_iovec() to net/core/datagram.c
To let it be reused and reduce code duplication. Also document this function.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:34 -07:00
Jason Wang
b4bf07771f net: move iov_pages() to net/core/iovec.c
To let it be reused and reduce code duplication.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:52:33 -07:00
stephen hemminger
6261d983f2 ip_tunnel: embed hash list head
The IP tunnel hash heads can be embedded in the per-net structure
since it is a fixed size. Reduce the size so that the total structure
fits in a page size. The original size was overly large, even NETDEV_HASHBITS
is only 8 bits!

Also, add some white space for readability.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 16:47:52 -07:00
Trond Myklebust
786615bc1c SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister
If rpcbind causes our connection to the AF_LOCAL socket to close after
we've registered a service, then we want to be careful about reconnecting
since the mount namespace may have changed.

By simply refusing to reconnect the AF_LOCAL socket in the case of
unregister, we avoid the need to somehow save the mount namespace. While
this may lead to some services not unregistering properly, it should
be safe.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.9.x
2013-08-07 17:07:18 -04:00
Eric Dumazet
cd6b423afd tcp: cubic: fix bug in bictcp_acked()
While investigating about strange increase of retransmit rates
on hosts ~24 days after boot, Van found hystart was disabled
if ca->epoch_start was 0, as following condition is true
when tcp_time_stamp high order bit is set.

(s32)(tcp_time_stamp - ca->epoch_start) < HZ

Quoting Van :

 At initialization & after every loss ca->epoch_start is set to zero so
 I believe that the above line will turn off hystart as soon as the 2^31
 bit is set in tcp_time_stamp & hystart will stay off for 24 days.
 I think we've observed that cubic's restart is too aggressive without
 hystart so this might account for the higher drop rate we observe.

Diagnosed-by: Van Jacobson <vanj@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 10:35:08 -07:00
Wang Sheng-Hui
15401946f9 bridge: correct the comment for file br_sysfs_br.c
br_sysfs_if.c is for sysfs attributes of bridge ports, while br_sysfs_br.c
is for sysfs attributes of bridge itself. Correct the comment here.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 10:35:06 -07:00
Eric Dumazet
2ed0edf909 tcp: cubic: fix overflow error in bictcp_update()
commit 17a6e9f1aa ("tcp_cubic: fix clock dependency") added an
overflow error in bictcp_update() in following code :

/* change the unit from HZ to bictcp_HZ */
t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) -
      ca->epoch_start) << BICTCP_HZ) / HZ;

Because msecs_to_jiffies() being unsigned long, compiler does
implicit type promotion.

We really want to constrain (tcp_time_stamp - ca->epoch_start)
to a signed 32bit value, or else 't' has unexpected high values.

This bugs triggers an increase of retransmit rates ~24 days after
boot [1], as the high order bit of tcp_time_stamp flips.

[1] for hosts with HZ=1000

Big thanks to Van Jacobson for spotting this problem.

Diagnosed-by: Van Jacobson <vanj@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07 10:35:05 -07:00
Fan Du
af83fde751 xfrm: Remove rebundant address family checking
present_and_same_family has checked addresses family validness for both
SADB_EXT_ADDRESS_SRC and SADB_EXT_ADDRESS_DST in the beginning.
Thereafter pfkey_sadb_addr2xfrm_addr doesn't need to do the checking again.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-07 10:12:58 +02:00
Daniel Borkmann
54e35cc523 ipvs: ip_vs_sh: ip_vs_sh_get_port: check skb_header_pointer for NULL
skb_header_pointer could return NULL, so check for it as we do it
everywhere else in ipvs code. This fixes a coverity warning.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-08-07 08:57:57 +09:00
Johannes Berg
e7f1935c11 wireless: make TU conversion macros available
A few places in the code (mac80211 and iwlmvm) use the same
TU_TO_JIFFIES() macro and could use TU_TO_EXP_TIME() that
mac80211 has. Make these available to everyone and use them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-06 11:00:59 +02:00
Dragos Foianu
70e3ca79cd ipvs: fixed spacing at for statements
found using checkpatch.pl

Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-08-06 15:09:58 +09:00
Fan Du
0659eea912 xfrm: Delete hold_timer when destroy policy
Both policy timer and hold_timer need to be deleted when destroy policy

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-06 06:59:18 +02:00
Trond Myklebust
00326ed644 SUNRPC: Don't auto-disconnect from the local rpcbind socket
There is no need for the kernel to time out the AF_LOCAL connection to
the rpcbind socket, and doing so is problematic because when it is
time to reconnect, our process may no longer be using the same mount
namespace.

Reported-by: Nix <nix@esperi.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.9.x
2013-08-05 22:17:00 -04:00
Linus Lüssing
248ba8ec05 bridge: don't try to update timers in case of broken MLD queries
Currently we are reading an uninitialized value for the max_delay
variable when snooping an MLD query message of invalid length and would
update our timers with that.

Fixing this by simply ignoring such broken MLD queries (just like we do
for IGMP already).

This is a regression introduced by:
"bridge: disable snooping if there is no querier" (b00589af3b)

Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 15:43:39 -07:00
Eric Dumazet
aab515d7c3 fib_trie: remove potential out of bound access
AddressSanitizer [1] dynamic checker pointed a potential
out of bound access in leaf_walk_rcu()

We could allocate one more slot in tnode_new() to leave the prefetch()
in-place but it looks not worth the pain.

Bug added in commit 82cfbb0085 ("[IPV4] fib_trie: iterator recode")

[1] :
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 15:26:11 -07:00
Veaceslav Falico
fc7f8f5c53 neighbour: populate neigh_parms on alloc before calling ndo_neigh_setup
dev->ndo_neigh_setup() might need some of the values of neigh_parms, so
populate them before calling it.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 15:19:04 -07:00
Daniel Borkmann
7921895a5e net: esp{4,6}: fix potential MTU calculation overflows
Commit 91657eafb ("xfrm: take net hdr len into account for esp payload
size calculation") introduced a possible interger overflow in
esp{4,6}_get_mtu() handlers in case of x->props.mode equals
XFRM_MODE_TUNNEL. Thus, the following expression will overflow

  unsigned int net_adj;
  ...
  <case ipv{4,6} XFRM_MODE_TUNNEL>
         net_adj = 0;
  ...
  return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
           net_adj) & ~(align - 1)) + (net_adj - 2);

where (net_adj - 2) would be evaluated as <foo> + (0 - 2) in an unsigned
context. Fix it by simply removing brackets as those operations here
do not need to have special precedence.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Benjamin Poirier <bpoirier@suse.de>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 12:26:50 -07:00
nikolay@redhat.com
07ce76aa9b net_sched: make dev_trans_start return vlan's real dev trans_start
Vlan devices are LLTX and don't update their own trans_start, so if
dev_trans_start has to be called with a vlan device then 0 or a stale
value will be returned. Currently the bonding is the only such user, and
it's needed for proper arp monitoring when the slaves are vlans.
Fix this by extracting the vlan's real device trans_start.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 12:17:42 -07:00
nikolay@redhat.com
0369722f02 vlan: make vlan_dev_real_dev work over stacked vlans
Sometimes we might have stacked vlans on top of each other, and we're
interested in the first non-vlan real device on the path, so transform
vlan_dev_real_dev to go over the stacked vlans and extract the first
non-vlan device.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 12:17:42 -07:00
Julia Lawall
d9af2d67e4 net/vmw_vsock/af_vsock.c: drop unneeded semicolon
Drop the semicolon at the end of the list_for_each_entry loop header.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05 11:07:44 -07:00
Dan Carpenter
e4d091d7bf netfilter: nfnetlink_{log,queue}: fix information leaks in netlink message
These structs have a "_pad" member.  Also the "phw" structs have an 8
byte "hw_addr[]" array but sometimes only the first 6 bytes are
initialized.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-05 17:36:04 +02:00
Florian Westphal
d8b3bfc253 netfilter: tproxy: fix build with IP6_NF_IPTABLES=n
after commit 93742cf (netfilter: tproxy: remove nf_tproxy_core.h)

CONFIG_IPV6=y
CONFIG_IP6_NF_IPTABLES=n

gives us:

net/netfilter/xt_TPROXY.c: In function 'nf_tproxy_get_sock_v6':
net/netfilter/xt_TPROXY.c:178:4: error: implicit declaration of function 'inet6_lookup_listener'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-05 12:57:38 +02:00
Mathias Krause
8603b9556e af_key: constify lookup tables
The lookup tables for minimum sizes of extensions and for the pfkey
handler functions are read only, therefore can be const.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-05 11:14:00 +02:00
Mathias Krause
e473fcb472 xfrm: constify mark argument of xfrm_find_acq()
The mark argument is read only, so constify it. Also make dummy_mark in
af_key const -- only used as dummy argument for this very function.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-05 11:13:53 +02:00
stephen hemminger
762a3d89eb bridge: fix rcu check warning in multicast port group
Use of RCU here with out marked pointer and function doesn't match prototype
with sparse.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-04 18:41:52 -07:00
David S. Miller
0e76a3a587 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge net into net-next to setup some infrastructure Eric
Dumazet needs for usbnet changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03 21:36:46 -07:00
Linus Torvalds
72a67a94bc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Don't ignore user initiated wireless regulatory settings on cards
    with custom regulatory domains, from Arik Nemtsov.

 2) Fix length check of bluetooth information responses, from Jaganath
    Kanakkassery.

 3) Fix misuse of PTR_ERR in btusb, from Adam Lee.

 4) Handle rfkill properly while iwlwifi devices are offline, from
    Emmanuel Grumbach.

 5) Fix r815x devices DMA'ing to stack buffers, from Hayes Wang.

 6) Kernel info leak in ATM packet scheduler, from Dan Carpenter.

 7) 8139cp doesn't check for DMA mapping errors, from Neil Horman.

 8) Fix bridge multicast code to not snoop when no querier exists,
    otherwise mutlicast traffic is lost.  From Linus Lüssing.

 9) Avoid soft lockups in fib6_run_gc(), from Michal Kubecek.

10) Fix races in automatic address asignment on ipv6, which can result
    in incorrect lifetime assignments.  From Jiri Benc.

11) Cure build bustage when CONFIG_NET_LL_RX_POLL is not set and rename
    it CONFIG_NET_RX_BUSY_POLL to eliminate the last reference to the
    original naming of this feature.  From Cong Wang.

12) Fix crash in TIPC when server socket creation fails, from Ying Xue.

13) macvlan_changelink() silently succeeds when it shouldn't, from
    Michael S Tsirkin.

14) HTB packet scheduler can crash due to sign extension, fix from
    Stephen Hemminger.

15) With the cable unplugged, r8169 prints out a message every 10
    seconds, make it netif_dbg() instead of netif_warn().  From Peter
    Wu.

16) Fix memory leak in rtm_to_ifaddr(), from Daniel Borkmann.

17) sis900 gets spurious TX queue timeouts due to mismanagement of link
    carrier state, from Denis Kirjanov.

18) Validate somaxconn sysctl to make sure it fits inside of a u16.
    From Roman Gushchin.

19) Fix MAC address filtering on qlcnic, from Shahed Shaikh.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits)
  qlcnic: Fix for flash update failure on 83xx adapter
  qlcnic: Fix link speed and duplex display for 83xx adapter
  qlcnic: Fix link speed display for 82xx adapter
  qlcnic: Fix external loopback test.
  qlcnic: Removed adapter series name from warning messages.
  qlcnic: Free up memory in error path.
  qlcnic: Fix ingress MAC learning
  qlcnic: Fix MAC address filter issue on 82xx adapter
  net: ethernet: davinci_emac: drop IRQF_DISABLED
  netlabel: use domain based selectors when address based selectors are not available
  net: check net.core.somaxconn sysctl values
  sis900: Fix the tx queue timeout issue
  net: rtm_to_ifaddr: free ifa if ifa_cacheinfo processing fails
  r8169: remove "PHY reset until link up" log spam
  net: ethernet: cpsw: drop IRQF_DISABLED
  htb: fix sign extension bug
  macvlan: handle set_promiscuity failures
  macvlan: better mode validation
  tipc: fix oops when creating server socket fails
  net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL
  ...
2013-08-03 15:00:23 -07:00
Linus Torvalds
83aaf3b39c Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
 "Most of this is due to a screwup on my part -- some gss-proxy crashes
  got fixed before the merge window but somehow never made it out of a
  temporary git repo on my laptop...."

* 'for-3.11' of git://linux-nfs.org/~bfields/linux:
  svcrpc: set cr_gss_mech from gss-proxy as well as legacy upcall
  svcrpc: fix kfree oops in gss-proxy code
  svcrpc: fix gss-proxy xdr decoding oops
  svcrpc: fix gss_rpc_upcall create error
  NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.
2013-08-03 11:15:03 -07:00
Stefan Tomanek
73f5698e77 fib_rules: fix suppressor names and default values
This change brings the suppressor attribute names into line; it also changes
the data types to provide a more consistent interface.

While -1 indicates that the suppressor is not enabled, values >= 0 for
suppress_prefixlen or suppress_ifgroup  reject routing decisions violating the
constraint.

This changes the previously presented behaviour of suppress_prefixlen, where a
prefix length _less_ than the attribute value was rejected. After this change,
a prefix length less than *or* equal to the value is considered a violation of
the rule constraint.

It also changes the default values for default and newly added rules (disabling
any suppression for those).

Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03 10:40:23 -07:00