Commit graph

35473 commits

Author SHA1 Message Date
Jeff Layton
3999e49364 locks: add a new "lm_owner_key" lock operation
Currently, the hashing that the locking code uses to add these values
to the blocked_hash is simply calculated using fl_owner field. That's
valid in most cases except for server-side lockd, which validates the
owner of a lock based on fl_owner and fl_pid.

In the case where you have a small number of NFS clients doing a lot
of locking between different processes, you could end up with all
the blocked requests sitting in a very small number of hash buckets.

Add a new lm_owner_key operation to the lock_manager_operations that
will generate an unsigned long to use as the key in the hashtable.
That function is only implemented for server-side lockd, and simply
XORs the fl_owner and fl_pid.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:45 +04:00
Jeff Layton
139ca04ee5 locks: convert fl_link to a hlist_node
Testing has shown that iterating over the blocked_list for deadlock
detection turns out to be a bottleneck. In order to alleviate that,
begin the process of turning it into a hashtable. We start by turning
the fl_link into a hlist_node and the global lists into hlists. A later
patch will do the conversion of the blocked_list to a hashtable.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:44 +04:00
Jeff Layton
1c8c601a8c locks: protect most of the file_lock handling with i_lock
Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:42 +04:00
Jeff Layton
1cb3601259 locks: comment cleanups and clarifications
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:39 +04:00
Jeff Layton
1a9e64a711 cifs: use posix_unblock_lock instead of locks_delete_block
commit 66189be74 (CIFS: Fix VFS lock usage for oplocked files) exported
the locks_delete_block symbol. There's already an exported helper
function that provides this capability however, so make cifs use that
instead and turn locks_delete_block back into a static function.

Note that if fl->fl_next == NULL then this lock has already been through
locks_delete_block(), so we should be OK to ignore an ENOENT error here
and simply not retry the lock.

Cc: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:38 +04:00
Jeff Layton
f891a29f46 locks: drop the unused filp argument to posix_unblock_lock
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:37 +04:00
Linus Torvalds
da53be12bb Don't pass inode to ->d_hash() and ->d_compare()
Instances either don't look at it at all (the majority of cases) or
only want it to find the superblock (which can be had as dentry->d_sb).
A few cases that want more are actually safe with dentry->d_inode -
the only precaution needed is the check that it hadn't been replaced with
NULL by rmdir() or by overwriting rename(), which case should be simply
treated as cache miss.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:36 +04:00
Al Viro
68d70d03f8 constify rw_verify_area()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:34 +04:00
Al Viro
1bf9d14dff new helper: fixed_size_llseek()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:26 +04:00
Al Viro
0b3fca1fd1 kill find_inode_number()
the only remaining caller (in ncpfs) is guaranteed to return 0 -
we only hit it if we'd just checked that there's no dentry with
such name.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:20 +04:00
David Howells
c77cecee52 Replace a bunch of file->dentry->d_inode refs with file_inode()
Replace a bunch of file->dentry->d_inode refs with file_inode().

In __fput(), use file->f_inode instead so as not to be affected by any tricks
that file_inode() might grow.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:13 +04:00
Al Viro
f4e0c30c19 allow the temp files created by open() to be linked to
O_TMPFILE | O_CREAT => linkat() with AT_SYMLINK_FOLLOW and /proc/self/fd/<n>
as oldpath (i.e. flink()) will create a link
O_TMPFILE | O_CREAT | O_EXCL => ENOENT on attempt to link those guys

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:11 +04:00
Al Viro
60545d0d46 [O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:10 +04:00
Al Viro
ac6614b764 [readdir] constify ->actor
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:05 +04:00
Al Viro
2233f31aad [readdir] ->readdir() is gone
everything's converted to ->iterate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:04 +04:00
Al Viro
5ded75ec4c [readdir] convert ext3
new helper: dir_relax(inode).  Call when you are in location that will
_not_ be invalidated by directory modifications (block boundary, in case
of ext*).  Returns whether the directory has survived (dropping i_mutex
allows rmdir to kill the sucker; if it returns false to us, ->iterate()
is obviously done)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:49 +04:00
Al Viro
5f99f4e79a [readdir] switch dcache_readdir() users to ->iterate()
new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx),
dir_emit_dots(file, ctx).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:48 +04:00
Al Viro
bb6f619b3a [readdir] introduce ->iterate(), ctx->pos, dir_emit()
New method - ->iterate(file, ctx).  That's the replacement for ->readdir();
it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
calls dir_emit(ctx, ...) instead of filldir(data, ...).  It does *not*
update file->f_pos (or look at it, for that matter); iterate_dir() does the
update.

Note that dir_emit() takes the offset from ctx->pos (and eventually
filldir_t will lose that argument).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:47 +04:00
Al Viro
5c0ba4e076 [readdir] introduce iterate_dir() and dir_context
iterate_dir(): new helper, replacing vfs_readdir().

struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:46 +04:00
Al Viro
83a8761142 move linux/loop.h to drivers/block
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:45 +04:00
Al Viro
7995bd2871 splice: don't pass the address of ->f_pos to methods
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-20 19:02:45 +04:00
David Daney
f21afc25f9 smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().
Thanks to commit f91eb62f71 ("init: scream bloody murder if interrupts
are enabled too early"), "bloody murder" is now being screamed.

With a MIPS OCTEON config, we use on_each_cpu() in our
irq_chip.irq_bus_sync_unlock() function.  This gets called in early as a
result of the time_init() call.  Because the !SMP version of
on_each_cpu() unconditionally enables irqs, we get:

    WARNING: at init/main.c:560 start_kernel+0x250/0x410()
    Interrupts were enabled early
    CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
    Call Trace:
      show_stack+0x68/0x80
      warn_slowpath_common+0x78/0xb0
      warn_slowpath_fmt+0x38/0x48
      start_kernel+0x250/0x410

Suggested fix: Do what we already do in the SMP version of
on_each_cpu(), and use local_irq_save/local_irq_restore.  Because we
need a flags variable, make it a static inline to avoid name space
issues.

[ Change from v1: Convert on_each_cpu to a static inline function, add
  #include <linux/irqflags.h> to avoid build breakage on some files.

  on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
  on_each_cpu(), but they are not causing !SMP bugs for me, so I will
  defer changing them to a less urgent patch. ]

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-14 19:24:42 -10:00
Linus Torvalds
cb7e9704d5 Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fixes from Paul McKenney:
 "I must confess that this past merge window was not RCU's best showing.
  This series contains three more fixes for RCU regressions:

   1.   A fix to __DECLARE_TRACE_RCU() that causes it to act as an
        interrupt from idle rather than as a task switch from idle.
        This change is needed due to the recent use of _rcuidle()
        tracepoints that can be invoked from interrupt handlers as well
        as from idle.  Without this fix, invoking _rcuidle() tracepoints
        from interrupt handlers results in splats and (more seriously)
        confusion on RCU's part as to whether a given CPU is idle or not.
        This confusion can in turn result in too-short grace periods and
        therefore random memory corruption.

   2.   A fix to a subtle deadlock that could result due to RCU doing
        a wakeup while holding one of its rcu_node structure's locks.
        Although the probability of occurrence is low, it really
        does happen.  The fix, courtesy of Steven Rostedt, uses
        irq_work_queue() to avoid the deadlock.

   3.   A fix to a silent deadlock (invisible to lockdep) due to the
        interaction of timeouts posted by RCU debug code enabled by
        CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU
        hotplug operations.  This will not occur in production kernels,
        but really does occur in randconfig testing.  Diagnosis courtesy
        of Steven Rostedt"

* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
  rcu: Don't call wakeup() with rcu_node structure ->lock held
  trace: Allow idle-safe tracepoints to be called from irq
2013-06-13 12:36:42 -07:00
Linus Torvalds
26e04462c8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking update from David Miller:

 1) Fix dump iterator in nfnl_acct_dump() and ctnl_timeout_dump() to
    dump all objects properly, from Pablo Neira Ayuso.

 2) xt_TCPMSS must use the default MSS of 536 when no MSS TCP option is
    present.  Fix from Phil Oester.

 3) qdisc_get_rtab() looks for an existing matching rate table and uses
    that instead of creating a new one.  However, it's key matching is
    incomplete, it fails to check to make sure the ->data[] array is
    identical too.  Fix from Eric Dumazet.

 4) ip_vs_dest_entry isn't fully initialized before copying back to
    userspace, fix from Dan Carpenter.

 5) Fix ubuf reference counting regression in vhost_net, from Jason
    Wang.

 6) When sock_diag dumps a socket filter back to userspace, we have to
    translate it out of the kernel's internal representation first.
    From Nicolas Dichtel.

 7) davinci_mdio holds a spinlock while calling pm_runtime, which
    sleeps.  Fix from Sebastian Siewior.

 8) Timeout check in sh_eth_check_reset is off by one, from Sergei
    Shtylyov.

 9) If sctp socket init fails, we can NULL deref during cleanup.  Fix
    from Daniel Borkmann.

10) netlink_mmap() does not propagate errors properly, from Patrick
    McHardy.

11) Disable powersave and use minstrel by default in ath9k.  From Sujith
    Manoharan.

12) Fix a regression in that SOCK_ZEROCOPY is not set on tuntap sockets
    which prevents vhost from being able to use zerocopy.  From Jason
    Wang.

13) Fix race between port lookup and TX path in team driver, from Jiri
    Pirko.

14) Missing length checks in bluetooth L2CAP packet parsing, from Johan
    Hedberg.

15) rtlwifi fails to connect to networking using any encryption method
    other than WPA2.  Fix from Larry Finger.

16) Fix iwlegacy build due to incorrect CONFIG_* ifdeffing for power
    management stuff.  From Yijing Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
  b43: stop format string leaking into error msgs
  ath9k: Use minstrel rate control by default
  Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity"
  ath9k: Disable PowerSave by default
  net: wireless: iwlegacy: fix build error for il_pm_ops
  rtlwifi: Fix a false leak indication for PCI devices
  wl12xx/wl18xx: scan all 5ghz channels
  wl12xx: increase minimum singlerole firmware version required
  wl12xx: fix minimum required firmware version for wl127x multirole
  rtlwifi: rtl8192cu: Fix problem in connecting to WEP or WPA(1) networks
  mwifiex: debugfs: Fix out of bounds array access
  Bluetooth: Fix mgmt handling of power on failures
  Bluetooth: Fix missing length checks for L2CAP signalling PDUs
  Bluetooth: btmrvl: support Marvell Bluetooth device SD8897
  Bluetooth: Fix checks for LE support on LE-only controllers
  team: fix checks in team_get_first_port_txable_rcu()
  team: move add to port list before port enablement
  team: check return value of team_get_port_by_index_rcu() for NULL
  tuntap: set SOCK_ZEROCOPY flag during open
  netlink: fix error propagation in netlink_mmap()
  ...
2013-06-12 17:18:29 -07:00
Linus Torvalds
b2cc9c19e4 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "Outside of bcache (which really isn't super big), these are all
  few-liners.  There are a few important fixes in here:

   - Fix blk pm sleeping when holding the queue lock

   - A small collection of bcache fixes that have been done and tested
     since bcache was included in this merge window.

   - A fix for a raid5 regression introduced with the bio changes.

   - Two important fixes for mtip32xx, fixing an oops and potential data
     corruption (or hang) due to wrong bio iteration on stacked devices."

* 'for-linus' of git://git.kernel.dk/linux-block:
  scatterlist: sg_set_buf() argument must be in linear mapping
  raid5: Initialize bi_vcnt
  pktcdvd: silence static checker warning
  block: remove refs to XD disks from documentation
  blkpm: avoid sleep when holding queue lock
  mtip32xx: Correctly handle bio->bi_idx != 0 conditions
  mtip32xx: Fix NULL pointer dereference during module unload
  bcache: Fix error handling in init code
  bcache: clarify free/available/unused space
  bcache: drop "select CLOSURES"
  bcache: Fix incompatible pointer type warning
2013-06-12 16:42:39 -07:00
Alex Shi
c2853c8df5 include/linux/math64.h: add div64_ul()
There is div64_long() to handle the s64/long division, but no mocro do
u64/ul division.  It is necessary in some scenarios, so add this
function.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:47 -07:00
Naoya Horiguchi
30dad30922 mm: migration: add migrate_entry_wait_huge()
When we have a page fault for the address which is backed by a hugepage
under migration, the kernel can't wait correctly and do busy looping on
hugepage fault until the migration finishes.  As a result, users who try
to kick hugepage migration (via soft offlining, for example) occasionally
experience long delay or soft lockup.

This is because pte_offset_map_lock() can't get a correct migration entry
or a correct page table lock for hugepage.  This patch introduces
migration_entry_wait_huge() to solve this.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>	[2.6.35+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:46 -07:00
Kees Cook
637241a900 kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections.  Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions.  With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.

To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:

 - /proc/kmsg allows:
  - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
    single-reader interface (SYSLOG_ACTION_READ).
  - everything, after an open.

 - syslog syscall allows:
  - anything, if CAP_SYSLOG.
  - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
    dmesg_restrict==0.
  - nothing else (EPERM).

The use-cases were:
 - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
 - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
   destructive SYSLOG_ACTION_READs.

AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.

Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.

To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.

 - /dev/kmsg allows:
  - open if CAP_SYSLOG or dmesg_restrict==0
  - reading/polling, after open

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192

[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:44 -07:00
Srivatsa S. Bhat
16e53dbf10 CPU hotplug: provide a generic helper to disable/enable CPU hotplug
There are instances in the kernel where we would like to disable CPU
hotplug (from sysfs) during some important operation.  Today the freezer
code depends on this and the code to do it was kinda tailor-made for
that.

Restructure the code and make it generic enough to be useful for other
usecases too.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:44 -07:00
Jiri Pirko
b79462a8b9 team: fix checks in team_get_first_port_txable_rcu()
should be checked if "cur" is txable, not "port".

Introduced by commit 6e88e1357c "team: use function team_port_txable()
for determing enabled and up port"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-12 00:56:27 -07:00
Nicolas Dichtel
ed13998c31 sock_diag: fix filter code sent to userspace
Filters need to be translated to real BPF code for userland, like SO_GETFILTER.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 22:23:32 -07:00
Paul E. McKenney
d62840995a trace: Allow idle-safe tracepoints to be called from irq
__DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
may safely be invoked from what RCU considers to be an idle CPU.
However, these _rcuidle() tracepoints may -not- be invoked from the
handler of an irq taken from idle, because rcu_idle_enter() zeroes
RCU's nesting-level counter, so that the rcu_irq_exit() returning to
idle will trigger a WARN_ON_ONCE().

This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
tracepoints usable from irq handlers as well as from process context.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-10 13:37:10 -07:00
Linus Torvalds
14d0ee0517 This contains 4 fixes.
The first two fix the case where full RCU debugging is enabled, enabling
 function tracing causes a live lock of the system. This is due to the added
 debug checks in rcu_dereference_raw() that is used by the function tracer.
 These checks are also traced by the function tracer as well as cause enough
 overhead to the function tracer to slow down the system enough that
 the time to finish an interrupt can take longer than when the next
 interrupt is triggered, causing a live lock from the timer interrupt.
 
 Talking this over with Paul McKenney, we came up with a fix that adds
 a new rcu_dereference_raw_notrace() that does not perform these added checks,
 and let the function tracer use that.
 
 The third commit fixes a failed compile when branch tracing is enabled,
 due to the conversion of the trace_test_buffer() selftest that the
 branch trace wasn't converted for.
 
 The forth patch fixes a bug caught by the RCU lockdep code where a
 rcu_read_lock() is performed when rcu is disabled (either going to
 or from idle, or user space). This happened on the irqsoff tracer
 as it calls task_uid(). The fix here was to use current_uid() when
 possible that doesn't use rcu locking. Which luckily, is always used
 when irqsoff calls this code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJRsQZhAAoJEOdOSU1xswtMquIH/0zyrqrLTnkc5MsNnnJ8kH5R
 z1cULts4FqBTUNZ1hdb3BTOu4zywjREIkWfM9qqpBmq9Mq6PBxX7gxWTqYvD4jiX
 EatiiCKa7Fyddx4iHJNfvtWgKVYt9WKSNeloRugS9h7NxIZ1wpz21DUpENFQzW2f
 jWRnq/AKXFmZ0vn1953mPePtRsg61RYpb7DCkTE1gtUnvL43wMd/Mo6p6BLMEG26
 1dDK6EWO/uewl8A4oP5JZYP+AP5Ckd4x1PuQK682AtQw+8S6etaGfeJr0WZmKQoD
 0aDZ/NXXSNKChlUFGJusBNJCWryONToa+sdiKuk1h/lW/k9Mail/FChiHBzMiwk=
 =uvlD
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "This contains 4 fixes.

  The first two fix the case where full RCU debugging is enabled,
  enabling function tracing causes a live lock of the system.  This is
  due to the added debug checks in rcu_dereference_raw() that is used by
  the function tracer.  These checks are also traced by the function
  tracer as well as cause enough overhead to the function tracer to slow
  down the system enough that the time to finish an interrupt can take
  longer than when the next interrupt is triggered, causing a live lock
  from the timer interrupt.

  Talking this over with Paul McKenney, we came up with a fix that adds
  a new rcu_dereference_raw_notrace() that does not perform these added
  checks, and let the function tracer use that.

  The third commit fixes a failed compile when branch tracing is
  enabled, due to the conversion of the trace_test_buffer() selftest
  that the branch trace wasn't converted for.

  The forth patch fixes a bug caught by the RCU lockdep code where a
  rcu_read_lock() is performed when rcu is disabled (either going to or
  from idle, or user space).  This happened on the irqsoff tracer as it
  calls task_uid().  The fix here was to use current_uid() when possible
  that doesn't use rcu locking.  Which luckily, is always used when
  irqsoff calls this code."

* tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Use current_uid() for critical time tracing
  tracing: Fix bad parameter passed in branch selftest
  ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends
  rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()
2013-06-07 18:46:51 -07:00
Andy Lutomirski
a7526eb5d0 net: Unbreak compat_sys_{send,recv}msg
I broke them in this commit:

    commit 1be374a051
    Author: Andy Lutomirski <luto@amacapital.net>
    Date:   Wed May 22 14:07:44 2013 -0700

        net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg

This patch adds __sys_sendmsg and __sys_sendmsg as common helpers that accept
MSG_CMSG_COMPAT and blocks MSG_CMSG_COMPAT at the syscall entrypoints.  It
also reverts some unnecessary checks in sys_socketcall.

Apparently I was suffering from underscore blindness the first time around.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 11:52:14 -07:00
Linus Torvalds
4d3797d7e1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix timeouts with direct mode authentication in mac80211, from
    Stanislaw Gruszka.

 2) Aggregation sessions can deadlock in ath9k, from Felix Fietkau.

 3) Netfilter's xt_addrtype doesn't work with ipv6 due to route lookups
    creating undesirable cache entries, from Florian Westphal.

 4) Fix netfilter's ipt_ULOG from generating non-NULL terminated
    strings.

 5) Fix netdev transmit queue crashes in mac80211, from Johannes Berg.

 6) Fix copy and paste error in 802.11 stack that broke reporting of
    64-bit station tx statistics, from Felix Fietkau.

 7) When qlge_probe fails, it leaks the netdev.  Fix from Wei Yongjun.

 8) SKB control block (where we store the IP options information,
    amongst other things) must be cleared properly otherwise ICMP
    sending can crash for IP tunnels.  Fix from Eric Dumazet.

 9) Verification of Energy Efficient Ether support was coded wrongly,
    the test was inversed.  Fix from Giuseppe CAVALLARO.

10) TCP handles redirects improperly because the wrong flow key is used
    for the route lookup.  From Michal Kubecek.

11) Don't interpret MSG_CMSG_COMPAT from userspace, fix from Andy
    Lutomirski.

12) The new AF_VSOCK was missing from the lockdep string table, fix from
    Federico Vaga.

13) be2net doesn't handle checksumming of IP fragments properly, from
    Somnath Kotur.

14) Fix several bugs in the device address list code that lead to
    crashes and other misbehaviors.  From Jay Vosburgh.

15) Fix ipv6 segmentation handling of fragmented GRE tunnel traffic,
    from Pravin B Shalr.

16) Fix usage of stale policies in IPSEC layer, from Paul Moore.

17) Fix team driver dump of ports when there are a large number of them,
    from Jiri Pirko.

18) Fix softlockups in UDP ipv4 socket lookup causes by and error in the
    hlist_nulls_for_each_entry_rcu() macro.  From Eric Dumazet.

19) Fix several regressions added by the high rate accuracy changes to
    the htb packet scheduler.  From Eric Dumazet.

20) Fix DMA'ing onto the stack in esd_usb2 and peak_usb CAN drivers,
    from Olivier Sobrie and Marc Kleine-Budde.

21) Fix unremovable network devices due to missing route pointer
    installation in the per-device ipv6 address list entries.  From Gao
    feng.

22) Apply the tg3 5719 DMA workaround on 5720 chips as well, otherwise
    we get stalls.  From Nithin Sujir.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits)
  net_sched: htb: do not mix 1ns and 64ns time units
  net: fix sk_buff head without data area
  tg3: Add read dma workaround for 5720
  net: ethernet: xilinx_emaclite: set protocol selector bits when writing ANAR
  bnx2x: Fix bridged GSO for 57710/57711 chips
  net: fec: add fallback to random MAC address
  bnx2x: fix TCP offload for tunneling ipv4 over ipv6
  ipv6: assign rt6_info to inet6_ifaddr in init_loopback
  net/mlx4_core: Keep VF assigned MAC in the PF admin table
  net/mlx4_en: Handle unassigned VF MAC address correctly
  net/mlx4_core: Return -EPROBE_DEFER when a VF is probed before PF is sufficiently initialized
  net/mlx4_en: Fix adaptive moderation cq update
  net: can: peak_usb: Do not do dma on the stack
  net: can: esd_usb2: Do not do dma on the stack
  net: can: kvaser_usb: fix reception on "USBcan Pro" and "USBcan R" type hardware.
  net_sched: restore "overhead xxx" handling
  net: force a reload of first item in hlist_nulls_for_each_entry_rcu
  hyperv: Fix vlan_proto setting in netvsc_recv_callback()
  team: fix port list dump for big number of ports
  list: introduce list_first_entry_or_null
  ...
2013-06-05 19:19:04 +09:00
Linus Torvalds
7d80fea426 Merge branch 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:

 - Fix for yet another xattr bug which may lead to NULL deref.

 - A subtle bug in for_each_descendant_pre().  This bug requires quite
   specific conditions to trigger and isn't too likely to actually
   happen in the wild, but maybe that just makes it that much more
   nastier.

 - A warning message added for silly cgroup re-mount (not -o remount,
   but unmount followed by mount) behavior.

* 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: warn about mismatching options of a new mount of an existing hierarchy
  cgroup: fix a subtle bug in descendant pre-order walk
  cgroup: initialize xattr before calling d_instantiate()
2013-06-03 17:57:16 +09:00
Eric Dumazet
c87a124a5d net: force a reload of first item in hlist_nulls_for_each_entry_rcu
Roman Gushchin discovered that udp4_lib_lookup2() was not reloading
first item in the rcu protected list, in case the loop was restarted.

This produced soft lockups as in https://lkml.org/lkml/2013/4/16/37

rcu_dereference(X)/ACCESS_ONCE(X) seem to not work as intended if X is
ptr->field :

In some cases, gcc caches the value or ptr->field in a register.

Use a barrier() to disallow such caching, as documented in
Documentation/atomic_ops.txt line 114

Thanks a lot to Roman for providing analysis and numerous patches.

Diagnosed-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-02 20:53:59 -07:00
Jiri Pirko
6d7581e62f list: introduce list_first_entry_or_null
non-rcu variant of list_first_or_null_rcu

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:31:52 -07:00
Pravin B Shelar
1e2bd517c1 udp6: Fix udp fragmentation for tunnel traffic.
udp6 over GRE tunnel does not work after to GRE tso changes. GRE
tso handler passes inner packet but keeps track of outer header
start in SKB_GSO_CB(skb)->mac_offset.  udp6 fragment need to
take care of outer header, which start at the mac_offset, while
adding fragment header.
This bug is introduced by commit 68c3316311 (GRE: Add TCP
segmentation offload for GRE).

Reported-by: Dmitry Kravkov <dkravkov@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Tested-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 17:06:07 -07:00
David S. Miller
73ce00d4d6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS fixes for 3.10-rc3,
they are:

* fix xt_addrtype with IPv6, from Florian Westphal. This required
  a new hook for IPv6 functions in the netfilter core to avoid
  hard dependencies with the ipv6 subsystem when this match is
  only used for IPv4.

* fix connection reuse case in IPVS. Currently, if an reused
  connection are directed to the same server. If that server is
  down, those connection would fail. Therefore, clear the
  connection and choose a new server among the available ones.

* fix possible non-nul terminated string sent to user-space if
  ipt_ULOG is used as the default netfilter logging stub, from
  Chen Gang.

* fix mark logging of IPv6 packets in xt_LOG, from Michal Kubecek.
  This bug has been there since 2.6.26.

* Fix breakage ip_vs_sh due to incorrect structure layout for
  RCU, from Jan Beulich.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-30 16:38:38 -07:00
Lance Ortiz
37448adfc7 aerdrv: Move cper_print_aer() call out of interrupt context
The following warning was seen on 3.9 when a corrected PCIe error was being
handled by the AER subsystem.

WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()

This occurred because a call to pci_get_domain_bus_and_slot() was added to
cper_print_pcie() to setup for the call to cper_print_aer().  The warning
showed up because cper_print_pcie() is called in an interrupt context and
pci_get* functions are not supposed to be called in that context.

The solution is to move the cper_print_aer() call out of the interrupt
context and into aer_recover_work_func() to avoid any warnings when calling
pci_get* functions.

Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-05-30 10:51:20 -07:00
Rusty Russell
ac4e97abce scatterlist: sg_set_buf() argument must be in linear mapping
Add a check behind CONFIG_DEBUG_SG to verify this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-05-30 09:20:20 +02:00
Steven Rostedt
12bcbe66d7 rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()
As rcu_dereference_raw() under RCU debug config options can add quite a
bit of checks, and that tracing uses rcu_dereference_raw(), these checks
happen with the function tracer. The function tracer also happens to trace
these debug checks too. This added overhead can livelock the system.

Add a new interface to RCU for both rcu_dereference_raw_notrace() as well
as hlist_for_each_entry_rcu_notrace() as the hlist iterator uses the
rcu_dereference_raw() as well, and is used a bit with the function tracer.

Link: http://lkml.kernel.org/r/20130528184209.304356745@goodmis.org

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-28 22:47:13 -04:00
Linus Torvalds
27a24cfa04 Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dma fixes from Vinod Koul:
 "We have two patches from Andy & Rafael fixing the Lynxpoint dma"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  ACPI / LPSS: register clock device for Lynxpoint DMA properly
  dma: acpi-dma: parse CSRT to extract additional resources
2013-05-25 20:30:31 -07:00
Linus Torvalds
9cf1848278 Merge branch 'akpm' (incoming from Andrew Morton)
Merge fixes from Andrew Morton:
 "A bunch of fixes and one simple fbdev driver which missed the merge
  window because people will still talking about it (to no great
  effect)."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (30 commits)
  aio: fix kioctx not being freed after cancellation at exit time
  mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas
  drivers/rtc/rtc-max8998.c: check for pdata presence before dereferencing
  ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap()
  random: fix accounting race condition with lockless irq entropy_count update
  drivers/char/random.c: fix priming of last_data
  mm/memory_hotplug.c: fix printk format warnings
  nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
  drivers/block/brd.c: fix brd_lookup_page() race
  fbdev: FB_GOLDFISH should depend on HAS_DMA
  drivers/rtc/rtc-pl031.c: pass correct pointer to free_irq()
  auditfilter.c: fix kernel-doc warnings
  aio: fix io_getevents documentation
  revert "selftest: add simple test for soft-dirty bit"
  drivers/leds/leds-ot200.c: fix error caused by shifted mask
  mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer
  linux/kernel.h: fix kernel-doc warning
  mm compaction: fix of improper cache flush in migration code
  rapidio/tsi721: fix bug in MSI interrupt handling
  hfs: avoid crash in hfs_bnode_create
  ...
2013-05-24 18:12:15 -07:00
Linus Torvalds
00cec111ac ARM: SoC fixes for 3.10-rc
We didn't have any fixes sent up for -rc2, so this is a slightly larger
 batch. A bit all over the place platform-wise; OMAP, at91, marvell,
 renesas, sunxi, ux500, etc.
 
 I tried to summarize highlights but there isn't a whole lot to point
 out. Lots of little things fixed all over. A couple of defconfig updates
 due to new/changing options.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRn+66AAoJEIwa5zzehBx3f/4P/3sqK2z7u5SSa+tpkKYkxezO
 MykUOpUc4tuwrKiuUEeXPh89pjIrclQVzKYDqdaXIcezKB7IXFfQSyLNxDzGM7Y3
 NrqrURvNpDmUi6F/xP89gXWkbvg2zr563mxSMpkF4G8HTYxvCv7sY0W/PDzb48Qg
 q3Efc/AfQsOGM0Zl8WXX9jZBdCNTquZYWd7YEFxCe9oVRGfmNlIYKirPOLnk9MAZ
 iIrbfFQG+cOos0NKrjM+tbtQNnPUBQeZdy3MvR7DtXCpKNdxs5EJL9EyHHqGGzPk
 Jd1CG3hNrPiuKVhmVSDkELNDYNT7J9+/rya1Mc33pwMrReAIKWUU/sitvsOkKypi
 PnTvlJkUv3UM2fOtoF8mOhQLTGdKSWfaF0BfHNmnCqJFDPs8vuQjx7O1WGKKUdC4
 SbYZetUnL3AoMrEbza5EoMyqQwpTXiALWp4/6MunLhNyXo0OQvsqexvT6QIrhKyQ
 0iExuSSbXDZAw2GXsqzOlCJecxHYG4qkGbf1DmeTW31GRQQ3nhpiu0GVAvHIEAhT
 SJGD57P0gbEXMpnSIKLNKIAYlLdFkGx6AhX1/Xb+AL7Jsod+UEwgz2ya4GMI4JI/
 0lUe8fPglU3ws8Up1y5FcIq4gjXTsvsEy317zeW/oq2vac0ACKBika2EVukdMD/5
 SYr+m40EzQtVdTKuaZ+e
 =8tMi
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "We didn't have any fixes sent up for -rc2, so this is a slightly
  larger batch.  A bit all over the place platform-wise; OMAP, at91,
  marvell, renesas, sunxi, ux500, etc.

  I tried to summarize highlights but there isn't a whole lot to point
  out.  Lots of little things fixed all over.  A couple of defconfig
  updates due to new/changing options."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (44 commits)
  ARM: at91/sama5: fix incorrect PMC pcr div definition
  ARM: at91/dt: fix macb pinctrl_macb_rmii_mii_alt definition
  ARM: at91: at91sam9n12: move external irq declatation to DT
  ARM: shmobile: marzen: Use error values in usb_power_*
  ARM: tegra: defconfig fixes
  ARM: nomadik: fix IRQ assignment for SMC ethernet
  ARM: vt8500: Add missing NULL terminator in dt_compat
  clk: tegra: add ac97 controller clock
  clk: tegra: remove USB from clk init table
  ARM: dts: mvebu: Fix wrong the address reg value for the L2-cache node
  ARM: plat-orion: Fix num_resources and id for ge10 and ge11
  ARM: OMAP2+: hwmod: Remove sysc slave idle and auto idle apis
  SERIAL: OMAP: Remove the slave idle handling from the driver
  ARM: OMAP2+: serial: Remove the un-used slave idle hooks
  ARM: OMAP2+: hwmod-data: UART IP needs software control to manage sidle modes
  ARM: OMAP2+: hwmod: Add a new flag to handle SIDLE in SWSUP only in active
  ARM: OMAP2+: hwmod: Fix sidle programming in _enable_sysc()/_idle_sysc()
  arm: mvebu: fix the 'ranges' property to handle PCIe
  ARM: mvebu: select ARCH_REQUIRE_GPIOLIB for mvebu platform
  ARM: AM33XX: Add missing .clkdm_name to clkdiv32k_ick clock
  ...
2013-05-24 16:27:37 -07:00
Randy Dunlap
7450231fb3 linux/kernel.h: fix kernel-doc warning
Fix kernel-doc warning in <linux/kernel.h>:

  Warning(include/linux/kernel.h:590): No description found for parameter 'ip'

scripts/kernel-doc cannot handle macros, functions, or function
prototypes between the function or macro that is being documented and
its definition, so move these prototypes above the function that is
being documented.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:51 -07:00
Imre Deak
4c663cfc52 wait: fix false timeouts when using wait_event_timeout()
Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses.  However, at the moment this isn't guaranteed.  If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.

Fix this by returning at least 1 if the condition becomes true.  This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 ("sched: fix wait_for_completion_timeout() spurious
failure under heavy load").

Daniel said "We have 3 instances of this bug in drm/i915.  One case even
where we switch between the interruptible and not interruptible
wait_event_timeout variants, foolishly presuming they have the same
semantics.  I very much like this."

One such bug is reported at
  https://bugs.freedesktop.org/show_bug.cgi?id=64133

Signed-off-by: Imre Deak <imre.deak@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: "Paul E.  McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00
Alexandre Bounine
bc8fcfea18 rapidio: add enumeration/discovery start from user space
Add RapidIO enumeration/discovery start from user space.  User space
start allows to defer RapidIO fabric scan until the moment when all
participating endpoints are initialized avoiding mandatory synchronized
start of all endpoints (which may be challenging in systems with large
number of RapidIO endpoints).

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00
Alexandre Bounine
a11650e110 rapidio: make enumeration/discovery configurable
Systems that use RapidIO fabric may need to implement their own
enumeration and discovery methods which are better suitable for needs of
a target application.

The following set of patches is intended to simplify process of
introduction of new RapidIO fabric enumeration/discovery methods.

The first patch offers ability to add new RapidIO enumeration/discovery
methods using kernel configuration options.  This new configuration
option mechanism allows to select statically linked or modular
enumeration/discovery method(s) from the list of existing methods or use
external module(s).

This patch also updates the currently existing enumeration/discovery
code to be used as a statically linked or modular method.

The corresponding configuration option is named "Basic
enumeration/discovery" method.  This is the only one configuration
option available today but new methods are expected to be introduced
after adoption of provided patches.

The second patch address a long time complaint of RapidIO subsystem
users regarding fabric enumeration/discovery start sequence.  Existing
implementation offers only a boot-time enumeration/discovery start which
requires synchronized boot of all endpoints in RapidIO network.  While
it works for small closed configurations with limited number of
endpoints, using this approach in systems with large number of endpoints
is quite challenging.

To eliminate requirement for synchronized start the second patch
introduces RapidIO enumeration/discovery start from user space.

For compatibility with the existing RapidIO subsystem implementation,
automatic boot time enumeration/discovery start can be configured in by
specifying "rio-scan.scan=1" command line parameter if statically linked
basic enumeration method is selected.

This patch:

Rework to implement RapidIO enumeration/discovery method selection
combined with ability to use enumeration/discovery as a kernel module.

This patch adds ability to introduce new RapidIO enumeration/discovery
methods using kernel configuration options.  Configuration option
mechanism allows to select statically linked or modular
enumeration/discovery method from the list of existing methods or use
external modules.  If a modular enumeration/discovery is selected each
RapidIO mport device can have its own method attached to it.

The existing enumeration/discovery code was updated to be used as
statically linked or modular method.  This configuration option is named
"Basic enumeration/discovery" method.

Several common routines have been moved from rio-scan.c to make them
available to other enumeration methods and reduce number of exported
symbols.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24 16:22:50 -07:00