Expand task_subsys_state()'s rcu_dereference_check() to include the full
locking rule as documented in Documentation/cgroups/cgroups.txt by adding
a check for task->alloc_lock being held.
This fixes an RCU false positive when resuming from suspend. The warning
comes from freezer cgroup in cgroup_freezing_or_frozen().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The ftrace.h file contains several functions as macros when the
functions are disabled due to config options. This patch converts
most of them to static inlines.
There are two exceptions:
register_ftrace_function() and unregister_ftrace_function()
This is because their parameter "ops" must not be evaluated since
code using the function is allowed to #ifdef out the creation of
the parameter.
This also fixes an error caused by recent changes:
kernel/trace/trace_irqsoff.c: In function 'start_irqsoff_tracer':
kernel/trace/trace_irqsoff.c:571: error: expected expression before 'do'
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Using a single list for all userspace devices leads to a dead lock
on multiplexed buses in some circumstances (mux chip instantiated
from userspace). This is solved by using a separate list for each
bus segment.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Michael Lawnick <ml.lawnick@gmx.de>
This patch implements a simple Keypad driver that functions
as an I2C client. It handles key press events for keys
connected to TCA6416 I2C based IO expander.
Signed-off-by: Sriramakrishnan <srk@ti.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Add hlist_for_each_entry_rcu_bh() and
hlist_for_each_entry_continue_rcu_bh() macros, and use them in
ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps
warnings.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a cpumask affinity hint to the irq_desc structure,
along with a registration function and a read-only proc entry for each
interrupt.
This affinity_hint handle for each interrupt can be used by underlying
drivers that need a better mechanism to control interrupt affinity.
The underlying driver can register a cpumask for the interrupt, which
will allow the driver to provide the CPU mask for the interrupt to
anything that requests it. The intent is to extend the userspace
daemon, irqbalance, to help hint to it a preferred CPU mask to balance
the interrupt into.
[ tglx: Fixed compile warnings, added WARN_ON, made SMP only ]
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Cc: davem@davemloft.net
Cc: arjan@linux.jf.intel.com
Cc: bhutchings@solarflare.com
LKML-Reference: <20100430214445.3992.41647.stgit@ppwaskie-hc2.jf.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
virtio added mergeable buffers mode where 2 bytes of extra info is put
after vnet header but before actual data (tun does not need this data).
In hindsight, it would have been better to add the new info *before* the
packet: as it is, users need a lot of tricky code to skip the extra 2
bytes in the middle of the iovec, and in fact applications seem to get
it wrong, and only work with specific iovec layout. The fact we might
need to split iovec also means we might in theory overflow iovec max
size.
This patch adds a simpler way for applications to handle this,
and future proofs the interface against further extensions,
by making the size of the virtio net header configurable
from userspace. As a result, tun driver will simply
skip the extra 2 bytes on both input and output.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.
This patch renames softnet_data.total to softnet_data.processed: the number of
packets processed in uppper levels(IP stacks).
softnet_stat data is moved into softnet_data.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
include/linux/netdevice.h | 17 +++++++----------
net/core/dev.c | 26 ++++++++++++--------------
net/sched/sch_generic.c | 2 +-
3 files changed, 20 insertions(+), 25 deletions(-)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 6be8ac2f ("[NET]: uninline skb_pull, de-bloats a lot")
we uninlined skb_pull.
But in some critical paths it makes sense to inline this thing
and it helps performance significantly.
Create an skb_pull_inline() so that we can do this in a way that
serves also as annotation.
Based upon a patch by Eric Dumazet.
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.
RCU conversion is pretty much needed :
1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).
[Future patch will add a list anchor for wakeup coalescing]
2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().
3) Respect RCU grace period when freeing a "struct socket_wq"
4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"
5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep
6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.
7) Change all sk_has_sleeper() callers to :
- Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
- Use wq_has_sleeper() to eventually wakeup tasks.
- Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
8) sock_wake_async() is modified to use rcu protection as well.
9) Exceptions :
macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.
Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The breakpoint generic layer assumes that archs always know in advance
the static number of address registers available to host breakpoints
through the HBP_NUM macro.
However this is not true for every archs. For example Arm needs to get
this information dynamically to handle the compatiblity between
different versions.
To solve this, this patch proposes to drop the static HBP_NUM macro
and let the arch provide the number of available slots through a
new hw_breakpoint_slots() function. For archs that have
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS selected, it will be called once
as the number of registers fits for instruction and data breakpoints
together.
For the others it will be called first to get the number of
instruction breakpoint registers and another time to get the
data breakpoint registers, the targeted type is given as a
parameter of hw_breakpoint_slots().
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
There are two outstanding fashions for archs to implement hardware
breakpoints.
The first is to separate breakpoint address pattern definition
space between data and instruction breakpoints. We then have
typically distinct instruction address breakpoint registers
and data address breakpoint registers, delivered with
separate control registers for data and instruction breakpoints
as well. This is the case of PowerPc and ARM for example.
The second consists in having merged breakpoint address space
definition between data and instruction breakpoint. Address
registers can host either instruction or data address and
the access mode for the breakpoint is defined in a control
register. This is the case of x86 and Super H.
This patch adds a new CONFIG_HAVE_MIXED_BREAKPOINTS_REGS config
that archs can select if they belong to the second case. Those
will have their slot allocation merged for instructions and
data breakpoints.
The others will have a separate slot tracking between data and
instruction breakpoints.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Tag ptrace breakpoints with the exclude_kernel attribute set. This
will make it easier to set generic policies on breakpoints, when it
comes to ensure nobody unpriviliged try to breakpoint on the kernel.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
__print_hex() prints values in an array in hex (w/o '0x') (space separated)
EX) 92 33 32 f3 ee 4d
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>
Signed-off-by: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
For more clearance what the functions actually do,
usb_buffer_alloc() is renamed to usb_alloc_coherent()
usb_buffer_free() is renamed to usb_free_coherent()
They should only be used in code which really needs DMA coherency.
[added compatibility macros so we can convert things easier - gregkh]
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Pedro Ribeiro <pedrib@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
nfs: fix some issues in nfs41_proc_reclaim_complete()
NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
NFS: Fix an unstable write data integrity race
nfs: testing for null instead of ERR_PTR()
NFS: rsize and wsize settings ignored on v4 mounts
NFSv4: Don't attempt an atomic open if the file is a mountpoint
SUNRPC: Fix a bug in rpcauth_prune_expired
The struct device_node *of_node pointer is moving out of dev->archdata
and into the struct device proper. of_i2c.c needs to set the of_node
pointer before the device is registered. Since the i2c subsystem
doesn't allow 2 stage allocation and registration of i2c devices, the
of_node pointer needs to be passed via the i2c_board_info structure
so that it is set prior to registration.
This patch adds of_node to struct i2c_board_info (conditional on
CONFIG_OF), sets of_node in i2c_new_device(), and modifies of_i2c.c
to use the new parameter. The calling of dev_archdata_set_node()
from of_i2c will be removed in a subsequent patch when of_node is
removed from archdata and all users are converted over.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Currently, platforms using CONFIG_OF add a 'struct device_node *of_node'
to dev->archdata. However, with CONFIG_OF becoming generic for all
architectures, it makes sense for commonality to move it out of archdata
and into struct device proper.
This patch adds a struct device_node *of_node member to struct device
and updates all locations which currently write the device_node pointer
into archdata to also update dev->of_node. Subsequent patches will
modify callers to use the archdata location and ultimately remove
the archdata member entirely.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
CC: Michal Simek <monstr@monstr.eu>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: "David S. Miller" <davem@davemloft.net>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: Jeremy Kerr <jeremy.kerr@canonical.com>
CC: microblaze-uclinux@itee.uq.edu.au
CC: linux-kernel@vger.kernel.org
CC: linuxppc-dev@ozlabs.org
CC: sparclinux@vger.kernel.org
This patch makes unflatten_device_tree() safe to call from any arch
setup code with the following changes:
- Make sure initial_boot_params actually points to a device tree blob
before unflattening
- Make sure the initial_boot_params->magic field is correct
- If CONFIG_OF_FLATTREE is not set, then make unflatten_device_tree()
an empty static inline function.
This patch also adds some additional debug output to the top of
unflatten_device_tree().
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Changes:
This is a complete re-write of the socket layer. Making the socket
implementation more aligned with the other socket layers and using more
of the support functions available in sock.c. Lots of code is copied
from af_unix (and some from af_irda).
Non-blocking mode should be working as well.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add bio_batch helper primitive. This is rather generic primitive
for submitting/waiting a complex request which consists of several
bios.
- blkdev_issue_zeroout() generate number of zero filed write bios.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
coda: move backing-dev.h kernel include inside __KERNEL__
mtd: ensure that bdi entries are properly initialized and registered
Move mtd_bdi_*mappable to mtdcore.c
btrfs: convert to using bdi_setup_and_register()
Catch filesystems lacking s_bdi
drbd: Terminate a connection early if sending the protocol fails
drbd: fix memory leak
Fix JFFS2 sync silent failure
smbfs: add bdi backing to mount session
ncpfs: add bdi backing to mount session
exofs: add bdi backing to mount session
ecryptfs: add bdi backing to mount session
coda: add bdi backing to mount session
cifs: add bdi backing to mount session
afs: add bdi backing to mount session.
9p: add bdi backing to mount session
bdi: add helper function for doing init and register of a bdi for a file system
block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer
Most of the LSM common audit work uses LSM_AUDIT_DATA_* for the naming.
This was not so for LSM_AUDIT_NO_AUDIT which means the generic initializer
cannot be used. This patch just renames the flag so the generic
initializer can be used.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
contention when RPS is enabled.
Note: in the worst case, the number of packets in a softnet_data may
be double of netdev_max_backlog.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
reimplement softnet_data.output_queue as a FIFO queue to keep the
fairness among the qdiscs rescheduled.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
----
include/linux/netdevice.h | 1 +
net/core/dev.c | 22 ++++++++++++----------
2 files changed, 13 insertions(+), 10 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
This is used to configure APs to not bridge traffic between connected stations.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This change corrects the fact that we were not reporting Gen2 link speeds
when we were in fact connected at Gen2 rates.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When performing a non-consuming read, a synchronize_sched() is
performed once for every cpu which is actively tracing.
This is very expensive, and can make it take several seconds to open
up the 'trace' file with lots of cpus.
Only one synchronize_sched() call is actually necessary. What is
desired is for all cpus to see the disabling state change. So we
transform the existing sequence:
for_each_cpu() {
ring_buffer_read_start();
}
where each ring_buffer_start() call performs a synchronize_sched(),
into the following:
for_each_cpu() {
ring_buffer_read_prepare();
}
ring_buffer_read_prepare_sync();
for_each_cpu() {
ring_buffer_read_start();
}
wherein only the single ring_buffer_read_prepare_sync() call needs to
do the synchronize_sched().
The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
memory and increments ->record_disabled.
In the second phase, ring_buffer_read_prepare_sync() makes sure this
->record_disabled state is visible fully to all cpus.
And in the final third phase, the ring_buffer_read_start() calls reset
the 'iter' objects allocated in the first phase since we now know that
none of the cpus are adding trace entries any more.
This makes openning the 'trace' file nearly instantaneous on a
sparc64 Niagara2 box with 128 cpus tracing.
Signed-off-by: David S. Miller <davem@davemloft.net>
LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add function graph output to irqsoff tracer.
The graph output is enabled by setting new 'display-graph' trace option.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1270227683-14631-4-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There has been quite a confusion in userspace about
XT_FUNCTION_MAXNAMELEN; because struct xt_entry_match used MAX-1,
userspace would have to do an awkward MAX-2 for maximum length
checking (due to '\0'). This patch adds a new define that matches the
definition of XT_TABLE_MAXNAMELEN - being the size of the actual
struct member, not one off.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add suspend/resume hooks for HID drivers so these can do some
additional state adjustment when device gets suspended/resumed.
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Currently, device claiming for exclusive open is done after low level
open - disk->fops->open() - has completed successfully. This means
that exclusive open attempts while a device is already exclusively
open will fail only after disk->fops->open() is called.
cdrom driver issues commands during open() which means that O_EXCL
open attempt can unintentionally inject commands to in-progress
command stream for burning thus disturbing burning process. In most
cases, this doesn't cause problems because the first command to be
issued is TUR which most devices can process in the middle of burning.
However, depending on how a device replies to TUR during burning,
cdrom driver may end up issuing further commands.
This can't be resolved trivially by moving bd_claim() before doing
actual open() because that means an open attempt which will end up
failing could interfere other legit O_EXCL open attempts.
ie. unconfirmed open attempts can fail others.
This patch resolves the problem by introducing claiming block which is
started by bd_start_claiming() and terminated either by bd_claim() or
bd_abort_claiming(). bd_claim() from inside a claiming block is
guaranteed to succeed and once a claiming block is started, other
bd_start_claiming() or bd_claim() attempts block till the current
claiming block is terminated.
bd_claim() can still be used standalone although now it always
synchronizes against claiming blocks, so the existing users will keep
working without any change.
blkdev_open() and open_bdev_exclusive() are converted to use claiming
blocks so that exclusive open attempts from these functions don't
interfere with the existing exclusive open.
This problem was discovered while investigating bko#15403.
https://bugzilla.kernel.org/show_bug.cgi?id=15403
The burning problem itself can be resolved by updating userspace
probing tools to always open w/ O_EXCL.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Matthias-Christian Ott <ott@mirix.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Our offset handling becomes even a little more hackish now. For some reason I
do not understand all offsets as inrelative. It assumes base offset is 0x1000
but it will work for now as we make offsets relative anyway by removing base
0x1000. Should be cleaner however.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Attempting to read registers that don't exist on the SSB bus can cause
hangs on some boxes. At least some b43 devices are 'in the wild' that
don't have SPROMs at all. When the SSB bus support loads, it attempts
to read these (non-existant) SPROMs and causes hard hangs on the box --
no console output, etc.
This patch adds some intelligence to determine whether or not the SPROM
is present before attempting to read it. This avoids those hard hangs
on those devices with no SPROM attached to their SSB bus. The
SSB-attached devices (e.g. b43, et al.) won't work, but at least the box
will survive to test further patches. :-)
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Michael Buesch <mb@bu3sch.de>
Decouple rtnetlink address families from real address families in socket.h to
be able to add rtnetlink interfaces to code that is not a real address family
without increasing AF_MAX/NPROTO.
This will be used to add support for multicast route dumping from all tables
as the proc interface can't be extended to support anything but the main table
without breaking compatibility.
This partialy undoes the patch to introduce independant families for routing
rules and converts ipmr routing rules to a new rtnetlink family. Similar to
that patch, values up to 127 are reserved for real address families, values
above that may be used arbitrarily.
Signed-off-by: Patrick McHardy <kaber@trash.net>