Simplify the timespec to nsec/usec conversions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Simplify the only user of this data by removing the timespec
conversion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
A lot of code converts either timespecs or ktime_t to
nanoseconds. Provide helper functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
ktime based conversion function to map a monotonic time stamp to a
different CLOCK.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Provide a helper function which lets us implement ktime_t based
interfaces for real, boot and tai clocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The ktime_t based interfaces are used a lot in performance critical
code pathes. Add ktime_t based data so the interfaces don't have to
convert from the xtime/timespec based data.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
struct timekeeper is quite badly sorted for the hot readout path. Most
time access functions need to load two cache lines.
Rearrange it so ktime_get() and getnstimeofday() are happy with a
single cache line.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
To convert callers of the core code to timespec64 we need to provide
the proper interfaces.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Right now we have time related prototypes in 3 different header
files. Move it to a single timekeeping header file and move the core
internal stuff into a core private header.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Convert the core timekeeping logic to use timespec64s. This moves the
2038 issues out of the core logic and into all of the accessor
functions.
Future changes will need to push the timespec64s out to all
timekeeping users, but that can be done interface by interface.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Helper and conversion functions for timespec64.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Define the timespec64 structure and standard helper functions.
[ tglx: Make it 32bit only. 64bit really can map timespec to timespec64 ]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In order to support dates past 2038 on 32bit systems, ktime_set()
needs to handle 64bit second values.
[ tglx: Removed the BITS_PER_LONG check ]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
With the plain nanoseconds based ktime_t we can simply use
ktime_divns() instead of going through loops and hoops of
timespec/timeval conversion.
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The non-scalar ktime_t implementation is basically a timespec
which has to be changed to support dates past 2038 on 32bit
systems.
This patch removes the non-scalar ktime_t implementation, forcing
the scalar s64 nanosecond version on all architectures.
This may have additional performance overhead on some 32bit
systems when converting between ktime_t and timespec structures,
however the majority of 32bit systems (arm and i386) were already
using scalar ktime_t, so no performance regressions will be seen
on those platforms.
On affected platforms, I'm open to finding optimizations, including
avoiding converting to timespecs where possible.
[ tglx: We can now cleanup the ktime_t.tv64 mess, but thats a
different issue and we can throw a coccinelle script at it ]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Rather then having two similar but totally different implementations
that provide timekeeping state to the hrtimer code, try to unify the
two implementations to be more simliar.
Thus this clarifies ktime_get_update_offsets to
ktime_get_update_offsets_now and changes get_xtime... to
ktime_get_update_offsets_tick.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Pull clockevents from Danel Lezcano:
* New timer driver for the Cirrus Logic CLPS711X SoC
* New driver for the Mediatek SoC which includes:
* A new function for of, acked by Rob Herring
* Move the PXA driver to drivers/clocksource, add DT support
* Optimization of the exynos_mct driver
* DT support for the renesas timers family.
* Some Kconfig and driver fixlets
A call to of_iomap does not request the memory region. This patch adds the
function of_io_request_and_map which requests the memory region before
mapping it.
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Rob Herring <robh@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The read() of timerfd files allows to fetch the number of timer ticks
while there is no way to set it back from userspace.
To restore the timer's state as it was at checkpoint moment we need
a path to bring @ticks back. Initially I thought about writing ticks
back via write() interface but it seems such API is somehow obscure.
Instead implement timerfd_ioctl() method with TFD_IOC_SET_TICKS
command which allows to adjust @ticks into non-zero value waking
up the waiters.
I wrapped code with CONFIG_CHECKPOINT_RESTORE which can be
dropped off if there users except c/r camp appear.
v2 (by akpm@):
- Use define timerfd_ioctl NULL for non c/r config
v3:
- Use copy_from_user for @ticks fetching since
not all arch support get_user for 8 byte argument
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christopher Covington <cov@codeaurora.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Link: http://lkml.kernel.org/r/20140715215703.285617923@openvz.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTwvRvAAoJEHm+PkMAQRiG8CoIAJucWkj+MJFFoDXjR9hfI8U7
/WeQLJP0GpWGMXd2KznX9epCuw5rsuaPAxCy1HFDNOa7OtNYacWrsIhByxOIDLwL
YjDB9+fpMMPFWsr+LPJa8Ombh/TveCS77w6Pt5VMZFwvIKujiNK/C3MdxjReH5Gr
iTGm8x7nEs2D6L2+5sQVlhXot/97phxIlBSP6wPXEiaztNZ9/JZi905Xpgq+WU16
ZOA8MlJj1TQD4xcWyUcsQ5REwIOdQ6xxPF00wv/12RFela+Puy4JLAilnV6Mc12U
fwYOZKbUNBS8rjfDDdyX3sljV1L5iFFqKkW3WFdnv/z8ZaZSo5NupWuavDnifKw=
=6Q8o
-----END PGP SIGNATURE-----
Merge tag 'v3.16-rc5' into timers/core
Reason: Bring in upstream modifications, so the pending changes which
depend on them can be queued.
Pull cgroup fixes from Tejun Heo:
"Mostly fixes for the fallouts from the recent cgroup core changes.
The decoupled nature of cgroup dynamic hierarchy management
(hierarchies are created dynamically on mount but may or may not be
reused once unmounted depending on remaining usages) led to more
ugliness being added to kernfs.
Hopefully, this is the last of it"
* 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: break kernfs active protection in cpuset_write_resmask()
cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
kernfs: introduce kernfs_pin_sb()
cgroup: fix mount failure in a corner case
cpuset,mempolicy: fix sleeping function called from invalid context
cgroup: fix broken css_has_online_children()
Pull percpu fix from Tejun Heo:
"One patch to fix a typo in percpu section name. Given how long the
bug has been there and that there hasn't been any report of brekage,
it's unlikely to cause actual issues"
* 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
core: fix typo in percpu read_mostly section
Here's a round of USB bugfixes, quirk additions, and new device ids for
3.16-rc4. Nothing major in here at all, just a bunch of tiny changes.
All have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlO1+wwACgkQMUfUDdst+ylcuACfbIytKMfxNcmq2xG6AExXbVSl
IQAAnAvv4QjqzbtSVWaNZB/Kkxzh4xeN
=Igpa
-----END PGP SIGNATURE-----
Merge tag 'usb-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB bugfixes from Greg KH:
"Here's a round of USB bugfixes, quirk additions, and new device ids
for 3.16-rc4. Nothing major in here at all, just a bunch of tiny
changes. All have been in linux-next with no reported issues"
* tag 'usb-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
usb: chipidea: udc: delete td from req's td list at ep_dequeue
usb: Kconfig: make EHCI_MSM selectable for QCOM SOCs
usb-storage/SCSI: Add broken_fua blacklist flag
usb: musb: dsps: fix the base address for accessing the mode register
tools: ffs-test: fix header values endianess
usb: phy: msm: Do not do runtime pm if the phy is not idle
usb: musb: Ensure that cppi41 timer gets armed on premature DMA TX irq
usb: gadget: gr_udc: Fix check for invalid number of microframes
usb: musb: Fix panic upon musb_am335x module removal
usb: gadget: f_fs: resurect usb_functionfs_descs_head structure
Revert "tools: ffs-test: convert to new descriptor format fixing compilation error"
xhci: Fix runtime suspended xhci from blocking system suspend.
xhci: clear root port wake on bits if controller isn't wake-up capable
xhci: correct burst count field for isoc transfers on 1.0 xhci hosts
xhci: Use correct SLOT ID when handling a reset device command
MAINTAINERS: update e-mail address
usb: option: add/modify Olivetti Olicard modems
USB: ftdi_sio: fix null deref at port probe
MAINTAINERS: drop two usb-serial subdriver entries
USB: option: add device ID for SpeedUp SU9800 usb 3g modem
...
Well, one drivercore fix for kernfs to resolve a reported issue with
sysfs files being updated from atomic contexts, and another lz4 bugfix
for testing potential buffer overflows.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlO1/FEACgkQMUfUDdst+ynRPACfWcssJKICc2N7g9/0XXGVTjVT
PwwAnjQ8bjOfu6i2z/lViLtZGjOnzKor
=qtjB
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Well, one drivercore fix for kernfs to resolve a reported issue with
sysfs files being updated from atomic contexts, and another lz4 bugfix
for testing potential buffer overflows"
* tag 'driver-core-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
lz4: add overrun checks to lz4_uncompress_unknownoutputsize()
kernfs: kernfs_notify() must be useable from non-sleepable contexts
The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values. That is
very much part of why it's faster than 'iret'.
Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.
However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'. Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.
Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d911d98748 ("kernfs: make kernfs_notify() trigger inotify events
too") added fsnotify triggering to kernfs_notify() which requires a
sleepable context. There are already existing users of
kernfs_notify() which invoke it from an atomic context and in general
it's silly to require a sleepable context for triggering a
notification.
The following is an invalid context bug triggerd by md invoking
sysfs_notify() from IO completion path.
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
2 locks held by swapper/1/0:
#0: (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
#1: (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
irq event stamp: 33518
hardirqs last enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
softirqs last enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
Call Trace:
<IRQ> [<ffffffff81807b4c>] dump_stack+0x4d/0x66
[<ffffffff810d4f14>] __might_sleep+0x184/0x240
[<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
[<ffffffff812d76a0>] kernfs_notify+0x90/0x150
[<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
[<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
[<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
[<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
[<ffffffff813acb8b>] bio_endio+0x6b/0xa0
[<ffffffff813b46c4>] blk_update_request+0x94/0x420
[<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
[<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
[<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
[<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
[<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
[<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
[<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
[<ffffffff8111647d>] handle_irq_event+0x3d/0x60
[<ffffffff81119436>] handle_edge_irq+0x66/0x130
[<ffffffff8101c3e4>] handle_irq+0x84/0x150
[<ffffffff818146ad>] do_IRQ+0x4d/0xe0
[<ffffffff818122f2>] common_interrupt+0x72/0x72
<EOI> [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
[<ffffffff81025454>] default_idle+0x24/0x230
[<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
[<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
[<ffffffff8104df1b>] start_secondary+0x25b/0x300
This patch fixes it by punting the notification delivery through a
work item. This ends up adding an extra pointer to kernfs_elem_attr
enlarging kernfs_node by a pointer, which is not ideal but not a very
big deal either. If this turns out to be an actual issue, we can move
kernfs_elem_attr->size to kernfs_node->iattr later.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The name, channel_offset, timer_bit, clockevent_rating and
clocksource_rating members are unused. Remove them.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Simon Horman <horms+renesas@verge.net.au>
This fixes a typo that named the read_mostly section of percpu as
readmostly. It works fine with SMP because the linker script specifies
.data..percpu..readmostly. However, UP kernel builds don't have percpu
sections defined and the non-percpu version of the section is called
data..read_mostly, so .data..readmostly will float around and may break
things unexpectedly.
Looking at the original change that introduced data..percpu..readmostly
(commit c957ef2c59), it looks like this
was the original intention.
Tested: Built UP kernel and confirmed the sections got merged.
- Before the patch:
$ objdump -h vmlinux.o | grep '\.data\.\.read.*mostly'
38 .data..read_mostly 00004418 0000000000000000 0000000000000000 00431ac0 2**6
50 .data..readmostly 00000014 0000000000000000 0000000000000000 00444000 2**3
- After the patch:
$ objdump -h vmlinux.o | grep '\.data\.\.read.*mostly'
38 .data..read_mostly 00004438 0000000000000000 0000000000000000 00431ac0 2**6
Signed-off-by: Zhengyu He <hzy@google.com>
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Some buggy JMicron USB-ATA bridges don't know how to translate the FUA
bit in READs or WRITEs. This patch adds an entry in unusual_devs.h
and a blacklist flag to tell the sd driver not to use FUA.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Michael Büsch <m@bues.ch>
Tested-by: Michael Büsch <m@bues.ch>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kernfs_pin_sb() tries to get a refcnt of the superblock.
This will be used by cgroupfs.
v2:
- make kernfs_pin_sb() return the superblock.
- drop kernfs_drop_sb().
tj: Updated the comment a bit.
[ This is a prerequisite for a bugfix. ]
Cc: <stable@vger.kernel.org> # 3.15
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull SCSI target fixes from Nicholas Bellinger:
"Mostly minor fixes this time around. The highlights include:
- iscsi-target CHAP authentication fixes to enforce explicit key
values (Tejas Vaykole + rahul.rane)
- fix a long-standing OOPs in target-core when a alua configfs
attribute is accessed after port symlink has been removed.
(Sebastian Herbszt)
- fix a v3.10.y iscsi-target regression causing the login reject
status class/detail to be ignored (Christoph Vu-Brugier)
- fix a v3.10.y iscsi-target regression to avoid rejecting an
existing ITT during Data-Out when data-direction is wrong (Santosh
Kulkarni + Arshad Hussain)
- fix a iscsi-target related shutdown deadlock on UP kernels (Mikulas
Patocka)
- fix a v3.16-rc1 build issue with vhost-scsi + !CONFIG_NET (MST)"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iscsi-target: fix iscsit_del_np deadlock on unload
iovec: move memcpy_from/toiovecend to lib/iovec.c
iscsi-target: Avoid rejecting incorrect ITT for Data-Out
tcm_loop: Fix memory leak in tcm_loop_submission_work error path
iscsi-target: Explicily clear login response PDU in exception path
target: Fix left-over se_lun->lun_sep pointer OOPs
iscsi-target; Enforce 1024 byte maximum for CHAP_C key value
iscsi-target: Convert chap_server_compute_md5 to use kstrtoul
ERROR: "memcpy_fromiovecend" [drivers/vhost/vhost_scsi.ko] undefined!
commit 9f977ef7b6
vhost-scsi: Include prot_bytes into expected data transfer length
in target-pending makes drivers/vhost/scsi.c call memcpy_fromiovecend().
This function is not available when CONFIG_NET is not enabled.
socket.h already includes uio.h, so no callers need updating.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pull block fixes from Jens Axboe:
"A small collection of fixes/changes for the current series. This
contains:
- Removal of dead code from Gu Zheng.
- Revert of two bad fixes that went in earlier in this round, marking
things as __init that were not purely used from init.
- A fix for blk_mq_start_hw_queue() using the __blk_mq_run_hw_queue(),
which could place us wrongly. Make it use the non __ variant,
which handles cases where we are called from the wrong CPU set.
From me.
- A fix for drbd, which allocates discard requests without room for
the SCSI payload. From Lars Ellenberg.
- A fix for user-after-free in the blkcg code from Tejun.
- Addition of limiting gaps in SG lists, if the hardware needs it.
This is the last pre-req patch for blk-mq to enable the full NVMe
conversion. Could wait until 3.17, but it's simple enough so would
be nice to have everything we need for the NVMe port in the 3.17
release. From me"
* 'for-linus' of git://git.kernel.dk/linux-block:
drbd: fix NULL pointer deref in blk_add_request_payload
blk-mq: blk_mq_start_hw_queue() should use blk_mq_run_hw_queue()
block: add support for limiting gaps in SG lists
bio: remove unused macro bip_vec_idx()
Revert "block: add __init to elv_register"
Revert "block: add __init to blkcg_policy_register"
blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt an atomic_t
floppy: format block0 read error message properly
blkdev_read_iter() wants to cap the iov_iter by the amount of data
remaining to the end of device. That's what iov_iter_truncate() is for
(trim iter->count if it's above the given limit). So far, so good, but
the argument of iov_iter_truncate() is size_t, so on 32bit boxen (in
case of a large device) we end up with that upper limit truncated down
to 32 bits *before* comparing it with iter->count.
Easily fixed by making iov_iter_truncate() take 64bit argument - it does
the right thing after such change (we only reach the assignment in there
when the current value of iter->count is greater than the limit, i.e.
for anything that would get truncated we don't reach the assignment at
all) and that argument is not the new value of iter->count - it's an
upper limit for such.
The overhead of passing u64 is not an issue - the thing is inlined, so
callers passing size_t won't pay any penalty.
Reported-and-tested-by: Theodore Tso <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Fix crash in ipvs tot_stats estimator, from Julian Anastasov.
2) Fix OOPS in nf_nat on netns removal, from Florian Westphal.
3) Really really really fix locking issues in slip and slcan tty write
wakeups, from Tyler Hall.
4) Fix checksum offloading in fec driver, from Fugang Duan.
5) Off by one in BPF instruction limit test, from Kees Cook.
6) Need to clear all TSO capability flags when doing software TSO in
tg3 driver, from Prashant Sreedharan.
7) Fix memory leak in vlan_reorder_header() error path, from Li
RongQing.
8) Fix various bugs in xen-netfront and xen-netback multiqueue support,
from David Vrabel and Wei Liu.
9) Fix deadlock in cxgb4 driver, from Li RongQing.
10) Prevent double free of no-cache DST entries, from Eric Dumazet.
11) Bad csum_start handling in skb_segment() leads to crashes when
forwarding, from Tom Herbert.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
net: fix setting csum_start in skb_segment()
ipv4: fix dst race in sk_dst_get()
net: filter: Use kcalloc/kmalloc_array to allocate arrays
trivial: net: filter: Change kerneldoc parameter order
trivial: net: filter: Fix typo in comment
net: allwinner: emac: Add missing free_irq
cxgb4: use dev_port to identify ports
xen-netback: bookkeep number of active queues in our own module
tg3: Change nvram command timeout value to 50ms
cxgb4: Not need to hold the adap_rcu_lock lock when read adap_rcu_list
be2net: fix qnq mode detection on VFs
of: mdio: fixup of_phy_register_fixed_link parsing of new bindings
at86rf230: fix irq setup
net: phy: at803x: fix coccinelle warnings
net/mlx4_core: Fix the error flow when probing with invalid VF configuration
tulip: Poll link status more frequently for Comet chips
net: huawei_cdc_ncm: increase command buffer size
drivers: net: cpsw: fix dual EMAC stall when connected to same switch
xen-netfront: recreate queues correctly when reconnecting
xen-netfront: fix oops when disconnected from backend
...
Another restriction inherited for NVMe - those devices don't support
SG lists that have "gaps" in them. Gaps refers to cases where the
previous SG entry doesn't end on a page boundary. For NVMe, all SG
entries must start at offset 0 (except the first) and end on a page
boundary (except the last).
Signed-off-by: Jens Axboe <axboe@fb.com>
Macro bip_vec_idx() was used by bio integrity originally, but no longer
used now. So remove it.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than a predefined period to time, without giving
other tasks a chance to run.
Currently, upon detection of this condition by the per-cpu watchdog
task, debug information (including a stack trace) is sent to the system
log.
On some occasions, we have observed that the "victim" rather than the
actual "culprit" (i.e. the owner/holder of the contended resource) is
reported to the user. Often this information has proven to be
insufficient to assist debugging efforts.
To avoid loss of useful debug information, for architectures which
support NMI, this patch makes it possible to improve soft lockup
reporting. This is accomplished by issuing an NMI to each cpu to obtain
a stack trace.
If NMI is not supported we just revert back to the old method. A sysctl
and boot-time parameter is available to toggle this feature.
[dzickus@redhat.com: add CONFIG_SMP in certain areas]
[akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
[mq@suse.cz: fix warning]
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jan Moskyto Matejka <mq@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
routine when one wants to avoid capturing a back trace for current. For
instance if one was previously captured recently.
This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
an NMI to every cpu but current and capture a back trace accordingly.
Patch x86 and sparc to support new routine.
[dzickus@redhat.com: add stub in #else clause]
[dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
[sfr@canb.auug.org.au: undo C99ism]
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To allow filtering of huge pages, makedumpfile must be able to identify
them in the dump. This can be done by checking the appropriate page
flag, so communicate its value to makedumpfile through the VMCOREINFO
interface.
There's only one small catch. Depending on how many page flags are
available on a given architecture, this bit can be called PG_head or
PG_compound.
I sent a similar patch back in 2012, but Eric Biederman did not like
using an #ifdef. So, this time I'm adding a common symbol
(PG_head_mask) instead.
See https://lkml.org/lkml/2012/11/28/91 for the previous version.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In lowres mode, hrtimers are serviced by the tick instead of a clock
event. Now it works well as long as the tick stays periodic but we
must also make sure that the hrtimers are serviced in dynticks mode.
Part of that job consist in kicking a dynticks hrtimer target in order
to make it reconsider the next tick to schedule to correctly handle the
hrtimer's expiring time. And that part isn't handled by the hrtimers
subsystem.
To prepare for fixing this, we need __hrtimer_start_range_ns() to be
able to resolve the CPU target associated to a hrtimer's object
'cpu_base' so that the kick can be centralized there.
So lets store it in the 'struct hrtimer_cpu_base' to resolve the CPU
without overhead. It is set once at CPU's online notification.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1403393357-2070-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This reverts commit b5097e956a.
The original commit is buggy, we do use the registration functions
at runtime, for instance when loading IO schedulers through sysfs.
Reported-by: Damien Wyart <damien.wyart@gmail.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTpiwZAAoJEAAOaEEZVoIVplsP/383a9q3eXonbsi+Ea8CGbRl
tdjjVhM1OY4NZYFAoulILDt3HqPTC6MBnqKlHz+BuMziwd/1+3w8S4E7IEwm/KtM
ghNYX8ct5Bf1nc5QEdDmwf4PX48QbRwTuT1uIcXaJ+KtxTzI9qN7mnjRN91TUtq4
WRGvOl0AsGCVq8YqxjztgD3TYbu7AG/72Em+DE9f81PTArAPTo2ySc3gxPuJJAsg
G1x46Gx46sfqFX2FY4SPsXen+J/67Og67y6eBawxnT2Bp6ZGDuW+jyPRmkhf0yth
pWAtkUi3XmEe6kk6GHiICsS0Yn0RG4jbz39+Ja+X7jibQVJ8Iz6b+Optw9RNQwYt
jDWHKFS2AaL/CDejHYOQ1shHcozpRojtIbDLIZ9vTNTQ2r5cdaBvkXMmQzdoktmN
wQtQ9AzBl8fHOFOQeCAwd/ZfCLIotvLoLds3K/CSqmpsyK2+9IyriQLKKZ2xm6Iu
8+UUspGQcNVwcMP6YWtI6G+u58/mVanmK6dtpiyXrncZLAfU4H7ETL2IPu8jJTbv
kTFCOJtXQzNZa5Xqur1hIewOG9/RlvAZAnii0Ghc3nXTWCCeNeI8re3jw4g9KdRv
33t4sYfJld8LQ1NSMqIDyAs+fvytmGurYt+uhVpb58G/4CqBLNpmmIGIQ6LFLMfs
75FQnbAezrD0H/JyAHUk
=AJD3
-----END PGP SIGNATURE-----
Merge tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux
Pull file locking fixes from Jeff Layton:
"File locking related bugfixes
Nothing too earth-shattering here. A fix for a potential regression
due to a patch in pile #1, and the addition of a memory barrier to
prevent a race condition between break_deleg and generic_add_lease"
* tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux:
locks: set fl_owner for leases back to current->files
locks: add missing memory barrier in break_deleg
Add a notify callback to inform phy drivers when the core is about to
do its link adjustment. No change for drivers that do not implement
this callback.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>