This set of changes contains a bunch of cleanups and minor fixes along
with some new clocks, mainly on Tegra210, in preparation for supporting
DisplayPort and HDMI 2.0.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXI3fsAAoJEN0jrNd/PrOho7sP/3W87IOP5Ga+0CAuiBfl3oyx
99nJMzloiHSSe9aH1w9CZJEXr47iCmfN7yoXp0xCx0CAT/6lTlnzIE9cpblvxJLY
GXwxpIHDFWndmwvnBTaw5YN8C/DjfgE8KPIYArE9yvP0X1lnU0IdbcMXT5Gu31ny
9Sh9csgZNcKJyTJ4VgzVJkpHWjE5/ngcud0JfuUiQNc3VsJInGroYxdE16WeYDPq
zP+5LXPEZAJu/GJPFBtySnhaBcr6Nk/HQ4X1M8/fC5ocA4TfxWZTHqXg6RyHIgJM
flM69aeh0uBlK5TEX99W7OiOTpXog8HSukrjMw53lJT69uitxRkt4RLm5PStp4Gr
fMClyJzujF2FTO3+TXvLnyj0MwrFvmQHboFqDJUBFJ4XFwZAZH43v9dXVFlGltib
qThUiSyzljjeco6XPRLTkHNjntA3rwixCb4Lq2J8MHgf7O2DSQuCBtuBswVXKyQQ
JfQAOTPNzzCXA7NWJ34pbNaM6Ex0pPJFGQs+5Mpzo5XAXc9od/79+4ht4tJSZ+hb
Su5oRlv0+EU44MddKOBX5FPBbBnI6lBl8fADiM+ld9oDdlHHgkqMxlradhjIWY0Q
9uz+qjUu6VrmvkCxNe6s0yepwHYoeBpbrmRIj2ZiokXcH4kyltpL8XpOvuWhm4pP
tYmTNJJmCb/hd6MZhiKv
=HcjN
-----END PGP SIGNATURE-----
Merge tag 'tegra-for-4.7-clk' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into clk-next
Pull tegra clk driver changes from Thierry Reding:
This set of changes contains a bunch of cleanups and minor fixes along
with some new clocks, mainly on Tegra210, in preparation for supporting
DisplayPort and HDMI 2.0.
* tag 'tegra-for-4.7-clk' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
clk: tegra: dfll: Reformat CVB frequency table
clk: tegra: dfll: Properly clean up on failure and removal
clk: tegra: dfll: Make code more comprehensible
clk: tegra: dfll: Reference CVB table instead of copying data
clk: tegra: dfll: Update kerneldoc
clk: tegra: Fix PLL_U post divider and initial rate on Tegra30
clk: tegra: Initialize PLL_C to sane rate on Tegra30
clk: tegra: Fix pllre Tegra210 and add pll_re_out1
clk: tegra: Add sor_safe clock
clk: tegra: dpaux and dpaux1 are fixed factor clocks
clk: tegra: Add dpaux1 clock
clk: tegra: Use correct parent for dpaux clock
clk: tegra: Add fixed factor peripheral clock type
clk: tegra: Special-case mipi-cal parent on Tegra114
clk: tegra: Remove trailing blank line
clk: tegra: Constify peripheral clock registers
clk: tegra: Add interface to enable hardware control of SATA/XUSB PLLs
On Tegra210, hardware control of the SATA and XUSB pad PLLs must be
done during the UPHY enable sequence rather than the PLLE enable
sequence. Export functions to do this so that hardware control can
be enabled from the XUSB padctl driver.
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Rhyland Klein <rklein@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The composite clock didn't have any unregistration function, which forced
us to use clk_unregister directly on it.
While it was already not great from an API point of view, it also meant
that we were leaking the clk_composite structure allocated in
clk_register_composite.
Add a clk_unregister_composite function to fix this.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
- Export the CPG/MSSR and CPG/MSTP attach/detach_dev callbacks, so
they can be called by the R-Car SYSC PM Domain driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXF2zRAAoJEEgEtLw/Ve77lQkP/0ePeLyPLr2kqmMWe3+5MaVe
suKU0i3GAV1fQJpl6Z3CibiDdjXrT9/7S+VVDaBme3VHwlE22QvBD0f2yn4RyL11
f5wJIyCf1TXMdfsL0MSnX+6efcfwfdDDgUe/aoMj2ybOwOfFwIbKEPDhVjkMeqQo
OcbIG8qBN0Hh237Lkb70SC0Oc27djSPpdoxd5+3B5ytpHo1q0ER1qQfdwy0MmG2J
vtcXnZxHoigxk6uanH8BwpdYbFhhRA+KRmyxfS4MYmOVdaVvUZPCddbLBDfcE6d9
xYpno+7TKVYni9rwd48Vq0EASVkRSyBJplCfLYOWCieIvrdHbFVpxGFTeW+uxzBS
b+mRKnuM7fGm8j5At+i7bVbKVAXVvWaRr9K4j4E/XukT2B1/dNfRoc0HEBJzNsnO
+S/X2JoHDyeiGva8cESlnQgO2rE9YiktxMVPX+Ey0k0ab7UigM6SzUle7wt1PVOg
PfNLfJUkGxv77GpPKjhb/BbhImBmMIBLeHKOwMKM8WFCeLUjk6h8CsPFmsFPgAE7
rTk87G1bMWBjjpk0d0nTggIStErF91yTZW+4kSMK9LRg+1//b+NO3xe9kQABLcmX
5qaZoQDRqHiq+6loqsOW0xzvMjAYOxUmvE0oeqLamX20Qg5SzZmwcMtC+JceWSFG
GE5VOJ8cBWcaBLFUmCmq
=jmQf
-----END PGP SIGNATURE-----
Merge tag 'clk-renesas-for-v4.7-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into clk-next
clk: renesas: R-Car SYSC PM Domain Preparation
- Export the CPG/MSSR and CPG/MSTP attach/detach_dev callbacks, so
they can be called by the R-Car SYSC PM Domain driver.
* tag 'clk-renesas-for-v4.7-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers:
clk: renesas: cpg-mssr: Export cpg_mssr_{at,de}tach_dev()
clk: renesas: mstp: Provide dummy attach/detach_dev callbacks
clk: renesas: Provide Kconfig symbols for CPG/MSSR and CPG/MSTP support
The R-Car SYSC PM Domain driver has to power manage devices in power
areas using clocks. To reuse code and to share knowledge of clocks
suitable for power management, this is ideally done through the existing
cpg_mssr_attach_dev() and cpg_mssr_detach_dev() callbacks.
Hence these callbacks can no longer rely on their "domain" parameter
pointing to the CPG/MSSR Clock Domain. To handle this, keep a pointer to
the clock domain in a static variable. cpg_mssr_attach_dev() has to
support probe deferral, as the R-Car SYSC PM Domain may be initialized,
and devices may be added to it, before the CPG/MSSR Clock Domain is
initialized.
Dummy callbacks are provided for the case where CPG/MSTP support is not
included, so the rcar-sysc driver won't have to care about this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Provide dummy cpg_mstp_{at,de}tach_dev() PM Domain callbacks if CPG/MSTP
support is not included, so the rcar-sysc driver won't have to care
about this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Add registration APIs in the clk fixed-rate code to return struct
clk_hw pointers instead of struct clk pointers. This way we hide
the struct clk pointer from providers unless they need to use
consumer facing APIs.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Add registration APIs in the clk gpio code to return struct
clk_hw pointers instead of struct clk pointers. This way we hide
the struct clk pointer from providers unless they need to use
consumer facing APIs.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Add registration APIs in the clk composite code to return struct
clk_hw pointers instead of struct clk pointers. This way we hide
the struct clk pointer from providers unless they need to use
consumer facing APIs.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Add registration APIs in the clk fractional divider code to
return struct clk_hw pointers instead of struct clk pointers.
This way we hide the struct clk pointer from providers unless
they need to use consumer facing APIs.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Add registration APIs in the clk fixed-factor code to return
struct clk_hw pointers instead of struct clk pointers. This way
we hide the struct clk pointer from providers unless they need to
use consumer facing APIs.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Add registration APIs in the clk mux code to return struct clk_hw
pointers instead of struct clk pointers. This way we hide the
struct clk pointer from providers unless they need to use
consumer facing APIs.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Add registration APIs in the clk gate code to return struct
clk_hw pointers instead of struct clk pointers. This way we hide
the struct clk pointer from providers unless they need to use
consumer facing APIs.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Add registration APIs in the clk divider code to return struct
clk_hw pointers instead of struct clk pointers. This way we hide
the struct clk pointer from providers unless they need to use
consumer facing APIs.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Now that we have a clk registration API that doesn't return
struct clks, we need to have some way to hand out struct clks via
the clk_get() APIs that doesn't involve associating struct clk
pointers with a struct clk_lookup. Luckily, clkdev already
operates on struct clk_hw pointers, except for the registration
facing APIs where it converts struct clk pointers into struct
clk_hw pointers almost immediately.
Let's add clk_hw based registration APIs so that we can skip the
conversion step and provide a way for clk provider drivers to
operate exclusively on clk_hw structs. This way we clearly
split the API between consumers and providers.
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Now that we have a clk registration API that doesn't return
struct clks, we need to have some way to hand out struct clks via
the clk_get() APIs that doesn't involve associating struct clk
pointers with an OF node. Currently we ask the OF provider to
give us a struct clk pointer for some clkspec, turn that struct
clk into a struct clk_hw and then allocate a new struct clk to
return to the caller.
Let's add a clk_hw based OF provider hook that returns a struct
clk_hw directly, so that we skip the intermediate step of
converting from struct clk to struct clk_hw. Eventually when
we've converted all OF clk providers to struct clk_hw based APIs
we can remove the struct clk based ones.
It should also be noted that we change the onecell provider to
have a flex array instead of a pointer for the array of clk_hw
pointers. This allows providers to allocate one structure of the
correct length in one step instead of two.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
We've largely split the clk consumer and provider APIs along
struct clk and struct clk_hw, but clk_register() still returns a
struct clk pointer for each struct clk_hw that's registered.
Eventually we'd like to only allocate struct clks when there's a
user, because struct clk is per-user now, so clk_register() needs
to change.
Let's add new APIs to register struct clk_hws, but this time
we'll hide the struct clk from the caller by returning an int
error code. Also add an unregistration API that takes the clk_hw
structure that was passed to the registration API. This way
provider drivers never have to deal with a struct clk pointer
unless they're using the clk consumer APIs.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Now that we've converted the only caller over to another clkdev
API, remove this one.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
DPLLs typically have a maximum rate they can support, and this varies
from DPLL to DPLL. Add support of the maximum rate value to the DPLL
data struct, and also add check for this in the DPLL round_rate function.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reviewed-by: Nishanth Menon <nm@ti.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Pull renesas clk driver updates from Geert Uytterhoeven:
- Support for the PWM module clock and watchdog related clocks on R-Car H3,
- Cleanups and clarifications.
* 'clk-renesas-for-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers:
clk: renesas: mstp: Clarify cpg_mstp_{at,de}tach_dev() domain parameter
clk: renesas: cpg-mssr: Drop check for CONFIG_PM_GENERIC_DOMAINS_OF
clk: renesas: mstp: Drop check for CONFIG_PM_GENERIC_DOMAINS_OF
clk: renesas: r8a7795: add RWDT clock
clk: renesas: r8a7795: add R clk
clk: renesas: r8a7795: add OSC and RINT clocks
clk: renesas: cpg-mssr: add generic support for read-only DIV6 clocks
clk: renesas: r8a7795: make SD clk definition specific for GEN3
clk: renesas: r8a7795: add PWM clock
This call matches clocks which have been marked as critical in DT
and sets the appropriate flag. These flags can then be used to
mark the clock core flags appropriately prior to registration.
Legacy bindings requiring this feature must add the clock-critical
property to their binding descriptions, as it is not a part of
common-clock binding.
Cc: devicetree@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Link: lkml.kernel.org/r/1455225554-13267-4-git-send-email-mturquette@baylibre.com
Critical clocks are those which must not be gated, else undefined
or catastrophic failure would occur. Here we have chosen to
ensure the prepare/enable counts are correctly incremented, so as
not to confuse users with enabled clocks with no visible users.
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Link: lkml.kernel.org/r/1455225554-13267-2-git-send-email-mturquette@baylibre.com
Make it clear that the "domain" parameter of the cpg_mstp_attach_dev()
and cpg_mstp_detach_dev() functions is not used.
The cpg_mstp_attach_dev() and cpg_mstp_detach_dev() callbacks are not
only used by the CPG/MSTP Clock Domain driver, but also by the R-Mobile
SYSC PM Domain driver.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
As of commit 71d076ceb2 ("ARM: shmobile: Enable PM and
PM_GENERIC_DOMAINS for SoCs with PM Domains"),
CONFIG_PM_GENERIC_DOMAINS_OF is always enabled for SoCs with MSTP
clocks.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Pull Ceph updates from Sage Weil:
"There is quite a bit here, including some overdue refactoring and
cleanup on the mon_client and osd_client code from Ilya, scattered
writeback support for CephFS and a pile of bug fixes from Zheng, and a
few random cleanups and fixes from others"
[ I already decided not to pull this because of it having been rebased
recently, but ended up changing my mind after all. Next time I'll
really hold people to it. Oh well. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits)
libceph: use KMEM_CACHE macro
ceph: use kmem_cache_zalloc
rbd: use KMEM_CACHE macro
ceph: use lookup request to revalidate dentry
ceph: kill ceph_get_dentry_parent_inode()
ceph: fix security xattr deadlock
ceph: don't request vxattrs from MDS
ceph: fix mounting same fs multiple times
ceph: remove unnecessary NULL check
ceph: avoid updating directory inode's i_size accidentally
ceph: fix race during filling readdir cache
libceph: use sizeof_footer() more
ceph: kill ceph_empty_snapc
ceph: fix a wrong comparison
ceph: replace CURRENT_TIME by current_fs_time()
ceph: scattered page writeback
libceph: add helper that duplicates last extent operation
libceph: enable large, variable-sized OSD requests
libceph: osdc->req_mempool should be backed by a slab pool
libceph: make r_request msg_size calculation clearer
...
translation window setup, NULL ptr dereference, and ntb-perf errors.
Also, a modification to the driver API that makes _addr functions
optional.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW9rVgAAoJEG5mS6x6i9IjutEP/11ufWLOGHOOD4zAZ/Bb62MS
YQXr0I/CUViEiV0Jaj254FmlYYafDAb5LfYVneVGalW3HvLMED069M7WEQyP7UqG
4dTNWyktS6GkjrlejGzY2xrPXr8hG4GTrpkMgBo4uZol1n95VEe3IlOn4VheEwPz
+RNioK1oBDV6f6JXjD5IH0FP+/Gn1VwxVPq/lXpFIZfw4FX1+pbTfjbNB2kkNOXM
0adUEZdyritg3cqrecH1P+ewGYT5//S/ai/AgxghddJAwh2IHlf46ZdGiVH37Awd
r6+EgqzdPiG5Qxlo2CTnu1gb4zsqXJYViWsM+vNIqJYCweD7ZoMuYy89mAf+l0kb
a4kBi657qmEsalw4nqeMoZHuHdW2G9gA8UHmNyENRWmtHG7odVXGnR4PA0nE7kyw
JPrNSHQ7mGgo+9wsvLYT6BAatpAjBIhE2wjrR8svS8qnP8jU9mjT3nTKAWKl1XO+
YXbUwlqbiMaFzBNd327iyEBqoGU3j1ba+AiG3IBbiOxNJz1WnfFNvnuYMu4zuruZ
KmdZlQTO5s2YPBIV+BgzPK6oRBCTxQrVBsRN2jn/i+02gAgB3uZeT3IhLCTLZhtf
mT5/3yyvk6O8Hcs5VoSzyyzbAHIL0y/NEUR0jnOqZowwTx8ajg+owHAUgggUloHj
wXALu1jc80GsOglBKU0Z
=7DcD
-----END PGP SIGNATURE-----
Merge tag 'ntb-4.6' of git://github.com/jonmason/ntb
Pull NTB bug fixes from Jon Mason:
"NTB bug fixes for tasklet from spinning forever, link errors,
translation window setup, NULL ptr dereference, and ntb-perf errors.
Also, a modification to the driver API that makes _addr functions
optional"
* tag 'ntb-4.6' of git://github.com/jonmason/ntb:
NTB: Remove _addr functions from ntb_hw_amd
NTB: Make _addr functions optional in the API
NTB: Fix incorrect clean up routine in ntb_perf
NTB: Fix incorrect return check in ntb_perf
ntb: fix possible NULL dereference
ntb: add missing setup of translation window
ntb: stop link work when we do not have memory
ntb: stop tasklet from spinning forever during shutdown.
ntb: perf test: fix address space confusion
Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot
will allow KASAN store allocation/deallocation stack traces for memory
chunks. The stack traces are stored in a hash table and referenced by
handles which reside in the kasan_alloc_meta and kasan_free_meta
structures in the allocated memory chunks.
IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
duplication.
Right now stackdepot support is only enabled in SLAB allocator. Once
KASAN features in SLAB are on par with those in SLUB we can switch SLUB
to stackdepot as well, thus removing the dependency on SLUB stack
bookkeeping, which wastes a lot of memory.
This patch is based on the "mm: kasan: stack depots" patch originally
prepared by Dmitry Chernenkov.
Joonsoo has said that he plans to reuse the stackdepot code for the
mm/page_owner.c debugging facility.
[akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
[aryabinin@virtuozzo.com: comment style fixes]
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.
Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>. Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add GFP flags to KASAN hooks for future patches to use.
This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add KASAN hooks to SLAB allocator.
This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A leftover from commit c32b3cbe0d ("oom, PM: make OOM detection in the
freezer path raceless").
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task" tried
to protect oom_reaper_list using MMF_OOM_KILLED flag. But we can do it
by simply checking tsk->oom_reaper_list != NULL.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Entries are only added/removed from oom_reaper_list at head so we can
use a single linked list and hence save a word in task_struct.
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo has reported that oom_kill_allocating_task=1 will cause
oom_reaper_list corruption because oom_kill_process doesn't follow
standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue
the same task multiple times - e.g. by sacrificing the same child
multiple times.
This patch fixes the issue by introducing a new MMF_OOM_KILLED mm flag
which is set in oom_kill_process atomically and oom reaper is disabled
if the flag was already set.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
wake_oom_reaper has allowed only 1 oom victim to be queued. The main
reason for that was the simplicity as other solutions would require some
way of queuing. The current approach is racy and that was deemed
sufficient as the oom_reaper is considered a best effort approach to
help with oom handling when the OOM victim cannot terminate in a
reasonable time. The race could lead to missing an oom victim which can
get stuck
out_of_memory
wake_oom_reaper
cmpxchg // OK
oom_reaper
oom_reap_task
__oom_reap_task
oom_victim terminates
atomic_inc_not_zero // fail
out_of_memory
wake_oom_reaper
cmpxchg // fails
task_to_reap = NULL
This race requires 2 OOM invocations in a short time period which is not
very likely but certainly not impossible. E.g. the original victim
might have not released a lot of memory for some reason.
The situation would improve considerably if wake_oom_reaper used a more
robust queuing. This is what this patch implements. This means adding
oom_reaper_list list_head into task_struct (eat a hole before embeded
thread_struct for that purpose) and a oom_reaper_lock spinlock for
queuing synchronization. wake_oom_reaper will then add the task on the
queue and oom_reaper will dequeue it.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When oom_reaper manages to unmap all the eligible vmas there shouldn't
be much of the freable memory held by the oom victim left anymore so it
makes sense to clear the TIF_MEMDIE flag for the victim and allow the
OOM killer to select another task.
The lack of TIF_MEMDIE also means that the victim cannot access memory
reserves anymore but that shouldn't be a problem because it would get
the access again if it needs to allocate and hits the OOM killer again
due to the fatal_signal_pending resp. PF_EXITING check. We can safely
hide the task from the OOM killer because it is clearly not a good
candidate anymore as everyhing reclaimable has been torn down already.
This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE
and thus hold off further global OOM killer actions granted the oom
reaper is able to take mmap_sem for the associated mm struct. This is
not guaranteed now but further steps should make sure that mmap_sem for
write should be blocked killable which will help to reduce such a lock
contention. This is not done by this patch.
Note that exit_oom_victim might be called on a remote task from
__oom_reap_task now so we have to check and clear the flag atomically
otherwise we might race and underflow oom_victims or wake up waiters too
early.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch (of 5):
This is based on the idea from Mel Gorman discussed during LSFMM 2015
and independently brought up by Oleg Nesterov.
The OOM killer currently allows to kill only a single task in a good
hope that the task will terminate in a reasonable time and frees up its
memory. Such a task (oom victim) will get an access to memory reserves
via mark_oom_victim to allow a forward progress should there be a need
for additional memory during exit path.
It has been shown (e.g. by Tetsuo Handa) that it is not that hard to
construct workloads which break the core assumption mentioned above and
the OOM victim might take unbounded amount of time to exit because it
might be blocked in the uninterruptible state waiting for an event (e.g.
lock) which is blocked by another task looping in the page allocator.
This patch reduces the probability of such a lockup by introducing a
specialized kernel thread (oom_reaper) which tries to reclaim additional
memory by preemptively reaping the anonymous or swapped out memory owned
by the oom victim under an assumption that such a memory won't be needed
when its owner is killed and kicked from the userspace anyway. There is
one notable exception to this, though, if the OOM victim was in the
process of coredumping the result would be incomplete. This is
considered a reasonable constrain because the overall system health is
more important than debugability of a particular application.
A kernel thread has been chosen because we need a reliable way of
invocation so workqueue context is not appropriate because all the
workers might be busy (e.g. allocating memory). Kswapd which sounds
like another good fit is not appropriate as well because it might get
blocked on locks during reclaim as well.
oom_reaper has to take mmap_sem on the target task for reading so the
solution is not 100% because the semaphore might be held or blocked for
write but the probability is reduced considerably wrt. basically any
lock blocking forward progress as described above. In order to prevent
from blocking on the lock without any forward progress we are using only
a trylock and retry 10 times with a short sleep in between. Users of
mmap_sem which need it for write should be carefully reviewed to use
_killable waiting as much as possible and reduce allocations requests
done with the lock held to absolute minimum to reduce the risk even
further.
The API between oom killer and oom reaper is quite trivial.
wake_oom_reaper updates mm_to_reap with cmpxchg to guarantee only
NULL->mm transition and oom_reaper clear this atomically once it is done
with the work. This means that only a single mm_struct can be reaped at
the time. As the operation is potentially disruptive we are trying to
limit it to the ncessary minimum and the reaper blocks any updates while
it operates on an mm. mm_struct is pinned by mm_count to allow parallel
exit_mmap and a race is detected by atomic_inc_not_zero(mm_users).
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This will be needed in the patch "mm, oom: introduce oom reaper".
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When security is enabled, security module can call filesystem's
getxattr/setxattr callbacks during d_instantiate(). For cephfs,
d_instantiate() is usually called by MDS' dispatch thread, while
handling MDS reply. If the MDS reply does not include xattrs and
corresponding caps, getxattr/setxattr need to send a new request
to MDS and waits for the reply. This makes MDS' dispatch sleep,
nobody handles later MDS replies.
The fix is make sure lookup/atomic_open reply include xattrs and
corresponding caps. So getxattr can be handled by cached xattrs.
This requires some modification to both MDS and request message.
(Client tells MDS what caps it wants; MDS encodes proper caps in
the reply)
Smack security module may call setxattr during d_instantiate().
Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
to us. So just make setxattr return error when called by MDS'
dispatch thread.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
This helper duplicates last extent operation in OSD request, then
adjusts the new extent operation's offset and length. The helper
is for scatterd page writeback, which adds nonconsecutive dirty
pages to single OSD request.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Turn r_ops into a flexible array member to enable large, consisting of
up to 16 ops, OSD requests. The use case is scattered writeback in
cephfs and, as far as the kernel client is concerned, 16 is just a made
up number.
r_ops had size 3 for copyup+hint+write, but copyup is really a special
case - it can only happen once. ceph_osd_request_cache is therefore
stuffed with num_ops=2 requests, anything bigger than that is allocated
with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which
means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
that.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This avoids defining large array of r_reply_op_{len,result} in
in struct ceph_osd_request.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Unless we are in the process of setting up a client (i.e. connecting to
the monitor cluster for the first time), apply a backoff: every time we
want to reopen a session, increase our timeout by a multiple (currently
2); when we complete the connection, reduce that multipler by 50%.
Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Split ping interval and ping timeout: ping interval is 10s; keepalive
timeout is 30s.
Make monc_ping_timeout a constant while at it - it's not actually
exported as a mount option (and the rest of tick-related settings won't
be either), so it's got no place in ceph_options.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
It is currently hard-coded in the mon_client that mdsmap and monmap
subs are continuous, while osdmap sub is always "onetime". To better
handle full clusters/pools in the osd_client, we need to be able to
issue continuous osdmap subs. Revamp subs code to allow us to specify
for each sub whether it should be continuous or not.
Although not strictly required for the above, switch to SUBSCRIBE2
protocol while at it, eliminating the ambiguity between a request for
"every map since X" and a request for "just the latest" when we don't
have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now
required - it's been supported since pre-argonaut (2010).
Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
in before we validate the epoch and successfully install the new map
can mess up mon_client sub state.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Pull drm fixes from Dave Airlie:
"Just a couple of dma-buf related fixes and some amdgpu fixes, along
with a regression fix for radeon off but default feature, but makes my
30" monitor happy again"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux:
drm/radeon/mst: cleanup code indentation
drm/radeon/mst: fix regression in lane/link handling.
drm/amdgpu: add invalidate_page callback for userptrs
drm/amdgpu: Revert "remove the userptr rmn->lock"
drm/amdgpu: clean up path handling for powerplay
drm/amd/powerplay: fix memory leak of tdp_table
dma-buf/fence: fix fence_is_later v2
dma-buf: Update docs for SYNC ioctl
drm: remove excess description
dma-buf, drm, ion: Propagate error code from dma_buf_start_cpu_access()
drm/atmel-hlcdc: use helper to get crtc state
drm/atomic: use helper to get crtc state
- Fix for an intel_pstate driver issue related to the handling of
MSR updates uncovered by the recent cpufreq rework (Rafael Wysocki).
- cpufreq core cleanups related to starting governors and frequency
synchronization during resume from system suspend and a locking
fix for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran).
- acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang,
Michael Neuling, Richard Cochran, Shilpasri Bhat).
- intel_idle driver update preventing some Skylake-H systems
from hanging during initialization by disabling deep C-states
mishandled by the platform in the problematic configurations (Len
Brown).
- Intel Xeon Phi Processor x200 support for intel_idle (Dasaratharaman
Chandramouli).
- cpuidle menu governor updates to make it always honor PM QoS
latency constraints (and prevent C1 from being used as the
fallback C-state on x86 when they are set below its exit latency)
and to restore the previous behavior to fall back to C1 if the next
timer event is set far enough in the future that was changed in 4.4
which led to an energy consumption regression (Rik van Riel, Rafael
Wysocki).
- New device ID for a future AMD UART controller in the ACPI driver
for AMD SoCs (Wang Hongcheng).
- Rockchip rk3399 support for the rockchip-io-domain adaptive voltage
scaling (AVS) driver (David Wu).
- ACPI PCI resources management fix for the handling of IO space
resources on architectures where the IO space is memory mapped
(IA64 and ARM64) broken by the introduction of common ACPI
resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi).
- Fix for the ACPI backend of the generic device properties API
to make it parse non-device (data node only) children of an
ACPI device correctly (Irina Tirdea).
- Fixes for the handling of global suspend flags (introduced in 4.4)
during hibernation and resume from it (Lukas Wunner).
- Support for obtaining configuration information from Device Trees
in the PM clocks framework (Jon Hunter).
- ACPI _DSM helper code and devfreq framework cleanups (Colin Ian
King, Geert Uytterhoeven).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJW9JaRAAoJEILEb/54YlRx/GAQAJujANWilWHZYm24a9JDcIE9
rsNZIC/FdeBVilPtRTZQnig/Pj32Z4Jm7IZ/DLOq0Deu1YK/9uv3y59M3BcX6WyL
H5VR80L8geUJZ7RRk0WfM5D4X82ovzwpE/kWt2Z7HDuvJSCBmFBZOvNrXbaRncKD
jIvat/p6uCuxt5c08+ebnBLQ6tOs8wLTWiCx3fO128GIrGRGN2xFV6hzRWVGnJ4g
WXGAR+AdLxRMZz4PPmqdTfRj4TNSR071GjKyaeKfZUjQGAsf5O9A77JFjeNVomDx
g1K37Byid2bTByzVavlEXPJZ7eKb5dAhlo7IJ9HAcOAXChLqH2Czjrpd+1XjR9MF
SV/78rCnF8eet83QYLbGV/Mzf7gbJP2Xp6wiaM22VAPpGe+sYfphJoQka9XRTfId
OgAjyYMYdWAKo5DhxVNI8WyN0W5dsoBFPxnaUFhHSGDCIJH7Ksy20m6y3plG2Bxf
ahoiQhmd9ohjtB5JbRnf4MY0hjekp8Srdf+DoNKsk/+JscIyROpYY3msQ3smUKo+
f628MC/wAosMpSV+l+KOYkbjCbtB49IabWtZ//NVD9hYB3E1f6aTN59yFbWB+1rp
L7Y8iaxzSkyJy/yYVuBal3rSk356+BvvoXBlLXmBsyu1TMlcDjALIYztSiTVT5MB
RZBhgNwdkxNCYJfU3ex+
=hUVj
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management and ACPI updates from Rafael Wysocki:
"The second batch of power management and ACPI updates for v4.6.
Included are fixups on top of the previous PM/ACPI pull request and
other material that didn't make into it but still should go into 4.6.
Among other things, there's a fix for an intel_pstate driver issue
uncovered by recent cpufreq changes, a workaround for a boot hang on
Skylake-H related to the handling of deep C-states by the platform and
a PCI/ACPI fix for the handling of IO port resources on non-x86
architectures plus some new device IDs and similar.
Specifics:
- Fix for an intel_pstate driver issue related to the handling of MSR
updates uncovered by the recent cpufreq rework (Rafael Wysocki).
- cpufreq core cleanups related to starting governors and frequency
synchronization during resume from system suspend and a locking fix
for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran).
- acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang,
Michael Neuling, Richard Cochran, Shilpasri Bhat).
- intel_idle driver update preventing some Skylake-H systems from
hanging during initialization by disabling deep C-states mishandled
by the platform in the problematic configurations (Len Brown).
- Intel Xeon Phi Processor x200 support for intel_idle
(Dasaratharaman Chandramouli).
- cpuidle menu governor updates to make it always honor PM QoS
latency constraints (and prevent C1 from being used as the fallback
C-state on x86 when they are set below its exit latency) and to
restore the previous behavior to fall back to C1 if the next timer
event is set far enough in the future that was changed in 4.4 which
led to an energy consumption regression (Rik van Riel, Rafael
Wysocki).
- New device ID for a future AMD UART controller in the ACPI driver
for AMD SoCs (Wang Hongcheng).
- Rockchip rk3399 support for the rockchip-io-domain adaptive voltage
scaling (AVS) driver (David Wu).
- ACPI PCI resources management fix for the handling of IO space
resources on architectures where the IO space is memory mapped
(IA64 and ARM64) broken by the introduction of common ACPI
resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi).
- Fix for the ACPI backend of the generic device properties API to
make it parse non-device (data node only) children of an ACPI
device correctly (Irina Tirdea).
- Fixes for the handling of global suspend flags (introduced in 4.4)
during hibernation and resume from it (Lukas Wunner).
- Support for obtaining configuration information from Device Trees
in the PM clocks framework (Jon Hunter).
- ACPI _DSM helper code and devfreq framework cleanups (Colin Ian
King, Geert Uytterhoeven)"
* tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
PM / AVS: rockchip-io: add io selectors and supplies for rk3399
intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled
ACPI / PM: Runtime resume devices when waking from hibernate
PM / sleep: Clear pm_suspend_global_flags upon hibernate
cpufreq: governor: Always schedule work on the CPU running update
cpufreq: Always update current frequency before startig governor
cpufreq: Introduce cpufreq_update_current_freq()
cpufreq: Introduce cpufreq_start_governor()
cpufreq: powernv: Add sysfs attributes to show throttle stats
cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static
PCI: ACPI: IA64: fix IO port generic range check
ACPI / util: cast data to u64 before shifting to fix sign extension
cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
cpuidle: menu: Fall back to polling if next timer event is near
cpufreq: acpi-cpufreq: Clean up hot plug notifier callback
intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
cpufreq: Make cpufreq_quick_get() safe to call
ACPI / property: fix data node parsing in acpi_get_next_subnode()
ACPI / APD: Add device HID for future AMD UART controller
...