For iptable module to load a bpf program from a pinned location, it
only retrieve a loaded program and cannot change the program content so
requiring a write permission for it might not be necessary.
Also when adding or removing an unrelated iptable rule, it might need to
flush and reload the xt_bpf related rules as well and triggers the inode
permission check. It might be better to remove the write premission
check for the inode so we won't need to grant write access to all the
processes that flush and restore iptables rules.
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2019-05-16
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a use after free in __dev_map_entry_free(), from Eric.
2) Several sockmap related bug fixes: a splat in strparser if
it was never initialized, remove duplicate ingress msg list
purging which can race, fix msg->sg.size accounting upon
skb to msg conversion, and last but not least fix a timeout
bug in tcp_bpf_wait_data(), from John.
3) Fix LRU map to avoid messing with eviction heuristics upon
syscall lookup, e.g. map walks from user space side will
then lead to eviction of just recently created entries on
updates as it would mark all map entries, from Daniel.
4) Don't bail out when libbpf feature probing fails. Also
various smaller fixes to flow_dissector test, from Stanislav.
5) Fix missing brackets for BTF_INT_OFFSET() in UAPI, from Gary.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mostly clean ups but there are also a couple of out-of-bounds accesses
(including a potential write to the byte before a static buffer).
The main changes are:
* Fixes those out-of-bounds access (empty string to configure
test module could write the byte before a buffer, high cpu counts
could read outside of per-cpu structures).
* Improvements to string handling problems picked up by new compiler
warnings and other static checks. Most are fixing benign issues that
can't be tickled without code changes but still reduce the wtf factor
a little.
* Tidy up the terminal output.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAlza6CEACgkQfOMlXTn3
iKHkwA/8CmCq7UA3ZZsItHYn48Xa4YzxySErhpocsqa7U7tPh7f7ETv6NkfmjxEQ
MGmQflKw/yU3avR6eHUfJcd4AO3+x0Vi5wR9fhevAVu3sfnjTjTjP0F2tDjWe+Y/
TJob4/gSssmDp1+DtFvOQdOmGVz9C7xMQ2wZFjQcpAwesLLbwu+KPZDcBPpyRnl1
zSOIUtTxHH6ay2PX57CNAWEmE4SFoUha7712GTHVHe3rykrC+CLYNeCRzUuyTSAz
OaPTNzd/OFz+uLcsvTPothpc3wUfM4MUkuCkAVKpuMcB2D+/7WqqszUfHYWJ27bH
oeYqRyeQaRa6COFkxZ4XZQRMBOYzzidVboTDjlTj391qq5L32tje75TtfHgCJ/p7
vlg0QdbrHOFaDF9aXcmqLr7kNRi83NxNPhg4XHA75MRHFBGvRkMd3jmuZdShFKc7
Yegr+pR0FJwivZz8+UcRPAdI5gdmSWLdpeB2GhtZqDk3975Qjvsy90ieG7GQnjR8
/ewoYsFQ5/qXwyZzJ+kHxAmTMWpGYx8Kge77j5UMljPsuSuHU56vUt4ovzSiSzuX
dTAxLRmrYi6Dlri76EHEoE+1mx301ymK4MXMz8WnNVQhnPnkCcEXx7P3neCB/wuX
w1O2VvMQ8b4si1/M71QFcAgQjMcJz7z8wGwhfnnqFPgZmVT5n/8=
=SsRX
-----END PGP SIGNATURE-----
Merge tag 'kgdb-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"Mostly cleanups but there are also a couple of fixes for out-of-bounds
accesses (including a potential write to the byte before a static
buffer).
The main changes are:
- Fixes to those out-of-bounds access (empty string to configure test
module could write the byte before a buffer, high cpu counts could
read outside of per-cpu structures).
- Improvements to string handling problems picked up by new compiler
warnings and other static checks. Most are fixing benign issues
that can't be tickled without code changes but still reduce the wtf
factor a little.
- Tidy up the terminal output"
* tag 'kgdb-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kdb: Fix bound check compiler warning
kdb: do a sanity check on the cpu in kdb_per_cpu()
kdb: Get rid of broken attempt to print CCVERSION in kdb summary
misc: kgdbts: fix out-of-bounds access in function param_set_kgdbts_var
kdb: kdb_support: replace strcpy() by strscpy()
gdbstub: Replace strcpy() by strscpy()
gdbstub: mark expected switch fall-throughs
Summary of modules changes for the 5.2 merge window:
- Use a separate table to store symbol types instead of
hijacking fields in struct Elf_Sym
- Trivial code cleanups
Signed-off-by: Jessica Yu <jeyu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJc2W9gAAoJEMBFfjjOO8FyAqkP+wbCB42LJ8kD2H+pbxpsf7vU
T1+5fZ3rfyLqHDc9Pzt1K6SBR04aLq8OXW3nOIQ5kmirSstiznokAExQBjeMegVJ
14Q5IxOSQ44rBPg5ggCA6s9qNyw2fqgwKWQKXLW+VmYAHiBfIBV0mO576KWIEgE1
vOCQPqW0ccfmjEssumEYMSePg01LATjso3PMcdJm6yI74xSGd8n872y36BleAP8M
232L8B4fuZEuHJ1QaBnXdapKibyMX3k3WbfK9U2RzSWaGjmuBXwihCB+ZA6HCiRX
MVwXQwh63UGCZLYkXnGMwPNgX5TTZRiPT4oIhB0tGIXkEFX/DlA+NGfuinEAYuKS
M0zhjgjSOKYEhNy1GtirbHjNCC4ULF0OZvD/dB8InQON0/t5Duq3sOHSp9+rWMg7
yuHy3NJGHHd65nGd7u7vXmdZXct/NVDdahKhqRA8i+HOxUBdF9vvMMnSASXufWFt
j5GSFA5/IZKpYofho/jvJK2rmQQVkVpD9zWCZcv8BNT124W4cZ5swOOSVYOnSAJb
Wvy1oknjF/HrKIft3UIFR0FU8uIueP0mtnna2B9SBVfZm7rM+/3+a1D/UOHaLtuO
++b1sIOpyApevRZcXpTnH63ZaO2tRC7KnDzaBT8FTZimEyV5KdD6Ec2MKJdi8Nkb
Pr099LJ2i/i8rQbkjM1i
=NO31
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
- Use a separate table to store symbol types instead of hijacking
fields in struct Elf_Sym
- Trivial code cleanups
* tag 'modules-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: add stubs for within_module functions
kallsyms: store type information in its own array
vmlinux.lds.h: drop unused __vermagic
One of the biggest issues we face right now with picking LRU map over
regular hash table is that a map walk out of user space, for example,
to just dump the existing entries or to remove certain ones, will
completely mess up LRU eviction heuristics and wrong entries such
as just created ones will get evicted instead. The reason for this
is that we mark an entry as "in use" via bpf_lru_node_set_ref() from
system call lookup side as well. Thus upon walk, all entries are
being marked, so information of actual least recently used ones
are "lost".
In case of Cilium where it can be used (besides others) as a BPF
based connection tracker, this current behavior causes disruption
upon control plane changes that need to walk the map from user space
to evict certain entries. Discussion result from bpfconf [0] was that
we should simply just remove marking from system call side as no
good use case could be found where it's actually needed there.
Therefore this patch removes marking for regular LRU and per-CPU
flavor. If there ever should be a need in future, the behavior could
be selected via map creation flag, but due to mentioned reason we
avoid this here.
[0] http://vger.kernel.org/bpfconf.html
Fixes: 29ba732acb ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
Fixes: 8f8449384e ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a callback map_lookup_elem_sys_only() that map implementations
could use over map_lookup_elem() from system call side in case the
map implementation needs to handle the latter differently than from
the BPF data path. If map_lookup_elem_sys_only() is set, this will
be preferred pick for map lookups out of user space. This hook is
used in a follow-up fix for LRU map, but once development window
opens, we can convert other map types from map_lookup_elem() (here,
the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use
the callback to simplify and clean up the latter.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Merge misc updates from Andrew Morton:
- a few misc things and hotfixes
- ocfs2
- almost all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (139 commits)
kernel/memremap.c: remove the unused device_private_entry_fault() export
mm: delete find_get_entries_tag
mm/huge_memory.c: make __thp_get_unmapped_area static
mm/mprotect.c: fix compilation warning because of unused 'mm' variable
mm/page-writeback: introduce tracepoint for wait_on_page_writeback()
mm/vmscan: simplify trace_reclaim_flags and trace_shrink_flags
mm/Kconfig: update "Memory Model" help text
mm/vmscan.c: don't disable irq again when count pgrefill for memcg
mm: memblock: make keeping memblock memory opt-in rather than opt-out
hugetlbfs: always use address space in inode for resv_map pointer
mm/z3fold.c: support page migration
mm/z3fold.c: add structure for buddy handles
mm/z3fold.c: improve compression by extending search
mm/z3fold.c: introduce helper functions
mm/page_alloc.c: remove unnecessary parameter in rmqueue_pcplist
mm/hmm: add ARCH_HAS_HMM_MIRROR ARCH_HAS_HMM_DEVICE Kconfig
mm/vmscan.c: simplify shrink_inactive_list()
fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback
xen/privcmd-buf.c: convert to use vm_map_pages_zero()
xen/gntdev.c: convert to use vm_map_pages()
...
This export has been entirely unused since it was added more than 1 1/2
years ago.
Link: http://lkml.kernel.org/r/20190429115535.12793-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most architectures do not need the memblock memory after the page
allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the
arch Kconfig.
Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the
logic makes it clear which architectures actually use memblock after
system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK
to the architectures that are still missing that option.
Link: http://lkml.kernel.org/r/1556102150-32517-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch_add_memory, __add_pages take a want_memblock which controls whether
the newly added memory should get the sysfs memblock user API (e.g.
ZONE_DEVICE users do not want/need this interface). Some callers even
want to control where do we allocate the memmap from by configuring
altmap.
Add a more generic hotplug context for arch_add_memory and __add_pages.
struct mhp_restrictions contains flags which contains additional features
to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and
altmap for alternative memmap allocator.
This patch shouldn't introduce any functional change.
[akpm@linux-foundation.org: build fix]
Link: http://lkml.kernel.org/r/20190408082633.2864-3-osalvador@suse.de
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This updates each existing invalidation to use the correct mmu notifier
event that represent what is happening to the CPU page table. See the
patch which introduced the events to see the rational behind this.
Link: http://lkml.kernel.org/r/20190326164747.24405-7-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
a result of kernel activities (memory compression, reclaim, migration,
...).
Users of mmu notifier API track changes to the CPU page table and take
specific action for them. While current API only provide range of virtual
address affected by the change, not why the changes is happening.
This patchset do the initial mechanical convertion of all the places that
calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
event as well as the vma if it is know (most invalidation happens against
a given vma). Passing down the vma allows the users of mmu notifier to
inspect the new vma page protection.
The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
should assume that every for the range is going away when that event
happens. A latter patch do convert mm call path to use a more appropriate
events for each call.
This is done as 2 patches so that no call site is forgotten especialy
as it uses this following coccinelle patch:
%<----------------------------------------------------------------------
@@
identifier I1, I2, I3, I4;
@@
static inline void mmu_notifier_range_init(struct mmu_notifier_range *I1,
+enum mmu_notifier_event event,
+unsigned flags,
+struct vm_area_struct *vma,
struct mm_struct *I2, unsigned long I3, unsigned long I4) { ... }
@@
@@
-#define mmu_notifier_range_init(range, mm, start, end)
+#define mmu_notifier_range_init(range, event, flags, vma, mm, start, end)
@@
expression E1, E3, E4;
identifier I1;
@@
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, I1,
I1->vm_mm, E3, E4)
...>
@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(..., struct vm_area_struct *VMA, ...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, VMA,
E2, E3, E4)
...> }
@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(...) {
struct vm_area_struct *VMA;
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, VMA,
E2, E3, E4)
...> }
@@
expression E1, E2, E3, E4;
identifier FN;
@@
FN(...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, NULL,
E2, E3, E4)
...> }
---------------------------------------------------------------------->%
Applied with:
spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
spatch --sp-file mmu-notifier.spatch --dir mm --in-place
Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.
This patch does not change any functionality. New functionality will
follow in subsequent patches.
Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.
NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted. This breaks the current
GUP call site convention of having the returned pages be the final
parameter. So the suggestion was rejected.
Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Userfaultfd can be misued to make it easier to exploit existing
use-after-free (and similar) bugs that might otherwise only make a
short window or race condition available. By using userfaultfd to
stall a kernel thread, a malicious program can keep some state that it
wrote, stable for an extended period, which it can then access using an
existing exploit. While it doesn't cause the exploit itself, and while
it's not the only thing that can stall a kernel thread when accessing a
memory location, it's one of the few that never needs privilege.
We can add a flag, allowing userfaultfd to be restricted, so that in
general it won't be useable by arbitrary user programs, but in
environments that require userfaultfd it can be turned back on.
Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
whether userfaultfd is allowed by unprivileged users. When this is
set to zero, only privileged users (root user, or users with the
CAP_SYS_PTRACE capability) will be able to use the userfaultfd
syscalls.
Andrea said:
: The only difference between the bpf sysctl and the userfaultfd sysctl
: this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
: requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
: because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
: already if it's doing other kind of tracking on processes runtime, in
: addition of userfaultfd. In other words both syscalls works only for
: root, when the two sysctl are opt-in set to 1.
[dgilbert@redhat.com: changelog additions]
[akpm@linux-foundation.org: documentation tweak, per Mike]
Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While validating new map we require the @start_data to be strictly less
than @end_data, which is fine for regular applications (this is why this
nit didn't trigger for that long). These members are set from executable
loaders such as elf handers, still it is pretty valid to have a loadable
data section with zero size in file, in such case the start_data is equal
to end_data once kernel loader finishes.
As a result when we're trying to restore such programs the procedure fails
and the kernel returns -EINVAL. From the image dump of a program:
| "mm_start_code": "0x400000",
| "mm_end_code": "0x8f5fb4",
| "mm_start_data": "0xf1bfb0",
| "mm_end_data": "0xf1bfb0",
Thus we need to change validate_prctl_map from strictly less to less or
equal operator use.
Link: http://lkml.kernel.org/r/20190408143554.GY1421@uranus.lan
Fixes: f606b77f1a ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Andrey Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The strncpy() function may leave the destination string buffer
unterminated, better use strscpy() instead.
This fixes the following warning with gcc 8.2:
kernel/debug/kdb/kdb_io.c: In function 'kdb_getstr':
kernel/debug/kdb/kdb_io.c:449:3: warning: 'strncpy' specified bound 256 equals destination size [-Wstringop-truncation]
strncpy(kdb_prompt_str, prompt, CMD_BUFLEN);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Both of them are not declared in the headers and not used outside
of bpf_trace.c file.
Fixes: a38d1107f9 ("bpf: support raw tracepoints in modules")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull networking fixes from David Miller:
"Fixes all over:
1) Netdev refcnt leak in nf_flow_table, from Taehee Yoo.
2) Fix RCU usage in nf_tables, from Florian Westphal.
3) Fix DSA build when NET_DSA_TAG_BRCM_PREPEND is not set, from Yue
Haibing.
4) Add missing page read/write ops to realtek driver, from Heiner
Kallweit.
5) Endianness fix in qrtr code, from Nicholas Mc Guire.
6) Fix various bugs in DSA_SKB_* macros, from Vladimir Oltean.
7) Several BPF documentation cures, from Quentin Monnet.
8) Fix undefined behavior in narrow load handling of BPF verifier,
from Krzesimir Nowak.
9) DMA ops crash in SGI Seeq driver due to not set netdev parent
device pointer, from Thomas Bogendoerfer.
10) Flow dissector has to disable preemption when invoking BPF
program, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
net: ethernet: stmmac: dwmac-sun8i: enable support of unicast filtering
net: ethernet: ti: netcp_ethss: fix build
flow_dissector: disable preemption around BPF calls
bonding: fix arp_validate toggling in active-backup mode
net: meson: fixup g12a glue ephy id
net: phy: realtek: Replace phy functions with non-locked version in rtl8211e_config_init()
net: seeq: fix crash caused by not set dev.parent
of_net: Fix missing of_find_device_by_node ref count drop
net: mvpp2: cls: Add missing NETIF_F_NTUPLE flag
bpf: fix undefined behavior in narrow load handling
libbpf: detect supported kernel BTF features and sanitize BTF
selftests: bpf: Add files generated after build to .gitignore
tools: bpf: synchronise BPF UAPI header with tools
bpf: fix minor issues in documentation for BPF helpers.
bpf: fix recurring typo in documentation for BPF helpers
bpf: fix script for generating man page on BPF helpers
bpf: add various test cases for backward jumps
net: dccp : proto: remove Unneeded variable "err"
net: dsa: Remove the now unused DSA_SKB_CB_COPY() macro
net: dsa: Remove dangerous DSA_SKB_CLONE() macro
...
Commit 31fd85816d ("bpf: permits narrower load from bpf program
context fields") made the verifier add AND instructions to clear the
unwanted bits with a mask when doing a narrow load. The mask is
computed with
(1 << size * 8) - 1
where "size" is the size of the narrow load. When doing a 4 byte load
of a an 8 byte field the verifier shifts the literal 1 by 32 places to
the left. This results in an overflow of a signed integer, which is an
undefined behavior. Typically, the computed mask was zero, so the
result of the narrow load ended up being zero too.
Cast the literal to long long to avoid overflows. Note that narrow
load of the 4 byte fields does not have the undefined behavior,
because the load size can only be either 1 or 2 bytes, so shifting 1
by 8 or 16 places will not overflow it. And reading 4 bytes would not
be a narrow load of a 4 bytes field.
Fixes: 31fd85816d ("bpf: permits narrower load from bpf program context fields")
Reviewed-by: Alban Crequy <alban@kinvolk.io>
Reviewed-by: Iago López Galeiras <iago@kinvolk.io>
Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The "whichcpu" comes from argv[3]. The cpu_online() macro looks up the
cpu in a bitmap of online cpus, but if the value is too high then it
could read beyond the end of the bitmap and possibly Oops.
Fixes: 5d5314d679 ("kdb: core for kgdb back end (1 of 2)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
If you drop into kdb and type "summary", it prints out a line that
says this:
ccversion CCVERSION
...and I don't mean that it actually prints out the version of the C
compiler. It literally prints out the string "CCVERSION".
The version of the C Compiler is already printed at boot up and it
doesn't seem useful to replicate this in kdb. Let's just delete it.
We can also delete the bit of the Makefile that called the C compiler
in an attempt to pass this into kdb. This will remove one extra call
to the C compiler at Makefile parse time and (very slightly) speed up
builds.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Core changes:
- The gpiolib MMIO driver has been enhanced to handle two direction
registers, i.e. one register to set lines as input and one register
to set lines as output. It turns out some silicon engineer thinks
the ability to configure a line as input and output at the same
time makes sense, this can be debated but includes a lot of analog
electronics reasoning, and the registers are there and need to
be handled consistently. Unsurprisingly, we enforce the lines to
be either inputs or outputs in such schemes.
- Send in the proper argument value to .set_config() dispatched to
the pin control subsystem. Nobody used it before, now someone
does, so fix it to work as expected.
- The ACPI gpiolib portions can now handle pin bias setting (pull up
or pull down). This has been in the ACPI spec for years and we
finally have it properly integrated with Linux GPIOs. It was based
on an observation from Andy Schevchenko that Thomas Petazzoni's
changes to the core for biasing the PCA950x GPIO expander actually
happen to fit hand-in-glove with what the ACPI core needed.
Such nice synergies happen sometimes.
New drivers:
- A new driver for the Mellanox BlueField GPIO controller. This is
using 64bit MMIO registers and can configure lines as inputs
and outputs at the same time and after improving the MMIO library
we handle it just fine. Interesting.
- A new IXP4xx proper gpiochip driver with hierarchical interrupts
should be coming in from the ARM SoC tree as well.
Driver enhancements:
- The PCA053x driver handles the CAT9554 GPIO expander.
- The PCA053x driver handles the NXP PCAL6416 GPIO expander.
- Wake-up support on PCA053x GPIO lines.
- OMAP now does a nice asynchronous IRQ handling on wake-ups by
letting everything wake up on edges, and this makes runtime PM
work as expected too.
Misc:
- Several cleanups such as devres fixes.
- Get rid of some languager comstructs that cause problems when
compiling with LLVMs clang.
- Documentation review and update.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJc1olZAAoJEEEQszewGV1zEU4P/RmTf3hG8xmNPS3MDTmR6gAy
/YJOXjXBf3CD/dmEAyyaNLnUQismrtRNvHSoEGbno7gkU+htzp9UfUJkj6+HIXs2
RpF+Hi78HzZNDxGWuBLu6OZolpmBtx+sRKOhHk/XfNS45qd1FgXWDuulzsYa9Xsr
hYMXdtdv9wY/vcc68q1rtKAbzlu5ZNCa3Zj1iNOr/XQt3Nl2BW66hGLgjK4mOvgx
fJy4rFXuDIMfDvo69U1Opz2b39sfE7XMhfZS/MOgg4yEV9zGRgDoI1tyMcTqGb8Q
8LQbp5dXkP+3dJQB8tgbu3Vk4WC1Rd/pmIli5sMgsk0HYQ6XegfT6HJKozSmwN9r
0s8jKlrocWZvdPo1aJwQgtRS56t2rFWcrcRye8bLqxkkW5cYIq9CwkE8USwB31Kv
PFpoOwRuCtj0gkCxf7WIEcC5NAkYPow3K1KPdk3E0Si6I3pj0NqqlaAD0JAlkC2V
aPq3xbTuFCAdmcADEt2Z+dUJ7WIs5Y9oQgosMAx+A2AD4K3QDBMu3pZsT6SCu4XZ
mK0eWJi9/CvOj/s7bA0BEJVxQA+p8KYsNRBOULg/8aAOqGcLnSydQjqrxDTE8YrL
xmmRG7i7ht0B9CchZuIB5hqdvjbCgvcVa5OnCUDfLxE0GdCx8iJ9y9OrsMXbabYq
8FcPDo1N38cTYLnLqvKI
=rhto
-----END PGP SIGNATURE-----
Merge tag 'gpio-v5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull gpio updates from Linus Walleij:
"This is the bulk of the GPIO changes for the v5.2 kernel cycle. A bit
later than usual because I was ironing out my own mistakes. I'm
holding some stuff back for the next kernel as a result, and this
should be a healthy and well tested batch.
Core changes:
- The gpiolib MMIO driver has been enhanced to handle two direction
registers, i.e. one register to set lines as input and one register
to set lines as output. It turns out some silicon engineer thinks
the ability to configure a line as input and output at the same
time makes sense, this can be debated but includes a lot of analog
electronics reasoning, and the registers are there and need to be
handled consistently. Unsurprisingly, we enforce the lines to be
either inputs or outputs in such schemes.
- Send in the proper argument value to .set_config() dispatched to
the pin control subsystem. Nobody used it before, now someone does,
so fix it to work as expected.
- The ACPI gpiolib portions can now handle pin bias setting (pull up
or pull down). This has been in the ACPI spec for years and we
finally have it properly integrated with Linux GPIOs. It was based
on an observation from Andy Schevchenko that Thomas Petazzoni's
changes to the core for biasing the PCA950x GPIO expander actually
happen to fit hand-in-glove with what the ACPI core needed. Such
nice synergies happen sometimes.
New drivers:
- A new driver for the Mellanox BlueField GPIO controller. This is
using 64bit MMIO registers and can configure lines as inputs and
outputs at the same time and after improving the MMIO library we
handle it just fine. Interesting.
- A new IXP4xx proper gpiochip driver with hierarchical interrupts
should be coming in from the ARM SoC tree as well.
Driver enhancements:
- The PCA053x driver handles the CAT9554 GPIO expander.
- The PCA053x driver handles the NXP PCAL6416 GPIO expander.
- Wake-up support on PCA053x GPIO lines.
- OMAP now does a nice asynchronous IRQ handling on wake-ups by
letting everything wake up on edges, and this makes runtime PM work
as expected too.
Misc:
- Several cleanups such as devres fixes.
- Get rid of some languager comstructs that cause problems when
compiling with LLVMs clang.
- Documentation review and update"
* tag 'gpio-v5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (85 commits)
gpio: Update documentation
docs: gpio: convert docs to ReST and rename to *.rst
gpio: sch: Remove write-only core_base
gpio: pxa: Make two symbols static
gpiolib: acpi: Respect pin bias setting
gpiolib: acpi: Add acpi_gpio_update_gpiod_lookup_flags() helper
gpiolib: acpi: Set pin value, based on bias, more accurately
gpiolib: acpi: Change type of dflags
gpiolib: Introduce GPIO_LOOKUP_FLAGS_DEFAULT
gpiolib: Make use of enum gpio_lookup_flags consistent
gpiolib: Indent entry values of enum gpio_lookup_flags
gpio: pca953x: add support for pca6416
dt-bindings: gpio: pca953x: document the nxp,pca6416
gpio: pca953x: add pcal6416 to the of_device_id table
gpio: gpio-omap: Remove conditional pm_runtime handling for GPIO interrupts
gpio: gpio-omap: configure edge detection for level IRQs for idle wakeup
tracing: stop making gpio tracing configurable
gpio: pca953x: Configure wake-up path when wake-up is enabled
gpio: of: Optimize quirk checks
gpio: mmio: Drop bgpio_dir_inverted
...
Pull cgroup updates from Tejun Heo:
"This includes Roman's cgroup2 freezer implementation.
It's a separate machanism from cgroup1 freezer. Instead of blocking
user tasks in arbitrary uninterruptible sleeps, the new implementation
extends jobctl stop - frozen tasks are trapped in jobctl stop until
thawed and can be killed and ptraced. Lots of thanks to Oleg for
sheperding the effort.
Other than that, there are a few trivial changes"
* 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: never call do_group_exit() with task->frozen bit set
kernel: cgroup: fix misuse of %x
cgroup: get rid of cgroup_freezer_frozen_exit()
cgroup: prevent spurious transition into non-frozen state
cgroup: Remove unused cgrp variable
cgroup: document cgroup v2 freezer interface
cgroup: add tracing points for cgroup v2 freezer
cgroup: make TRACE_CGROUP_PATH irq-safe
kselftests: cgroup: add freezer controller self-tests
kselftests: cgroup: don't fail on cg_kill_all() error in cg_destroy()
cgroup: cgroup v2 freezer
cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock
cgroup: implement __cgroup_task_count() helper
cgroup: rename freezer.c into legacy_freezer.c
cgroup: remove extra cgroup_migrate_finish() call
Pull workqueue updates from Tejun Heo:
"Only three commits, of which two are trivial.
The non-trivial chagne is Thomas's patch to switch workqueue from
sched RCU to regular one. The use of sched RCU is mostly historic and
doesn't really buy us anything noticeable"
* 'for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Use normal rcu
kernel/workqueue: Document wq_worker_last_func() argument
kernel/workqueue: Use __printf markup to silence compiler in function 'alloc_workqueue'
Pull intgrity updates from James Morris:
"This contains just three patches, the remainder were either included
in other pull requests (eg. audit, lockdown) or will be upstreamed via
other subsystems (eg. kselftests, Power).
Included here is one bug fix, one documentation update, and extending
the x86 IMA arch policy rules to coordinate the different kernel
module signature verification methods"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
doc/kernel-parameters.txt: Deprecate ima_appraise_tcb
x86/ima: add missing include
x86/ima: require signed kernel modules
- remove the already broken support for NULL dev arguments to the
DMA API calls
- Kconfig tidyups
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlzT00YLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYO66hAAx2kCUIh+K2gFB5uxHqZiG62UmRjkPzolxcR5/Jx9
4Rz6NRAE+rp8v2fbBr2bveDx7cF5bm1L+pRyRsFMfwkm3a8dCHQ51ldIm5VFoI3e
NiX6Zoxk02BCXP/Qk//aHeNW9dBmuiemiXzdPEhOvWvVzqTO5JZrECQpkHEkG+8A
R/IWU15sr5xzw9Td/HVN9CRJri/qiTAuB9nSoP6BGjZeHkQjREJKNMGKDTvSzH4L
tlyD1G7yEymQvLBqGGO64ztuav00l8sqjI3tn1mmwpw4VTajabeRHPnWh+7g9Od+
sH1pRvIOTvEMc456fizufYIOedB5Ze344kgfrxhngRbBVXmMfShr8ZLzdIUGhGjY
1cdGqIUOEKywiDf13KrHVkNU+lJtvjMCMxvV93mAYRLOIQg0Jf4T2kklgKyEhqrG
rqFdbbtSBzmLjPyqc1FS0heDWmA+yJsKAumGcH4blJXCpsD1rHWGe0AJ34x+OHPT
tw5l+P4zAH1eO1qHCtmxN9s0lXZv1VLcFkOrJH91LPvAhZsUCrdqDjyJpTUYaIao
yzkiLbDwFO7SVoqzaVNlVZIJ/9LX0qfAnl2Atty+sAQomrQMoviNSzGbLSLQqhHN
FbTIEBMxrxS49+3lfzHOS/lYPpJp6B31yotNM+6YpXmbRQZN5gjGNYBqhKD+7Rgn
L0Y=
=IdsP
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.2' of git://git.infradead.org/users/hch/dma-mapping
Pull DMA mapping updates from Christoph Hellwig:
- remove the already broken support for NULL dev arguments to the DMA
API calls
- Kconfig tidyups
* tag 'dma-mapping-5.2' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: add a Kconfig symbol to indicate arch_dma_prep_coherent presence
dma-mapping: remove an unnecessary NULL check
x86/dma: Remove the x86_dma_fallback_dev hack
dma-mapping: remove leftover NULL device support
arm: use a dummy struct device for ISA DMA use of the DMA API
pxa3xx-gcu: pass struct device to dma_mmap_coherent
gbefb: switch to managed version of the DMA allocator
da8xx-fb: pass struct device to DMA API functions
parport_ip32: pass struct device to DMA API functions
dma: select GENERIC_ALLOCATOR for DMA_REMAP
I've got two independent reports that cgroup_task_frozen() check
in cgroup_exit() has been triggered by lkp libhugetlbfs-test and
LTP ptrace01 tests.
For example:
[ 44.576072] WARNING: CPU: 1 PID: 3028 at kernel/cgroup/cgroup.c:5932 cgroup_exit+0x148/0x160
[ 44.577724] Modules linked in: crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sr_mod cdrom
bochs_drm sg ttm ata_generic pata_acpi ppdev drm_kms_helper snd_pcm syscopyarea aesni_intel snd_timer
sysfillrect sysimgblt snd crypto_simd cryptd glue_helper soundcore fb_sys_fops joydev drm serio_raw pcspkr
ata_piix libata i2c_piix4 floppy parport_pc parport ip_tables
[ 44.583106] CPU: 1 PID: 3028 Comm: ptrace-write-hu Not tainted 5.1.0-rc3-00053-g9262503 #5
[ 44.584600] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 44.586116] RIP: 0010:cgroup_exit+0x148/0x160
[ 44.587135] Code: 0f 84 50 ff ff ff 48 8b 85 c8 0c 00 00 48 8b 78 70 e8 ec 2e 00 00 e9 3b ff ff ff f0 ff 43 60
0f 88 72 21 89 00 e9 48 ff ff ff <0f> 0b e9 1b ff ff ff e8 3c 73 f4 ff 66 90 66 2e 0f 1f 84 00 00 00
[ 44.590113] RSP: 0018:ffffb25702dcfd30 EFLAGS: 00010002
[ 44.591167] RAX: ffff96a7fee32410 RBX: ffff96a7ff1d6000 RCX: dead000000000200
[ 44.592446] RDX: ffff96a7ff1d6080 RSI: ffff96a7fec75290 RDI: ffff96a7fec75290
[ 44.593715] RBP: ffff96a7fec745c0 R08: ffff96a7fec74658 R09: 0000000000000000
[ 44.594985] R10: 0000000000000000 R11: 0000000000000001 R12: ffff96a7fec75101
[ 44.596266] R13: ffff96a7fec745c0 R14: ffff96a7ff3bde30 R15: ffff96a7fec75130
[ 44.597550] FS: 0000000000000000(0000) GS:ffff96a7dd700000(0000) knlGS:0000000000000000
[ 44.598950] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 44.600098] CR2: 00000000f7a00000 CR3: 000000000d20e000 CR4: 00000000000406e0
[ 44.601417] Call Trace:
[ 44.602777] do_exit+0x337/0xc40
[ 44.603677] do_group_exit+0x3a/0xa0
[ 44.604610] get_signal+0x12e/0x8d0
[ 44.605533] ? __switch_to_asm+0x40/0x70
[ 44.606503] do_signal+0x36/0x650
[ 44.607409] ? __switch_to_asm+0x40/0x70
[ 44.608383] ? __schedule+0x267/0x860
[ 44.609329] exit_to_usermode_loop+0x89/0xf0
[ 44.610349] do_fast_syscall_32+0x251/0x2e3
[ 44.611357] entry_SYSENTER_compat+0x7f/0x91
[ 44.612376] ---[ end trace e4ca5cfc4b7f7964 ]---
The problem is caused by the ptrace_signal() call in the for loop
in get_signal(). There is a cgroup_enter_frozen() call inside
ptrace_signal(), so after exit from ptrace_signal() the task->frozen
bit might be set. In this case do_group_exit() can be called with the
task->frozen bit set and trigger the warning. This is only place where
we can leave the loop with the task->frozen bit set and without
setting JOBCTL_TRAP_FREEZE and TIF_SIGPENDING.
To resolve this problem, let's move cgroup_leave_frozen(true) call to
just after the fatal label. If the task is going to die, the frozen
bit must be cleared no matter how we get into this point.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Reported-by: Qian Cai <cai@lca.pw>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull networking updates from David Miller:
"Highlights:
1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.
2) Add fib_sync_mem to control the amount of dirty memory we allow to
queue up between synchronize RCU calls, from David Ahern.
3) Make flow classifier more lockless, from Vlad Buslov.
4) Add PHY downshift support to aquantia driver, from Heiner
Kallweit.
5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
contention on SLAB spinlocks in heavy RPC workloads.
6) Partial GSO offload support in XFRM, from Boris Pismenny.
7) Add fast link down support to ethtool, from Heiner Kallweit.
8) Use siphash for IP ID generator, from Eric Dumazet.
9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
entries, from David Ahern.
10) Move skb->xmit_more into a per-cpu variable, from Florian
Westphal.
11) Improve eBPF verifier speed and increase maximum program size,
from Alexei Starovoitov.
12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
spinlocks. From Neil Brown.
13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.
14) Improve link partner cap detection in generic PHY code, from
Heiner Kallweit.
15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
Maguire.
16) Remove SKB list implementation assumptions in SCTP, your's truly.
17) Various cleanups, optimizations, and simplifications in r8169
driver. From Heiner Kallweit.
18) Add memory accounting on TX and RX path of SCTP, from Xin Long.
19) Switch PHY drivers over to use dynamic featue detection, from
Heiner Kallweit.
20) Support flow steering without masking in dpaa2-eth, from Ioana
Ciocoi.
21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
Pirko.
22) Increase the strict parsing of current and future netlink
attributes, also export such policies to userspace. From Johannes
Berg.
23) Allow DSA tag drivers to be modular, from Andrew Lunn.
24) Remove legacy DSA probing support, also from Andrew Lunn.
25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
Haabendal.
26) Add a generic tracepoint for TX queue timeouts to ease debugging,
from Cong Wang.
27) More indirect call optimizations, from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
cxgb4: Fix error path in cxgb4_init_module
net: phy: improve pause mode reporting in phy_print_status
dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
net: macb: Change interrupt and napi enable order in open
net: ll_temac: Improve error message on error IRQ
net/sched: remove block pointer from common offload structure
net: ethernet: support of_get_mac_address new ERR_PTR error
net: usb: smsc: fix warning reported by kbuild test robot
staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
net: dsa: support of_get_mac_address new ERR_PTR error
net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
vrf: sit mtu should not be updated when vrf netdev is the link
net: dsa: Fix error cleanup path in dsa_init_module
l2tp: Fix possible NULL pointer dereference
taprio: add null check on sched_nest to avoid potential null pointer dereference
net: mvpp2: cls: fix less than zero check on a u32 variable
net_sched: sch_fq: handle non connected flows
net_sched: sch_fq: do not assume EDT packets are ordered
net: hns3: use devm_kcalloc when allocating desc_cb
net: hns3: some cleanup for struct hns3_enet_ring
...
Pull misc dcache updates from Al Viro:
"Most of this pile is putting name length into struct name_snapshot and
making use of it.
The beginning of this series ("ovl_lookup_real_one(): don't bother
with strlen()") ought to have been split in two (separate switch of
name_snapshot to struct qstr from overlayfs reaping the trivial
benefits of that), but I wanted to avoid a rebase - by the time I'd
spotted that it was (a) in -next and (b) close to 5.1-final ;-/"
* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
audit_compare_dname_path(): switch to const struct qstr *
audit_update_watch(): switch to const struct qstr *
inotify_handle_event(): don't bother with strlen()
fsnotify: switch send_to_group() and ->handle_event to const struct qstr *
fsnotify(): switch to passing const struct qstr * for file_name
switch fsnotify_move() to passing const struct qstr * for old_name
ovl_lookup_real_one(): don't bother with strlen()
sysv: bury the broken "quietly truncate the long filenames" logics
nsfs: unobfuscate
unexport d_alloc_pseudo()
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAlzRrzoUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNc7hAApgsi+3Jf9i29mgrKdrTciZ35TegK
C8pTlOIndpBcmdwDakR50/PgfMHdHll8M9TReVNEjbe0S+Ww5GTE7eWtL3YqoPC2
MuXEqcriz6UNi5Xma6vCZrDznWLXkXnzMDoDoYGDSoKuUYxef0fuqxDBnERM60Ht
s52+0XvR5ZseBw7I1KIv/ix2fXuCGq6eCdqassm0rvLPQ7bq6nWzFAlNXOLud303
DjIWu6Op2EL0+fJSmG+9Z76zFjyEbhMIhw5OPDeH4eO3pxX29AIv0m0JlI7ZXxfc
/VVC3r5G4WrsWxwKMstOokbmsQxZ5pB3ZaceYpco7U+9N2e3SlpsNM9TV+Y/0ac/
ynhYa//GK195LpMXx1BmWmLpjBHNgL8MvQkVTIpDia0GT+5sX7+haDxNLGYbocmw
A/mR+KM2jAU3QzNseGh6c659j3K4tbMIFMNxt7pUBxVPLafcccNngFGTpzCwu5GU
b7y4d21g6g/3Irj14NYU/qS8dTjW0rYrCMDquTpxmMfZ2xYuSvQmnBw91NQzVBp2
98L2/fsUG3yOa5MApgv+ryJySsIM+SW+7leKS5tjy/IJINzyPEZ85l3o8ck8X4eT
nohpKc/ELmeyi3omFYq18ecvFf2YRS5jRnz89i9q65/3ESgGiC0wyGOhNTvjvsyv
k4jT0slIK614aGk=
=p8Fp
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"We've got a reasonably broad set of audit patches for the v5.2 merge
window, the highlights are below:
- The biggest change, and the source of all the arch/* changes, is
the patchset from Dmitry to help enable some of the work he is
doing around PTRACE_GET_SYSCALL_INFO.
To be honest, including this in the audit tree is a bit of a
stretch, but it does help move audit a little further along towards
proper syscall auditing for all arches, and everyone else seemed to
agree that audit was a "good" spot for this to land (or maybe they
just didn't want to merge it? dunno.).
- We can now audit time/NTP adjustments.
- We continue the work to connect associated audit records into a
single event"
* tag 'audit-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: (21 commits)
audit: fix a memory leak bug
ntp: Audit NTP parameters adjustment
timekeeping: Audit clock adjustments
audit: purge unnecessary list_empty calls
audit: link integrity evm_write_xattrs record to syscall event
syscall_get_arch: add "struct task_struct *" argument
unicore32: define syscall_get_arch()
Move EM_UNICORE to uapi/linux/elf-em.h
nios2: define syscall_get_arch()
nds32: define syscall_get_arch()
Move EM_NDS32 to uapi/linux/elf-em.h
m68k: define syscall_get_arch()
hexagon: define syscall_get_arch()
Move EM_HEXAGON to uapi/linux/elf-em.h
h8300: define syscall_get_arch()
c6x: define syscall_get_arch()
arc: define syscall_get_arch()
Move EM_ARCOMPACT and EM_ARCV2 to uapi/linux/elf-em.h
audit: Make audit_log_cap and audit_copy_inode static
audit: connect LOGIN record to its syscall record
...
Pull swiotlb updates from Konrad Rzeszutek Wilk:
"Cleanups in the swiotlb code and extra debugfs knobs to help with the
field diagnostics"
* 'stable/for-linus-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb-xen: ensure we have a single callsite for xen_dma_map_page
swiotlb-xen: simplify the DMA sync method implementations
swiotlb-xen: use ->map_page to implement ->map_sg
swiotlb-xen: make instances match their method names
swiotlb: save io_tlb_used to local variable before leaving critical section
swiotlb: dump used and total slots when swiotlb buffer is full
Here is the "big" set of driver core patches for 5.2-rc1
There are a number of ACPI patches in here as well, as Rafael said they
should go through this tree due to the driver core changes they
required. They have all been acked by the ACPI developers.
There are also a number of small subsystem-specific changes in here, due
to some changes to the kobject core code. Those too have all been acked
by the various subsystem maintainers.
As for content, it's pretty boring outside of the ACPI changes:
- spdx cleanups
- kobject documentation updates
- default attribute groups for kobjects
- other minor kobject/driver core fixes
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXNHDbw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynDAgCfbb4LBR6I50wFXb8JM/R6cAS7qrsAn1unshKV
8XCYcif2RxjtdJWXbjdm
=/rLh
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core/kobject updates from Greg KH:
"Here is the "big" set of driver core patches for 5.2-rc1
There are a number of ACPI patches in here as well, as Rafael said
they should go through this tree due to the driver core changes they
required. They have all been acked by the ACPI developers.
There are also a number of small subsystem-specific changes in here,
due to some changes to the kobject core code. Those too have all been
acked by the various subsystem maintainers.
As for content, it's pretty boring outside of the ACPI changes:
- spdx cleanups
- kobject documentation updates
- default attribute groups for kobjects
- other minor kobject/driver core fixes
All have been in linux-next for a while with no reported issues"
* tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
kobject: clean up the kobject add documentation a bit more
kobject: Fix kernel-doc comment first line
kobject: Remove docstring reference to kset
firmware_loader: Fix a typo ("syfs" -> "sysfs")
kobject: fix dereference before null check on kobj
Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
init/config: Do not select BUILD_BIN2C for IKCONFIG
Provide in-kernel headers to make extending kernel easier
kobject: Improve doc clarity kobject_init_and_add()
kobject: Improve docs for kobject_add/del
driver core: platform: Fix the usage of platform device name(pdev->name)
livepatch: Replace klp_ktype_patch's default_attrs with groups
cpufreq: schedutil: Replace default_attrs field with groups
padata: Replace padata_attr_type default_attrs field with groups
irqdesc: Replace irq_kobj_type's default_attrs field with groups
net-sysfs: Replace ktype default_attrs field with groups
block: Replace all ktype default_attrs with groups
samples/kobject: Replace foo_ktype's default_attrs field with groups
kobject: Add support for default attribute groups to kobj_type
driver core: Postpone DMA tear-down until after devres release for probe failure
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE7btrcuORLb1XUhEwjrBW1T7ssS0FAlzReuoACgkQjrBW1T7s
sS1uvBAA16pgnhRNxNTrp3LYft6lUWmF4n0baOTVtQNLhPjpwaOxHIrCBugkQCJB
QcQ9IQSOvIkaEW0XAQoPBaeLviiKhHOFw1Fv89OtW6xUidSfSV15lcI9f1F2pCm2
4yCL/8XvL6M0NhxiwftJAkWOXeDNLfjFnLwyLxBfgg3EeyqMgUB8raeosEID0ORR
gm2/g8DYS2r+KNqM/F4xvMSgabfi2bGk+8BtAaVnftJfstpRNrqKwWnSK3Wspj1l
5gkb8gSsiY6ns3V6RgNHrFlhevFg8V+VjcJt7FR+aUEjOkcoiXas/PhvamMzdsn/
FM1F/A0pM8FSybIUClhnnnxNPc+p8ZN/71YQAPs+Mnh3xvbtKea2lkhC+Xv4OpK3
edutSZWFaiIery82Rk00H3vqiSF1+kRIXSpZSS4mElk4FsVljkyH+nSP7rbmE2MR
EQe+kKnZl8QzWrVbnODC+EVvvVpA2bXDvENJmvKqus+t2G0OdV7Iku3F5E3KjF8k
S5RRV1zuBF3ugqnjmYrVmJtpEA8mxClmqvg6okru+qW6ngO5oOgVpPLjWn1CXcdj
wcuQ6Pe1QwAHS54e9WSWgCHVssLvm9nCdCqypdNaoyGWmbTWntwlrY7Y0JUQnAbB
6/G/DQQiCWY9y8bMZlTEydhIpgcsdROuPYv+oHF5+eQQthsWwHc=
=LH11
-----END PGP SIGNATURE-----
Merge tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd updates from Christian Brauner:
"This patchset makes it possible to retrieve pidfds at process creation
time by introducing the new flag CLONE_PIDFD to the clone() system
call. Linus originally suggested to implement this as a new flag to
clone() instead of making it a separate system call.
After a thorough review from Oleg CLONE_PIDFD returns pidfds in the
parent_tidptr argument. This means we can give back the associated pid
and the pidfd at the same time. Access to process metadata information
thus becomes rather trivial.
As has been agreed, CLONE_PIDFD creates file descriptors based on
anonymous inodes similar to the new mount api. They are made
unconditional by this patchset as they are now needed by core kernel
code (vfs, pidfd) even more than they already were before (timerfd,
signalfd, io_uring, epoll etc.). The core patchset is rather small.
The bulky looking changelist is caused by David's very simple changes
to Kconfig to make anon inodes unconditional.
A pidfd comes with additional information in fdinfo if the kernel
supports procfs. The fdinfo file contains the pid of the process in
the callers pid namespace in the same format as the procfs status
file, i.e. "Pid:\t%d".
To remove worries about missing metadata access this patchset comes
with a sample/test program that illustrates how a combination of
CLONE_PIDFD and pidfd_send_signal() can be used to gain race-free
access to process metadata through /proc/<pid>.
Further work based on this patchset has been done by Joel. His work
makes pidfds pollable. It finished too late for this merge window. I
would prefer to have it sitting in linux-next for a while and send it
for inclusion during the 5.3 merge window"
* tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
samples: show race-free pidfd metadata access
signal: support CLONE_PIDFD with pidfd_send_signal
clone: add CLONE_PIDFD
Make anon_inodes unconditional
Pull vfs stable fodder fixes from Al Viro:
- acct_on() fix for deadlock caught by overlayfs folks
- autofs RCU use-after-free SNAFU (->d_manage() can be called
locklessly, so we need to RCU-delay freeing the objects it looks at)
- (hopefully) the end of "do we need freeing this dentry RCU-delayed"
whack-a-mole.
* 'stable-fodder' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
autofs: fix use-after-free in lockless ->d_manage()
dcache: sort the freeing-without-RCU-delay mess for good.
acct_on(): don't mess with freeze protection
Pull vfs inode freeing updates from Al Viro:
"Introduction of separate method for RCU-delayed part of
->destroy_inode() (if any).
Pretty much as posted, except that destroy_inode() stashes
->free_inode into the victim (anon-unioned with ->i_fops) before
scheduling i_callback() and the last two patches (sockfs conversion
and folding struct socket_wq into struct socket) are excluded - that
pair should go through netdev once davem reopens his tree"
* 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits)
orangefs: make use of ->free_inode()
shmem: make use of ->free_inode()
hugetlb: make use of ->free_inode()
overlayfs: make use of ->free_inode()
jfs: switch to ->free_inode()
fuse: switch to ->free_inode()
ext4: make use of ->free_inode()
ecryptfs: make use of ->free_inode()
ceph: use ->free_inode()
btrfs: use ->free_inode()
afs: switch to use of ->free_inode()
dax: make use of ->free_inode()
ntfs: switch to ->free_inode()
securityfs: switch to ->free_inode()
apparmor: switch to ->free_inode()
rpcpipe: switch to ->free_inode()
bpf: switch to ->free_inode()
mqueue: switch to ->free_inode()
ufs: switch to ->free_inode()
coda: switch to ->free_inode()
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAlzP8nQACgkQUqAMR0iA
lPK79A/+NkRouqA9ihAZhUbgW0DHzOAFvUJSBgX11HQAZbGjngakuoyYFvwUx0T0
m80SUTCysxQrWl+xLdccPZ9ZrhP2KFQrEBEdeYHZ6ymcYcl83+3bOIBS7VwdZAbO
EzB8u/58uU/sI6ABL4lF7ZF/+R+U4CXveEUoVUF04bxdPOxZkRX4PT8u3DzCc+RK
r4yhwQUXGcKrHa2GrRL3GXKsDxcnRdFef/nzq4RFSZsi0bpskzEj34WrvctV6j+k
FH/R3kEcZrtKIMPOCoDMMWq07yNqK/QKj0MJlGoAlwfK4INgcrSXLOx+pAmr6BNq
uMKpkxCFhnkZVKgA/GbKEGzFf+ZGz9+2trSFka9LD2Ig6DIstwXqpAgiUK8JFQYj
lq1mTaJZD3DfF2vnGHGeAfBFG3XETv+mIT/ow6BcZi3NyNSVIaqa5GAR+lMc6xkR
waNkcMDkzLFuP1r0p7ZizXOksk9dFkMP3M6KqJomRtApwbSNmtt+O2jvyLPvB3+w
wRyN9WT7IJZYo4v0rrD5Bl6BjV15ZeCPRSFZRYofX+vhcqJQsFX1M9DeoNqokh55
Cri8f6MxGzBVjE1G70y2/cAFFvKEKJud0NUIMEuIbcy+xNrEAWPF8JhiwpKKnU10
c0u674iqHJ2HeVsYWZF0zqzqQ6E1Idhg/PrXfuVuhAaL5jIOnYY=
=WZfC
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:
- Allow state reset of printk_once() calls.
- Prevent crashes when dereferencing invalid pointers in vsprintf().
Only the first byte is checked for simplicity.
- Make vsprintf warnings consistent and inlined.
- Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
modifiers.
- Some clean up of vsprintf and test_printf code.
* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
lib/vsprintf: Make function pointer_string static
vsprintf: Limit the length of inlined error messages
vsprintf: Avoid confusion between invalid address and value
vsprintf: Prevent crash when dereferencing invalid pointers
vsprintf: Consolidate handling of unknown pointer specifiers
vsprintf: Factor out %pO handler as kobject_string()
vsprintf: Factor out %pV handler as va_format()
vsprintf: Factor out %p[iI] handler as ip_addr_string()
vsprintf: Do not check address of well-known strings
vsprintf: Consistent %pK handling for kptr_restrict == 0
vsprintf: Shuffle restricted_pointer()
printk: Tie printk_once / printk_deferred_once into .data.once for reset
treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
lib/test_printf: Switch to bitmap_zalloc()
Pull livepatching updates from Jiri Kosina:
- livepatching kselftests improvements from Joe Lawrence and Miroslav
Benes
- making use of gcc's -flive-patching option when available, from
Miroslav Benes
- kobject handling cleanups, from Petr Mladek
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
livepatch: Remove duplicated code for early initialization
livepatch: Remove custom kobject state handling
livepatch: Convert error about unsupported reliable stacktrace into a warning
selftests/livepatch: Add functions.sh to TEST_PROGS_EXTENDED
kbuild: use -flive-patching when CONFIG_LIVEPATCH is enabled
selftests/livepatch: use TEST_PROGS for test scripts
Pull security subsystem updates from James Morris:
"Just a few bugfixes and documentation updates"
* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
seccomp: fix up grammar in comment
Revert "security: inode: fix a missing check for securityfs_create_file"
Yama: mark function as static
security: inode: fix a missing check for securityfs_create_file
keys: safe concurrent user->{session,uid}_keyring access
security: don't use RCU accessors for cred->session_keyring
Yama: mark local symbols as static
LSM: lsm_hooks.h: fix documentation format
LSM: fix documentation for the shm_* hooks
LSM: fix documentation for the sem_* hooks
LSM: fix documentation for the msg_queue_* hooks
LSM: fix documentation for the audit_* hooks
LSM: fix documentation for the path_chmod hook
LSM: fix documentation for the socket_getpeersec_dgram hook
LSM: fix documentation for the task_setscheduler hook
LSM: fix documentation for the socket_post_create hook
LSM: fix documentation for the syslog hook
LSM: fix documentation for sb_copy_data hook
Let pidfd_send_signal() use pidfds retrieved via CLONE_PIDFD. With this
patch pidfd_send_signal() becomes independent of procfs. This fullfils
the request made when we merged the pidfd_send_signal() patchset. The
pidfd_send_signal() syscall is now always available allowing for it to
be used by users without procfs mounted or even users without procfs
support compiled into the kernel.
Signed-off-by: Christian Brauner <christian@brauner.io>
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
This patchset makes it possible to retrieve pid file descriptors at
process creation time by introducing the new flag CLONE_PIDFD to the
clone() system call. Linus originally suggested to implement this as a
new flag to clone() instead of making it a separate system call. As
spotted by Linus, there is exactly one bit for clone() left.
CLONE_PIDFD creates file descriptors based on the anonymous inode
implementation in the kernel that will also be used to implement the new
mount api. They serve as a simple opaque handle on pids. Logically,
this makes it possible to interpret a pidfd differently, narrowing or
widening the scope of various operations (e.g. signal sending). Thus, a
pidfd cannot just refer to a tgid, but also a tid, or in theory - given
appropriate flag arguments in relevant syscalls - a process group or
session. A pidfd does not represent a privilege. This does not imply it
cannot ever be that way but for now this is not the case.
A pidfd comes with additional information in fdinfo if the kernel supports
procfs. The fdinfo file contains the pid of the process in the callers
pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d".
As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the
parent_tidptr argument of clone. This has the advantage that we can
give back the associated pid and the pidfd at the same time.
To remove worries about missing metadata access this patchset comes with
a sample program that illustrates how a combination of CLONE_PIDFD, and
pidfd_send_signal() can be used to gain race-free access to process
metadata through /proc/<pid>. The sample program can easily be
translated into a helper that would be suitable for inclusion in libc so
that users don't have to worry about writing it themselves.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Pull crypto update from Herbert Xu:
"API:
- Add support for AEAD in simd
- Add fuzz testing to testmgr
- Add panic_on_fail module parameter to testmgr
- Use per-CPU struct instead multiple variables in scompress
- Change verify API for akcipher
Algorithms:
- Convert x86 AEAD algorithms over to simd
- Forbid 2-key 3DES in FIPS mode
- Add EC-RDSA (GOST 34.10) algorithm
Drivers:
- Set output IV with ctr-aes in crypto4xx
- Set output IV in rockchip
- Fix potential length overflow with hashing in sun4i-ss
- Fix computation error with ctr in vmx
- Add SM4 protected keys support in ccree
- Remove long-broken mxc-scc driver
- Add rfc4106(gcm(aes)) cipher support in cavium/nitrox"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits)
crypto: ccree - use a proper le32 type for le32 val
crypto: ccree - remove set but not used variable 'du_size'
crypto: ccree - Make cc_sec_disable static
crypto: ccree - fix spelling mistake "protedcted" -> "protected"
crypto: caam/qi2 - generate hash keys in-place
crypto: caam/qi2 - fix DMA mapping of stack memory
crypto: caam/qi2 - fix zero-length buffer DMA mapping
crypto: stm32/cryp - update to return iv_out
crypto: stm32/cryp - remove request mutex protection
crypto: stm32/cryp - add weak key check for DES
crypto: atmel - remove set but not used variable 'alg_name'
crypto: picoxcell - Use dev_get_drvdata()
crypto: crypto4xx - get rid of redundant using_sd variable
crypto: crypto4xx - use sync skcipher for fallback
crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues
crypto: crypto4xx - fix ctr-aes missing output IV
crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA
crypto: ux500 - use ccflags-y instead of CFLAGS_<basename>.o
crypto: ccree - handle tee fips error during power management resume
crypto: ccree - add function to handle cryptocell tee fips error
...
- Fix the handling of Performance and Energy Bias Hint (EPB) on
Intel processors and expose it to user space via sysfs to avoid
having to access it through the generic MSR I/F (Rafael Wysocki).
- Improve the handling of global turbo changes made by the platform
firmware in the intel_pstate driver (Rafael Wysocki).
- Convert some slow-path static_cpu_has() callers to boot_cpu_has()
in cpufreq (Borislav Petkov).
- Fix the frequency calculation loop in the armada-37xx cpufreq
driver (Gregory CLEMENT).
- Fix possible object reference leaks in multuple cpufreq drivers
(Wen Yang).
- Fix kerneldoc comment in the centrino cpufreq driver (dongjian).
- Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan
Kumar).
- Add support for lx2160a and ls1028a to the qoriq cpufreq driver
(Vabhav Sharma, Yuantian Tang).
- Fix kobject memory leak in the cpufreq core (Viresh Kumar).
- Simplify the IOwait boosting in the schedutil cpufreq governor
and rework the TSC cpufreq notifier on x86 (Rafael Wysocki).
- Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin).
- Improve the cpufreq documentation, add SPDX license tags to
some PM documentation files and unify copyright notices in
them (Rafael Wysocki).
- Add support for "CPU" domains to the generic power domains (genpd)
framework and provide low-level PSCI firmware support for that
feature (Ulf Hansson).
- Rearrange the PSCI firmware support code and add support for
SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla).
- Improve genpd support for devices in multiple power domains (Ulf
Hansson).
- Unify target residency for the AFTR and coupled AFTR states in the
exynos cpuidle driver (Marek Szyprowski).
- Introduce new helper routine in the operating performance points
(OPP) framework (Andrew-sh.Cheng).
- Add support for passing on-die termination (ODT) and auto power
down parameters from the kernel to Trusted Firmware-A (TF-A) to
the rk3399_dmc devfreq driver (Enric Balletbo i Serra).
- Add tracing to devfreq (Lukasz Luba).
- Make the exynos-bus devfreq driver suspend all devices on system
shutdown (Marek Szyprowski).
- Fix a few minor issues in the devfreq subsystem and clean it up
somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring,
Saravana Kannan, Yangtao Li).
- Improve system wakeup diagnostics (Stephen Boyd).
- Rework filesystem sync messages emitted during system suspend and
hibernation (Harry Pan).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAlzQEwUSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxxXwP/jrxikIXdCOV3CJVioV0NetyebwlOqYp
UsIA7lQBfZ/DY6dHw/oKuAT9LP01vcFg6XGe83Alkta9qczR5KZ/MYHFNSZXjXjL
kEvIMBCS/oykaBuW+Xn9am8Ke3Yq/rBSTKWVom3vzSQY0qvZ9GBwPDrzw+k63Zhz
P3afB4ThyY0e9ftgw4HvSSNm13Kn0ItUIQOdaLatXMMcPqP5aAdnUma5Ibinbtpp
rpTHuHKYx7MSjaCg6wl3kKTJeWbQP4wYO2ISZqH9zEwQgdvSHeFAvfPKTegUkmw9
uUsQnPD1JvdglOKovr2muehD1Ur+zsjKDf2OKERkWsWXHPyWzA/AqaVv1mkkU++b
KaWaJ9pE86kGlJ3EXwRbGfV0dM5rrl+dUUQW6nPI1XJnIOFlK61RzwAbqI26F0Mz
AlKxY4jyPLcM3SpQz9iILqyzHQqB67rm29XvId/9scoGGgoqEI4S+v6LYZqI3Vx6
aeSRu+Yof7p5w4Kg5fODX+HzrtMnMrPmLUTXhbExfsYZMi7hXURcN6s+tMpH0ckM
4yiIpnNGCKUSV4vxHBm8XJdAuUnR4Vcz++yFslszgDVVvw5tkvF7SYeHZ6HqcQVm
af9HdWzx3qajs/oyBwdRBedZYDnP1joC5donBI2ofLeF33NA7TEiPX8Zebw8XLkv
fNikssA7PGdv
=nY9p
-----END PGP SIGNATURE-----
Merge tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These fix the (Intel-specific) Performance and Energy Bias Hint (EPB)
handling and expose it to user space via sysfs, fix and clean up
several cpufreq drivers, add support for two new chips to the qoriq
cpufreq driver, fix, simplify and clean up the cpufreq core and the
schedutil governor, add support for "CPU" domains to the generic power
domains (genpd) framework and provide low-level PSCI firmware support
for that feature, fix the exynos cpuidle driver and fix a couple of
issues in the devfreq subsystem and clean it up.
Specifics:
- Fix the handling of Performance and Energy Bias Hint (EPB) on Intel
processors and expose it to user space via sysfs to avoid having to
access it through the generic MSR I/F (Rafael Wysocki).
- Improve the handling of global turbo changes made by the platform
firmware in the intel_pstate driver (Rafael Wysocki).
- Convert some slow-path static_cpu_has() callers to boot_cpu_has()
in cpufreq (Borislav Petkov).
- Fix the frequency calculation loop in the armada-37xx cpufreq
driver (Gregory CLEMENT).
- Fix possible object reference leaks in multuple cpufreq drivers
(Wen Yang).
- Fix kerneldoc comment in the centrino cpufreq driver (dongjian).
- Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan
Kumar).
- Add support for lx2160a and ls1028a to the qoriq cpufreq driver
(Vabhav Sharma, Yuantian Tang).
- Fix kobject memory leak in the cpufreq core (Viresh Kumar).
- Simplify the IOwait boosting in the schedutil cpufreq governor and
rework the TSC cpufreq notifier on x86 (Rafael Wysocki).
- Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin).
- Improve the cpufreq documentation, add SPDX license tags to some PM
documentation files and unify copyright notices in them (Rafael
Wysocki).
- Add support for "CPU" domains to the generic power domains (genpd)
framework and provide low-level PSCI firmware support for that
feature (Ulf Hansson).
- Rearrange the PSCI firmware support code and add support for
SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla).
- Improve genpd support for devices in multiple power domains (Ulf
Hansson).
- Unify target residency for the AFTR and coupled AFTR states in the
exynos cpuidle driver (Marek Szyprowski).
- Introduce new helper routine in the operating performance points
(OPP) framework (Andrew-sh.Cheng).
- Add support for passing on-die termination (ODT) and auto power
down parameters from the kernel to Trusted Firmware-A (TF-A) to the
rk3399_dmc devfreq driver (Enric Balletbo i Serra).
- Add tracing to devfreq (Lukasz Luba).
- Make the exynos-bus devfreq driver suspend all devices on system
shutdown (Marek Szyprowski).
- Fix a few minor issues in the devfreq subsystem and clean it up
somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring,
Saravana Kannan, Yangtao Li).
- Improve system wakeup diagnostics (Stephen Boyd).
- Rework filesystem sync messages emitted during system suspend and
hibernation (Harry Pan)"
* tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
cpufreq: Fix kobject memleak
cpufreq: armada-37xx: fix frequency calculation for opp
cpufreq: centrino: Fix centrino_setpolicy() kerneldoc comment
cpufreq: qoriq: add support for lx2160a
x86: tsc: Rework time_cpufreq_notifier()
PM / Domains: Allow to attach a CPU via genpd_dev_pm_attach_by_id|name()
PM / Domains: Search for the CPU device outside the genpd lock
PM / Domains: Drop unused in-parameter to some genpd functions
PM / Domains: Use the base device for driver_deferred_probe_check_state()
cpufreq: qoriq: Add ls1028a chip support
PM / Domains: Enable genpd_dev_pm_attach_by_id|name() for single PM domain
PM / Domains: Allow OF lookup for multi PM domain case from ->attach_dev()
PM / Domains: Don't kfree() the virtual device in the error path
cpufreq: Move ->get callback check outside of __cpufreq_get()
PM / Domains: remove unnecessary unlikely()
cpufreq: Remove needless bios_limit check in show_bios_limit()
drivers/cpufreq/acpi-cpufreq.c: This fixes the following checkpatch warning
firmware/psci: add support for SYSTEM_RESET2
PM / devfreq: add tracing for scheduling work
trace: events: add devfreq trace event file
...
Mostly just incremental improvements here:
- Introduce AT_HWCAP2 for advertising CPU features to userspace
- Expose SVE2 availability to userspace
- Support for "data cache clean to point of deep persistence" (DC PODP)
- Honour "mitigations=off" on the cmdline and advertise status via sysfs
- CPU timer erratum workaround (Neoverse-N1 #1188873)
- Introduce perf PMU driver for the SMMUv3 performance counters
- Add config option to disable the kuser helpers page for AArch32 tasks
- Futex modifications to ensure liveness under contention
- Rework debug exception handling to seperate kernel and user handlers
- Non-critical fixes and cleanup
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFGgACgkQt6xw3ITB
YzTicAf/TX1h1+ecbx4WJAa4qeiOCPoNpG9efldQumqJhKL44MR5bkhuShna5mwE
ptm5qUXkZCxLTjzssZKnbdbgwa3t+emW8Of3D91IfI9akiZbMoDx5FGgcNbqjazb
RLrhOFHwgontA38yppZN+DrL+sXbvif/CVELdHahkEx6KepSGaS2lmPXRmz/W56v
4yIRy/zxc3Dhjgfm3wKh72nBwoZdLiIc4mchd5pthNlR9E2idrYkQegG1C+gA00r
o8uZRVOWgoh7H+QJE+xLUc8PaNCg8xqRRXOuZYg9GOz6hh7zSWhm+f1nRz9S2tIR
gIgsCHNqoO2I3E1uJpAQXDGtt2kFhA==
=ulpJ
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Mostly just incremental improvements here:
- Introduce AT_HWCAP2 for advertising CPU features to userspace
- Expose SVE2 availability to userspace
- Support for "data cache clean to point of deep persistence" (DC PODP)
- Honour "mitigations=off" on the cmdline and advertise status via
sysfs
- CPU timer erratum workaround (Neoverse-N1 #1188873)
- Introduce perf PMU driver for the SMMUv3 performance counters
- Add config option to disable the kuser helpers page for AArch32 tasks
- Futex modifications to ensure liveness under contention
- Rework debug exception handling to seperate kernel and user
handlers
- Non-critical fixes and cleanup"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
Documentation: Add ARM64 to kernel-parameters.rst
arm64/speculation: Support 'mitigations=' cmdline option
arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSB
arm64: enable generic CPU vulnerabilites support
arm64: add sysfs vulnerability show for speculative store bypass
arm64: Fix size of __early_cpu_boot_status
clocksource/arm_arch_timer: Use arch_timer_read_counter to access stable counters
clocksource/arm_arch_timer: Remove use of workaround static key
clocksource/arm_arch_timer: Drop use of static key in arch_timer_reg_read_stable
clocksource/arm_arch_timer: Direcly assign set_next_event workaround
arm64: Use arch_timer_read_counter instead of arch_counter_get_cntvct
watchdog/sbsa: Use arch_timer_read_counter instead of arch_counter_get_cntvct
ARM: vdso: Remove dependency with the arch_timer driver internals
arm64: Apply ARM64_ERRATUM_1188873 to Neoverse-N1
arm64: Add part number for Neoverse N1
arm64: Make ARM64_ERRATUM_1188873 depend on COMPAT
arm64: Restrict ARM64_ERRATUM_1188873 mitigation to AArch32
arm64: mm: Remove pte_unmap_nested()
arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variable
arm64: compat: Reduce address limit for 64K pages
...
Remove mmiowb() from the kernel memory barrier API and instead, for
architectures that need it, hide the barrier inside spin_unlock() when
MMIO has been performed inside the critical section.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFaUACgkQt6xw3ITB
YzRICQgAiv7wF/yIbBhDOmCNCAKDO59chvFQWxXWdGk/aAB56kwKAMXJgLOvlMG/
VRuuLyParTFQETC3jaxKgnO/1hb+PZLDt2Q2KqixtjIzBypKUPWvK2sf6THhSRF1
GK0DBVUd1rCrWrR815+SPb8el4xXtdBzvAVB+Fx35PXVNpdRdqCkK+EQ6UnXGokm
rXXHbnfsnquBDtmb4CR4r2beH+aNElXbdt0Kj8VcE5J7f7jTdW3z6Q9WFRvdKmK7
yrsxXXB2w/EsWXOwFp0SLTV5+fgeGgTvv8uLjDw+SG6t0E0PebxjNAflT7dPrbYL
WecjKC9WqBxrGY+4ew6YJP70ijLBCw==
=aC8m
-----END PGP SIGNATURE-----
Merge tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull mmiowb removal from Will Deacon:
"Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
Remove mmiowb() from the kernel memory barrier API and instead, for
architectures that need it, hide the barrier inside spin_unlock() when
MMIO has been performed inside the critical section.
The only relatively recent changes have been addressing review
comments on the documentation, which is in a much better shape thanks
to the efforts of Ben and Ingo.
I was initially planning to split this into two pull requests so that
you could run the coccinelle script yourself, however it's been plain
sailing in linux-next so I've just included the whole lot here to keep
things simple"
* tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (23 commits)
docs/memory-barriers.txt: Update I/O section to be clearer about CPU vs thread
docs/memory-barriers.txt: Fix style, spacing and grammar in I/O section
arch: Remove dummy mmiowb() definitions from arch code
net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
i40iw: Redefine i40iw_mmiowb() to do nothing
scsi/qla1280: Remove stale comment about mmiowb()
drivers: Remove explicit invocations of mmiowb()
drivers: Remove useless trailing comments from mmiowb() invocations
Documentation: Kill all references to mmiowb()
riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
m68k/io: Remove useless definition of mmiowb()
nds32/io: Remove useless definition of mmiowb()
x86/io: Remove useless definition of mmiowb()
arm64/io: Remove useless definition of mmiowb()
ARM/io: Remove useless definition of mmiowb()
mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
...
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this
will allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJc0CCEAAoJEDjwexyKj9rgjmkH/A3e2drvuP/hSF3xfCKTQFdx
/PoLHQVCqENB3HU3FA/ljoXuG6jMgwj61looqlxBNumXFpIfTg0E1JC5S4wRGJ+K
cOVhIKV53gcuZkRcCJQp0WMnGzpk1Daf7iYXYmAl+7e+mREUPxOuJ0Ei6vXvRGZS
8cQrUCGrtPgkAeLlndypHI2M2TDDGJIMczOGbOZau8+8Lo7Wq9zt5y0h/v0ew37g
ogA0eGh6koU1435dt2pclZRiZ1XOcar3Uin9ioT+RnSgJ4pr1Pza/F6IGO0RdQa+
rva990lqGFp5r9lE4rMCwK9LWb/rfHdVPd35t9XPwphnQ/ORoWUwLk3uc5XOHow=
=dbuy
-----END PGP SIGNATURE-----
Merge tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this will
allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
* tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (98 commits)
s390/vdso: drop unnecessary cc-ldoption
s390: fix clang -Wpointer-sign warnigns in boot code
s390: drop CONFIG_VIRT_TO_BUS
s390: boot, purgatory: pass $(CLANG_FLAGS) where needed
s390: only build for new CPUs with clang
s390: simplify disabled_wait
s390/ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
s390/unwind: introduce stack unwind API
s390/opcodes: add missing instructions to the disassembler
s390/bug: add entry size to the __bug_table section
s390: use proper expoline sections for .dma code
s390/nospec: rename assembler generated expoline thunks
s390: add missing ENDPROC statements to assembler functions
locking/lockdep: check for freed initmem in static_obj()
s390/kernel: add support for kernel address space layout randomization (KASLR)
s390/kernel: introduce .dma sections
s390/sclp: do not use static sccbs
s390/kprobes: use static buffer for insn_page
s390/kernel: convert SYSCALL and PGM_CHECK handlers to .quad
s390/kernel: build a relocatable kernel
...
Pull x86 mm updates from Ingo Molnar:
"The changes in here are:
- text_poke() fixes and an extensive set of executability lockdowns,
to (hopefully) eliminate the last residual circumstances under
which we are using W|X mappings even temporarily on x86 kernels.
This required a broad range of surgery in text patching facilities,
module loading, trampoline handling and other bits.
- tweak page fault messages to be more informative and more
structured.
- remove DISCONTIGMEM support on x86-32 and make SPARSEMEM the
default.
- reduce KASLR granularity on 5-level paging kernels from 512 GB to
1 GB.
- misc other changes and updates"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/mm: Initialize PGD cache during mm initialization
x86/alternatives: Add comment about module removal races
x86/kprobes: Use vmalloc special flag
x86/ftrace: Use vmalloc special flag
bpf: Use vmalloc special flag
modules: Use vmalloc special flag
mm/vmalloc: Add flag for freeing of special permsissions
mm/hibernation: Make hibernation handle unmapped pages
x86/mm/cpa: Add set_direct_map_*() functions
x86/alternatives: Remove the return value of text_poke_*()
x86/jump-label: Remove support for custom text poker
x86/modules: Avoid breaking W^X while loading modules
x86/kprobes: Set instruction page as executable
x86/ftrace: Set trampoline pages as executable
x86/kgdb: Avoid redundant comparison of patched code
x86/alternatives: Use temporary mm for text poking
x86/alternatives: Initialize temporary mm for patching
fork: Provide a function for copying init_mm
uprobes: Initialize uprobes earlier
x86/mm: Save debug registers when loading a temporary mm
...
Pull timer updates from Ingo Molnar:
"This cycle had the following changes:
- Timer tracing improvements (Anna-Maria Gleixner)
- Continued tasklet reduction work: remove the hrtimer_tasklet
(Thomas Gleixner)
- Fix CPU hotplug remove race in the tick-broadcast mask handling
code (Thomas Gleixner)
- Force upper bound for setting CLOCK_REALTIME, to fix ABI
inconsistencies with handling values that are close to the maximum
supported and the vagueness of when uptime related wraparound might
occur. Make the consistent maximum the year 2232 across all
relevant ABIs and APIs. (Thomas Gleixner)
- various cleanups and smaller fixes"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Fix typos in comments
tick/broadcast: Fix warning about undefined tick_broadcast_oneshot_offline()
timekeeping: Force upper bound for setting CLOCK_REALTIME
timer/trace: Improve timer tracing
timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps
timer: Move trace point to get proper index
tick/sched: Update tick_sched struct documentation
tick: Remove outgoing CPU from broadcast masks
timekeeping: Consistently use unsigned int for seqcount snapshot
softirq: Remove tasklet_hrtimer
xfrm: Replace hrtimer tasklet with softirq hrtimer
mac80211_hwsim: Replace hrtimer tasklet with softirq hrtimer