Commit graph

8977 commits

Author SHA1 Message Date
Linus Torvalds
f3f19f939c Networking fixes for 5.18-rc7, including fixes from wireless,
and bluetooth. No outstanding fires.
 
 Current release - regressions:
 
  - eth: atlantic: always deep reset on pm op, fix null-deref
 
 Current release - new code bugs:
 
  - rds: use maybe_get_net() when acquiring refcount on TCP sockets
    [refinement of a previous fix]
 
  - eth: ocelot: mark traps with a bool instead of guessing type based
    on list membership
 
 Previous releases - regressions:
 
  - net: fix skipping features in for_each_netdev_feature()
 
  - phy: micrel: fix null-derefs on suspend/resume and probe
 
  - bcmgenet: check for Wake-on-LAN interrupt probe deferral
 
 Previous releases - always broken:
 
  - ipv4: drop dst in multicast routing path, prevent leaks
 
  - ping: fix address binding wrt vrf
 
  - net: fix wrong network header length when BPF protocol translation
    is used on skbs with a fraglist
 
  - bluetooth: fix the creation of hdev->name
 
  - rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
 
  - wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing
 
  - wifi: ath11k: reduce the wait time of 11d scan and hw scan while
    adding an interface
 
  - mac80211: fix rx reordering with non explicit / psmp ack policy
 
  - mac80211: reset MBSSID parameters upon connection
 
  - nl80211: fix races in nl80211_set_tx_bitrate_mask()
 
  - tls: fix context leak on tls_device_down
 
  - sched: act_pedit: really ensure the skb is writable
 
  - batman-adv: don't skb_split skbuffs with frag_list
 
  - eth: ocelot: fix various issues with TC actions (null-deref; bad
    stats; ineffective drops; ineffective filter removal)
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmJ9VBwACgkQMUZtbf5S
 IrumIg/+M4ZTUjz/34v2djiwivBHjfn3Aa2SepJ3516AqxauLUr4h4yfIcFXW8vA
 81PyZkpqLBdgGklt+1GQfNCGrJKgC4+sM0eyRATMl1nf5zgrc0bZfp+ZN1bZeuoy
 iRB7vwCnAPdLg997inxQeG11RYS+DUDDz68MJReqnqB6o5+xX87vcDEpyKaISTVo
 SlUe/SYEk7S/Erk37l4wlz8bpr3EoIy1pYc4fDZOhBa8gKOVmmXidI5pwcjZXSo5
 RQ9oH74/aaI7dgEQ4UvyAksCUrj17GRygBDyLE/p0eMc/2oV0lO6DQ3RGvh6ucPj
 wU4AAOI5SRrXiqSzbDFXLFvtN7wLuXU+s4a/xq0z/Z4w7CQb8HVH1S92lYIA+jkA
 Q8RCT3XY523Me8JWAyGJGl2PnbLqlGhFdqucmvHpxXOKy2F+QC69Z44kGWIaBliR
 C3zoUuMnAV41DIp7YXbxvxjMk8GtNUXU4Vdg7+v1DlZxxe+DDH4cxB5YmReq+bNC
 60p79/vpNzJgYuZgtlk4xi1YNcOywVJb0L6zXus5p4SU/Ud63H7NHufndCnf/X6Q
 9W2vWrTJb9r1+cWRBKvxyWdqyUT68UgBAbhY2WvPdmjbP5UkW7YLCMfBOsAg7vPi
 GZeroxYIg/NwyqWm/aUuPdiNgD73Qq4Zvan5JmavP/PwX8AP9eA=
 =jt7U
 -----END PGP SIGNATURE-----

Merge tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless, and bluetooth.

  No outstanding fires.

  Current release - regressions:

   - eth: atlantic: always deep reset on pm op, fix null-deref

  Current release - new code bugs:

   - rds: use maybe_get_net() when acquiring refcount on TCP sockets
     [refinement of a previous fix]

   - eth: ocelot: mark traps with a bool instead of guessing type based
     on list membership

  Previous releases - regressions:

   - net: fix skipping features in for_each_netdev_feature()

   - phy: micrel: fix null-derefs on suspend/resume and probe

   - bcmgenet: check for Wake-on-LAN interrupt probe deferral

  Previous releases - always broken:

   - ipv4: drop dst in multicast routing path, prevent leaks

   - ping: fix address binding wrt vrf

   - net: fix wrong network header length when BPF protocol translation
     is used on skbs with a fraglist

   - bluetooth: fix the creation of hdev->name

   - rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition

   - wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing

   - wifi: ath11k: reduce the wait time of 11d scan and hw scan while
     adding an interface

   - mac80211: fix rx reordering with non explicit / psmp ack policy

   - mac80211: reset MBSSID parameters upon connection

   - nl80211: fix races in nl80211_set_tx_bitrate_mask()

   - tls: fix context leak on tls_device_down

   - sched: act_pedit: really ensure the skb is writable

   - batman-adv: don't skb_split skbuffs with frag_list

   - eth: ocelot: fix various issues with TC actions (null-deref; bad
     stats; ineffective drops; ineffective filter removal)"

* tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
  tls: Fix context leak on tls_device_down
  net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()
  net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending
  net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()
  mlxsw: Avoid warning during ip6gre device removal
  net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral
  net: ethernet: mediatek: ppe: fix wrong size passed to memset()
  Bluetooth: Fix the creation of hdev->name
  i40e: i40e_main: fix a missing check on list iterator
  net/sched: act_pedit: really ensure the skb is writable
  s390/lcs: fix variable dereferenced before check
  s390/ctcm: fix potential memory leak
  s390/ctcm: fix variable dereferenced before check
  net: atlantic: verify hw_head_ lies within TX buffer ring
  net: atlantic: add check for MAX_SKB_FRAGS
  net: atlantic: reduce scope of is_rsc_complete
  net: atlantic: fix "frag[0] not initialized"
  net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
  net: phy: micrel: Fix incorrect variable type in micrel
  decnet: Use container_of() for struct dn_neigh casts
  ...
2022-05-12 11:51:45 -07:00
Feng Zhou
07343110b2 bpf: add bpf_map_lookup_percpu_elem for percpu map
Add new ebpf helpers bpf_map_lookup_percpu_elem.

The implementation method is relatively simple, refer to the implementation
method of map_lookup_elem of percpu map, increase the parameters of cpu, and
obtain it according to the specified cpu.

Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-11 18:16:54 -07:00
Jakub Kicinski
8bf6008c8b wireless fixes for v5.18
Second set of fixes for v5.18 and hopefully the last one. We have a
 new iwlwifi maintainer, a fix to rfkill ioctl interface and important
 fixes to both stack and two drivers.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmJ72akRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZso0wf+PmQev6QTWG/LPBfcIp7H6upRMS09/+Se
 SEhS9UAE/qh4GJM0Mn1XE6T5mcokQZ/Ck5uaWT3Be9Dwbbk/ucAvYGEvf4OxmUEM
 wwtDaA0BaFwS417iW5FLLAsu2ascN8yeje/+yK+Uu9DpB2KxXSIQB7OJpy3/HVAj
 jEavgZN/fQEiTba9/JDa6DBMm2RVAZrmc+1sB5FakUocVTuN2pZAkM+lOBXvlHS4
 4jd/KEFDyto2BMOR46IOwXTNKgBk2UovqeYFrTdonMz7W7nhzWJcguFU0e6rej5q
 MCWxT8PryZ5yD5wl7pfOKZRTqFf+Mb+Up+yFipEEgd2SYnwxjplkaQ==
 =qa3F
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v5.18

Second set of fixes for v5.18 and hopefully the last one. We have a
new iwlwifi maintainer, a fix to rfkill ioctl interface and important
fixes to both stack and two drivers.

* tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
  nl80211: fix locking in nl80211_set_tx_bitrate_mask()
  mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection
  mac80211_hwsim: fix RCU protected chanctx access
  mailmap: update Kalle Valo's email
  mac80211: Reset MBSSID parameters upon connection
  cfg80211: retrieve S1G operating channel number
  nl80211: validate S1G channel width
  mac80211: fix rx reordering with non explicit / psmp ack policy
  ath11k: reduce the wait time of 11d scan and hw scan while add interface
  MAINTAINERS: update iwlwifi driver maintainer
  iwlwifi: iwl-dbg: Use del_timer_sync() before freeing
====================

Link: https://lore.kernel.org/r/20220511154535.A1A12C340EE@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11 17:33:01 -07:00
Anuj Gupta
f569add471 nvme: add vectored-io support for uring-cmd
wire up support for async passthru that takes an array of buffers (using
iovec). Exposed via a new op NVME_URING_CMD_IO_VEC. Same 'struct
nvme_uring_cmd' is to be used with -

1. cmd.addr as base address of user iovec array
2. cmd.data_len as count of iovec array elements

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-6-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-11 07:41:19 -06:00
Kanchan Joshi
456cba386e nvme: wire-up uring-cmd support for io-passthru on char-device.
Introduce handler for fops->uring_cmd(), implementing async passthru
on char device (/dev/ngX). The handler supports newly introduced
operation NVME_URING_CMD_IO. This operates on a new structure
nvme_uring_cmd, which is similar to struct nvme_passthru_cmd64 but
without the embedded 8b result field. This field is not needed since
uring-cmd allows to return additional result via big-CQE.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-5-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-11 07:41:13 -06:00
Jens Axboe
ee692a21e9 fs,io_uring: add infrastructure for uring-cmd
file_operations->uring_cmd is a file private handler.
This is somewhat similar to ioctl but hopefully a lot more sane and
useful as it can be used to enable many io_uring capabilities for the
underlying operation.

IORING_OP_URING_CMD is a file private kind of request. io_uring doesn't
know what is in this command type, it's for the provider of ->uring_cmd()
to deal with.

Co-developed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-2-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-11 07:40:47 -06:00
Basavaraj Natikar
a8641d7d85 HID: amd_sfh: Move bus declaration outside of amd-sfh
This should allow external drivers to reference this bus ID
reservation and detect data coming from amd-sfh.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-05-11 14:16:26 +02:00
Kui-Feng Lee
2fcc82411e bpf, x86: Attach a cookie to fentry/fexit/fmod_ret/lsm.
Pass a cookie along with BPF_LINK_CREATE requests.

Add a bpf_cookie field to struct bpf_tracing_link to attach a cookie.
The cookie of a bpf_tracing_link is available by calling
bpf_get_attach_cookie when running the BPF program of the attached
link.

The value of a cookie will be set at bpf_tramp_run_ctx by the
trampoline of the link.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220510205923.3206889-4-kuifeng@fb.com
2022-05-10 21:58:31 -07:00
Kui-Feng Lee
f7e0beaf39 bpf, x86: Generate trampolines from bpf_tramp_links
Replace struct bpf_tramp_progs with struct bpf_tramp_links to collect
struct bpf_tramp_link(s) for a trampoline.  struct bpf_tramp_link
extends bpf_link to act as a linked list node.

arch_prepare_bpf_trampoline() accepts a struct bpf_tramp_links to
collects all bpf_tramp_link(s) that a trampoline should call.

Change BPF trampoline and bpf_struct_ops to pass bpf_tramp_links
instead of bpf_tramp_progs.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220510205923.3206889-2-kuifeng@fb.com
2022-05-10 17:50:40 -07:00
Linus Torvalds
feb9c5e19e virtio: last minute fixup
A last minute fixup of the transitional ID numbers.
 Important to get these right - if users start to depend on the
 wrong ones they are very hard to fix.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmJ6U4YPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpIHgH/01k2geAxpoi/dFydlaQzNXzQHiVaredPVNY
 sR5iudoVQJvpQpCHi5pTcesVl3xPwKzgm4WPRO1OonQ/F435b9AmpZT/z+VJUkuR
 tBBK3TPhROCcSB2p0hvYbJdJFESnDPKUHdhAsqqwMz/BrdAMEvDfPKXF39/i61bP
 YEEbJHCrH2BcwAcq2VcRtG7d1ODF/ETNEZEqod3K9LFk8ruEnzSpwl6zAxycJG0c
 n8Hrsx3CotgY7PQ89kNsIp41JFiEmHCC01luGOSGUKJOKnKbIrsocVk8O2mPbtS/
 UuJ3m/6BsBPIJZ4aUEOZSmWu6aaQkY04VUgbuBawozlWCiasMfY=
 =fmKH
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fix from Michael Tsirkin:
 "A last minute fixup of the transitional ID numbers.

  Important to get these right - if users start to depend on the wrong
  ones they are very hard to fix"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio: fix virtio transitional ids
2022-05-10 11:15:05 -07:00
Kaixi Fan
26101f5ab6 bpf: Add source ip in "struct bpf_tunnel_key"
Add tunnel source ip field in "struct bpf_tunnel_key". Add related code
to set and get tunnel source field.

Signed-off-by: Kaixi Fan <fankaixi.li@bytedance.com>
Link: https://lore.kernel.org/r/20220430074844.69214-2-fankaixi.li@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-10 10:49:03 -07:00
Christoph Hellwig
c23d47abee loop: remove most the top-of-file boilerplate comment from the UAPI header
Just leave the SPDX marker and the copyright notice and remove the
irrelevant rest.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220419063303.583106-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-10 06:30:05 -06:00
Shunsuke Mie
7ff960a6fe virtio: fix virtio transitional ids
This commit fixes the transitional PCI device ID.

Fixes: d61914ea6a ("virtio: update virtio id table, add transitional ids")
Signed-off-by: Shunsuke Mie <mie@igel.co.jp>
Link: https://lore.kernel.org/r/20220510102723.87666-1-mie@igel.co.jp
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-10 07:22:28 -04:00
Carlos Llamas
bd32889e84 binder: add BINDER_GET_EXTENDED_ERROR ioctl
Provide a userspace mechanism to pull precise error information upon
failed operations. Extending the current error codes returned by the
interfaces allows userspace to better determine the course of action.
This could be for instance, retrying a failed transaction at a later
point and thus offloading the error handling from the driver.

Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20220429235644.697372-3-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-09 15:43:24 +02:00
Stefan Roesch
7a51e5b44b io_uring: support CQE32 in io_uring_cqe
This adds the big_cqe array to the struct io_uring_cqe to support large
CQE's.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-2-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09 06:35:33 -06:00
Jens Axboe
ebdeb7c01d io_uring: add support for 128-byte SQEs
Normal SQEs are 64-bytes in length, which is fine for all the commands
we support. However, in preparation for supporting passthrough IO,
provide an option for setting up a ring with 128-byte SQEs.

We continue to use the same type for io_uring_sqe, it's marked and
commented with a zero sized array pad at the end. This provides up
to 80 bytes of data for a passthrough command - 64 bytes for the
extra added data, and 16 bytes available at the end of the existing
SQE.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09 06:35:33 -06:00
Jens Axboe
b5ba65df47 Merge branch 'for-5.19/io_uring-socket' into for-5.19/io_uring-passthrough
* for-5.19/io_uring-socket:
  io_uring: use the text representation of ops in trace
  io_uring: rename op -> opcode
  io_uring: add io_uring_get_opcode
  io_uring: add type to op enum
  io_uring: add socket(2) support
  net: add __sys_socket_file()
  io_uring: fix trace for reduced sqe padding
  io_uring: add fgetxattr and getxattr support
  io_uring: add fsetxattr and setxattr support
  fs: split off do_getxattr from getxattr
  fs: split off setxattr_copy and do_setxattr function from setxattr
2022-05-09 06:35:28 -06:00
Jens Axboe
1308689906 Merge branch 'for-5.19/io_uring' into for-5.19/io_uring-passthrough
* for-5.19/io_uring: (85 commits)
  io_uring: don't clear req->kbuf when buffer selection is done
  io_uring: eliminate the need to track provided buffer ID separately
  io_uring: move provided buffer state closer to submit state
  io_uring: move provided and fixed buffers into the same io_kiocb area
  io_uring: abstract out provided buffer list selection
  io_uring: never call io_buffer_select() for a buffer re-select
  io_uring: get rid of hashed provided buffer groups
  io_uring: always use req->buf_index for the provided buffer group
  io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't set
  io_uring: kill io_rw_buffer_select() wrapper
  io_uring: make io_buffer_select() return the user address directly
  io_uring: kill io_recv_buffer_select() wrapper
  io_uring: use 'sr' vs 'req->sr_msg' consistently
  io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg
  io_uring: check IOPOLL/ioprio support upfront
  io_uring: replace smp_mb() with smp_mb__after_atomic() in io_sq_thread()
  io_uring: add IORING_SETUP_TASKRUN_FLAG
  io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used
  io_uring: set task_work notify method at init time
  io-wq: use __set_notify_signal() to wake workers
  ...
2022-05-09 06:35:11 -06:00
Gleb Fotengauer-Malinovskiy
a36e07dfe6 rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
The definition of RFKILL_IOCTL_MAX_SIZE introduced by commit
54f586a915 ("rfkill: make new event layout opt-in") is unusable
since it is based on RFKILL_IOC_EXT_SIZE which has not been defined.
Fix that by replacing the undefined constant with the constant which
is intended to be used in this definition.

Fixes: 54f586a915 ("rfkill: make new event layout opt-in")
Cc: stable@vger.kernel.org # 5.11+
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Link: https://lore.kernel.org/r/20220506172454.120319-1-glebfm@altlinux.org
[add commit message provided later by Dmitry]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-05-09 14:00:07 +02:00
Mickaël Salaün
6cc2df8e3a
landlock: Add clang-format exceptions
In preparation to a following commit, add clang-format on and
clang-format off stanzas around constant definitions.  This enables to
keep aligned values, which is much more readable than packed
definitions.

Link: https://lore.kernel.org/r/20220506160513.523257-2-mic@digikod.net
Cc: stable@vger.kernel.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2022-05-09 12:31:05 +02:00
Arnd Bergmann
728c0d2941 TEE cleanup
Removes the old and unused TEE_IOCTL_SHM_* flags
 Removes unused the unused tee_shm_va2pa() and tee_shm_pa2va() functions
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCgA4FiEEFV+gSSXZJY9ZyuB5LinzTIcAHJcFAmJ0xD4aHGplbnMud2lr
 bGFuZGVyQGxpbmFyby5vcmcACgkQLinzTIcAHJfWjA/+PG699Ct5WRnorEtCliYU
 NY2YCt5HhKgIMqLbglNLXvWZ5DW/xE8JTfUsBJt7WRG9arha+AZVhJ8wwuoK+MaW
 mEj0NxQLjPCFsxaCtu1fSX8ZJUQKRyqZzNtU91S1/qTz9YBOcmS5shFXT/+EOEwG
 i6B5Y4bXzAsB8h7Lznt9/IyP5VSLHtdgl1auCTdv57m5bwNjsMQBqKJZsR/rgoAo
 4pBAEd1YhDGOuFN2bSONfJx2618753jCA1oxpj5/FLKG93R5Iypu36yRvzu+ujO5
 O1cY4kYNsGCZLHeVgqDkJwDQglfU5GcwSBnGHo+SeJxi8Zj8VRPbm5QECaRASTk9
 y1QSlgBLITxKYJBMjO7HuZ3RrA1wjcCgUrJvXg5KKbkQwttmciFuSewh7/ubRp3V
 tGvEnIH6bjiYMkb4Aiolc9CBhQfg3yCtiJk9VUpK0KiV3Cs1C8cBjATvYQoDCXQX
 OjxcgIXcFnkVzEHb6px7/kAuZghUPSr1A+qJDKIFeaEk9csrYhP+Gmi0XFDMV6Ws
 wuSMUJ9bWg0xw1xo+TVyYy8AQcwWBeSIUC/yq0djW54hPJmTQY9PNdSdYUK/dIN4
 ERN9ncf+BQXd9rr9q0+af2oNOG9Ee3D6jqVNBxMfn1p9ntI5W7jjIKhVDJxOM4R6
 x++gc+FREDet86HKOvOjhhY=
 =7v+l
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmJ1fsQACgkQmmx57+YA
 GNlmThAAvxZ+sXy1ZzpGnn3R04DlD/rCt7whxYfvcuHMRhV4uPc0cPdrFaVFnoh7
 3uGpG+A9TCK9/g77p+pZ4yKpDPUO8HXWci5ptV8TMeUaZDXf+UaXM6y2qpsxBYuF
 wB5aAa0vUZ8ljqbn6KskaiLVvRmYaSSNaEAh8M1cO1dbQHY6D8vtKjkBtmh2zDMZ
 dblGYhaiPEUAyCMOUdVuR0RTQWTQ+AbtGI+n88DnhgnREnuYo1qQMRwrIerTDetn
 f+hMqU6u8dfoo/GeK732YA0DU3Y2PXfE3hRgWlBTYYKClA50O35rHYlYtoOYTmCC
 i1BdgAWeR+oiMJYxaWasIr4gxfFbWfdNSBchVqr/byiLOwhnL0++Jp4YO8ESbery
 T0cDyvG784CagRABw+Yxe0mSMsz6BznQqYGBh+L5vkNXHsEj0KSoj4l/zzL9e0bj
 anqYYxf8glwT13yeWsvHQufOl7XWoLhZVrwB7nUJFt2gJH6CHITyFiFRKw8/PkWb
 lf6KJ7z8gzGeNM7euDEcGUV3emADV/yN1OHohWMI4CjfeKxLfAAkIweKJAHg9/2S
 SZEoa/68LW1+Dp76QMC+ZZBxqIgWzjZcYfZMEuwP6iYwU2EtFSn0OZZ3XUOb7ngm
 fY2MV9ZE03NDXRdP1u4jgLtHKMIOW/c4RXC938WCPHzCQfXgyvo=
 =khay
 -----END PGP SIGNATURE-----

Merge tag 'tee-cleanup-for-v5.19' of https://git.linaro.org/people/jens.wiklander/linux-tee into arm/drivers

TEE cleanup

Removes the old and unused TEE_IOCTL_SHM_* flags
Removes unused the unused tee_shm_va2pa() and tee_shm_pa2va() functions

* tag 'tee-cleanup-for-v5.19' of https://git.linaro.org/people/jens.wiklander/linux-tee:
  tee: remove flags TEE_IOCTL_SHM_MAPPED and TEE_IOCTL_SHM_DMA_BUF
  tee: remove tee_shm_va2pa() and tee_shm_pa2va()

Link: https://lore.kernel.org/r/20220506070328.GA1344495@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-06 22:02:12 +02:00
Jens Axboe
0455d4ccec io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg
If IORING_RECVSEND_POLL_FIRST is set for recv/recvmsg or send/sendmsg,
then we arm poll first rather than attempt a receive or send upfront.
This can be useful if we expect there to be no data (or space) available
for the request, as we can then avoid wasting time on the initial
issue attempt.

Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-05 17:09:31 -06:00
Jakub Kicinski
c4a67a21a6 Revert "Merge branch 'mlxsw-line-card-model'"
This reverts commit 5e927a9f4b, reversing
changes made to cfc1d91a7d.

The discussion is still ongoing so let's remove the uAPI
until the discussion settles.

Link: https://lore.kernel.org/all/20220425090021.32e9a98f@kernel.org/
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220504154037.539442-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 15:47:23 -07:00
Jakub Kicinski
c8227d568d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/testing/selftests/net/forwarding/Makefile
  f62c5acc80 ("selftests/net/forwarding: add missing tests to Makefile")
  50fe062c80 ("selftests: forwarding: new test, verify host mdb entries")
https://lore.kernel.org/all/20220502111539.0b7e4621@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05 13:03:18 -07:00
Muna Sinada
36f8423597 cfg80211: support disabling EHT mode
Allow userspace to disable EHT mode during association.

Signed-off-by: Muna Sinada <quic_msinada@quicinc.com>
Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Link: https://lore.kernel.org/r/20220323224636.20211-1-quic_alokad@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-05-04 22:50:01 +02:00
Florian Westphal
702c2f646d mptcp: netlink: allow userspace-driven subflow establishment
This allows userspace to tell kernel to add a new subflow to an existing
mptcp connection.

Userspace provides the token to identify the mptcp-level connection
that needs a change in active subflows and the local and remote
addresses of the new or the to-be-removed subflow.

MPTCP_PM_CMD_SUBFLOW_CREATE requires the following parameters:
{ token, { loc_id, family, loc_addr4 | loc_addr6 }, { family, rem_addr4 |
rem_addr6, rem_port }

MPTCP_PM_CMD_SUBFLOW_DESTROY requires the following parameters:
{ token, { family, loc_addr4 | loc_addr6, loc_port }, { family, rem_addr4 |
rem_addr6, rem_port }

Acked-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-04 10:49:32 +01:00
Kishen Maloor
d9a4594eda mptcp: netlink: Add MPTCP_PM_CMD_REMOVE
This change adds a MPTCP netlink command for issuing a
REMOVE_ADDR signal for an address over the chosen MPTCP
connection from a userspace path manager.

The command requires the following parameters: {token, loc_id}.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-04 10:49:31 +01:00
Kishen Maloor
9ab4807c84 mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE
This change adds a MPTCP netlink interface for issuing
ADD_ADDR advertisements over the chosen MPTCP connection from a
userspace path manager.

The command requires the following parameters:
{ token, { loc_id, family, daddr4 | daddr6 [, dport] } [, if_idx],
flags[signal] }.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-04 10:49:31 +01:00
Dipen Patel
2068339a6c gpiolib: cdev: Add hardware timestamp clock type
This patch adds new clock type for the GPIO controller which can
timestamp gpio lines in using hardware means. To expose such
functionalities to the userspace, code has been added where
during line create or set config API calls, it checks for new
clock type and if requested, calls HTE API. During line change
event, the HTE subsystem pushes timestamp data to userspace
through gpiolib-cdev.

Signed-off-by: Dipen Patel <dipenp@nvidia.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2022-05-04 11:06:13 +02:00
Marc Zyngier
4b88524c47 Merge remote-tracking branch 'arm64/for-next/sme' into kvmarm-master/next
Merge arm64's SME branch to resolve conflicts with the WFxT branch.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-04 09:38:32 +01:00
Oliver Upton
bfbab44568 KVM: arm64: Implement PSCI SYSTEM_SUSPEND
ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
software to request that a system be placed in the deepest possible
low-power state. Effectively, software can use this to suspend itself to
RAM.

Unfortunately, there really is no good way to implement a system-wide
PSCI call in KVM. Any precondition checks done in the kernel will need
to be repeated by userspace since there is no good way to protect a
critical section that spans an exit to userspace. SYSTEM_RESET and
SYSTEM_OFF are equally plagued by this issue, although no users have
seemingly cared for the relatively long time these calls have been
supported.

The solution is to just make the whole implementation userspace's
problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that
indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND.
Additionally, add a CAP to get buy-in from userspace for this new exit
type.

Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in.
If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide
explicit documentation of userspace's responsibilites for the exit and
point to the PSCI specification to describe the actual PSCI call.

Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com
2022-05-04 09:28:45 +01:00
Oliver Upton
7b33a09d03 KVM: arm64: Add support for userspace to suspend a vCPU
Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
is in a suspended state. In the suspended state the vCPU will block
until a wakeup event (pending interrupt) is recognized.

Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
userspace that KVM has recognized one such wakeup event. It is the
responsibility of userspace to then make the vCPU runnable, or leave it
suspended until the next wakeup event.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com
2022-05-04 09:28:45 +01:00
Kishen Maloor
41b3c69bf9 mptcp: expose server_side attribute in MPTCP netlink events
This change records the 'server_side' attribute of MPTCP_EVENT_CREATED
and MPTCP_EVENT_ESTABLISHED events to inform their recipient about the
Client/Server role of the running MPTCP application.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/246
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03 16:54:55 -07:00
Sargun Dhillon
c2aa2dfef2 seccomp: Add wait_killable semantic to seccomp user notifier
This introduces a per-filter flag (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV)
that makes it so that when notifications are received by the supervisor the
notifying process will transition to wait killable semantics. Although wait
killable isn't a set of semantics formally exposed to userspace, the
concept is searchable. If the notifying process is signaled prior to the
notification being received by the userspace agent, it will be handled as
normal.

One quirk about how this is handled is that the notifying process
only switches to TASK_KILLABLE if it receives a wakeup from either
an addfd or a signal. This is to avoid an unnecessary wakeup of
the notifying task.

The reasons behind switching into wait_killable only after userspace
receives the notification are:
* Avoiding unncessary work - Often, workloads will perform work that they
  may abort (request racing comes to mind). This allows for syscalls to be
  aborted safely prior to the notification being received by the
  supervisor. In this, the supervisor doesn't end up doing work that the
  workload does not want to complete anyways.
* Avoiding side effects - We don't want the syscall to be interruptible
  once the supervisor starts doing work because it may not be trivial
  to reverse the operation. For example, unmounting a file system may
  take a long time, and it's hard to rollback, or treat that as
  reentrant.
* Avoid breaking runtimes - Various runtimes do not GC when they are
  during a syscall (or while running native code that subsequently
  calls a syscall). If many notifications are blocked, and not picked
  up by the supervisor, this can get the application into a bad state.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220503080958.20220-2-sargun@sargun.me
2022-05-03 14:11:58 -07:00
Linus Torvalds
b6b2648911 ARM:
* Take care of faults occuring between the PARange and
   IPA range by injecting an exception
 
 * Fix S2 faults taken from a host EL0 in protected mode
 
 * Work around Oops caused by a PMU access from a 32bit
   guest when PMU has been created. This is a temporary
   bodge until we fix it for good.
 
 x86:
 
 * Fix potential races when walking host page table
 
 * Fix shadow page table leak when KVM runs nested
 
 * Work around bug in userspace when KVM synthesizes leaf
   0x80000021 on older (pre-EPYC) or Intel processors
 
 Generic (but affects only RISC-V):
 
 * Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJuxI4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNjfQf/X4Rn6+sTkXRS0UHWEu+q9FjJ+mIx
 ZUWdbncf0brUB1RPAFfKaiQHo0t2Req+iTlpqZL0nVQ4myNUelHYube/sZdK/aBR
 WOjKZE0hugGyMH3js2bsTdgzbcphThyYAX97qGZNb7tsPGhBiw7c98KhjxlieJab
 D8LMNtM3uzPDxg422GfOm8ge2VbpySS5oRoGHfbD+4FiLYlXoCYfZuzlFwFFIGxw
 uHm5zzfX5jshayFpFYVSJHtARXlpwJWKz9yl63QjHrhVitW4m5j4re3aNfboL6Pd
 F5Z9K+DKhJLAH5cqmgiPPe2CGMvmRwKrN3F9MqV91xDPBT8J4rrowEeboQ==
 =SwSU
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Take care of faults occuring between the PARange and IPA range by
     injecting an exception

   - Fix S2 faults taken from a host EL0 in protected mode

   - Work around Oops caused by a PMU access from a 32bit guest when PMU
     has been created. This is a temporary bodge until we fix it for
     good.

  x86:

   - Fix potential races when walking host page table

   - Fix shadow page table leak when KVM runs nested

   - Work around bug in userspace when KVM synthesizes leaf 0x80000021
     on older (pre-EPYC) or Intel processors

  Generic (but affects only RISC-V):

   - Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: work around QEMU issue with synthetic CPUID leaves
  Revert "x86/mm: Introduce lookup_address_in_mm()"
  KVM: x86/mmu: fix potential races when walking host page table
  KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
  KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR
  KVM: arm64: Inject exception on out-of-IPA-range translation fault
  KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not set
  KVM: arm64: Handle host stage-2 faults from 32-bit EL0
2022-05-01 11:49:32 -07:00
Alexandru Tachici
3da8ffd854 net: phy: Add 10BASE-T1L support in phy-c45
This patch is needed because the BASE-T1 uses different registers
for status, control and advertisement to those already
employed in the existing phy-c45 functions.

Where required, genphy_c45 functions will now check whether
the device supports BASE-T1 and use the specific registers
instead: 45.2.7.19 BASE-T1 AN control register,
45.2.7.20 BASE-T1 AN status, 45.2.7.21 BASE-T1 AN
advertisement register, 45.2.7.22 BASE-T1 AN LP Base
Page ability register, 45.2.1.185 BASE-T1 PMA/PMD control
register.

Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-01 17:45:35 +01:00
Alexandru Tachici
1b020e448e net: phy: Add BaseT1 auto-negotiation registers
Added BASE-T1 AN advertisement register (Registers 7.514, 7.515, and
7.516) and BASE-T1 AN LP Base Page ability register (Registers 7.517,
7.518, and 7.519).

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-01 17:45:35 +01:00
Alexandru Tachici
909b4f2bf7 net: phy: Add 10-BaseT1L registers
The 802.3gc specification defines the 10-BaseT1L link
mode for ethernet trafic on twisted wire pair.

PMA status register can be used to detect if the phy supports
2.4 V TX level and PCS control register can be used to
enable/disable PCS level loopback.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-01 17:45:35 +01:00
Alexandru Tachici
3254e0b9eb ethtool: Add 10base-T1L link mode entry
Add entry for the 10base-T1L full duplex mode.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-01 17:45:35 +01:00
Jens Axboe
ef060ea9e4 io_uring: add IORING_SETUP_TASKRUN_FLAG
If IORING_SETUP_COOP_TASKRUN is set to use cooperative scheduling for
running task_work, then IORING_SETUP_TASKRUN_FLAG can be set so the
application can tell if task_work is pending in the kernel for this
ring. This allows use cases like io_uring_peek_cqe() to still function
appropriately, or for the task to know when it would be useful to
call io_uring_wait_cqe() to run pending events.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-7-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-30 08:39:54 -06:00
Jens Axboe
e1169f06d5 io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used
If this is set, io_uring will never use an IPI to deliver a task_work
notification. This can be used in the common case where a single task or
thread communicates with the ring, and doesn't rely on
io_uring_cqe_peek().

This provides a noticeable win in performance, both from eliminating
the IPI itself, but also from avoiding interrupting the submitting
task unnecessarily.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-6-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-30 08:39:54 -06:00
Jens Axboe
f548a12efd io_uring: return hint on whether more data is available after receive
For now just use a CQE flag for this, with big CQE support we could
return the actual number of bytes left.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-29 21:12:12 -06:00
Jens Axboe
27738039fc Merge branch 'for-5.19/io_uring-socket' into for-5.19/io_uring-net
* for-5.19/io_uring-socket: (73 commits)
  io_uring: use the text representation of ops in trace
  io_uring: rename op -> opcode
  io_uring: add io_uring_get_opcode
  io_uring: add type to op enum
  io_uring: add socket(2) support
  net: add __sys_socket_file()
  io_uring: fix trace for reduced sqe padding
  io_uring: add fgetxattr and getxattr support
  io_uring: add fsetxattr and setxattr support
  fs: split off do_getxattr from getxattr
  fs: split off setxattr_copy and do_setxattr function from setxattr
  io_uring: return an error when cqe is dropped
  io_uring: use constants for cq_overflow bitfield
  io_uring: rework io_uring_enter to simplify return value
  io_uring: trace cqe overflows
  io_uring: add trace support for CQE overflow
  io_uring: allow re-poll if we made progress
  io_uring: support MSG_WAITALL for IORING_OP_SEND(MSG)
  io_uring: add support for IORING_ASYNC_CANCEL_ANY
  io_uring: allow IORING_OP_ASYNC_CANCEL with 'fd' key
  ...
2022-04-29 21:10:52 -06:00
Dr. Thomas Orgis
0e0af57e0e taskstats: version 12 with thread group and exe info
The task exit struct needs some crucial information to be able to provide
an enhanced version of process and thread accounting.  This change
provides:

1. ac_tgid in additon to ac_pid
2. thread group execution walltime in ac_tgetime
3. flag AGROUP in ac_flag to indicate the last task
   in a thread group / process
4. device ID and inode of task's /proc/self/exe in
   ac_exe_dev and ac_exe_inode
5. tools/accounting/procacct as demonstrator

When a task exits, taskstats are reported to userspace including the
task's pid and ppid, but without the id of the thread group this task is
part of.  Without the tgid, the stats of single tasks cannot be correlated
to each other as a thread group (process).

The taskstats documentation suggests that on process exit a data set
consisting of accumulated stats for the whole group is produced.  But such
an additional set of stats is only produced for actually multithreaded
processes, not groups that had only one thread, and also those stats only
contain data about delay accounting and not the more basic information
about CPU and memory resource usage.  Adding the AGROUP flag to be set
when the last task of a group exited enables determination of process end
also for single-threaded processes.

My applicaton basically does enhanced process accounting with summed
cputime, biggest maxrss, tasks per process.  The data is not available
with the traditional BSD process accounting (which is not designed to be
extensible) and the taskstats interface allows more efficient on-the-fly
grouping and summing of the stats, anyway, without intermediate disk
writes.

Furthermore, I do carry statistics on which exact program binary is used
how often with associated resources, getting a picture on how important
which parts of a collection of installed scientific software in different
versions are, and how well they put load on the machine.  This is enabled
by providing information on /proc/self/exe for each task.  I assume the
two 64-bit fields for device ID and inode are more appropriate than the
possibly large resolved path to keep the data volume down.

Add the tgid to the stats to complete task identification, the flag AGROUP
to mark the last task of a group, the group wallclock time, and
inode-based identification of the associated executable file.

Add tools/accounting/procacct.c as a simplified fork of getdelays.c to
demonstrate process and thread accounting.

[thomas.orgis@uni-hamburg.de: fix version number in comment]
  Link: https://lkml.kernel.org/r/20220405003601.7a5f6008@plasteblaster
Link: https://lkml.kernel.org/r/20220331004106.64e5616b@plasteblaster
Signed-off-by: Dr. Thomas Orgis <thomas.orgis@uni-hamburg.de>
Reviewed-by: Ismael Luceno <ismael@iodev.co.uk>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:38:03 -07:00
Linus Torvalds
66c2112b74 arm64 fix for -rc5
- Rename and reallocate the PT_ARM_MEMTAG_MTE ELF segment type
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmJryRkQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNOS/B/43vhg0Bz0sH9poU0r3YE6TfEeCTd2HwCBq
 Caf6VAM6K9Y7JwPSFkAgnmW2DOG1hif83mCaX47Dkb4bJZaUQpmzJ+ZsK0nCe/Wo
 GIBzTdjRLrl6LQAqcW860s2eE4Ac212LszDxdp5DXFp4HjzbSfsteUFWipnpyXEs
 KtA4Z991fbsj+FsWY42zqhFkMnnPO+6twT5okYN/S6tONrojCTZhRTKupBr4VRAk
 94xD+pJI/8EfyakEhNN2PabhCvlq0bEI/EB6pJMrL6GjMs5t0pbpowXHzMoctZuf
 /SkRgyILDNUPbAa6wGCJoV4ZcgFH7PDy80iDMwNscb/DgAO8xo6t
 =hafH
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Will Deacon:
 "Rename and reallocate the PT_ARM_MEMTAG_MTE ELF segment type.

  This is a fix to the MTE ELF ABI for a bug that was added during the
  most recent merge window as part of the coredump support.

  The issue is that the value assigned to the new PT_ARM_MEMTAG_MTE
  segment type has already been allocated to PT_AARCH64_UNWIND by the
  ELF ABI, so we've bumped the value and changed the name of the
  identifier to be better aligned with the existing one"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  elf: Fix the arm64 MTE ELF segment name and value
2022-04-29 10:36:47 -07:00
Paolo Bonzini
71d7c575a6 Merge branch 'kvm-fixes-for-5.18-rc5' into HEAD
Fixes for (relatively) old bugs, to be merged in both the -rc and next
development trees.

The merge reconciles the ABI fixes for KVM_EXIT_SYSTEM_EVENT between
5.18 and commit c24a950ec7 ("KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata
for SEV-ES", 2022-04-13).
2022-04-29 12:47:59 -04:00
Paolo Bonzini
73331c5d84 Merge branch 'kvm-fixes-for-5.18-rc5' into HEAD
Fixes for (relatively) old bugs, to be merged in both the -rc and next
development trees:

* Fix potential races when walking host page table

* Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT

* Fix shadow page table leak when KVM runs nested
2022-04-29 12:39:34 -04:00
Paolo Bonzini
d495f942f4 KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
member that at the time was unused.  Unfortunately this extensibility
mechanism has several issues:

- x86 is not writing the member, so it would not be possible to use it
  on x86 except for new events

- the member is not aligned to 64 bits, so the definition of the
  uAPI struct is incorrect for 32- on 64-bit userspace.  This is a
  problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately
  usage of flags was only introduced in 5.18.

Since padding has to be introduced, place a new field in there
that tells if the flags field is valid.  To allow further extensibility,
in fact, change flags to an array of 16 values, and store how many
of the values are valid.  The availability of the new ndata field
is tied to a system capability; all architectures are changed to
fill in the field.

To avoid breaking compilation of userspace that was using the flags
field, provide a userspace-only union to overlap flags with data[0].
The new field is placed at the same offset for both 32- and 64-bit
userspace.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220422103013.34832-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:38:22 -04:00
Jakub Kicinski
0e55546b18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/linux/netdevice.h
net/core/dev.c
  6510ea973d ("net: Use this_cpu_inc() to increment net->core_stats")
  794c24e992 ("net-core: rx_otherhost_dropped to core_stats")
https://lore.kernel.org/all/20220428111903.5f4304e0@canb.auug.org.au/

drivers/net/wan/cosa.c
  d48fea8401 ("net: cosa: fix error check return value of register_chrdev()")
  89fbca3307 ("net: wan: remove support for COSA and SRP synchronous serial boards")
https://lore.kernel.org/all/20220428112130.1f689e5e@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28 13:02:01 -07:00
Catalin Marinas
c35fe2a68f elf: Fix the arm64 MTE ELF segment name and value
Unfortunately, the name/value choice for the MTE ELF segment type
(PT_ARM_MEMTAG_MTE) was pretty poor: LOPROC+1 is already in use by
PT_AARCH64_UNWIND, as defined in the AArch64 ELF ABI
(https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst).

Update the ELF segment type value to LOPROC+2 and also change the define
to PT_AARCH64_MEMTAG_MTE to match the AArch64 ELF ABI namespace. The
AArch64 ELF ABI document is updating accordingly (segment type not
previously mentioned in the document).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 761b9b366c ("elf: Introduce the ARM MTE ELF segment type")
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Machado <luis.machado@arm.com>
Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>
Link: https://lore.kernel.org/r/20220425151833.2603830-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-04-28 11:37:06 +01:00