Add hooks at process waking up and exiting routines so that oems
can control these procedures. One possible benifit is the peak of
system load can be shaved and load can be more smooth when a large
number of threads is killed once upon a time, while a sudden peak
of system load can probably lead to user junk issues.
Bug: 296493318
Change-Id: Ide5f9e63a4f50d6a9e3ffbc9516de9ce48ededef
Signed-off-by: xieliujie <xieliujie@oppo.com>
Add vendors hooks for recording memory used
Vendor modules allocate and manages the memory itself.
These memories might not be included in kernel memory
statistics. Also, detailed references and vendor-specific
information are managed only inside modules. When
various problems such as memory leaks occurs, these
information should be showed in real-time.
Bug: 182443489
Bug: 234407991
Bug: 277799025
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I62d8bb2b6650d8b187b433f97eb833ef0b784df1
Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Changes in 6.1.16
HID: asus: use spinlock to protect concurrent accesses
HID: asus: use spinlock to safely schedule workers
powerpc/mm: Rearrange if-else block to avoid clang warning
ata: ahci: Revert "ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"
ARM: OMAP2+: Fix memory leak in realtime_counter_init()
arm64: dts: qcom: qcs404: use symbol names for PCIe resets
arm64: dts: qcom: msm8996-tone: Fix USB taking 6 minutes to wake up
arm64: dts: qcom: sm8150-kumano: Panel framebuffer is 2.5k instead of 4k
arm64: dts: qcom: sm6350: Fix up the ramoops node
arm64: dts: qcom: sm6125: Reorder HSUSB PHY clocks to match bindings
arm64: dts: qcom: sm6125-seine: Clean up gpio-keys (volume down)
arm64: dts: imx8m: Align SoC unique ID node unit address
ARM: zynq: Fix refcount leak in zynq_early_slcr_init
arm64: dts: mediatek: mt8195: Add power domain to U3PHY1 T-PHY
arm64: dts: mediatek: mt8183: Fix systimer 13 MHz clock description
arm64: dts: mediatek: mt8192: Fix systimer 13 MHz clock description
arm64: dts: mediatek: mt8195: Fix systimer 13 MHz clock description
arm64: dts: mediatek: mt8186: Fix systimer 13 MHz clock description
arm64: dts: qcom: sdm845-db845c: fix audio codec interrupt pin name
x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC
arm64: dts: qcom: sc7180: correct SPMI bus address cells
arm64: dts: qcom: sc7280: correct SPMI bus address cells
arm64: dts: qcom: sc8280xp: correct SPMI bus address cells
arm64: dts: qcom: sc8280xp: Vote for CX in USB controllers
arm64: dts: meson-gxl: jethub-j80: Fix WiFi MAC address node
arm64: dts: meson-gxl: jethub-j80: Fix Bluetooth MAC node name
arm64: dts: meson-axg: jethub-j1xx: Fix MAC address node names
arm64: dts: meson-gx: Fix Ethernet MAC address unit name
arm64: dts: meson-g12a: Fix internal Ethernet PHY unit name
arm64: dts: meson-gx: Fix the SCPI DVFS node name and unit address
cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE *again*
arm64: dts: ti: k3-am62: Enable SPI nodes at the board level
arm64: dts: ti: k3-am62-main: Fix clocks for McSPI
arm64: tegra: Fix duplicate regulator on Jetson TX1
arm64: dts: msm8992-bullhead: add memory hole region
arm64: dts: qcom: msm8992-bullhead: Fix cont_splash_mem size
arm64: dts: qcom: msm8992-bullhead: Disable dfps_data_mem
arm64: dts: qcom: ipq8074: correct USB3 QMP PHY-s clock output names
arm64: dts: qcom: ipq8074: fix Gen2 PCIe QMP PHY
arm64: dts: qcom: ipq8074: fix Gen3 PCIe QMP PHY
arm64: dts: qcom: ipq8074: correct Gen2 PCIe ranges
arm64: dts: qcom: ipq8074: fix Gen3 PCIe node
arm64: dts: qcom: ipq8074: correct PCIe QMP PHY output clock names
arm64: dts: meson: remove CPU opps below 1GHz for G12A boards
ARM: OMAP1: call platform_device_put() in error case in omap1_dm_timer_init()
arm64: dts: mediatek: mt8192: Mark scp_adsp clock as broken
ARM: bcm2835_defconfig: Enable the framebuffer
ARM: s3c: fix s3c64xx_set_timer_source prototype
arm64: dts: ti: k3-j7200: Fix wakeup pinmux range
ARM: dts: exynos: correct wr-active property in Exynos3250 Rinato
ARM: imx: Call ida_simple_remove() for ida_simple_get
arm64: dts: amlogic: meson-gx: fix SCPI clock dvfs node name
arm64: dts: amlogic: meson-axg: fix SCPI clock dvfs node name
arm64: dts: amlogic: meson-gx: add missing SCPI sensors compatible
arm64: dts: amlogic: meson-axg-jethome-jethub-j1xx: fix supply name of USB controller node
arm64: dts: amlogic: meson-gxl-s905d-sml5442tw: drop invalid clock-names property
arm64: dts: amlogic: meson-gx: add missing unit address to rng node name
arm64: dts: amlogic: meson-gxl-s905w-jethome-jethub-j80: fix invalid rtc node name
arm64: dts: amlogic: meson-axg-jethome-jethub-j1xx: fix invalid rtc node name
arm64: dts: amlogic: meson-gxl: add missing unit address to eth-phy-mux node name
arm64: dts: amlogic: meson-gx-libretech-pc: fix update button name
arm64: dts: amlogic: meson-sm1-bananapi-m5: fix adc keys node names
arm64: dts: amlogic: meson-gxl-s905d-phicomm-n1: fix led node name
arm64: dts: amlogic: meson-gxbb-kii-pro: fix led node name
arm64: dts: amlogic: meson-sm1-odroid-hc4: fix active fan thermal trip
locking/rwsem: Disable preemption in all down_read*() and up_read() code paths
arm64: dts: renesas: beacon-renesom: Fix gpio expander reference
arm64: dts: meson: radxa-zero: allow usb otg mode
arm64: dts: meson: bananapi-m5: switch VDDIO_C pin to OPEN_DRAIN
ARM: dts: sun8i: nanopi-duo2: Fix regulator GPIO reference
ublk_drv: remove nr_aborted_queues from ublk_device
ublk_drv: don't probe partitions if the ubq daemon isn't trusted
ARM: dts: imx7s: correct iomuxc gpr mux controller cells
sbitmap: remove redundant check in __sbitmap_queue_get_batch
sbitmap: Use single per-bitmap counting to wake up queued tags
sbitmap: correct wake_batch recalculation to avoid potential IO hung
arm64: dts: mt8195: Fix CPU map for single-cluster SoC
arm64: dts: mt8192: Fix CPU map for single-cluster SoC
arm64: dts: mt8186: Fix CPU map for single-cluster SoC
arm64: dts: mediatek: mt7622: Add missing pwm-cells to pwm node
arm64: dts: mediatek: mt8186: Fix watchdog compatible
arm64: dts: mediatek: mt8195: Fix watchdog compatible
arm64: dts: mediatek: mt7986: Fix watchdog compatible
ARM: dts: stm32: Update part number NVMEM description on stm32mp131
blk-mq: avoid sleep in blk_mq_alloc_request_hctx
blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx
blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait
blk-mq: Fix potential io hung for shared sbitmap per tagset
blk-mq: correct stale comment of .get_budget
arm64: dts: qcom: msm8996: support using GPLL0 as kryocc input
arm64: dts: qcom: msm8996 switch from RPM_SMD_BB_CLK1 to RPM_SMD_XO_CLK_SRC
arm64: dts: qcom: sm8350: drop incorrect cells from serial
arm64: dts: qcom: sm8450: drop incorrect cells from serial
arm64: dts: qcom: msm8992-lg-bullhead: Correct memory overlaps with the SMEM and MPSS memory regions
arm64: dts: qcom: msm8953: correct TLMM gpio-ranges
arm64: dts: qcom: msm8992-*: Fix up comments
arm64: dts: qcom: msm8992-lg-bullhead: Enable regulators
s390/dasd: Fix potential memleak in dasd_eckd_init()
sched/rt: pick_next_rt_entity(): check list_entry
perf/x86/intel/ds: Fix the conversion from TSC to perf time
x86/perf/zhaoxin: Add stepping check for ZXC
KEYS: asymmetric: Fix ECDSA use via keyctl uapi
block: ublk: check IO buffer based on flag need_get_data
arm64: dts: qcom: pmk8350: Specify PBS register for PON
arm64: dts: qcom: pmk8350: Use the correct PON compatible
erofs: relinquish volume with mutex held
block: sync mixed merged request's failfast with 1st bio's
block: Fix io statistics for cgroup in throttle path
block: bio-integrity: Copy flags when bio_integrity_payload is cloned
block: use proper return value from bio_failfast()
wifi: mt76: mt7915: add missing of_node_put()
wifi: mt76: mt7921s: fix slab-out-of-bounds access in sdio host
wifi: mt76: mt7915: check return value before accessing free_block_num
wifi: mt76: mt7915: drop always true condition of __mt7915_reg_addr()
wifi: mt76: mt7915: fix unintended sign extension of mt7915_hw_queue_read()
wifi: mt76: fix coverity uninit_use_in_call in mt76_connac2_reverse_frag0_hdr_trans()
wifi: rsi: Fix memory leak in rsi_coex_attach()
wifi: rtlwifi: rtl8821ae: don't call kfree_skb() under spin_lock_irqsave()
wifi: rtlwifi: rtl8188ee: don't call kfree_skb() under spin_lock_irqsave()
wifi: rtlwifi: rtl8723be: don't call kfree_skb() under spin_lock_irqsave()
wifi: iwlegacy: common: don't call dev_kfree_skb() under spin_lock_irqsave()
wifi: libertas: fix memory leak in lbs_init_adapter()
wifi: rtl8xxxu: don't call dev_kfree_skb() under spin_lock_irqsave()
wifi: rtw89: 8852c: rfk: correct DACK setting
wifi: rtw89: 8852c: rfk: correct DPK settings
wifi: rtlwifi: Fix global-out-of-bounds bug in _rtl8812ae_phy_set_txpower_limit()
libbpf: Fix btf__align_of() by taking into account field offsets
wifi: ipw2x00: don't call dev_kfree_skb() under spin_lock_irqsave()
wifi: ipw2200: fix memory leak in ipw_wdev_init()
wifi: wilc1000: fix potential memory leak in wilc_mac_xmit()
wifi: wilc1000: add missing unregister_netdev() in wilc_netdev_ifc_init()
wifi: brcmfmac: fix potential memory leak in brcmf_netdev_start_xmit()
wifi: brcmfmac: unmap dma buffer in brcmf_msgbuf_alloc_pktid()
wifi: libertas_tf: don't call kfree_skb() under spin_lock_irqsave()
wifi: libertas: if_usb: don't call kfree_skb() under spin_lock_irqsave()
wifi: libertas: main: don't call kfree_skb() under spin_lock_irqsave()
wifi: libertas: cmdresp: don't call kfree_skb() under spin_lock_irqsave()
wifi: wl3501_cs: don't call kfree_skb() under spin_lock_irqsave()
libbpf: Fix invalid return address register in s390
crypto: x86/ghash - fix unaligned access in ghash_setkey()
ACPICA: Drop port I/O validation for some regions
genirq: Fix the return type of kstat_cpu_irqs_sum()
rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose
rcu-tasks: Remove preemption disablement around srcu_read_[un]lock() calls
rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
lib/mpi: Fix buffer overrun when SG is too long
crypto: ccp - Avoid page allocation failure warning for SEV_GET_ID2
platform/chrome: cros_ec_typec: Update port DP VDO
ACPICA: nsrepair: handle cases without a return value correctly
selftests/xsk: print correct payload for packet dump
selftests/xsk: print correct error codes when exiting
arm64/cpufeature: Fix field sign for DIT hwcap detection
kselftest/arm64: Fix syscall-abi for systems without 128 bit SME
workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex
s390/early: fix sclp_early_sccb variable lifetime
s390/vfio-ap: fix an error handling path in vfio_ap_mdev_probe_queue()
x86/signal: Fix the value returned by strict_sas_size()
thermal/drivers/tsens: Drop msm8976-specific defines
thermal/drivers/tsens: Sort out msm8976 vs msm8956 data
thermal/drivers/tsens: fix slope values for msm8939
thermal/drivers/tsens: limit num_sensors to 9 for msm8939
wifi: rtw89: fix potential leak in rtw89_append_probe_req_ie()
wifi: rtw89: Add missing check for alloc_workqueue
wifi: rtl8xxxu: Fix memory leaks with RTL8723BU, RTL8192EU
wifi: orinoco: check return value of hermes_write_wordrec()
thermal/drivers/imx_sc_thermal: Drop empty platform remove function
thermal/drivers/imx_sc_thermal: Fix the loop condition
wifi: ath9k: htc_hst: free skb in ath9k_htc_rx_msg() if there is no callback function
wifi: ath9k: hif_usb: clean up skbs if ath9k_hif_usb_rx_stream() fails
wifi: ath9k: Fix potential stack-out-of-bounds write in ath9k_wmi_rsp_callback()
wifi: ath11k: Fix memory leak in ath11k_peer_rx_frag_setup
wifi: cfg80211: Fix extended KCK key length check in nl80211_set_rekey_data()
ACPI: battery: Fix missing NUL-termination with large strings
selftests/bpf: Fix build errors if CONFIG_NF_CONNTRACK=m
crypto: ccp - Failure on re-initialization due to duplicate sysfs filename
crypto: essiv - Handle EBUSY correctly
crypto: seqiv - Handle EBUSY correctly
powercap: fix possible name leak in powercap_register_zone()
x86/microcode: Add a parameter to microcode_check() to store CPU capabilities
x86/microcode: Check CPU capabilities after late microcode update correctly
x86/microcode: Adjust late loading result reporting message
selftests/bpf: Use consistent build-id type for liburandom_read.so
selftests/bpf: Fix vmtest static compilation error
crypto: xts - Handle EBUSY correctly
leds: led-class: Add missing put_device() to led_put()
s390/bpf: Add expoline to tail calls
wifi: iwlwifi: mei: fix compilation errors in rfkill()
kselftest/arm64: Fix enumeration of systems without 128 bit SME
can: rcar_canfd: Fix R-Car V3U GAFLCFG field accesses
selftests/bpf: Initialize tc in xdp_synproxy
crypto: ccp - Flush the SEV-ES TMR memory before giving it to firmware
bpftool: profile online CPUs instead of possible
wifi: mt76: mt7915: call mt7915_mcu_set_thermal_throttling() only after init_work
wifi: mt76: mt7915: fix memory leak in mt7915_mcu_exit
wifi: mt76: mt7915: fix WED TxS reporting
wifi: mt76: add memory barrier to SDIO queue kick
wifi: mt76: mt7921: fix error code of return in mt7921_acpi_read
net/mlx5: Enhance debug print in page allocation failure
irqchip: Fix refcount leak in platform_irqchip_probe
irqchip/alpine-msi: Fix refcount leak in alpine_msix_init_domains
irqchip/irq-mvebu-gicp: Fix refcount leak in mvebu_gicp_probe
irqchip/ti-sci: Fix refcount leak in ti_sci_intr_irq_domain_probe
s390/mem_detect: fix detect_memory() error handling
s390/vmem: fix empty page tables cleanup under KASAN
s390/boot: cleanup decompressor header files
s390/mem_detect: rely on diag260() if sclp_early_get_memsize() fails
s390/boot: fix mem_detect extended area allocation
net: add sock_init_data_uid()
tun: tun_chr_open(): correctly initialize socket uid
tap: tap_open(): correctly initialize socket uid
OPP: fix error checking in opp_migrate_dentry()
cpufreq: davinci: Fix clk use after free
Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeeds
Bluetooth: L2CAP: Fix potential user-after-free
Bluetooth: hci_qca: get wakeup status from serdev device handle
net: ipa: generic command param fix
s390: vfio-ap: tighten the NIB validity check
s390/ap: fix status returned by ap_aqic()
s390/ap: fix status returned by ap_qact()
libbpf: Fix alen calculation in libbpf_nla_dump_errormsg()
xen/grant-dma-iommu: Implement a dummy probe_device() callback
rds: rds_rm_zerocopy_callback() correct order for list_add_tail()
crypto: rsa-pkcs1pad - Use akcipher_request_complete
m68k: /proc/hardware should depend on PROC_FS
RISC-V: time: initialize hrtimer based broadcast clock event device
clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
wifi: iwl3945: Add missing check for create_singlethread_workqueue
wifi: iwl4965: Add missing check for create_singlethread_workqueue()
wifi: mwifiex: fix loop iterator in mwifiex_update_ampdu_txwinsize()
selftests/bpf: Fix out-of-srctree build
ACPI: resource: Add IRQ overrides for MAINGEAR Vector Pro 2 models
ACPI: resource: Do IRQ override on all TongFang GMxRGxx
crypto: octeontx2 - Fix objects shared between several modules
crypto: crypto4xx - Call dma_unmap_page when done
wifi: mac80211: move color collision detection report in a delayed work
wifi: mac80211: make rate u32 in sta_set_rate_info_rx()
wifi: mac80211: fix non-MLO station association
wifi: mac80211: Don't translate MLD addresses for multicast
wifi: mac80211: avoid u32_encode_bits() warning
wifi: mac80211: fix off-by-one link setting
tools/lib/thermal: Fix thermal_sampling_exit()
thermal/drivers/hisi: Drop second sensor hi3660
selftests/bpf: Fix map_kptr test.
wifi: mac80211: pass 'sta' to ieee80211_rx_data_set_sta()
bpf: Zeroing allocated object from slab in bpf memory allocator
selftests/bpf: Fix xdp_do_redirect on s390x
can: esd_usb: Move mislocated storage of SJA1000_ECC_SEG bits in case of a bus error
can: esd_usb: Make use of can_change_state() and relocate checking skb for NULL
xsk: check IFF_UP earlier in Tx path
LoongArch, bpf: Use 4 instructions for function address in JIT
bpf: Fix global subprog context argument resolution logic
irqchip/irq-brcmstb-l2: Set IRQ_LEVEL for level triggered interrupts
irqchip/irq-bcm7120-l2: Set IRQ_LEVEL for level triggered interrupts
net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link()
net/smc: fix application data exception
selftests/net: Interpret UDP_GRO cmsg data as an int value
l2tp: Avoid possible recursive deadlock in l2tp_tunnel_register()
net: bcmgenet: fix MoCA LED control
net: lan966x: Fix possible deadlock inside PTP
net/mlx4_en: Introduce flexible array to silence overflow warning
selftest: fib_tests: Always cleanup before exit
sefltests: netdevsim: wait for devlink instance after netns removal
drm: Fix potential null-ptr-deref due to drmm_mode_config_init()
drm/fourcc: Add missing big-endian XRGB1555 and RGB565 formats
drm/bridge: ti-sn65dsi83: Fix delay after reset deassert to match spec
drm: mxsfb: DRM_IMX_LCDIF should depend on ARCH_MXC
drm: mxsfb: DRM_MXSFB should depend on ARCH_MXS || ARCH_MXC
drm/bridge: megachips: Fix error handling in i2c_register_driver()
drm/vkms: Fix memory leak in vkms_init()
drm/vkms: Fix null-ptr-deref in vkms_release()
drm/vc4: dpi: Fix format mapping for RGB565
drm: tidss: Fix pixel format definition
gpu: ipu-v3: common: Add of_node_put() for reference returned by of_graph_get_port_by_id()
drm/vc4: drop all currently held locks if deadlock happens
hwmon: (ftsteutates) Fix scaling of measurements
drm/msm/dpu: check for null return of devm_kzalloc() in dpu_writeback_init()
drm/msm/hdmi: Add missing check for alloc_ordered_workqueue
pinctrl: qcom: pinctrl-msm8976: Correct function names for wcss pins
pinctrl: stm32: Fix refcount leak in stm32_pctrl_get_irq_domain
pinctrl: rockchip: Fix refcount leak in rockchip_pinctrl_parse_groups
drm/vc4: hvs: Set AXI panic modes
drm/vc4: hvs: SCALER_DISPBKGND_AUTOHS is only valid on HVS4
drm/vc4: hvs: Correct interrupt masking bit assignment for HVS5
drm/vc4: hvs: Fix colour order for xRGB1555 on HVS5
drm/vc4: hdmi: Correct interlaced timings again
drm/msm: clean event_thread->worker in case of an error
drm/panel-edp: fix name for IVO product id 854b
scsi: qla2xxx: Fix exchange oversubscription
scsi: qla2xxx: Fix exchange oversubscription for management commands
scsi: qla2xxx: edif: Fix clang warning
ASoC: fsl_sai: initialize is_dsp_mode flag
drm/bridge: tc358767: Set default CLRSIPO count
drm/msm/adreno: Fix null ptr access in adreno_gpu_cleanup()
ALSA: hda/ca0132: minor fix for allocation size
drm/amdgpu: Use the sched from entity for amdgpu_cs trace
drm/msm/gem: Add check for kmalloc
drm/msm/dpu: Disallow unallocated resources to be returned
drm/bridge: lt9611: fix sleep mode setup
drm/bridge: lt9611: fix HPD reenablement
drm/bridge: lt9611: fix polarity programming
drm/bridge: lt9611: fix programming of video modes
drm/bridge: lt9611: fix clock calculation
drm/bridge: lt9611: pass a pointer to the of node
regulator: tps65219: use IS_ERR() to detect an error pointer
drm/mipi-dsi: Fix byte order of 16-bit DCS set/get brightness
drm: exynos: dsi: Fix MIPI_DSI*_NO_* mode flags
drm/msm/dsi: Allow 2 CTRLs on v2.5.0
scsi: ufs: exynos: Fix DMA alignment for PAGE_SIZE != 4096
drm/msm/dpu: sc7180: add missing WB2 clock control
drm/msm: use strscpy instead of strncpy
drm/msm/dpu: Add check for cstate
drm/msm/dpu: Add check for pstates
drm/msm/mdp5: Add check for kzalloc
habanalabs: bugs fixes in timestamps buff alloc
pinctrl: bcm2835: Remove of_node_put() in bcm2835_of_gpio_ranges_fallback()
pinctrl: mediatek: Initialize variable pullen and pullup to zero
pinctrl: mediatek: Initialize variable *buf to zero
gpu: host1x: Fix mask for syncpoint increment register
gpu: host1x: Don't skip assigning syncpoints to channels
drm/tegra: firewall: Check for is_addr_reg existence in IMM check
pinctrl: renesas: rzg2l: Fix configuring the GPIO pins as interrupts
drm/msm/dpu: set pdpu->is_rt_pipe early in dpu_plane_sspp_atomic_update()
drm/mediatek: dsi: Reduce the time of dsi from LP11 to sending cmd
drm/mediatek: Use NULL instead of 0 for NULL pointer
drm/mediatek: Drop unbalanced obj unref
drm/mediatek: mtk_drm_crtc: Add checks for devm_kcalloc
drm/mediatek: Clean dangling pointer on bind error path
ASoC: soc-compress.c: fixup private_data on snd_soc_new_compress()
dt-bindings: display: mediatek: Fix the fallback for mediatek,mt8186-disp-ccorr
gpio: vf610: connect GPIO label to dev name
ASoC: topology: Properly access value coming from topology file
spi: dw_bt1: fix MUX_MMIO dependencies
ASoC: mchp-spdifrx: fix controls which rely on rsr register
ASoC: mchp-spdifrx: fix return value in case completion times out
ASoC: mchp-spdifrx: fix controls that works with completion mechanism
ASoC: mchp-spdifrx: disable all interrupts in mchp_spdifrx_dai_remove()
dm: improve shrinker debug names
regmap: apply reg_base and reg_downshift for single register ops
ASoC: rsnd: fixup #endif position
ASoC: mchp-spdifrx: Fix uninitialized use of mr in mchp_spdifrx_hw_params()
ASoC: dt-bindings: meson: fix gx-card codec node regex
regulator: tps65219: use generic set_bypass()
hwmon: (asus-ec-sensors) add missing mutex path
hwmon: (ltc2945) Handle error case in ltc2945_value_store
ALSA: hda: Fix the control element identification for multiple codecs
drm/amdgpu: fix enum odm_combine_mode mismatch
scsi: mpt3sas: Fix a memory leak
scsi: aic94xx: Add missing check for dma_map_single()
HID: multitouch: Add quirks for flipped axes
HID: retain initial quirks set up when creating HID devices
ASoC: qcom: q6apm-lpass-dai: unprepare stream if its already prepared
ASoC: qcom: q6apm-dai: fix race condition while updating the position pointer
ASoC: qcom: q6apm-dai: Add SNDRV_PCM_INFO_BATCH flag
ASoC: codecs: lpass: register mclk after runtime pm
ASoC: codecs: lpass: fix incorrect mclk rate
drm/amd/display: don't call dc_interrupt_set() for disabled crtcs
HID: logitech-hidpp: Hard-code HID++ 1.0 fast scroll support
spi: bcm63xx-hsspi: Fix multi-bit mode setting
hwmon: (mlxreg-fan) Return zero speed for broken fan
ASoC: tlv320adcx140: fix 'ti,gpio-config' DT property init
dm: remove flush_scheduled_work() during local_exit()
nfs4trace: fix state manager flag printing
NFS: fix disabling of swap
spi: synquacer: Fix timeout handling in synquacer_spi_transfer_one()
ASoC: soc-dapm.h: fixup warning struct snd_pcm_substream not declared
HID: bigben: use spinlock to protect concurrent accesses
HID: bigben_worker() remove unneeded check on report_field
HID: bigben: use spinlock to safely schedule workers
hid: bigben_probe(): validate report count
ALSA: hda/hdmi: Register with vga_switcheroo on Dual GPU Macbooks
drm/shmem-helper: Fix locking for drm_gem_shmem_get_pages_sgt()
NFSD: enhance inter-server copy cleanup
NFSD: fix leaked reference count of nfsd4_ssc_umount_item
nfsd: fix race to check ls_layouts
nfsd: clean up potential nfsd_file refcount leaks in COPY codepath
NFSD: fix problems with cleanup on errors in nfsd4_copy
nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open
nfsd: don't fsync nfsd_files on last close
NFSD: copy the whole verifier in nfsd_copy_write_verifier
cifs: Fix lost destroy smbd connection when MR allocate failed
cifs: Fix warning and UAF when destroy the MR list
cifs: use tcon allocation functions even for dummy tcon
gfs2: jdata writepage fix
perf llvm: Fix inadvertent file creation
leds: led-core: Fix refcount leak in of_led_get()
leds: is31fl319x: Wrap mutex_destroy() for devm_add_action_or_rest()
leds: simatic-ipc-leds-gpio: Make sure we have the GPIO providing driver
tools/tracing/rtla: osnoise_hist: use total duration for average calculation
perf inject: Use perf_data__read() for auxtrace
perf intel-pt: Do not try to queue auxtrace data on pipe
perf test bpf: Skip test if kernel-debuginfo is not present
perf tools: Fix auto-complete on aarch64
sparc: allow PM configs for sparc32 COMPILE_TEST
selftests: find echo binary to use -ne options
selftests/ftrace: Fix bash specific "==" operator
selftests: use printf instead of echo -ne
perf record: Fix segfault with --overwrite and --max-size
printf: fix errname.c list
perf tests stat_all_metrics: Change true workload to sleep workload for system wide check
objtool: add UACCESS exceptions for __tsan_volatile_read/write
mfd: cs5535: Don't build on UML
mfd: pcf50633-adc: Fix potential memleak in pcf50633_adc_async_read()
dmaengine: idxd: Set traffic class values in GRPCFG on DSA 2.0
RDMA/erdma: Fix refcount leak in erdma_mmap
dmaengine: HISI_DMA should depend on ARCH_HISI
RDMA/hns: Fix refcount leak in hns_roce_mmap
iio: light: tsl2563: Do not hardcode interrupt trigger type
usb: gadget: fusb300_udc: free irq on the error path in fusb300_probe()
i2c: designware: fix i2c_dw_clk_rate() return size to be u32
soundwire: cadence: Don't overflow the command FIFOs
driver core: fix potential null-ptr-deref in device_add()
kobject: modify kobject_get_path() to take a const *
kobject: Fix slab-out-of-bounds in fill_kobj_path()
alpha/boot/tools/objstrip: fix the check for ELF header
media: uvcvideo: Check for INACTIVE in uvc_ctrl_is_accessible()
media: uvcvideo: Implement mask for V4L2_CTRL_TYPE_MENU
media: uvcvideo: Refactor uvc_ctrl_mappings_uvcXX
media: uvcvideo: Refactor power_line_frequency_controls_limited
coresight: etm4x: Fix accesses to TRCSEQRSTEVR and TRCSEQSTR
coresight: cti: Prevent negative values of enable count
coresight: cti: Add PM runtime call in enable_store
usb: typec: intel_pmc_mux: Don't leak the ACPI device reference count
PCI/IOV: Enlarge virtfn sysfs name buffer
PCI: switchtec: Return -EFAULT for copy_to_user() errors
PCI: endpoint: pci-epf-vntb: Clean up kernel_doc warning
PCI: endpoint: pci-epf-vntb: Add epf_ntb_mw_bar_clear() num_mws kernel-doc
hwtracing: hisi_ptt: Only add the supported devices to the filters list
tty: serial: fsl_lpuart: disable Rx/Tx DMA in lpuart32_shutdown()
tty: serial: fsl_lpuart: clear LPUART Status Register in lpuart32_shutdown()
serial: tegra: Add missing clk_disable_unprepare() in tegra_uart_hw_init()
Revert "char: pcmcia: cm4000_cs: Replace mdelay with usleep_range in set_protocol"
eeprom: idt_89hpesx: Fix error handling in idt_init()
applicom: Fix PCI device refcount leak in applicom_init()
firmware: stratix10-svc: add missing gen_pool_destroy() in stratix10_svc_drv_probe()
firmware: stratix10-svc: fix error handle while alloc/add device failed
VMCI: check context->notify_page after call to get_user_pages_fast() to avoid GPF
mei: pxp: Use correct macros to initialize uuid_le
misc/mei/hdcp: Use correct macros to initialize uuid_le
misc: fastrpc: Fix an error handling path in fastrpc_rpmsg_probe()
driver core: fix resource leak in device_add()
driver core: location: Free struct acpi_pld_info *pld before return false
drivers: base: transport_class: fix possible memory leak
drivers: base: transport_class: fix resource leak when transport_add_device() fails
firmware: dmi-sysfs: Fix null-ptr-deref in dmi_sysfs_register_handle
fotg210-udc: Add missing completion handler
dmaengine: dw-edma: Fix missing src/dst address of interleaved xfers
fpga: microchip-spi: move SPI I/O buffers out of stack
fpga: microchip-spi: rewrite status polling in a time measurable way
usb: early: xhci-dbc: Fix a potential out-of-bound memory access
tty: serial: fsl_lpuart: Fix the wrong RXWATER setting for rx dma case
RDMA/cxgb4: add null-ptr-check after ip_dev_find()
usb: musb: mediatek: don't unregister something that wasn't registered
usb: gadget: configfs: Restrict symlink creation is UDC already binded
phy: mediatek: remove temporary variable @mask_
PCI: mt7621: Delay phy ports initialization
iommu: dart: Add suspend/resume support
iommu: dart: Support >64 stream IDs
iommu/dart: Fix apple_dart_device_group for PCI groups
iommu/vt-d: Set No Execute Enable bit in PASID table entry
power: supply: remove faulty cooling logic
RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish()
usb: max-3421: Fix setting of I/O pins
RDMA/irdma: Cap MSIX used to online CPUs + 1
serial: fsl_lpuart: fix RS485 RTS polariy inverse issue
tty: serial: imx: Handle RS485 DE signal active high
tty: serial: imx: disable Ageing Timer interrupt request irq
driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links
driver core: fw_devlink: Don't purge child fwnode's consumer links
driver core: fw_devlink: Allow marking a fwnode link as being part of a cycle
driver core: fw_devlink: Consolidate device link flag computation
driver core: fw_devlink: Improve check for fwnode with no device/driver
driver core: fw_devlink: Make cycle detection more robust
mtd: mtdpart: Don't create platform device that'll never probe
usb: host: fsl-mph-dr-of: reuse device_set_of_node_from_dev
dmaengine: dw-edma: Fix readq_ch() return value truncation
PCI: Fix dropping valid root bus resources with .end = zero
phy: rockchip-typec: fix tcphy_get_mode error case
PCI: qcom: Fix host-init error handling
iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry()
iommu: Fix error unwind in iommu_group_alloc()
iommu/amd: Do not identity map v2 capable device when snp is enabled
dmaengine: sf-pdma: pdma_desc memory leak fix
dmaengine: dw-axi-dmac: Do not dereference NULL structure
dmaengine: ptdma: check for null desc before calling pt_cmd_callback
iommu/vt-d: Fix error handling in sva enable/disable paths
iommu/vt-d: Allow to use flush-queue when first level is default
RDMA/rxe: cleanup some error handling in rxe_verbs.c
RDMA/rxe: Fix missing memory barriers in rxe_queue.h
IB/hfi1: Fix math bugs in hfi1_can_pin_pages()
IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors
Revert "remoteproc: qcom_q6v5_mss: map/unmap metadata region before/after use"
remoteproc: qcom_q6v5_mss: Use a carveout to authenticate modem headers
media: ti: cal: fix possible memory leak in cal_ctx_create()
media: platform: ti: Add missing check for devm_regulator_get
media: imx: imx7-media-csi: fix missing clk_disable_unprepare() in imx7_csi_init()
powerpc: Remove linker flag from KBUILD_AFLAGS
s390/vdso: Drop '-shared' from KBUILD_CFLAGS_64
builddeb: clean generated package content
media: max9286: Fix memleak in max9286_v4l2_register()
media: ov2740: Fix memleak in ov2740_init_controls()
media: ov5675: Fix memleak in ov5675_init_controls()
media: ov5640: Fix soft reset sequence and timings
media: ov5640: Handle delays when no reset_gpio set
media: mc: Get media_device directly from pad
media: i2c: ov772x: Fix memleak in ov772x_probe()
media: i2c: imx219: Split common registers from mode tables
media: i2c: imx219: Fix binning for RAW8 capture
media: platform: mtk-mdp3: Fix return value check in mdp_probe()
media: camss: csiphy-3ph: avoid undefined behavior
media: platform: mtk-mdp3: remove unused VIDEO_MEDIATEK_VPU config
media: platform: mtk-mdp3: fix Kconfig dependencies
media: v4l2-jpeg: correct the skip count in jpeg_parse_app14_data
media: v4l2-jpeg: ignore the unknown APP14 marker
media: hantro: Fix JPEG encoder ENUM_FRMSIZE on RK3399
media: imx-jpeg: Apply clk_bulk api instead of operating specific clk
media: amphion: correct the unspecified color space
media: drivers/media/v4l2-core/v4l2-h264 : add detection of null pointers
media: rc: Fix use-after-free bugs caused by ene_tx_irqsim()
media: atomisp: Only set default_run_mode on first open of a stream/asd
media: i2c: ov7670: 0 instead of -EINVAL was returned
media: usb: siano: Fix use after free bugs caused by do_submit_urb
media: saa7134: Use video_unregister_device for radio_dev
rpmsg: glink: Avoid infinite loop on intent for missing channel
rpmsg: glink: Release driver_override
ARM: OMAP2+: omap4-common: Fix refcount leak bug
arm64: dts: qcom: msm8996: Add additional A2NoC clocks
udf: Define EFSCORRUPTED error code
context_tracking: Fix noinstr vs KASAN
exit: Detect and fix irq disabled state in oops
ARM: dts: exynos: Use Exynos5420 compatible for the MIPI video phy
fs: Use CHECK_DATA_CORRUPTION() when kernel bugs are detected
blk-iocost: fix divide by 0 error in calc_lcoefs()
blk-cgroup: dropping parent refcount after pd_free_fn() is done
blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()
trace/blktrace: fix memory leak with using debugfs_lookup()
btrfs: scrub: improve tree block error reporting
arm64: zynqmp: Enable hs termination flag for USB dwc3 controller
cpuidle, intel_idle: Fix CPUIDLE_FLAG_INIT_XSTATE
x86/fpu: Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads
cpuidle: drivers: firmware: psci: Dont instrument suspend code
cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
perf/x86/intel/uncore: Add Meteor Lake support
wifi: ath9k: Fix use-after-free in ath9k_hif_usb_disconnect()
wifi: ath11k: fix monitor mode bringup crash
wifi: brcmfmac: Fix potential stack-out-of-bounds in brcmf_c_preinit_dcmds()
rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks
rcu: Suppress smp_processor_id() complaint in synchronize_rcu_expedited_wait()
srcu: Delegate work to the boot cpu if using SRCU_SIZE_SMALL
rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug
rcu-tasks: Handle queue-shrink/callback-enqueue race condition
wifi: ath11k: debugfs: fix to work with multiple PCI devices
thermal: intel: Fix unsigned comparison with less than zero
timers: Prevent union confusion from unexpected restart_syscall()
x86/bugs: Reset speculation control settings on init
bpftool: Always disable stack protection for BPF objects
wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds
wifi: mt7601u: fix an integer underflow
inet: fix fast path in __inet_hash_connect()
ice: restrict PTP HW clock freq adjustments to 100, 000, 000 PPB
ice: add missing checks for PF vsi type
ACPI: Don't build ACPICA with '-Os'
bpf, docs: Fix modulo zero, division by zero, overflow, and underflow
thermal: intel: intel_pch: Add support for Wellsburg PCH
clocksource: Suspend the watchdog temporarily when high read latency detected
crypto: hisilicon: Wipe entire pool on error
net: bcmgenet: Add a check for oversized packets
m68k: Check syscall_trace_enter() return code
s390/mm,ptdump: avoid Kasan vs Memcpy Real markers swapping
netfilter: nf_tables: NULL pointer dereference in nf_tables_updobj()
can: isotp: check CAN address family in isotp_bind()
gcc-plugins: drop -std=gnu++11 to fix GCC 13 build
tools/power/x86/intel-speed-select: Add Emerald Rapid quirk
wifi: mt76: dma: free rx_head in mt76_dma_rx_cleanup
ACPI: video: Fix Lenovo Ideapad Z570 DMI match
net/mlx5: fw_tracer: Fix debug print
coda: Avoid partial allocation of sig_inputArgs
uaccess: Add minimum bounds check on kernel buffer size
s390/idle: mark arch_cpu_idle() noinstr
time/debug: Fix memory leak with using debugfs_lookup()
PM: domains: fix memory leak with using debugfs_lookup()
PM: EM: fix memory leak with using debugfs_lookup()
Bluetooth: Fix issue with Actions Semi ATS2851 based devices
Bluetooth: btusb: Add new PID/VID 0489:e0f2 for MT7921
Bluetooth: btusb: Add VID:PID 13d3:3529 for Realtek RTL8821CE
wifi: rtw89: debug: avoid invalid access on RTW89_DBG_SEL_MAC_30
hv_netvsc: Check status in SEND_RNDIS_PKT completion message
s390/kfence: fix page fault reporting
devlink: Fix TP_STRUCT_entry in trace of devlink health report
scm: add user copy checks to put_cmsg()
drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Tab 3 X90F
drm: panel-orientation-quirks: Add quirk for DynaBook K50
drm/amd/display: Reduce expected sdp bandwidth for dcn321
drm/amd/display: Revert Reduce delay when sink device not able to ACK 00340h write
drm/amd/display: Fix potential null-deref in dm_resume
drm/omap: dsi: Fix excessive stack usage
HID: Add Mapping for System Microphone Mute
drm/tiny: ili9486: Do not assume 8-bit only SPI controllers
drm/amd/display: Defer DIG FIFO disable after VID stream enable
drm/radeon: free iio for atombios when driver shutdown
drm/amd: Avoid BUG() for case of SRIOV missing IP version
drm/amdkfd: Page aligned memory reserve size
scsi: lpfc: Fix use-after-free KFENCE violation during sysfs firmware write
Revert "fbcon: don't lose the console font across generic->chip driver switch"
drm/amd: Avoid ASSERT for some message failures
drm: amd: display: Fix memory leakage
drm/amd/display: fix mapping to non-allocated address
HID: uclogic: Add frame type quirk
HID: uclogic: Add battery quirk
HID: uclogic: Add support for XP-PEN Deco Pro SW
HID: uclogic: Add support for XP-PEN Deco Pro MW
drm/msm/dsi: Add missing check for alloc_ordered_workqueue
drm: rcar-du: Add quirk for H3 ES1.x pclk workaround
drm: rcar-du: Fix setting a reserved bit in DPLLCR
drm/drm_print: correct format problem
drm/amd/display: Set hvm_enabled flag for S/G mode
habanalabs: extend fatal messages to contain PCI info
habanalabs: fix bug in timestamps registration code
docs/scripts/gdb: add necessary make scripts_gdb step
drm/msm/dpu: Add DSC hardware blocks to register snapshot
ASoC: soc-compress: Reposition and add pcm_mutex
ASoC: kirkwood: Iterate over array indexes instead of using pointer math
regulator: max77802: Bounds check regulator id against opmode
regulator: s5m8767: Bounds check id indexing into arrays
Revert "drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled"
drm/amd/display: fix FCLK pstate change underflow
gfs2: Improve gfs2_make_fs_rw error handling
hwmon: (coretemp) Simplify platform device handling
hwmon: (nct6775) Directly call ASUS ACPI WMI method
hwmon: (nct6775) B650/B660/X670 ASUS boards support
pinctrl: at91: use devm_kasprintf() to avoid potential leaks
drm/amd/display: Do not commit pipe when updating DRR
scsi: snic: Fix memory leak with using debugfs_lookup()
scsi: ufs: core: Fix device management cmd timeout flow
HID: logitech-hidpp: Don't restart communication if not necessary
drm/amd/display: Enable P-state validation checks for DCN314
drm: panel-orientation-quirks: Add quirk for Lenovo IdeaPad Duet 3 10IGL5
drm/amd/display: Disable HUBP/DPP PG on DCN314 for now
dm thin: add cond_resched() to various workqueue loops
dm cache: add cond_resched() to various workqueue loops
nfsd: zero out pointers after putting nfsd_files on COPY setup error
nfsd: don't hand out delegation on setuid files being opened for write
cifs: prevent data race in smb2_reconnect()
drm/shmem-helper: Revert accidental non-GPL export
driver core: fw_devlink: Avoid spurious error message
wifi: rtl8xxxu: fixing transmisison failure for rtl8192eu
scsi: mpt3sas: Remove usage of dma_get_required_mask() API
firmware: coreboot: framebuffer: Ignore reserved pixel color bits
block: don't allow multiple bios for IOCB_NOWAIT issue
block: clear bio->bi_bdev when putting a bio back in the cache
block: be a bit more careful in checking for NULL bdev while polling
rtc: pm8xxx: fix set-alarm race
ipmi: ipmb: Fix the MODULE_PARM_DESC associated to 'retry_time_ms'
ipmi:ssif: resend_msg() cannot fail
ipmi_ssif: Rename idle state and check
io_uring: Replace 0-length array with flexible array
io_uring: use user visible tail in io_uring_poll()
io_uring: handle TIF_NOTIFY_RESUME when checking for task_work
io_uring: add a conditional reschedule to the IOPOLL cancelation loop
io_uring: add reschedule point to handle_tw_list()
io_uring/rsrc: disallow multi-source reg buffers
io_uring: remove MSG_NOSIGNAL from recvmsg
io_uring: fix fget leak when fs don't support nowait buffered read
s390/extmem: return correct segment type in __segment_load()
s390: discard .interp section
s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handler
s390/kprobes: fix current_kprobe never cleared after kprobes reenter
KVM: s390: disable migration mode when dirty tracking is disabled
cifs: Fix uninitialized memory read in smb3_qfs_tcon()
cifs: Fix uninitialized memory reads for oparms.mode
cifs: fix mount on old smb servers
cifs: introduce cifs_io_parms in smb2_async_writev()
cifs: split out smb3_use_rdma_offload() helper
cifs: don't try to use rdma offload on encrypted connections
cifs: Check the lease context if we actually got a lease
cifs: return a single-use cfid if we did not get a lease
scsi: mpi3mr: Fix missing mrioc->evtack_cmds initialization
scsi: mpi3mr: Fix issues in mpi3mr_get_all_tgt_info()
scsi: mpi3mr: Remove unnecessary memcpy() to alltgt_info->dmi
btrfs: hold block group refcount during async discard
locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
ksmbd: fix wrong data area length for smb2 lock request
ksmbd: do not allow the actual frame length to be smaller than the rfc1002 length
ksmbd: fix possible memory leak in smb2_lock()
torture: Fix hang during kthread shutdown phase
ARM: dts: exynos: correct HDMI phy compatible in Exynos4
io_uring: mark task TASK_RUNNING before handling resume/task work
hfs: fix missing hfs_bnode_get() in __hfs_bnode_create
fs: hfsplus: fix UAF issue in hfsplus_put_super
exfat: fix reporting fs error when reading dir beyond EOF
exfat: fix unexpected EOF while reading dir
exfat: redefine DIR_DELETED as the bad cluster number
exfat: fix inode->i_blocks for non-512 byte sector size device
fs: dlm: don't set stop rx flag after node reset
fs: dlm: move sending fin message into state change handling
fs: dlm: send FIN ack back in right cases
f2fs: fix information leak in f2fs_move_inline_dirents()
f2fs: retry to update the inode page given data corruption
f2fs: fix cgroup writeback accounting with fs-layer encryption
f2fs: fix kernel crash due to null io->bio
ocfs2: fix defrag path triggering jbd2 ASSERT
ocfs2: fix non-auto defrag path not working issue
fs/cramfs/inode.c: initialize file_ra_state
selftests/landlock: Skip overlayfs tests when not supported
selftests/landlock: Test ptrace as much as possible with Yama
udf: Truncate added extents on failed expansion
udf: Do not bother merging very long extents
udf: Do not update file length for failed writes to inline files
udf: Preserve link count of system files
udf: Detect system inodes linked into directory hierarchy
udf: Fix file corruption when appending just after end of preallocated extent
md: don't update recovery_cp when curr_resync is ACTIVE
RDMA/siw: Fix user page pinning accounting
KVM: Destroy target device if coalesced MMIO unregistration fails
KVM: VMX: Fix crash due to uninitialized current_vmcs
KVM: Register /dev/kvm as the _very_ last thing during initialization
KVM: x86: Purge "highest ISR" cache when updating APICv state
KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps
KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabled
KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID
KVM: SVM: Flush the "current" TLB when activating AVIC
KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target
KVM: SVM: Don't put/load AVIC when setting virtual APIC mode
KVM: x86: Inject #GP if WRMSR sets reserved bits in APIC Self-IPI
KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits 63:32
KVM: SVM: Fix potential overflow in SEV's send|receive_update_data()
KVM: SVM: hyper-v: placate modpost section mismatch error
selftests: x86: Fix incorrect kernel headers search path
x86/virt: Force GIF=1 prior to disabling SVM (for reboot flows)
x86/crash: Disable virt in core NMI crash handler to avoid double shootdown
x86/reboot: Disable virtualization in an emergency if SVM is supported
x86/reboot: Disable SVM, not just VMX, when stopping CPUs
x86/kprobes: Fix __recover_optprobed_insn check optimizing logic
x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range
x86/microcode/amd: Remove load_microcode_amd()'s bsp parameter
x86/microcode/AMD: Add a @cpu parameter to the reloading functions
x86/microcode/AMD: Fix mixed steppings support
x86/speculation: Allow enabling STIBP with legacy IBRS
Documentation/hw-vuln: Document the interaction between IBRS and STIBP
virt/sev-guest: Return -EIO if certificate buffer is not large enough
brd: mark as nowait compatible
brd: return 0/-error from brd_insert_page()
brd: check for REQ_NOWAIT and set correct page allocation mask
ima: fix error handling logic when file measurement failed
ima: Align ima_file_mmap() parameters with mmap_file LSM hook
selftests/powerpc: Fix incorrect kernel headers search path
selftests/ftrace: Fix eprobe syntax test case to check filter support
selftests: sched: Fix incorrect kernel headers search path
selftests: core: Fix incorrect kernel headers search path
selftests: pid_namespace: Fix incorrect kernel headers search path
selftests: arm64: Fix incorrect kernel headers search path
selftests: clone3: Fix incorrect kernel headers search path
selftests: pidfd: Fix incorrect kernel headers search path
selftests: membarrier: Fix incorrect kernel headers search path
selftests: kcmp: Fix incorrect kernel headers search path
selftests: media_tests: Fix incorrect kernel headers search path
selftests: gpio: Fix incorrect kernel headers search path
selftests: filesystems: Fix incorrect kernel headers search path
selftests: user_events: Fix incorrect kernel headers search path
selftests: ptp: Fix incorrect kernel headers search path
selftests: sync: Fix incorrect kernel headers search path
selftests: rseq: Fix incorrect kernel headers search path
selftests: move_mount_set_group: Fix incorrect kernel headers search path
selftests: mount_setattr: Fix incorrect kernel headers search path
selftests: perf_events: Fix incorrect kernel headers search path
selftests: ipc: Fix incorrect kernel headers search path
selftests: futex: Fix incorrect kernel headers search path
selftests: drivers: Fix incorrect kernel headers search path
selftests: dmabuf-heaps: Fix incorrect kernel headers search path
selftests: vm: Fix incorrect kernel headers search path
selftests: seccomp: Fix incorrect kernel headers search path
irqdomain: Fix association race
irqdomain: Fix disassociation race
irqdomain: Look for existing mapping only once
irqdomain: Drop bogus fwspec-mapping error handling
irqdomain: Refactor __irq_domain_alloc_irqs()
irqdomain: Fix mapping-creation race
irqdomain: Fix domain registration race
crypto: qat - fix out-of-bounds read
mm/damon/paddr: fix missing folio_put()
ALSA: ice1712: Do not left ice->gpio_mutex locked in aureon_add_controls()
ALSA: hda/realtek: Add quirk for HP EliteDesk 800 G6 Tower PC
jbd2: fix data missing when reusing bh which is ready to be checkpointed
ext4: optimize ea_inode block expansion
ext4: refuse to create ea block when umounted
cxl/pmem: Fix nvdimm registration races
mtd: spi-nor: sfdp: Fix index value for SCCR dwords
mtd: spi-nor: spansion: Consider reserved bits in CFR5 register
mtd: spi-nor: Fix shift-out-of-bounds in spi_nor_set_erase_type
dm: send just one event on resize, not two
dm: add cond_resched() to dm_wq_work()
dm: add cond_resched() to dm_wq_requeue_work()
wifi: rtw88: use RTW_FLAG_POWERON flag to prevent to power on/off twice
wifi: rtl8xxxu: Use a longer retry limit of 48
wifi: ath11k: allow system suspend to survive ath11k
wifi: cfg80211: Fix use after free for wext
wifi: cfg80211: Set SSID if it is not already set
cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
qede: fix interrupt coalescing configuration
thermal: intel: powerclamp: Fix cur_state for multi package system
dm flakey: fix logic when corrupting a bio
dm cache: free background tracker's queued work in btracker_destroy
dm flakey: don't corrupt the zero page
dm flakey: fix a bug with 32-bit highmem systems
hwmon: (peci/cputemp) Fix off-by-one in coretemp_label allocation
hwmon: (nct6775) Fix incorrect parenthesization in nct6775_write_fan_div()
ARM: dts: qcom: sdx65: Add Qcom SMMU-500 as the fallback for IOMMU node
ARM: dts: qcom: sdx55: Add Qcom SMMU-500 as the fallback for IOMMU node
ARM: dts: exynos: correct TMU phandle in Exynos4210
ARM: dts: exynos: correct TMU phandle in Exynos4
ARM: dts: exynos: correct TMU phandle in Odroid XU3 family
ARM: dts: exynos: correct TMU phandle in Exynos5250
ARM: dts: exynos: correct TMU phandle in Odroid XU
ARM: dts: exynos: correct TMU phandle in Odroid HC1
arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP
fuse: add inode/permission checks to fileattr_get/fileattr_set
rbd: avoid use-after-free in do_rbd_add() when rbd_dev_create() fails
ceph: update the time stamps and try to drop the suid/sgid
regulator: core: Use ktime_get_boottime() to determine how long a regulator was off
panic: fix the panic_print NMI backtrace setting
mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON
alpha: fix FEN fault handling
dax/kmem: Fix leak of memory-hotplug resources
mips: fix syscall_get_nr
media: ipu3-cio2: Fix PM runtime usage_count in driver unbind
remoteproc/mtk_scp: Move clk ops outside send_lock
docs: gdbmacros: print newest record
mm: memcontrol: deprecate charge moving
mm/thp: check and bail out if page in deferred queue already
ktest.pl: Give back console on Ctrt^C on monitor
kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list
ktest.pl: Fix missing "end_monitor" when machine check fails
ktest.pl: Add RUN_TIMEOUT option with default unlimited
memory tier: release the new_memtier in find_create_memory_tier()
ring-buffer: Handle race between rb_move_tail and rb_check_pages
tools/bootconfig: fix single & used for logical condition
tracing/eprobe: Fix to add filter on eprobe description in README file
iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter
iommu/amd: Improve page fault error reporting
scsi: aacraid: Allocate cmd_priv with scsicmd
scsi: qla2xxx: Fix link failure in NPIV environment
scsi: qla2xxx: Check if port is online before sending ELS
scsi: qla2xxx: Fix DMA-API call trace on NVMe LS requests
scsi: qla2xxx: Remove unintended flag clearing
scsi: qla2xxx: Fix erroneous link down
scsi: qla2xxx: Remove increment of interface err cnt
scsi: ses: Don't attach if enclosure has no components
scsi: ses: Fix slab-out-of-bounds in ses_enclosure_data_process()
scsi: ses: Fix possible addl_desc_ptr out-of-bounds accesses
scsi: ses: Fix possible desc_ptr out-of-bounds accesses
scsi: ses: Fix slab-out-of-bounds in ses_intf_remove()
RISC-V: add a spin_shadow_stack declaration
riscv: Avoid enabling interrupts in die()
riscv: mm: fix regression due to update_mmu_cache change
riscv: jump_label: Fixup unaligned arch_static_branch function
riscv, mm: Perform BPF exhandler fixup on page fault
riscv: ftrace: Remove wasted nops for !RISCV_ISA_C
riscv: ftrace: Reduce the detour code size to half
MIPS: DTS: CI20: fix otg power gpio
PCI/PM: Observe reset delay irrespective of bridge_d3
PCI: Unify delay handling for reset and resume
PCI: hotplug: Allow marking devices as disconnected during bind/unbind
PCI: Avoid FLR for AMD FCH AHCI adapters
PCI/DPC: Await readiness of secondary bus after reset
bus: mhi: ep: Only send -ENOTCONN status if client driver is available
bus: mhi: ep: Move chan->lock to the start of processing queued ch ring
bus: mhi: ep: Save channel state locally during suspend and resume
iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
iommu/vt-d: Fix PASID directory pointer coherency
vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
vfio/type1: prevent underflow of locked_vm via exec()
vfio/type1: track locked_vm per dma
vfio/type1: restore locked_vm
drm/amd: Fix initialization for nbio 7.5.1
drm/i915/quirks: Add inverted backlight quirk for HP 14-r206nv
drm/radeon: Fix eDP for single-display iMac11,2
drm/i915: Don't use stolen memory for ring buffers with LLC
drm/i915: Don't use BAR mappings for ring buffers with LLC
drm/gud: Fix UBSAN warning
drm/edid: fix AVI infoframe aspect ratio handling
drm/edid: fix parsing of 3D modes from HDMI VSDB
qede: avoid uninitialized entries in coal_entry array
brd: use radix_tree_maybe_preload instead of radix_tree_preload
sbitmap: Advance the queue index before waking up a queue
wait: Return number of exclusive waiters awaken
sbitmap: Try each queue to wake up at least one waiter
kbuild: Port silent mode detection to future gnu make.
net: avoid double iput when sock_alloc_file fails
Linux 6.1.16
Change-Id: I705caf70ee547e6d55f38d133bdcd50713aed745
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 001c28e57187570e4b5aa4492c7a957fb6d65d7b ]
If a task oopses with irqs disabled, this can cause various cascading
problems in the oops path such as sleep-from-invalid warnings, and
potentially worse.
Since commit 0258b5fd7c ("coredump: Limit coredumps to a single
thread group"), the unconditional irq enable in coredump_task_exit()
will "fix" the irq state to be enabled early in do_exit(), so currently
this may not be triggerable, but that is coincidental and fragile.
Detect and fix the irqs_disabled() condition in the oops path before
calling do_exit(), similarly to the way in_atomic() is handled.
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://lore.kernel.org/lkml/20221004094401.708299-1-npiggin@gmail.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 7535b832c6399b5ebfc5b53af5c51dd915ee2538 upstream.
Use a temporary variable to take full advantage of READ_ONCE() behavior.
Without this, the report (and even the test) might be out of sync with
the initial test.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y5x7GXeluFmZ8E0E@hirez.programming.kicks-ass.net
Fixes: 9fc9e278a5c0 ("panic: Introduce warn_limit")
Fixes: d4ccd54d28d3 ("exit: Put an upper limit on how often we can oops")
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9db89b41117024f80b38b15954017fb293133364 upstream.
Since Oops count is now tracked and is a fairly interesting signal, add
the entry /sys/kernel/oops_count to expose it to userspace.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-3-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4ccd54d28d3c8598e2354acc13e28c060961dbb upstream.
Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
completely unexploitable exploitable (like NULL dereferences and such) if
each crash elevates a refcount by one or a lock is taken in read mode, and
this causes a counter to eventually overflow.
The most interesting counters for this are 32 bits wide (like open-coded
refcounts that don't use refcount_t). (The ldsem reader count on 32-bit
platforms is just 16 bits, but probably nobody cares about 32-bit platforms
that much nowadays.)
So let's panic the system if the kernel is constantly oopsing.
The speed of oopsing 2^32 times probably depends on several factors, like
how long the stack trace is and which unwinder you're using; an empirically
important one is whether your console is showing a graphical environment or
a text console that oopses will be printed to.
In a quick single-threaded benchmark, it looks like oopsing in a vfork()
child with a very short stack trace only takes ~510 microseconds per run
when a graphical console is active; but switching to a text console that
oopses are printed to slows it down around 87x, to ~45 milliseconds per
run.
(Adding more threads makes this faster, but the actual oops printing
happens under &die_lock on x86, so you can maybe speed this up by a factor
of around 2 and then any further improvement gets eaten up by lock
contention.)
It looks like it would take around 8-12 days to overflow a 32-bit counter
with repeated oopsing on a multi-core X86 system running a graphical
environment; both me (in an X86 VM) and Seth (with a distro kernel on
normal hardware in a standard configuration) got numbers in that ballpark.
12 days aren't *that* short on a desktop system, and you'd likely need much
longer on a typical server system (assuming that people don't run graphical
desktop environments on their servers), and this is a *very* noisy and
violent approach to exploiting the kernel; and it also seems to take orders
of magnitude longer on some machines, probably because stuff like EFI
pstore will slow it down a ton if that's active.
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20221107201317.324457-1-jannh@google.com
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-2-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Valentin Schneider makes crash-kexec work properly when invoked from
an NMI-time panic.
- ntfs bugfixes from Hawkins Jiawei
- Jiebin Sun improves IPC msg scalability by replacing atomic_t's with
percpu counters.
- nilfs2 cleanups from Minghao Chi
- lots of other single patches all over the tree!
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0Yf0gAKCRDdBJ7gKXxA
joapAQDT1d1zu7T8yf9cQXkYnZVuBKCjxKE/IsYvqaq1a42MjQD/SeWZg0wV05B8
DhJPj9nkEp6R3Rj3Mssip+3vNuceAQM=
=lUQY
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- hfs and hfsplus kmap API modernization (Fabio Francesco)
- make crash-kexec work properly when invoked from an NMI-time panic
(Valentin Schneider)
- ntfs bugfixes (Hawkins Jiawei)
- improve IPC msg scalability by replacing atomic_t's with percpu
counters (Jiebin Sun)
- nilfs2 cleanups (Minghao Chi)
- lots of other single patches all over the tree!
* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
proc: test how it holds up with mapping'less process
mailmap: update Frank Rowand email address
ia64: mca: use strscpy() is more robust and safer
init/Kconfig: fix unmet direct dependencies
ia64: update config files
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
fork: remove duplicate included header files
init/main.c: remove unnecessary (void*) conversions
proc: mark more files as permanent
nilfs2: remove the unneeded result variable
nilfs2: delete unnecessary checks before brelse()
checkpatch: warn for non-standard fixes tag style
usr/gen_init_cpio.c: remove unnecessary -1 values from int file
ipc/msg: mitigate the lock contention with percpu counter
percpu: add percpu_counter_add_local and percpu_counter_sub_local
fs/ocfs2: fix repeated words in comments
relay: use kvcalloc to alloc page array in relay_alloc_page_array
proc: make config PROC_CHILDREN depend on PROC_FS
fs: uninline inode_maybe_inc_iversion()
...
linux-next for a couple of months without, to my knowledge, any negative
reports (or any positive ones, come to that).
- Also the Maple Tree from Liam R. Howlett. An overlapping range-based
tree for vmas. It it apparently slight more efficient in its own right,
but is mainly targeted at enabling work to reduce mmap_lock contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
(https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
This has yet to be addressed due to Liam's unfortunately timed
vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down to
the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
=xfWx
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging
activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
- Debuggability:
- Change most occurances of BUG_ON() to WARN_ON_ONCE()
- Reorganize & fix TASK_ state comparisons, turn it into a bitmap
- Update/fix misc scheduler debugging facilities
- Load-balancing & regular scheduling:
- Improve the behavior of the scheduler in presence of lot of
SCHED_IDLE tasks - in particular they should not impact other
scheduling classes.
- Optimize task load tracking, cleanups & fixes
- Clean up & simplify misc load-balancing code
- Freezer:
- Rewrite the core freezer to behave better wrt thawing and be simpler
in general, by replacing PF_FROZEN with TASK_FROZEN & fixing/adjusting
all the fallout.
- Deadline scheduler:
- Fix the DL capacity-aware code
- Factor out dl_task_is_earliest_deadline() & replenish_dl_new_period()
- Relax/optimize locking in task_non_contending()
- Cleanups:
- Factor out the update_current_exec_runtime() helper
- Various cleanups, simplifications
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmM/01cRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1geZA/+PB4KC1T9aVxzaTHI36R03YgJYZmIdtxw
wTf02MixePmz+gQCbepJbempGOh5ST28aOcI0xhdYOql5B63MaUBBMlB0HvGUyDG
IU3zETqLMRtAbnSTdQFv8m++ECUtZYp8/x1FCel4WO7ya4ETkRu1NRfCoUepEhpZ
aVAlae9LH3NBaF9t7s0PT2lTjf3pIzMFRkddJ0ywJhbFR3VnWat05fAK+J6fGY8+
LS54coefNlJD4oDh5TY8uniL1j5SmWmmwbk9Cdj7bLU5P3dFSS0/+5FJNHJPVGDE
srGT7wstRUcDrN0CnZo48VIUBiApJCCDqTfJYi9wNYd0NAHvwY6MIJJgEIY8mKsI
L/qH26H81Wt+ezSZ/5JIlGlZ/LIeNaa6OO/fbWEYABBQogvvx3nxsRNUYKSQzumH
CnSBasBjLnjWyLlK4qARM9cI7NFSEK6NUigrEx/7h8JFu/8T4DlSy6LsF1HUyKgq
4+FJLAqG6cL0tcwB/fHYd0oRESN8dStnQhGxSojgufwLc7dlFULvCYF5JM/dX+/V
IKwbOfIOeOn6ViMtSOXAEGdII+IQ2/ZFPwr+8Z5JC7NzvTVL6xlu/3JXkLZR3L7o
yaXTSaz06h1vil7Z+GRf7RHc+wUeGkEpXh5vnarGZKXivhFdWsBdROIJANK+xR0i
TeSLCxQxXlU=
=KjMD
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Debuggability:
- Change most occurances of BUG_ON() to WARN_ON_ONCE()
- Reorganize & fix TASK_ state comparisons, turn it into a bitmap
- Update/fix misc scheduler debugging facilities
Load-balancing & regular scheduling:
- Improve the behavior of the scheduler in presence of lot of
SCHED_IDLE tasks - in particular they should not impact other
scheduling classes.
- Optimize task load tracking, cleanups & fixes
- Clean up & simplify misc load-balancing code
Freezer:
- Rewrite the core freezer to behave better wrt thawing and be
simpler in general, by replacing PF_FROZEN with TASK_FROZEN &
fixing/adjusting all the fallout.
Deadline scheduler:
- Fix the DL capacity-aware code
- Factor out dl_task_is_earliest_deadline() &
replenish_dl_new_period()
- Relax/optimize locking in task_non_contending()
Cleanups:
- Factor out the update_current_exec_runtime() helper
- Various cleanups, simplifications"
* tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
sched: Fix more TASK_state comparisons
sched: Fix TASK_state comparisons
sched/fair: Move call to list_last_entry() in detach_tasks
sched/fair: Cleanup loop_max and loop_break
sched/fair: Make sure to try to detach at least one movable task
sched: Show PF_flag holes
freezer,sched: Rewrite core freezer logic
sched: Widen TAKS_state literals
sched/wait: Add wait_event_state()
sched/completion: Add wait_for_completion_state()
sched: Add TASK_ANY for wait_task_inactive()
sched: Change wait_task_inactive()s match_state
freezer,umh: Clean up freezer/initrd interaction
freezer: Have {,un}lock_system_sleep() save/restore flags
sched: Rename task_running() to task_on_cpu()
sched/fair: Cleanup for SIS_PROP
sched/fair: Default to false in test_idle_cores()
sched/fair: Remove useless check in select_idle_core()
sched/fair: Avoid double search on same cpu
sched/fair: Remove redundant check in select_idle_smt()
...
Recently I had a conversation where it was pointed out to me that
SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
difficult for a tracer to handle.
Keeping SIGKILL working after the process has been killed is pain
from an implementation point of view.
So since the debuggers don't want this behavior let's see if we can
remove this wart for the userspace API
If a regression is detected it should only need to be the last change
that is the reverted. The other two are just general cleanups that
make the last patch simpler.
Eric W. Biederman (3):
signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
signal: Drop signals received after a fatal signal has been processed
fs/coredump.c | 2 +-
include/linux/sched/signal.h | 1 +
kernel/exit.c | 20 +++++++++++++++++++-
kernel/fork.c | 2 ++
kernel/signal.c | 3 ++-
5 files changed, 25 insertions(+), 3 deletions(-)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmM7U40ACgkQC/v6Eiaj
j0B7eRAAst6EW4nkxiDBN/PRo43Kz7tkYCZGLAq73vZAaKkyTFSwghT85cZwTsuc
px/iDPWZpSGdIeHhdt1giJeuFG0FuoMR9prQVx3z/fM6KLXEJb86OtW1c1uW00Bh
TkhVPQiF/9bc+Eb/nRKF61NA4sP0OXwO1+t/zBL4clekO9vFLP1DpRBE9OrNlHq2
NlDqoPqq6SsKYG8f+J2LJKKzRWLICvHCtz4uUt6O11Wt/PH2TZYdYnXf/2vvZyzS
SOs8kQjw4QPSptcNH/LIZz0WNHagn1uU/2/T106LOgPgcr515T4MhqOK+Wp95BjA
0RrhsfdMD+7HArqLmt9VKgKO9Gs+T/M6jQZgpUyzlw3qPorKUGu4s9UnUgS7l4uz
oNV9no1Ei2fQ+YR6RBR74579a45FWkqRBsaia59KFCzZxRsQz8VB1cqcIgl9dFUA
f81qt9FiX8duVYZIcoT79BvV1bQ3LGwyygrH+X3sDTQSdN/aeZ24JdGP2MNtYO/c
jWN/mMVHc1xOsYmACVGETWhZf0Y7lGGBYSIJ92jT2HxIuEiqmFE4kngRjKcYgL/G
1/o9z8VntMq5t+ZcQY5WfK/WpSeSPXa6TsP80PEm/hvdMwZLEO4IzCR+gPJB6oJS
KPQ3EZfRCB2Jb1qBmcpbleA/RBA0iJUZtBvtIPI7QmMTR+VRdhc=
=gX4y
-----END PGP SIGNATURE-----
Merge tag 'signal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace update from Eric Biederman:
"ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
Recently I had a conversation where it was pointed out to me that
SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
difficult for a tracer to handle.
Keeping SIGKILL working after the process has been killed is pain from
an implementation point of view.
So since the debuggers don't want this behavior let's see if we can
remove this wart for the userspace API
If a regression is detected it should only need to be the last change
that is the reverted. The other two are just general cleanups that
make the last patch simpler"
* tag 'signal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Drop signals received after a fatal signal has been processed
signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
To further exploit spatial locality, the aging prefers to walk page tables
to search for young PTEs and promote hot pages. A kill switch will be
added in the next patch to disable this behavior. When disabled, the
aging relies on the rmap only.
NB: this behavior has nothing similar with the page table scanning in the
2.4 kernel [1], which searches page tables for old PTEs, adds cold pages
to swapcache and unmaps them.
To avoid confusion, the term "iteration" specifically means the traversal
of an entire mm_struct list; the term "walk" will be applied to page
tables and the rmap, as usual.
An mm_struct list is maintained for each memcg, and an mm_struct follows
its owner task to the new memcg when this task is migrated. Given an
lruvec, the aging iterates lruvec_memcg()->mm_list and calls
walk_page_range() with each mm_struct on this list to promote hot pages
before it increments max_seq.
When multiple page table walkers iterate the same list, each of them gets
a unique mm_struct; therefore they can run concurrently. Page table
walkers ignore any misplaced pages, e.g., if an mm_struct was migrated,
pages it left in the previous memcg will not be promoted when its current
memcg is under reclaim. Similarly, page table walkers will not promote
pages from nodes other than the one under reclaim.
This patch uses the following optimizations when walking page tables:
1. It tracks the usage of mm_struct's between context switches so that
page table walkers can skip processes that have been sleeping since
the last iteration.
2. It uses generational Bloom filters to record populated branches so
that page table walkers can reduce their search space based on the
query results, e.g., to skip page tables containing mostly holes or
misplaced pages.
3. It takes advantage of the accessed bit in non-leaf PMD entries when
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y.
4. It does not zigzag between a PGD table and the same PMD table
spanning multiple VMAs. IOW, it finishes all the VMAs within the
range of the same PMD table before it returns to a PGD table. This
improves the cache performance for workloads that have large
numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5.
Server benchmark results:
Single workload:
fio (buffered I/O): no change
Single workload:
memcached (anon): +[8, 10]%
Ops/sec KB/sec
patch1-7: 1147696.57 44640.29
patch1-8: 1245274.91 48435.66
Configurations:
no change
Client benchmark results:
kswapd profiles:
patch1-7
48.16% lzo1x_1_do_compress (real work)
8.20% page_vma_mapped_walk (overhead)
7.06% _raw_spin_unlock_irq
2.92% ptep_clear_flush
2.53% __zram_bvec_write
2.11% do_raw_spin_lock
2.02% memmove
1.93% lru_gen_look_around
1.56% free_unref_page_list
1.40% memset
patch1-8
49.44% lzo1x_1_do_compress (real work)
6.19% page_vma_mapped_walk (overhead)
5.97% _raw_spin_unlock_irq
3.13% get_pfn_folio
2.85% ptep_clear_flush
2.42% __zram_bvec_write
2.08% do_raw_spin_lock
1.92% memmove
1.44% alloc_zspage
1.36% memset
Configurations:
no change
Thanks to the following developers for their efforts [3].
kernel test robot <lkp@intel.com>
[1] https://lwn.net/Articles/23732/
[2] https://llvm.org/docs/ScudoHardenedAllocator.html
[3] https://lore.kernel.org/r/202204160827.ekEARWQo-lkp@intel.com/
Link: https://lkml.kernel.org/r/20220918080010.2920238-9-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rewrite the core freezer to behave better wrt thawing and be simpler
in general.
By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
ensured frozen tasks stay frozen until thawed and don't randomly wake
up early, as is currently possible.
As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
two PF_flags (yay!).
Specifically; the current scheme works a little like:
freezer_do_not_count();
schedule();
freezer_count();
And either the task is blocked, or it lands in try_to_freezer()
through freezer_count(). Now, when it is blocked, the freezer
considers it frozen and continues.
However, on thawing, once pm_freezing is cleared, freezer_count()
stops working, and any random/spurious wakeup will let a task run
before its time.
That is, thawing tries to thaw things in explicit order; kernel
threads and workqueues before doing bringing SMP back before userspace
etc.. However due to the above mentioned races it is entirely possible
for userspace tasks to thaw (by accident) before SMP is back.
This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
where the userspace task requires a special CPU to run.
As said; replace this with a special task state TASK_FROZEN and add
the following state transitions:
TASK_FREEZABLE -> TASK_FROZEN
__TASK_STOPPED -> TASK_FROZEN
__TASK_TRACED -> TASK_FROZEN
The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL
(IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state
is already required to deal with spurious wakeups and the freezer
causes one such when thawing the task (since the original state is
lost).
The special __TASK_{STOPPED,TRACED} states *can* be restored since
their canonical state is in ->jobctl.
With this, frozen tasks need an explicit TASK_FROZEN wakeup and are
free of undue (early / spurious) wakeups.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
Track how many threads have not started exiting and when the last
thread starts exiting set SIGNAL_GROUP_EXIT.
This guarantees that SIGNAL_GROUP_EXIT will get set when a process
exits. In practice this achieves nothing as glibc's implementation of
_exit calls sys_group_exit then sys_exit. While glibc's implemenation
of pthread_exit calls exit (which cleansup and calls _exit) if it is
the last thread and sys_exit if it is the last thread.
This means the only way the kernel might observe a process that does
not set call exit_group is if the language runtime does not use glibc.
With more cleanups I hope to move the decrement of quick_threads
earlier.
Link: https://lkml.kernel.org/r/87bkukd4tc.fsf_-_@email.froward.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function do_group_exit has an optimization that avoids taking
siglock and doing the work to find other threads in the signal group
and shutting them down.
It is very desirable for SIGNAL_GROUP_EXIT to always been set whenever
it is decided for the process to exit. That ensures only a single
place needs to be tested, and a single bit of state needs to be looked
at. This makes the optimization in do_group_exit counter productive.
Make the code and maintenance simpler by removing this unnecessary
option.
Link: https://lkml.kernel.org/r/87letod4v3.fsf_-_@email.froward.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
As Chris explains, the comment above exit_itimers() is not correct,
we can race with proc_timers_seq_ops. Change exit_itimers() to clear
signal->posix_timers with ->siglock held.
Cc: <stable@vger.kernel.org>
Reported-by: chris@accessvector.net
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was
around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
making the semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where now anything left in tracehook.h is
some weird strange thing that is difficult to understand.
Eric W. Biederman (15):
ptrace: Move ptrace_report_syscall into ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Remove tracehook_signal_handler
task_work: Remove unnecessary include from posix_timers.h
task_work: Introduce task_work_pending
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
resume_user_mode: Move to resume_user_mode.h
tracehook: Remove tracehook.h
ptrace: Move setting/clearing ptrace_message into ptrace_stop
ptrace: Return the signal to continue with from ptrace_stop
Jann Horn (1):
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Yang Li (1):
ptrace: Remove duplicated include in ptrace.c
MAINTAINERS | 1 -
arch/Kconfig | 5 +-
arch/alpha/kernel/ptrace.c | 5 +-
arch/alpha/kernel/signal.c | 4 +-
arch/arc/kernel/ptrace.c | 5 +-
arch/arc/kernel/signal.c | 4 +-
arch/arm/kernel/ptrace.c | 12 +-
arch/arm/kernel/signal.c | 4 +-
arch/arm64/kernel/ptrace.c | 14 +--
arch/arm64/kernel/signal.c | 4 +-
arch/csky/kernel/ptrace.c | 5 +-
arch/csky/kernel/signal.c | 4 +-
arch/h8300/kernel/ptrace.c | 5 +-
arch/h8300/kernel/signal.c | 4 +-
arch/hexagon/kernel/process.c | 4 +-
arch/hexagon/kernel/signal.c | 1 -
arch/hexagon/kernel/traps.c | 6 +-
arch/ia64/kernel/process.c | 4 +-
arch/ia64/kernel/ptrace.c | 6 +-
arch/ia64/kernel/signal.c | 1 -
arch/m68k/kernel/ptrace.c | 5 +-
arch/m68k/kernel/signal.c | 4 +-
arch/microblaze/kernel/ptrace.c | 5 +-
arch/microblaze/kernel/signal.c | 4 +-
arch/mips/kernel/ptrace.c | 5 +-
arch/mips/kernel/signal.c | 4 +-
arch/nds32/include/asm/syscall.h | 2 +-
arch/nds32/kernel/ptrace.c | 5 +-
arch/nds32/kernel/signal.c | 4 +-
arch/nios2/kernel/ptrace.c | 5 +-
arch/nios2/kernel/signal.c | 4 +-
arch/openrisc/kernel/ptrace.c | 5 +-
arch/openrisc/kernel/signal.c | 4 +-
arch/parisc/kernel/ptrace.c | 7 +-
arch/parisc/kernel/signal.c | 4 +-
arch/powerpc/kernel/ptrace/ptrace.c | 8 +-
arch/powerpc/kernel/signal.c | 4 +-
arch/riscv/kernel/ptrace.c | 5 +-
arch/riscv/kernel/signal.c | 4 +-
arch/s390/include/asm/entry-common.h | 1 -
arch/s390/kernel/ptrace.c | 1 -
arch/s390/kernel/signal.c | 5 +-
arch/sh/kernel/ptrace_32.c | 5 +-
arch/sh/kernel/signal_32.c | 4 +-
arch/sparc/kernel/ptrace_32.c | 5 +-
arch/sparc/kernel/ptrace_64.c | 5 +-
arch/sparc/kernel/signal32.c | 1 -
arch/sparc/kernel/signal_32.c | 4 +-
arch/sparc/kernel/signal_64.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/um/kernel/ptrace.c | 5 +-
arch/x86/kernel/ptrace.c | 1 -
arch/x86/kernel/signal.c | 5 +-
arch/x86/mm/tlb.c | 1 +
arch/xtensa/kernel/ptrace.c | 5 +-
arch/xtensa/kernel/signal.c | 4 +-
block/blk-cgroup.c | 2 +-
fs/coredump.c | 1 -
fs/exec.c | 1 -
fs/io-wq.c | 6 +-
fs/io_uring.c | 11 +-
fs/proc/array.c | 1 -
fs/proc/base.c | 1 -
include/asm-generic/syscall.h | 2 +-
include/linux/entry-common.h | 47 +-------
include/linux/entry-kvm.h | 2 +-
include/linux/posix-timers.h | 1 -
include/linux/ptrace.h | 81 ++++++++++++-
include/linux/resume_user_mode.h | 64 ++++++++++
include/linux/sched/signal.h | 17 +++
include/linux/task_work.h | 5 +
include/linux/tracehook.h | 226 -----------------------------------
include/uapi/linux/ptrace.h | 2 +-
kernel/entry/common.c | 19 +--
kernel/entry/kvm.c | 9 +-
kernel/exit.c | 3 +-
kernel/livepatch/transition.c | 1 -
kernel/ptrace.c | 47 +++++---
kernel/seccomp.c | 1 -
kernel/signal.c | 62 +++++-----
kernel/task_work.c | 4 +-
kernel/time/posix-cpu-timers.c | 1 +
mm/memcontrol.c | 2 +-
security/apparmor/domain.c | 1 -
security/selinux/hooks.c | 1 -
85 files changed, 372 insertions(+), 495 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
/HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
=uEro
-----END PGP SIGNATURE-----
Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was around
task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where anything left in tracehook.h was
some weird strange thing that was difficult to understand"
* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: Remove duplicated include in ptrace.c
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
ptrace: Return the signal to continue with from ptrace_stop
ptrace: Move setting/clearing ptrace_message into ptrace_stop
tracehook: Remove tracehook.h
resume_user_mode: Move to resume_user_mode.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Introduce task_work_pending
task_work: Remove unnecessary include from posix_timers.h
ptrace: Remove tracehook_signal_handler
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Move ptrace_report_syscall into ptrace.h
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism
where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP.
Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is
limited to 2 instructions (and typically fewer) on branch targets not starting
with ENDBR. CET-IBT also limits speculation of the next sequential instruction
after the indirect CALL/JMP [1].
CET-IBT is fundamentally incompatible with retpolines, but provides, as
described above, speculation limits itself.
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp
bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD
1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX
urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF
BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b
y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde
go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht
9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW
ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT
x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T
dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL
EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz
beB/7u2KIUbKEkSN
=jZfK
-----END PGP SIGNATURE-----
Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
"Add support for Intel CET-IBT, available since Tigerlake (11th gen),
which is a coarse grained, hardware based, forward edge
Control-Flow-Integrity mechanism where any indirect CALL/JMP must
target an ENDBR instruction or suffer #CP.
Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
is limited to 2 instructions (and typically fewer) on branch targets
not starting with ENDBR. CET-IBT also limits speculation of the next
sequential instruction after the indirect CALL/JMP [1].
CET-IBT is fundamentally incompatible with retpolines, but provides,
as described above, speculation limits itself"
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
kvm/emulate: Fix SETcc emulation for ENDBR
x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0
x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0
kbuild: Fixup the IBT kbuild changes
x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
x86: Remove toolchain check for X32 ABI capability
x86/alternative: Use .ibt_endbr_seal to seal indirect calls
objtool: Find unused ENDBR instructions
objtool: Validate IBT assumptions
objtool: Add IBT/ENDBR decoding
objtool: Read the NOENDBR annotation
x86: Annotate idtentry_df()
x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
x86: Annotate call_on_stack()
objtool: Rework ASM_REACHABLE
x86: Mark __invalid_creds() __noreturn
exit: Mark do_group_exit() __noreturn
x86: Mark stop_this_cpu() __noreturn
objtool: Ignore extra-symbol code
objtool: Rename --duplicate to --lto
...
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout
the stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower
iTLB pressure and prevents identity mapping huge pages from
getting split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop
the user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if called
from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling
of TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
/DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
=ELpe
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"The sprinkling of SPI drivers is because we added a new one and Mark
sent us a SPI driver interface conversion pull request.
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout the
stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower iTLB
pressure and prevents identity mapping huge pages from getting
split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop the
user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if
called from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling of
TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup"
* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
llc: fix netdevice reference leaks in llc_ui_bind()
drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
ice: don't allow to run ice_send_event_to_aux() in atomic ctx
ice: fix 'scheduling while atomic' on aux critical err interrupt
net/sched: fix incorrect vlan_push_eth dest field
net: bridge: mst: Restrict info size queries to bridge ports
net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
drivers: net: xgene: Fix regression in CRC stripping
net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
net: dsa: fix missing host-filtered multicast addresses
net/mlx5e: Fix build warning, detected write beyond size of field
iwlwifi: mvm: Don't fail if PPAG isn't supported
selftests/bpf: Fix kprobe_multi test.
Revert "rethook: x86: Add rethook x86 implementation"
Revert "arm64: rethook: Add arm64 rethook implementation"
Revert "powerpc: Add rethook support"
Revert "ARM: rethook: Add rethook arm implementation"
netdevice: add missing dm_private kdoc
net: bridge: mst: prevent NULL deref in br_mst_info_size()
selftests: forwarding: Use same VRF for port and VLAN upper
...
There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good. This
was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly
tricky and error-prone code.
There is a small merge conflict against a parisc cleanup, the
solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel. The
hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
There are some obvious conflicts against changes to the removed
files.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
=vtCN
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good.
This was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly tricky
and error-prone code. There is a small merge conflict against a
parisc cleanup, the solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel.
The hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks"
* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
nds32: Remove the architecture
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
sparc64: fix building assembly files
...
Add a return hook framework which hooks the function return. Most of the
logic came from the kretprobe, but this is independent from kretprobe.
Note that this is expected to be used with other function entry hooking
feature, like ftrace, fprobe, adn kprobes. Eventually this will replace
the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment,
this is just an additional hook.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2
Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.
Update the few files that were depending upon tracehook.h to bring in
definitions to use the headers they need directly.
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-13-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
There are no remaining callers of set_fs(), so CONFIG_SET_FS
can be removed globally, along with the thread_info field and
any references to it.
This turns access_ok() into a cheaper check against TASK_SIZE_MAX.
As CONFIG_SET_FS is now gone, drop all remaining references to
set_fs()/get_fs(), mm_segment_t, user_addr_max() and uaccess_kernel().
Acked-by: Sam Ravnborg <sam@ravnborg.org> # for sparc32 changes
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Sergey Matyukevich <sergey.matyukevich@synopsys.com> # for arc changes
Acked-by: Stafford Horne <shorne@gmail.com> # [openrisc, asm-generic]
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
There is no need to perform the stack accounting of the outgoing task in
its final schedule() invocation which happens with preemption disabled.
The task is leaving, the resources will be freed and the accounting can
happen in do_exit() before the actual schedule invocation which
frees the stack memory.
Move the accounting of the stack memory from release_task_stack() to
exit_task_stack_account() which then can be invoked from do_exit().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20220217102406.3697941-7-bigeasy@linutronix.de
blk_needs_flush_plug fails to account for the cb_list, which needs
flushing as well. Remove it and just check if there is a plug instead
of poking into the internals of the plug structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220127070549.1377856-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The function wait_task_zombie is defined to always returns the process not
thread exit status. Unfortunately when process group exit support
was added to wait_task_zombie the WNOWAIT case was overlooked.
Usually tsk->exit_code and tsk->signal->group_exit_code will be in sync
so fixing this is bug probably has no effect in practice. But fix
it anyway so that people aren't scratching their heads about why
the two code paths are different.
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 2c66151cbc2c ("[PATCH] sys_exit() threading improvements, BK-curr")
Link: https://lkml.kernel.org/r/20220103213312.9144-3-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The comment about coredumps not reaching do_group_exit and the
corresponding BUG_ON are bogus.
What happens and has happened for years is that get_signal calls
do_coredump (which sets SIGNAL_GROUP_EXIT and group_exit_code) and
then do_group_exit passing the signal number. Then do_group_exit
ignores the exit_code it is passed and uses signal->group_exit_code
from the coredump.
The comment and BUG_ON were correct when they were added during the
2.5 development cycle, but became obsolete and incorrect when
get_signal was changed to fall through to do_group_exit after
do_coredump in 2.6.10-rc2.
So remove the stale comment and BUG_ON
Fixes: 63bd6144f191 ("[PATCH] Invalid BUG_ONs in signal.c")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Link: https://lkml.kernel.org/r/20220103213312.9144-2-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When I say remove I mean remove. All profile_task_exit and
profile_munmap do is call a blocking notifier chain. The helpers
profile_task_register and profile_task_unregister are not called
anywhere in the tree. Which means this is all dead code.
So remove the dead code and make it easier to read do_exit.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/20220103213312.9144-1-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This helper is misleading. It tests for an ongoing exec as well as
the process having received a fatal signal.
Sometimes it is appropriate to treat an on-going exec differently than
a process that is shutting down due to a fatal signal. In particular
taking the fast path out of exit_signals instead of retargeting
signals is not appropriate during exec, and not changing the the exit
code in do_group_exit during exec.
Removing the helper makes it more obvious what is going on as both
cases must be coded for explicitly.
While removing the helper fix the two cases where I have observed
using signal_group_exit resulted in the wrong result.
In exit_signals only test for SIGNAL_GROUP_EXIT so that signals are
retargetted during an exec.
In do_group_exit use 0 as the exit code during an exec as de_thread
does not set group_exit_code. As best as I can determine
group_exit_code has been is set to 0 most of the time during
de_thread. During a thread group stop group_exit_code is set to the
stop signal and when the thread group receives SIGCONT group_exit_code
is reset to 0.
Link: https://lkml.kernel.org/r/20211213225350.27481-8-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The only remaining user of group_exit_task is exec. Rename the field
so that it is clear which part of the code uses it.
Update the comment above the definition of group_exec_task to document
how it is currently used.
Link: https://lkml.kernel.org/r/20211213225350.27481-7-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
With kernel threads on architectures that still have set_fs/get_fs
running as KERNEL_DS moving force_uaccess_begin does not appear safe.
Calling force_uaccess_begin is a noop on anything people care about.
Update the comment to explain why this code while looking like an
obvious candidate for moving to make_task_dead probably needs to
remain in do_exit until set_fs/get_fs are entirely removed from the
kernel.
Fixes: 05ea0424f0 ("exit: Move oops specific logic from do_exit into make_task_dead")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/YdUxGKRcSiDy8jGg@zeniv-ca.linux.org.uk
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Change the task state to EXIT_DEAD and take an extra rcu_refernce
to guarantee the task will not be reaped and that it will not be
freed.
Link: https://lkml.kernel.org/r/YdUzjrLAlRiNLQp2@zeniv-ca.linux.org.uk
Pointed-out-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 7f80a2fd7d ("exit: Stop poorly open coding do_task_dead in make_task_dead")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Update complete_and_exit to call kthread_exit instead of do_exit.
Change the name to reflect this change in functionality. All of the
users of complete_and_exit are causing the current kthread to exit so
this change makes it clear what is happening.
Move the implementation of kthread_complete_and_exit from
kernel/exit.c to to kernel/kthread.c. As this function is kthread
specific it makes most sense to live with the kthread functions.
There are no functional change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Now that there are no more modular uses of do_exit remove the EXPORT_SYMBOL.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When the kernel detects it is oops or otherwise force killing a task
while it exits the code poorly attempts to permanently stop the task
from scheduling.
I say poorly because it is possible for a task in TASK_UINTERRUPTIBLE
to be woken up.
As it makes no sense for the task to continue call do_task_dead
instead which actually does the work and permanently removes the task
from the scheduler. Guaranteeing the task will never be woken
up again.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The beginning of do_exit has become cluttered and difficult to read as
it is filled with checks to handle things that can only happen when
the kernel is operating improperly.
Now that we have a dedicated function for cleaning up a task when the
kernel is operating improperly move the checks there.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
There are two big uses of do_exit. The first is it's design use to be
the guts of the exit(2) system call. The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.
Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure. In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.
Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.
As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull per signal_struct coredumps from Eric Biederman:
"Current coredumps are mixed up with the exit code, the signal handling
code, and the ptrace code making coredumps much more complicated than
necessary and difficult to follow.
This series of changes starts with ptrace_stop and cleans it up,
making it easier to follow what is happening in ptrace_stop. Then
cleans up the exec interactions with coredumps. Then cleans up the
coredump interactions with exit. Finally the coredump interactions
with the signal handling code is cleaned up.
The first and last changes are bug fixes for minor bugs.
I believe the fact that vfork followed by execve can kill the process
the called vfork if exec fails is sufficient justification to change
the userspace visible behavior.
In previous discussions some of these changes were organized
differently and individually appeared to make the code base worse. As
currently written I believe they all stand on their own as cleanups
and bug fixes.
Which means that even if the worst should happen and the last change
needs to be reverted for some unimaginable reason, the code base will
still be improved.
If the worst does not happen there are a more cleanups that can be
made. Signals that generate coredumps can easily become eligible for
short circuit delivery in complete_signal. The entire rendezvous for
generating a coredump can move into get_signal. The function
force_sig_info_to_task be written in a way that does not modify the
signal handling state of the target task (because coredumps are
eligible for short circuit delivery). Many of these future cleanups
can be done another way but nothing so cleanly as if coredumps become
per signal_struct"
* 'per_signal_struct_coredumps-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
coredump: Limit coredumps to a single thread group
coredump: Don't perform any cleanups before dumping core
exit: Factor coredump_exit_mm out of exit_mm
exec: Check for a pending fatal signal instead of core_state
ptrace: Remove the unnecessary arguments from arch_ptrace_stop
signal: Remove the bogus sigkill_pending in ptrace_stop
- Revert the printk format based wchan() symbol resolution as it can leak
the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset and
__sched_setscheduler() introduced a new lock dependency which is now
triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
/1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
d2qWG7UvoseT+g==
=fgtS
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- Revert the printk format based wchan() symbol resolution as it can
leak the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset
and __sched_setscheduler() introduced a new lock dependency which is
now triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
sched/fair: Cleanup newidle_balance
sched/fair: Remove sysctl_sched_migration_cost condition
sched/fair: Wait before decaying max_newidle_lb_cost
sched/fair: Skip update_blocked_averages if we are defering load balance
sched/fair: Account update_blocked_averages in newidle_balance cost
x86: Fix __get_wchan() for !STACKTRACE
sched,x86: Fix L2 cache mask
sched/core: Remove rq_relock()
sched: Improve wake_up_all_idle_cpus() take #2
irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
sched: Add cluster scheduler level for x86
sched: Add cluster scheduler level in core and related Kconfig for ARM64
topology: Represent clusters of CPUs within a die
sched: Disable -Wunused-but-set-variable
sched: Add wrapper for get_wchan() to keep task blocked
x86: Fix get_wchan() to support the ORC unwinder
proc: Use task_is_running() for wchan in /proc/$pid/stat
...
Various files have acquired spurious includes of <linux/blkdev.h> over
time. Remove them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Today when a signal is delivered with a handler of SIG_DFL whose
default behavior is to generate a core dump not only that process but
every process that shares the mm is killed.
In the case of vfork this looks like a real world problem. Consider
the following well defined sequence.
if (vfork() == 0) {
execve(...);
_exit(EXIT_FAILURE);
}
If a signal that generates a core dump is received after vfork but
before the execve changes the mm the process that called vfork will
also be killed (as the mm is shared).
Similarly if the execve fails after the point of no return the kernel
delivers SIGSEGV which will kill both the exec'ing process and because
the mm is shared the process that called vfork as well.
As far as I can tell this behavior is a violation of people's
reasonable expectations, POSIX, and is unnecessarily fragile when the
system is low on memory.
Solve this by making a userspace visible change to only kill a single
process/thread group. This is possible because Jann Horn recently
modified[1] the coredump code so that the mm can safely be modified
while the coredump is happening. With LinuxThreads long gone I don't
expect anyone to have a notice this behavior change in practice.
To accomplish this move the core_state pointer from mm_struct to
signal_struct, which allows different thread groups to coredump
simultatenously.
In zap_threads remove the work to kill anything except for the current
thread group.
v2: Remove core_state from the VM_BUG_ON_MM print to fix
compile failure when CONFIG_DEBUG_VM is enabled.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
[1] a07279c9a8 ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot")
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133
Link: https://lkml.kernel.org/r/20211007144701.67592574@canb.auug.org.au
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rename coredump_exit_mm to coredump_task_exit and call it from do_exit
before PTRACE_EVENT_EXIT, and before any cleanup work for a task
happens. This ensures that an accurate copy of the process can be
captured in the coredump as no cleanup for the process happens before
the coredump completes. This also ensures that PTRACE_EVENT_EXIT
will not be visited by any thread until the coredump is complete.
Add a new flag PF_POSTCOREDUMP so that tasks that have passed through
coredump_task_exit can be recognized and ignored in zap_process.
Now that all of the coredumping happens before exit_mm remove code to
test for a coredump in progress from mm_release.
Replace "may_ptrace_stop()" with a simple test of "current->ptrace".
The other tests in may_ptrace_stop all concern avoiding stopping
during a coredump. These tests are no longer necessary as it is now
guaranteed that fatal_signal_pending will be set if the code enters
ptrace_stop during a coredump. The code in ptrace_stop is guaranteed
not to stop if fatal_signal_pending returns true.
Until this change "ptrace_event(PTRACE_EVENT_EXIT)" could call
ptrace_stop without fatal_signal_pending being true, as signals are
dequeued in get_signal before calling do_exit. This is no longer
an issue as "ptrace_event(PTRACE_EVENT_EXIT)" is no longer reached
until after the coredump completes.
Link: https://lkml.kernel.org/r/874kaax26c.fsf@disp2133
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Separate the coredump logic from the ordinary exit_mm logic
by moving the coredump logic out of exit_mm into it's own
function coredump_exit_mm.
Link: https://lkml.kernel.org/r/87a6k2x277.fsf@disp2133
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>