Commit graph

698 commits

Author SHA1 Message Date
xieliujie
7afa84fbb9 ANDROID: vendor_hooks: Add hooks for waking up and exiting control
Add hooks at process waking up and exiting routines so that oems
can control these procedures. One possible benifit is the peak of
system load can be shaved and load can be more smooth when a large
number of threads is killed once upon a time, while a sudden peak
of system load can probably lead to user junk issues.

Bug: 296493318
Change-Id: Ide5f9e63a4f50d6a9e3ffbc9516de9ce48ededef
Signed-off-by: xieliujie <xieliujie@oppo.com>
2023-08-25 18:32:06 +00:00
Liujie Xie
573ba7b6e6 ANDROID: vendor_hooks: Add hooks for memory when debug
Add vendors hooks for recording memory used

Vendor modules allocate and manages the memory itself.

These memories might not be included in kernel memory
statistics. Also, detailed references and vendor-specific
information are managed only inside modules. When
various problems such as memory leaks occurs, these
information should be showed in real-time.

Bug: 182443489
Bug: 234407991
Bug: 277799025

Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I62d8bb2b6650d8b187b433f97eb833ef0b784df1
Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com>
2023-05-25 21:06:40 +00:00
Greg Kroah-Hartman
2cb73a87e4 Merge 6.1.16 into android14-6.1
Changes in 6.1.16
	HID: asus: use spinlock to protect concurrent accesses
	HID: asus: use spinlock to safely schedule workers
	powerpc/mm: Rearrange if-else block to avoid clang warning
	ata: ahci: Revert "ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"
	ARM: OMAP2+: Fix memory leak in realtime_counter_init()
	arm64: dts: qcom: qcs404: use symbol names for PCIe resets
	arm64: dts: qcom: msm8996-tone: Fix USB taking 6 minutes to wake up
	arm64: dts: qcom: sm8150-kumano: Panel framebuffer is 2.5k instead of 4k
	arm64: dts: qcom: sm6350: Fix up the ramoops node
	arm64: dts: qcom: sm6125: Reorder HSUSB PHY clocks to match bindings
	arm64: dts: qcom: sm6125-seine: Clean up gpio-keys (volume down)
	arm64: dts: imx8m: Align SoC unique ID node unit address
	ARM: zynq: Fix refcount leak in zynq_early_slcr_init
	arm64: dts: mediatek: mt8195: Add power domain to U3PHY1 T-PHY
	arm64: dts: mediatek: mt8183: Fix systimer 13 MHz clock description
	arm64: dts: mediatek: mt8192: Fix systimer 13 MHz clock description
	arm64: dts: mediatek: mt8195: Fix systimer 13 MHz clock description
	arm64: dts: mediatek: mt8186: Fix systimer 13 MHz clock description
	arm64: dts: qcom: sdm845-db845c: fix audio codec interrupt pin name
	x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC
	arm64: dts: qcom: sc7180: correct SPMI bus address cells
	arm64: dts: qcom: sc7280: correct SPMI bus address cells
	arm64: dts: qcom: sc8280xp: correct SPMI bus address cells
	arm64: dts: qcom: sc8280xp: Vote for CX in USB controllers
	arm64: dts: meson-gxl: jethub-j80: Fix WiFi MAC address node
	arm64: dts: meson-gxl: jethub-j80: Fix Bluetooth MAC node name
	arm64: dts: meson-axg: jethub-j1xx: Fix MAC address node names
	arm64: dts: meson-gx: Fix Ethernet MAC address unit name
	arm64: dts: meson-g12a: Fix internal Ethernet PHY unit name
	arm64: dts: meson-gx: Fix the SCPI DVFS node name and unit address
	cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE *again*
	arm64: dts: ti: k3-am62: Enable SPI nodes at the board level
	arm64: dts: ti: k3-am62-main: Fix clocks for McSPI
	arm64: tegra: Fix duplicate regulator on Jetson TX1
	arm64: dts: msm8992-bullhead: add memory hole region
	arm64: dts: qcom: msm8992-bullhead: Fix cont_splash_mem size
	arm64: dts: qcom: msm8992-bullhead: Disable dfps_data_mem
	arm64: dts: qcom: ipq8074: correct USB3 QMP PHY-s clock output names
	arm64: dts: qcom: ipq8074: fix Gen2 PCIe QMP PHY
	arm64: dts: qcom: ipq8074: fix Gen3 PCIe QMP PHY
	arm64: dts: qcom: ipq8074: correct Gen2 PCIe ranges
	arm64: dts: qcom: ipq8074: fix Gen3 PCIe node
	arm64: dts: qcom: ipq8074: correct PCIe QMP PHY output clock names
	arm64: dts: meson: remove CPU opps below 1GHz for G12A boards
	ARM: OMAP1: call platform_device_put() in error case in omap1_dm_timer_init()
	arm64: dts: mediatek: mt8192: Mark scp_adsp clock as broken
	ARM: bcm2835_defconfig: Enable the framebuffer
	ARM: s3c: fix s3c64xx_set_timer_source prototype
	arm64: dts: ti: k3-j7200: Fix wakeup pinmux range
	ARM: dts: exynos: correct wr-active property in Exynos3250 Rinato
	ARM: imx: Call ida_simple_remove() for ida_simple_get
	arm64: dts: amlogic: meson-gx: fix SCPI clock dvfs node name
	arm64: dts: amlogic: meson-axg: fix SCPI clock dvfs node name
	arm64: dts: amlogic: meson-gx: add missing SCPI sensors compatible
	arm64: dts: amlogic: meson-axg-jethome-jethub-j1xx: fix supply name of USB controller node
	arm64: dts: amlogic: meson-gxl-s905d-sml5442tw: drop invalid clock-names property
	arm64: dts: amlogic: meson-gx: add missing unit address to rng node name
	arm64: dts: amlogic: meson-gxl-s905w-jethome-jethub-j80: fix invalid rtc node name
	arm64: dts: amlogic: meson-axg-jethome-jethub-j1xx: fix invalid rtc node name
	arm64: dts: amlogic: meson-gxl: add missing unit address to eth-phy-mux node name
	arm64: dts: amlogic: meson-gx-libretech-pc: fix update button name
	arm64: dts: amlogic: meson-sm1-bananapi-m5: fix adc keys node names
	arm64: dts: amlogic: meson-gxl-s905d-phicomm-n1: fix led node name
	arm64: dts: amlogic: meson-gxbb-kii-pro: fix led node name
	arm64: dts: amlogic: meson-sm1-odroid-hc4: fix active fan thermal trip
	locking/rwsem: Disable preemption in all down_read*() and up_read() code paths
	arm64: dts: renesas: beacon-renesom: Fix gpio expander reference
	arm64: dts: meson: radxa-zero: allow usb otg mode
	arm64: dts: meson: bananapi-m5: switch VDDIO_C pin to OPEN_DRAIN
	ARM: dts: sun8i: nanopi-duo2: Fix regulator GPIO reference
	ublk_drv: remove nr_aborted_queues from ublk_device
	ublk_drv: don't probe partitions if the ubq daemon isn't trusted
	ARM: dts: imx7s: correct iomuxc gpr mux controller cells
	sbitmap: remove redundant check in __sbitmap_queue_get_batch
	sbitmap: Use single per-bitmap counting to wake up queued tags
	sbitmap: correct wake_batch recalculation to avoid potential IO hung
	arm64: dts: mt8195: Fix CPU map for single-cluster SoC
	arm64: dts: mt8192: Fix CPU map for single-cluster SoC
	arm64: dts: mt8186: Fix CPU map for single-cluster SoC
	arm64: dts: mediatek: mt7622: Add missing pwm-cells to pwm node
	arm64: dts: mediatek: mt8186: Fix watchdog compatible
	arm64: dts: mediatek: mt8195: Fix watchdog compatible
	arm64: dts: mediatek: mt7986: Fix watchdog compatible
	ARM: dts: stm32: Update part number NVMEM description on stm32mp131
	blk-mq: avoid sleep in blk_mq_alloc_request_hctx
	blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx
	blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait
	blk-mq: Fix potential io hung for shared sbitmap per tagset
	blk-mq: correct stale comment of .get_budget
	arm64: dts: qcom: msm8996: support using GPLL0 as kryocc input
	arm64: dts: qcom: msm8996 switch from RPM_SMD_BB_CLK1 to RPM_SMD_XO_CLK_SRC
	arm64: dts: qcom: sm8350: drop incorrect cells from serial
	arm64: dts: qcom: sm8450: drop incorrect cells from serial
	arm64: dts: qcom: msm8992-lg-bullhead: Correct memory overlaps with the SMEM and MPSS memory regions
	arm64: dts: qcom: msm8953: correct TLMM gpio-ranges
	arm64: dts: qcom: msm8992-*: Fix up comments
	arm64: dts: qcom: msm8992-lg-bullhead: Enable regulators
	s390/dasd: Fix potential memleak in dasd_eckd_init()
	sched/rt: pick_next_rt_entity(): check list_entry
	perf/x86/intel/ds: Fix the conversion from TSC to perf time
	x86/perf/zhaoxin: Add stepping check for ZXC
	KEYS: asymmetric: Fix ECDSA use via keyctl uapi
	block: ublk: check IO buffer based on flag need_get_data
	arm64: dts: qcom: pmk8350: Specify PBS register for PON
	arm64: dts: qcom: pmk8350: Use the correct PON compatible
	erofs: relinquish volume with mutex held
	block: sync mixed merged request's failfast with 1st bio's
	block: Fix io statistics for cgroup in throttle path
	block: bio-integrity: Copy flags when bio_integrity_payload is cloned
	block: use proper return value from bio_failfast()
	wifi: mt76: mt7915: add missing of_node_put()
	wifi: mt76: mt7921s: fix slab-out-of-bounds access in sdio host
	wifi: mt76: mt7915: check return value before accessing free_block_num
	wifi: mt76: mt7915: drop always true condition of __mt7915_reg_addr()
	wifi: mt76: mt7915: fix unintended sign extension of mt7915_hw_queue_read()
	wifi: mt76: fix coverity uninit_use_in_call in mt76_connac2_reverse_frag0_hdr_trans()
	wifi: rsi: Fix memory leak in rsi_coex_attach()
	wifi: rtlwifi: rtl8821ae: don't call kfree_skb() under spin_lock_irqsave()
	wifi: rtlwifi: rtl8188ee: don't call kfree_skb() under spin_lock_irqsave()
	wifi: rtlwifi: rtl8723be: don't call kfree_skb() under spin_lock_irqsave()
	wifi: iwlegacy: common: don't call dev_kfree_skb() under spin_lock_irqsave()
	wifi: libertas: fix memory leak in lbs_init_adapter()
	wifi: rtl8xxxu: don't call dev_kfree_skb() under spin_lock_irqsave()
	wifi: rtw89: 8852c: rfk: correct DACK setting
	wifi: rtw89: 8852c: rfk: correct DPK settings
	wifi: rtlwifi: Fix global-out-of-bounds bug in _rtl8812ae_phy_set_txpower_limit()
	libbpf: Fix btf__align_of() by taking into account field offsets
	wifi: ipw2x00: don't call dev_kfree_skb() under spin_lock_irqsave()
	wifi: ipw2200: fix memory leak in ipw_wdev_init()
	wifi: wilc1000: fix potential memory leak in wilc_mac_xmit()
	wifi: wilc1000: add missing unregister_netdev() in wilc_netdev_ifc_init()
	wifi: brcmfmac: fix potential memory leak in brcmf_netdev_start_xmit()
	wifi: brcmfmac: unmap dma buffer in brcmf_msgbuf_alloc_pktid()
	wifi: libertas_tf: don't call kfree_skb() under spin_lock_irqsave()
	wifi: libertas: if_usb: don't call kfree_skb() under spin_lock_irqsave()
	wifi: libertas: main: don't call kfree_skb() under spin_lock_irqsave()
	wifi: libertas: cmdresp: don't call kfree_skb() under spin_lock_irqsave()
	wifi: wl3501_cs: don't call kfree_skb() under spin_lock_irqsave()
	libbpf: Fix invalid return address register in s390
	crypto: x86/ghash - fix unaligned access in ghash_setkey()
	ACPICA: Drop port I/O validation for some regions
	genirq: Fix the return type of kstat_cpu_irqs_sum()
	rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose
	rcu-tasks: Remove preemption disablement around srcu_read_[un]lock() calls
	rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
	lib/mpi: Fix buffer overrun when SG is too long
	crypto: ccp - Avoid page allocation failure warning for SEV_GET_ID2
	platform/chrome: cros_ec_typec: Update port DP VDO
	ACPICA: nsrepair: handle cases without a return value correctly
	selftests/xsk: print correct payload for packet dump
	selftests/xsk: print correct error codes when exiting
	arm64/cpufeature: Fix field sign for DIT hwcap detection
	kselftest/arm64: Fix syscall-abi for systems without 128 bit SME
	workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex
	s390/early: fix sclp_early_sccb variable lifetime
	s390/vfio-ap: fix an error handling path in vfio_ap_mdev_probe_queue()
	x86/signal: Fix the value returned by strict_sas_size()
	thermal/drivers/tsens: Drop msm8976-specific defines
	thermal/drivers/tsens: Sort out msm8976 vs msm8956 data
	thermal/drivers/tsens: fix slope values for msm8939
	thermal/drivers/tsens: limit num_sensors to 9 for msm8939
	wifi: rtw89: fix potential leak in rtw89_append_probe_req_ie()
	wifi: rtw89: Add missing check for alloc_workqueue
	wifi: rtl8xxxu: Fix memory leaks with RTL8723BU, RTL8192EU
	wifi: orinoco: check return value of hermes_write_wordrec()
	thermal/drivers/imx_sc_thermal: Drop empty platform remove function
	thermal/drivers/imx_sc_thermal: Fix the loop condition
	wifi: ath9k: htc_hst: free skb in ath9k_htc_rx_msg() if there is no callback function
	wifi: ath9k: hif_usb: clean up skbs if ath9k_hif_usb_rx_stream() fails
	wifi: ath9k: Fix potential stack-out-of-bounds write in ath9k_wmi_rsp_callback()
	wifi: ath11k: Fix memory leak in ath11k_peer_rx_frag_setup
	wifi: cfg80211: Fix extended KCK key length check in nl80211_set_rekey_data()
	ACPI: battery: Fix missing NUL-termination with large strings
	selftests/bpf: Fix build errors if CONFIG_NF_CONNTRACK=m
	crypto: ccp - Failure on re-initialization due to duplicate sysfs filename
	crypto: essiv - Handle EBUSY correctly
	crypto: seqiv - Handle EBUSY correctly
	powercap: fix possible name leak in powercap_register_zone()
	x86/microcode: Add a parameter to microcode_check() to store CPU capabilities
	x86/microcode: Check CPU capabilities after late microcode update correctly
	x86/microcode: Adjust late loading result reporting message
	selftests/bpf: Use consistent build-id type for liburandom_read.so
	selftests/bpf: Fix vmtest static compilation error
	crypto: xts - Handle EBUSY correctly
	leds: led-class: Add missing put_device() to led_put()
	s390/bpf: Add expoline to tail calls
	wifi: iwlwifi: mei: fix compilation errors in rfkill()
	kselftest/arm64: Fix enumeration of systems without 128 bit SME
	can: rcar_canfd: Fix R-Car V3U GAFLCFG field accesses
	selftests/bpf: Initialize tc in xdp_synproxy
	crypto: ccp - Flush the SEV-ES TMR memory before giving it to firmware
	bpftool: profile online CPUs instead of possible
	wifi: mt76: mt7915: call mt7915_mcu_set_thermal_throttling() only after init_work
	wifi: mt76: mt7915: fix memory leak in mt7915_mcu_exit
	wifi: mt76: mt7915: fix WED TxS reporting
	wifi: mt76: add memory barrier to SDIO queue kick
	wifi: mt76: mt7921: fix error code of return in mt7921_acpi_read
	net/mlx5: Enhance debug print in page allocation failure
	irqchip: Fix refcount leak in platform_irqchip_probe
	irqchip/alpine-msi: Fix refcount leak in alpine_msix_init_domains
	irqchip/irq-mvebu-gicp: Fix refcount leak in mvebu_gicp_probe
	irqchip/ti-sci: Fix refcount leak in ti_sci_intr_irq_domain_probe
	s390/mem_detect: fix detect_memory() error handling
	s390/vmem: fix empty page tables cleanup under KASAN
	s390/boot: cleanup decompressor header files
	s390/mem_detect: rely on diag260() if sclp_early_get_memsize() fails
	s390/boot: fix mem_detect extended area allocation
	net: add sock_init_data_uid()
	tun: tun_chr_open(): correctly initialize socket uid
	tap: tap_open(): correctly initialize socket uid
	OPP: fix error checking in opp_migrate_dentry()
	cpufreq: davinci: Fix clk use after free
	Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeeds
	Bluetooth: L2CAP: Fix potential user-after-free
	Bluetooth: hci_qca: get wakeup status from serdev device handle
	net: ipa: generic command param fix
	s390: vfio-ap: tighten the NIB validity check
	s390/ap: fix status returned by ap_aqic()
	s390/ap: fix status returned by ap_qact()
	libbpf: Fix alen calculation in libbpf_nla_dump_errormsg()
	xen/grant-dma-iommu: Implement a dummy probe_device() callback
	rds: rds_rm_zerocopy_callback() correct order for list_add_tail()
	crypto: rsa-pkcs1pad - Use akcipher_request_complete
	m68k: /proc/hardware should depend on PROC_FS
	RISC-V: time: initialize hrtimer based broadcast clock event device
	clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
	wifi: iwl3945: Add missing check for create_singlethread_workqueue
	wifi: iwl4965: Add missing check for create_singlethread_workqueue()
	wifi: mwifiex: fix loop iterator in mwifiex_update_ampdu_txwinsize()
	selftests/bpf: Fix out-of-srctree build
	ACPI: resource: Add IRQ overrides for MAINGEAR Vector Pro 2 models
	ACPI: resource: Do IRQ override on all TongFang GMxRGxx
	crypto: octeontx2 - Fix objects shared between several modules
	crypto: crypto4xx - Call dma_unmap_page when done
	wifi: mac80211: move color collision detection report in a delayed work
	wifi: mac80211: make rate u32 in sta_set_rate_info_rx()
	wifi: mac80211: fix non-MLO station association
	wifi: mac80211: Don't translate MLD addresses for multicast
	wifi: mac80211: avoid u32_encode_bits() warning
	wifi: mac80211: fix off-by-one link setting
	tools/lib/thermal: Fix thermal_sampling_exit()
	thermal/drivers/hisi: Drop second sensor hi3660
	selftests/bpf: Fix map_kptr test.
	wifi: mac80211: pass 'sta' to ieee80211_rx_data_set_sta()
	bpf: Zeroing allocated object from slab in bpf memory allocator
	selftests/bpf: Fix xdp_do_redirect on s390x
	can: esd_usb: Move mislocated storage of SJA1000_ECC_SEG bits in case of a bus error
	can: esd_usb: Make use of can_change_state() and relocate checking skb for NULL
	xsk: check IFF_UP earlier in Tx path
	LoongArch, bpf: Use 4 instructions for function address in JIT
	bpf: Fix global subprog context argument resolution logic
	irqchip/irq-brcmstb-l2: Set IRQ_LEVEL for level triggered interrupts
	irqchip/irq-bcm7120-l2: Set IRQ_LEVEL for level triggered interrupts
	net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link()
	net/smc: fix application data exception
	selftests/net: Interpret UDP_GRO cmsg data as an int value
	l2tp: Avoid possible recursive deadlock in l2tp_tunnel_register()
	net: bcmgenet: fix MoCA LED control
	net: lan966x: Fix possible deadlock inside PTP
	net/mlx4_en: Introduce flexible array to silence overflow warning
	selftest: fib_tests: Always cleanup before exit
	sefltests: netdevsim: wait for devlink instance after netns removal
	drm: Fix potential null-ptr-deref due to drmm_mode_config_init()
	drm/fourcc: Add missing big-endian XRGB1555 and RGB565 formats
	drm/bridge: ti-sn65dsi83: Fix delay after reset deassert to match spec
	drm: mxsfb: DRM_IMX_LCDIF should depend on ARCH_MXC
	drm: mxsfb: DRM_MXSFB should depend on ARCH_MXS || ARCH_MXC
	drm/bridge: megachips: Fix error handling in i2c_register_driver()
	drm/vkms: Fix memory leak in vkms_init()
	drm/vkms: Fix null-ptr-deref in vkms_release()
	drm/vc4: dpi: Fix format mapping for RGB565
	drm: tidss: Fix pixel format definition
	gpu: ipu-v3: common: Add of_node_put() for reference returned by of_graph_get_port_by_id()
	drm/vc4: drop all currently held locks if deadlock happens
	hwmon: (ftsteutates) Fix scaling of measurements
	drm/msm/dpu: check for null return of devm_kzalloc() in dpu_writeback_init()
	drm/msm/hdmi: Add missing check for alloc_ordered_workqueue
	pinctrl: qcom: pinctrl-msm8976: Correct function names for wcss pins
	pinctrl: stm32: Fix refcount leak in stm32_pctrl_get_irq_domain
	pinctrl: rockchip: Fix refcount leak in rockchip_pinctrl_parse_groups
	drm/vc4: hvs: Set AXI panic modes
	drm/vc4: hvs: SCALER_DISPBKGND_AUTOHS is only valid on HVS4
	drm/vc4: hvs: Correct interrupt masking bit assignment for HVS5
	drm/vc4: hvs: Fix colour order for xRGB1555 on HVS5
	drm/vc4: hdmi: Correct interlaced timings again
	drm/msm: clean event_thread->worker in case of an error
	drm/panel-edp: fix name for IVO product id 854b
	scsi: qla2xxx: Fix exchange oversubscription
	scsi: qla2xxx: Fix exchange oversubscription for management commands
	scsi: qla2xxx: edif: Fix clang warning
	ASoC: fsl_sai: initialize is_dsp_mode flag
	drm/bridge: tc358767: Set default CLRSIPO count
	drm/msm/adreno: Fix null ptr access in adreno_gpu_cleanup()
	ALSA: hda/ca0132: minor fix for allocation size
	drm/amdgpu: Use the sched from entity for amdgpu_cs trace
	drm/msm/gem: Add check for kmalloc
	drm/msm/dpu: Disallow unallocated resources to be returned
	drm/bridge: lt9611: fix sleep mode setup
	drm/bridge: lt9611: fix HPD reenablement
	drm/bridge: lt9611: fix polarity programming
	drm/bridge: lt9611: fix programming of video modes
	drm/bridge: lt9611: fix clock calculation
	drm/bridge: lt9611: pass a pointer to the of node
	regulator: tps65219: use IS_ERR() to detect an error pointer
	drm/mipi-dsi: Fix byte order of 16-bit DCS set/get brightness
	drm: exynos: dsi: Fix MIPI_DSI*_NO_* mode flags
	drm/msm/dsi: Allow 2 CTRLs on v2.5.0
	scsi: ufs: exynos: Fix DMA alignment for PAGE_SIZE != 4096
	drm/msm/dpu: sc7180: add missing WB2 clock control
	drm/msm: use strscpy instead of strncpy
	drm/msm/dpu: Add check for cstate
	drm/msm/dpu: Add check for pstates
	drm/msm/mdp5: Add check for kzalloc
	habanalabs: bugs fixes in timestamps buff alloc
	pinctrl: bcm2835: Remove of_node_put() in bcm2835_of_gpio_ranges_fallback()
	pinctrl: mediatek: Initialize variable pullen and pullup to zero
	pinctrl: mediatek: Initialize variable *buf to zero
	gpu: host1x: Fix mask for syncpoint increment register
	gpu: host1x: Don't skip assigning syncpoints to channels
	drm/tegra: firewall: Check for is_addr_reg existence in IMM check
	pinctrl: renesas: rzg2l: Fix configuring the GPIO pins as interrupts
	drm/msm/dpu: set pdpu->is_rt_pipe early in dpu_plane_sspp_atomic_update()
	drm/mediatek: dsi: Reduce the time of dsi from LP11 to sending cmd
	drm/mediatek: Use NULL instead of 0 for NULL pointer
	drm/mediatek: Drop unbalanced obj unref
	drm/mediatek: mtk_drm_crtc: Add checks for devm_kcalloc
	drm/mediatek: Clean dangling pointer on bind error path
	ASoC: soc-compress.c: fixup private_data on snd_soc_new_compress()
	dt-bindings: display: mediatek: Fix the fallback for mediatek,mt8186-disp-ccorr
	gpio: vf610: connect GPIO label to dev name
	ASoC: topology: Properly access value coming from topology file
	spi: dw_bt1: fix MUX_MMIO dependencies
	ASoC: mchp-spdifrx: fix controls which rely on rsr register
	ASoC: mchp-spdifrx: fix return value in case completion times out
	ASoC: mchp-spdifrx: fix controls that works with completion mechanism
	ASoC: mchp-spdifrx: disable all interrupts in mchp_spdifrx_dai_remove()
	dm: improve shrinker debug names
	regmap: apply reg_base and reg_downshift for single register ops
	ASoC: rsnd: fixup #endif position
	ASoC: mchp-spdifrx: Fix uninitialized use of mr in mchp_spdifrx_hw_params()
	ASoC: dt-bindings: meson: fix gx-card codec node regex
	regulator: tps65219: use generic set_bypass()
	hwmon: (asus-ec-sensors) add missing mutex path
	hwmon: (ltc2945) Handle error case in ltc2945_value_store
	ALSA: hda: Fix the control element identification for multiple codecs
	drm/amdgpu: fix enum odm_combine_mode mismatch
	scsi: mpt3sas: Fix a memory leak
	scsi: aic94xx: Add missing check for dma_map_single()
	HID: multitouch: Add quirks for flipped axes
	HID: retain initial quirks set up when creating HID devices
	ASoC: qcom: q6apm-lpass-dai: unprepare stream if its already prepared
	ASoC: qcom: q6apm-dai: fix race condition while updating the position pointer
	ASoC: qcom: q6apm-dai: Add SNDRV_PCM_INFO_BATCH flag
	ASoC: codecs: lpass: register mclk after runtime pm
	ASoC: codecs: lpass: fix incorrect mclk rate
	drm/amd/display: don't call dc_interrupt_set() for disabled crtcs
	HID: logitech-hidpp: Hard-code HID++ 1.0 fast scroll support
	spi: bcm63xx-hsspi: Fix multi-bit mode setting
	hwmon: (mlxreg-fan) Return zero speed for broken fan
	ASoC: tlv320adcx140: fix 'ti,gpio-config' DT property init
	dm: remove flush_scheduled_work() during local_exit()
	nfs4trace: fix state manager flag printing
	NFS: fix disabling of swap
	spi: synquacer: Fix timeout handling in synquacer_spi_transfer_one()
	ASoC: soc-dapm.h: fixup warning struct snd_pcm_substream not declared
	HID: bigben: use spinlock to protect concurrent accesses
	HID: bigben_worker() remove unneeded check on report_field
	HID: bigben: use spinlock to safely schedule workers
	hid: bigben_probe(): validate report count
	ALSA: hda/hdmi: Register with vga_switcheroo on Dual GPU Macbooks
	drm/shmem-helper: Fix locking for drm_gem_shmem_get_pages_sgt()
	NFSD: enhance inter-server copy cleanup
	NFSD: fix leaked reference count of nfsd4_ssc_umount_item
	nfsd: fix race to check ls_layouts
	nfsd: clean up potential nfsd_file refcount leaks in COPY codepath
	NFSD: fix problems with cleanup on errors in nfsd4_copy
	nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open
	nfsd: don't fsync nfsd_files on last close
	NFSD: copy the whole verifier in nfsd_copy_write_verifier
	cifs: Fix lost destroy smbd connection when MR allocate failed
	cifs: Fix warning and UAF when destroy the MR list
	cifs: use tcon allocation functions even for dummy tcon
	gfs2: jdata writepage fix
	perf llvm: Fix inadvertent file creation
	leds: led-core: Fix refcount leak in of_led_get()
	leds: is31fl319x: Wrap mutex_destroy() for devm_add_action_or_rest()
	leds: simatic-ipc-leds-gpio: Make sure we have the GPIO providing driver
	tools/tracing/rtla: osnoise_hist: use total duration for average calculation
	perf inject: Use perf_data__read() for auxtrace
	perf intel-pt: Do not try to queue auxtrace data on pipe
	perf test bpf: Skip test if kernel-debuginfo is not present
	perf tools: Fix auto-complete on aarch64
	sparc: allow PM configs for sparc32 COMPILE_TEST
	selftests: find echo binary to use -ne options
	selftests/ftrace: Fix bash specific "==" operator
	selftests: use printf instead of echo -ne
	perf record: Fix segfault with --overwrite and --max-size
	printf: fix errname.c list
	perf tests stat_all_metrics: Change true workload to sleep workload for system wide check
	objtool: add UACCESS exceptions for __tsan_volatile_read/write
	mfd: cs5535: Don't build on UML
	mfd: pcf50633-adc: Fix potential memleak in pcf50633_adc_async_read()
	dmaengine: idxd: Set traffic class values in GRPCFG on DSA 2.0
	RDMA/erdma: Fix refcount leak in erdma_mmap
	dmaengine: HISI_DMA should depend on ARCH_HISI
	RDMA/hns: Fix refcount leak in hns_roce_mmap
	iio: light: tsl2563: Do not hardcode interrupt trigger type
	usb: gadget: fusb300_udc: free irq on the error path in fusb300_probe()
	i2c: designware: fix i2c_dw_clk_rate() return size to be u32
	soundwire: cadence: Don't overflow the command FIFOs
	driver core: fix potential null-ptr-deref in device_add()
	kobject: modify kobject_get_path() to take a const *
	kobject: Fix slab-out-of-bounds in fill_kobj_path()
	alpha/boot/tools/objstrip: fix the check for ELF header
	media: uvcvideo: Check for INACTIVE in uvc_ctrl_is_accessible()
	media: uvcvideo: Implement mask for V4L2_CTRL_TYPE_MENU
	media: uvcvideo: Refactor uvc_ctrl_mappings_uvcXX
	media: uvcvideo: Refactor power_line_frequency_controls_limited
	coresight: etm4x: Fix accesses to TRCSEQRSTEVR and TRCSEQSTR
	coresight: cti: Prevent negative values of enable count
	coresight: cti: Add PM runtime call in enable_store
	usb: typec: intel_pmc_mux: Don't leak the ACPI device reference count
	PCI/IOV: Enlarge virtfn sysfs name buffer
	PCI: switchtec: Return -EFAULT for copy_to_user() errors
	PCI: endpoint: pci-epf-vntb: Clean up kernel_doc warning
	PCI: endpoint: pci-epf-vntb: Add epf_ntb_mw_bar_clear() num_mws kernel-doc
	hwtracing: hisi_ptt: Only add the supported devices to the filters list
	tty: serial: fsl_lpuart: disable Rx/Tx DMA in lpuart32_shutdown()
	tty: serial: fsl_lpuart: clear LPUART Status Register in lpuart32_shutdown()
	serial: tegra: Add missing clk_disable_unprepare() in tegra_uart_hw_init()
	Revert "char: pcmcia: cm4000_cs: Replace mdelay with usleep_range in set_protocol"
	eeprom: idt_89hpesx: Fix error handling in idt_init()
	applicom: Fix PCI device refcount leak in applicom_init()
	firmware: stratix10-svc: add missing gen_pool_destroy() in stratix10_svc_drv_probe()
	firmware: stratix10-svc: fix error handle while alloc/add device failed
	VMCI: check context->notify_page after call to get_user_pages_fast() to avoid GPF
	mei: pxp: Use correct macros to initialize uuid_le
	misc/mei/hdcp: Use correct macros to initialize uuid_le
	misc: fastrpc: Fix an error handling path in fastrpc_rpmsg_probe()
	driver core: fix resource leak in device_add()
	driver core: location: Free struct acpi_pld_info *pld before return false
	drivers: base: transport_class: fix possible memory leak
	drivers: base: transport_class: fix resource leak when transport_add_device() fails
	firmware: dmi-sysfs: Fix null-ptr-deref in dmi_sysfs_register_handle
	fotg210-udc: Add missing completion handler
	dmaengine: dw-edma: Fix missing src/dst address of interleaved xfers
	fpga: microchip-spi: move SPI I/O buffers out of stack
	fpga: microchip-spi: rewrite status polling in a time measurable way
	usb: early: xhci-dbc: Fix a potential out-of-bound memory access
	tty: serial: fsl_lpuart: Fix the wrong RXWATER setting for rx dma case
	RDMA/cxgb4: add null-ptr-check after ip_dev_find()
	usb: musb: mediatek: don't unregister something that wasn't registered
	usb: gadget: configfs: Restrict symlink creation is UDC already binded
	phy: mediatek: remove temporary variable @mask_
	PCI: mt7621: Delay phy ports initialization
	iommu: dart: Add suspend/resume support
	iommu: dart: Support >64 stream IDs
	iommu/dart: Fix apple_dart_device_group for PCI groups
	iommu/vt-d: Set No Execute Enable bit in PASID table entry
	power: supply: remove faulty cooling logic
	RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish()
	usb: max-3421: Fix setting of I/O pins
	RDMA/irdma: Cap MSIX used to online CPUs + 1
	serial: fsl_lpuart: fix RS485 RTS polariy inverse issue
	tty: serial: imx: Handle RS485 DE signal active high
	tty: serial: imx: disable Ageing Timer interrupt request irq
	driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links
	driver core: fw_devlink: Don't purge child fwnode's consumer links
	driver core: fw_devlink: Allow marking a fwnode link as being part of a cycle
	driver core: fw_devlink: Consolidate device link flag computation
	driver core: fw_devlink: Improve check for fwnode with no device/driver
	driver core: fw_devlink: Make cycle detection more robust
	mtd: mtdpart: Don't create platform device that'll never probe
	usb: host: fsl-mph-dr-of: reuse device_set_of_node_from_dev
	dmaengine: dw-edma: Fix readq_ch() return value truncation
	PCI: Fix dropping valid root bus resources with .end = zero
	phy: rockchip-typec: fix tcphy_get_mode error case
	PCI: qcom: Fix host-init error handling
	iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry()
	iommu: Fix error unwind in iommu_group_alloc()
	iommu/amd: Do not identity map v2 capable device when snp is enabled
	dmaengine: sf-pdma: pdma_desc memory leak fix
	dmaengine: dw-axi-dmac: Do not dereference NULL structure
	dmaengine: ptdma: check for null desc before calling pt_cmd_callback
	iommu/vt-d: Fix error handling in sva enable/disable paths
	iommu/vt-d: Allow to use flush-queue when first level is default
	RDMA/rxe: cleanup some error handling in rxe_verbs.c
	RDMA/rxe: Fix missing memory barriers in rxe_queue.h
	IB/hfi1: Fix math bugs in hfi1_can_pin_pages()
	IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors
	Revert "remoteproc: qcom_q6v5_mss: map/unmap metadata region before/after use"
	remoteproc: qcom_q6v5_mss: Use a carveout to authenticate modem headers
	media: ti: cal: fix possible memory leak in cal_ctx_create()
	media: platform: ti: Add missing check for devm_regulator_get
	media: imx: imx7-media-csi: fix missing clk_disable_unprepare() in imx7_csi_init()
	powerpc: Remove linker flag from KBUILD_AFLAGS
	s390/vdso: Drop '-shared' from KBUILD_CFLAGS_64
	builddeb: clean generated package content
	media: max9286: Fix memleak in max9286_v4l2_register()
	media: ov2740: Fix memleak in ov2740_init_controls()
	media: ov5675: Fix memleak in ov5675_init_controls()
	media: ov5640: Fix soft reset sequence and timings
	media: ov5640: Handle delays when no reset_gpio set
	media: mc: Get media_device directly from pad
	media: i2c: ov772x: Fix memleak in ov772x_probe()
	media: i2c: imx219: Split common registers from mode tables
	media: i2c: imx219: Fix binning for RAW8 capture
	media: platform: mtk-mdp3: Fix return value check in mdp_probe()
	media: camss: csiphy-3ph: avoid undefined behavior
	media: platform: mtk-mdp3: remove unused VIDEO_MEDIATEK_VPU config
	media: platform: mtk-mdp3: fix Kconfig dependencies
	media: v4l2-jpeg: correct the skip count in jpeg_parse_app14_data
	media: v4l2-jpeg: ignore the unknown APP14 marker
	media: hantro: Fix JPEG encoder ENUM_FRMSIZE on RK3399
	media: imx-jpeg: Apply clk_bulk api instead of operating specific clk
	media: amphion: correct the unspecified color space
	media: drivers/media/v4l2-core/v4l2-h264 : add detection of null pointers
	media: rc: Fix use-after-free bugs caused by ene_tx_irqsim()
	media: atomisp: Only set default_run_mode on first open of a stream/asd
	media: i2c: ov7670: 0 instead of -EINVAL was returned
	media: usb: siano: Fix use after free bugs caused by do_submit_urb
	media: saa7134: Use video_unregister_device for radio_dev
	rpmsg: glink: Avoid infinite loop on intent for missing channel
	rpmsg: glink: Release driver_override
	ARM: OMAP2+: omap4-common: Fix refcount leak bug
	arm64: dts: qcom: msm8996: Add additional A2NoC clocks
	udf: Define EFSCORRUPTED error code
	context_tracking: Fix noinstr vs KASAN
	exit: Detect and fix irq disabled state in oops
	ARM: dts: exynos: Use Exynos5420 compatible for the MIPI video phy
	fs: Use CHECK_DATA_CORRUPTION() when kernel bugs are detected
	blk-iocost: fix divide by 0 error in calc_lcoefs()
	blk-cgroup: dropping parent refcount after pd_free_fn() is done
	blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()
	trace/blktrace: fix memory leak with using debugfs_lookup()
	btrfs: scrub: improve tree block error reporting
	arm64: zynqmp: Enable hs termination flag for USB dwc3 controller
	cpuidle, intel_idle: Fix CPUIDLE_FLAG_INIT_XSTATE
	x86/fpu: Don't set TIF_NEED_FPU_LOAD for PF_IO_WORKER threads
	cpuidle: drivers: firmware: psci: Dont instrument suspend code
	cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
	perf/x86/intel/uncore: Add Meteor Lake support
	wifi: ath9k: Fix use-after-free in ath9k_hif_usb_disconnect()
	wifi: ath11k: fix monitor mode bringup crash
	wifi: brcmfmac: Fix potential stack-out-of-bounds in brcmf_c_preinit_dcmds()
	rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks
	rcu: Suppress smp_processor_id() complaint in synchronize_rcu_expedited_wait()
	srcu: Delegate work to the boot cpu if using SRCU_SIZE_SMALL
	rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug
	rcu-tasks: Handle queue-shrink/callback-enqueue race condition
	wifi: ath11k: debugfs: fix to work with multiple PCI devices
	thermal: intel: Fix unsigned comparison with less than zero
	timers: Prevent union confusion from unexpected restart_syscall()
	x86/bugs: Reset speculation control settings on init
	bpftool: Always disable stack protection for BPF objects
	wifi: brcmfmac: ensure CLM version is null-terminated to prevent stack-out-of-bounds
	wifi: mt7601u: fix an integer underflow
	inet: fix fast path in __inet_hash_connect()
	ice: restrict PTP HW clock freq adjustments to 100, 000, 000 PPB
	ice: add missing checks for PF vsi type
	ACPI: Don't build ACPICA with '-Os'
	bpf, docs: Fix modulo zero, division by zero, overflow, and underflow
	thermal: intel: intel_pch: Add support for Wellsburg PCH
	clocksource: Suspend the watchdog temporarily when high read latency detected
	crypto: hisilicon: Wipe entire pool on error
	net: bcmgenet: Add a check for oversized packets
	m68k: Check syscall_trace_enter() return code
	s390/mm,ptdump: avoid Kasan vs Memcpy Real markers swapping
	netfilter: nf_tables: NULL pointer dereference in nf_tables_updobj()
	can: isotp: check CAN address family in isotp_bind()
	gcc-plugins: drop -std=gnu++11 to fix GCC 13 build
	tools/power/x86/intel-speed-select: Add Emerald Rapid quirk
	wifi: mt76: dma: free rx_head in mt76_dma_rx_cleanup
	ACPI: video: Fix Lenovo Ideapad Z570 DMI match
	net/mlx5: fw_tracer: Fix debug print
	coda: Avoid partial allocation of sig_inputArgs
	uaccess: Add minimum bounds check on kernel buffer size
	s390/idle: mark arch_cpu_idle() noinstr
	time/debug: Fix memory leak with using debugfs_lookup()
	PM: domains: fix memory leak with using debugfs_lookup()
	PM: EM: fix memory leak with using debugfs_lookup()
	Bluetooth: Fix issue with Actions Semi ATS2851 based devices
	Bluetooth: btusb: Add new PID/VID 0489:e0f2 for MT7921
	Bluetooth: btusb: Add VID:PID 13d3:3529 for Realtek RTL8821CE
	wifi: rtw89: debug: avoid invalid access on RTW89_DBG_SEL_MAC_30
	hv_netvsc: Check status in SEND_RNDIS_PKT completion message
	s390/kfence: fix page fault reporting
	devlink: Fix TP_STRUCT_entry in trace of devlink health report
	scm: add user copy checks to put_cmsg()
	drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Tab 3 X90F
	drm: panel-orientation-quirks: Add quirk for DynaBook K50
	drm/amd/display: Reduce expected sdp bandwidth for dcn321
	drm/amd/display: Revert Reduce delay when sink device not able to ACK 00340h write
	drm/amd/display: Fix potential null-deref in dm_resume
	drm/omap: dsi: Fix excessive stack usage
	HID: Add Mapping for System Microphone Mute
	drm/tiny: ili9486: Do not assume 8-bit only SPI controllers
	drm/amd/display: Defer DIG FIFO disable after VID stream enable
	drm/radeon: free iio for atombios when driver shutdown
	drm/amd: Avoid BUG() for case of SRIOV missing IP version
	drm/amdkfd: Page aligned memory reserve size
	scsi: lpfc: Fix use-after-free KFENCE violation during sysfs firmware write
	Revert "fbcon: don't lose the console font across generic->chip driver switch"
	drm/amd: Avoid ASSERT for some message failures
	drm: amd: display: Fix memory leakage
	drm/amd/display: fix mapping to non-allocated address
	HID: uclogic: Add frame type quirk
	HID: uclogic: Add battery quirk
	HID: uclogic: Add support for XP-PEN Deco Pro SW
	HID: uclogic: Add support for XP-PEN Deco Pro MW
	drm/msm/dsi: Add missing check for alloc_ordered_workqueue
	drm: rcar-du: Add quirk for H3 ES1.x pclk workaround
	drm: rcar-du: Fix setting a reserved bit in DPLLCR
	drm/drm_print: correct format problem
	drm/amd/display: Set hvm_enabled flag for S/G mode
	habanalabs: extend fatal messages to contain PCI info
	habanalabs: fix bug in timestamps registration code
	docs/scripts/gdb: add necessary make scripts_gdb step
	drm/msm/dpu: Add DSC hardware blocks to register snapshot
	ASoC: soc-compress: Reposition and add pcm_mutex
	ASoC: kirkwood: Iterate over array indexes instead of using pointer math
	regulator: max77802: Bounds check regulator id against opmode
	regulator: s5m8767: Bounds check id indexing into arrays
	Revert "drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled"
	drm/amd/display: fix FCLK pstate change underflow
	gfs2: Improve gfs2_make_fs_rw error handling
	hwmon: (coretemp) Simplify platform device handling
	hwmon: (nct6775) Directly call ASUS ACPI WMI method
	hwmon: (nct6775) B650/B660/X670 ASUS boards support
	pinctrl: at91: use devm_kasprintf() to avoid potential leaks
	drm/amd/display: Do not commit pipe when updating DRR
	scsi: snic: Fix memory leak with using debugfs_lookup()
	scsi: ufs: core: Fix device management cmd timeout flow
	HID: logitech-hidpp: Don't restart communication if not necessary
	drm/amd/display: Enable P-state validation checks for DCN314
	drm: panel-orientation-quirks: Add quirk for Lenovo IdeaPad Duet 3 10IGL5
	drm/amd/display: Disable HUBP/DPP PG on DCN314 for now
	dm thin: add cond_resched() to various workqueue loops
	dm cache: add cond_resched() to various workqueue loops
	nfsd: zero out pointers after putting nfsd_files on COPY setup error
	nfsd: don't hand out delegation on setuid files being opened for write
	cifs: prevent data race in smb2_reconnect()
	drm/shmem-helper: Revert accidental non-GPL export
	driver core: fw_devlink: Avoid spurious error message
	wifi: rtl8xxxu: fixing transmisison failure for rtl8192eu
	scsi: mpt3sas: Remove usage of dma_get_required_mask() API
	firmware: coreboot: framebuffer: Ignore reserved pixel color bits
	block: don't allow multiple bios for IOCB_NOWAIT issue
	block: clear bio->bi_bdev when putting a bio back in the cache
	block: be a bit more careful in checking for NULL bdev while polling
	rtc: pm8xxx: fix set-alarm race
	ipmi: ipmb: Fix the MODULE_PARM_DESC associated to 'retry_time_ms'
	ipmi:ssif: resend_msg() cannot fail
	ipmi_ssif: Rename idle state and check
	io_uring: Replace 0-length array with flexible array
	io_uring: use user visible tail in io_uring_poll()
	io_uring: handle TIF_NOTIFY_RESUME when checking for task_work
	io_uring: add a conditional reschedule to the IOPOLL cancelation loop
	io_uring: add reschedule point to handle_tw_list()
	io_uring/rsrc: disallow multi-source reg buffers
	io_uring: remove MSG_NOSIGNAL from recvmsg
	io_uring: fix fget leak when fs don't support nowait buffered read
	s390/extmem: return correct segment type in __segment_load()
	s390: discard .interp section
	s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handler
	s390/kprobes: fix current_kprobe never cleared after kprobes reenter
	KVM: s390: disable migration mode when dirty tracking is disabled
	cifs: Fix uninitialized memory read in smb3_qfs_tcon()
	cifs: Fix uninitialized memory reads for oparms.mode
	cifs: fix mount on old smb servers
	cifs: introduce cifs_io_parms in smb2_async_writev()
	cifs: split out smb3_use_rdma_offload() helper
	cifs: don't try to use rdma offload on encrypted connections
	cifs: Check the lease context if we actually got a lease
	cifs: return a single-use cfid if we did not get a lease
	scsi: mpi3mr: Fix missing mrioc->evtack_cmds initialization
	scsi: mpi3mr: Fix issues in mpi3mr_get_all_tgt_info()
	scsi: mpi3mr: Remove unnecessary memcpy() to alltgt_info->dmi
	btrfs: hold block group refcount during async discard
	locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
	ksmbd: fix wrong data area length for smb2 lock request
	ksmbd: do not allow the actual frame length to be smaller than the rfc1002 length
	ksmbd: fix possible memory leak in smb2_lock()
	torture: Fix hang during kthread shutdown phase
	ARM: dts: exynos: correct HDMI phy compatible in Exynos4
	io_uring: mark task TASK_RUNNING before handling resume/task work
	hfs: fix missing hfs_bnode_get() in __hfs_bnode_create
	fs: hfsplus: fix UAF issue in hfsplus_put_super
	exfat: fix reporting fs error when reading dir beyond EOF
	exfat: fix unexpected EOF while reading dir
	exfat: redefine DIR_DELETED as the bad cluster number
	exfat: fix inode->i_blocks for non-512 byte sector size device
	fs: dlm: don't set stop rx flag after node reset
	fs: dlm: move sending fin message into state change handling
	fs: dlm: send FIN ack back in right cases
	f2fs: fix information leak in f2fs_move_inline_dirents()
	f2fs: retry to update the inode page given data corruption
	f2fs: fix cgroup writeback accounting with fs-layer encryption
	f2fs: fix kernel crash due to null io->bio
	ocfs2: fix defrag path triggering jbd2 ASSERT
	ocfs2: fix non-auto defrag path not working issue
	fs/cramfs/inode.c: initialize file_ra_state
	selftests/landlock: Skip overlayfs tests when not supported
	selftests/landlock: Test ptrace as much as possible with Yama
	udf: Truncate added extents on failed expansion
	udf: Do not bother merging very long extents
	udf: Do not update file length for failed writes to inline files
	udf: Preserve link count of system files
	udf: Detect system inodes linked into directory hierarchy
	udf: Fix file corruption when appending just after end of preallocated extent
	md: don't update recovery_cp when curr_resync is ACTIVE
	RDMA/siw: Fix user page pinning accounting
	KVM: Destroy target device if coalesced MMIO unregistration fails
	KVM: VMX: Fix crash due to uninitialized current_vmcs
	KVM: Register /dev/kvm as the _very_ last thing during initialization
	KVM: x86: Purge "highest ISR" cache when updating APICv state
	KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps
	KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabled
	KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID
	KVM: SVM: Flush the "current" TLB when activating AVIC
	KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target
	KVM: SVM: Don't put/load AVIC when setting virtual APIC mode
	KVM: x86: Inject #GP if WRMSR sets reserved bits in APIC Self-IPI
	KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits 63:32
	KVM: SVM: Fix potential overflow in SEV's send|receive_update_data()
	KVM: SVM: hyper-v: placate modpost section mismatch error
	selftests: x86: Fix incorrect kernel headers search path
	x86/virt: Force GIF=1 prior to disabling SVM (for reboot flows)
	x86/crash: Disable virt in core NMI crash handler to avoid double shootdown
	x86/reboot: Disable virtualization in an emergency if SVM is supported
	x86/reboot: Disable SVM, not just VMX, when stopping CPUs
	x86/kprobes: Fix __recover_optprobed_insn check optimizing logic
	x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range
	x86/microcode/amd: Remove load_microcode_amd()'s bsp parameter
	x86/microcode/AMD: Add a @cpu parameter to the reloading functions
	x86/microcode/AMD: Fix mixed steppings support
	x86/speculation: Allow enabling STIBP with legacy IBRS
	Documentation/hw-vuln: Document the interaction between IBRS and STIBP
	virt/sev-guest: Return -EIO if certificate buffer is not large enough
	brd: mark as nowait compatible
	brd: return 0/-error from brd_insert_page()
	brd: check for REQ_NOWAIT and set correct page allocation mask
	ima: fix error handling logic when file measurement failed
	ima: Align ima_file_mmap() parameters with mmap_file LSM hook
	selftests/powerpc: Fix incorrect kernel headers search path
	selftests/ftrace: Fix eprobe syntax test case to check filter support
	selftests: sched: Fix incorrect kernel headers search path
	selftests: core: Fix incorrect kernel headers search path
	selftests: pid_namespace: Fix incorrect kernel headers search path
	selftests: arm64: Fix incorrect kernel headers search path
	selftests: clone3: Fix incorrect kernel headers search path
	selftests: pidfd: Fix incorrect kernel headers search path
	selftests: membarrier: Fix incorrect kernel headers search path
	selftests: kcmp: Fix incorrect kernel headers search path
	selftests: media_tests: Fix incorrect kernel headers search path
	selftests: gpio: Fix incorrect kernel headers search path
	selftests: filesystems: Fix incorrect kernel headers search path
	selftests: user_events: Fix incorrect kernel headers search path
	selftests: ptp: Fix incorrect kernel headers search path
	selftests: sync: Fix incorrect kernel headers search path
	selftests: rseq: Fix incorrect kernel headers search path
	selftests: move_mount_set_group: Fix incorrect kernel headers search path
	selftests: mount_setattr: Fix incorrect kernel headers search path
	selftests: perf_events: Fix incorrect kernel headers search path
	selftests: ipc: Fix incorrect kernel headers search path
	selftests: futex: Fix incorrect kernel headers search path
	selftests: drivers: Fix incorrect kernel headers search path
	selftests: dmabuf-heaps: Fix incorrect kernel headers search path
	selftests: vm: Fix incorrect kernel headers search path
	selftests: seccomp: Fix incorrect kernel headers search path
	irqdomain: Fix association race
	irqdomain: Fix disassociation race
	irqdomain: Look for existing mapping only once
	irqdomain: Drop bogus fwspec-mapping error handling
	irqdomain: Refactor __irq_domain_alloc_irqs()
	irqdomain: Fix mapping-creation race
	irqdomain: Fix domain registration race
	crypto: qat - fix out-of-bounds read
	mm/damon/paddr: fix missing folio_put()
	ALSA: ice1712: Do not left ice->gpio_mutex locked in aureon_add_controls()
	ALSA: hda/realtek: Add quirk for HP EliteDesk 800 G6 Tower PC
	jbd2: fix data missing when reusing bh which is ready to be checkpointed
	ext4: optimize ea_inode block expansion
	ext4: refuse to create ea block when umounted
	cxl/pmem: Fix nvdimm registration races
	mtd: spi-nor: sfdp: Fix index value for SCCR dwords
	mtd: spi-nor: spansion: Consider reserved bits in CFR5 register
	mtd: spi-nor: Fix shift-out-of-bounds in spi_nor_set_erase_type
	dm: send just one event on resize, not two
	dm: add cond_resched() to dm_wq_work()
	dm: add cond_resched() to dm_wq_requeue_work()
	wifi: rtw88: use RTW_FLAG_POWERON flag to prevent to power on/off twice
	wifi: rtl8xxxu: Use a longer retry limit of 48
	wifi: ath11k: allow system suspend to survive ath11k
	wifi: cfg80211: Fix use after free for wext
	wifi: cfg80211: Set SSID if it is not already set
	cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
	qede: fix interrupt coalescing configuration
	thermal: intel: powerclamp: Fix cur_state for multi package system
	dm flakey: fix logic when corrupting a bio
	dm cache: free background tracker's queued work in btracker_destroy
	dm flakey: don't corrupt the zero page
	dm flakey: fix a bug with 32-bit highmem systems
	hwmon: (peci/cputemp) Fix off-by-one in coretemp_label allocation
	hwmon: (nct6775) Fix incorrect parenthesization in nct6775_write_fan_div()
	ARM: dts: qcom: sdx65: Add Qcom SMMU-500 as the fallback for IOMMU node
	ARM: dts: qcom: sdx55: Add Qcom SMMU-500 as the fallback for IOMMU node
	ARM: dts: exynos: correct TMU phandle in Exynos4210
	ARM: dts: exynos: correct TMU phandle in Exynos4
	ARM: dts: exynos: correct TMU phandle in Odroid XU3 family
	ARM: dts: exynos: correct TMU phandle in Exynos5250
	ARM: dts: exynos: correct TMU phandle in Odroid XU
	ARM: dts: exynos: correct TMU phandle in Odroid HC1
	arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP
	fuse: add inode/permission checks to fileattr_get/fileattr_set
	rbd: avoid use-after-free in do_rbd_add() when rbd_dev_create() fails
	ceph: update the time stamps and try to drop the suid/sgid
	regulator: core: Use ktime_get_boottime() to determine how long a regulator was off
	panic: fix the panic_print NMI backtrace setting
	mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON
	alpha: fix FEN fault handling
	dax/kmem: Fix leak of memory-hotplug resources
	mips: fix syscall_get_nr
	media: ipu3-cio2: Fix PM runtime usage_count in driver unbind
	remoteproc/mtk_scp: Move clk ops outside send_lock
	docs: gdbmacros: print newest record
	mm: memcontrol: deprecate charge moving
	mm/thp: check and bail out if page in deferred queue already
	ktest.pl: Give back console on Ctrt^C on monitor
	kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list
	ktest.pl: Fix missing "end_monitor" when machine check fails
	ktest.pl: Add RUN_TIMEOUT option with default unlimited
	memory tier: release the new_memtier in find_create_memory_tier()
	ring-buffer: Handle race between rb_move_tail and rb_check_pages
	tools/bootconfig: fix single & used for logical condition
	tracing/eprobe: Fix to add filter on eprobe description in README file
	iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter
	iommu/amd: Improve page fault error reporting
	scsi: aacraid: Allocate cmd_priv with scsicmd
	scsi: qla2xxx: Fix link failure in NPIV environment
	scsi: qla2xxx: Check if port is online before sending ELS
	scsi: qla2xxx: Fix DMA-API call trace on NVMe LS requests
	scsi: qla2xxx: Remove unintended flag clearing
	scsi: qla2xxx: Fix erroneous link down
	scsi: qla2xxx: Remove increment of interface err cnt
	scsi: ses: Don't attach if enclosure has no components
	scsi: ses: Fix slab-out-of-bounds in ses_enclosure_data_process()
	scsi: ses: Fix possible addl_desc_ptr out-of-bounds accesses
	scsi: ses: Fix possible desc_ptr out-of-bounds accesses
	scsi: ses: Fix slab-out-of-bounds in ses_intf_remove()
	RISC-V: add a spin_shadow_stack declaration
	riscv: Avoid enabling interrupts in die()
	riscv: mm: fix regression due to update_mmu_cache change
	riscv: jump_label: Fixup unaligned arch_static_branch function
	riscv, mm: Perform BPF exhandler fixup on page fault
	riscv: ftrace: Remove wasted nops for !RISCV_ISA_C
	riscv: ftrace: Reduce the detour code size to half
	MIPS: DTS: CI20: fix otg power gpio
	PCI/PM: Observe reset delay irrespective of bridge_d3
	PCI: Unify delay handling for reset and resume
	PCI: hotplug: Allow marking devices as disconnected during bind/unbind
	PCI: Avoid FLR for AMD FCH AHCI adapters
	PCI/DPC: Await readiness of secondary bus after reset
	bus: mhi: ep: Only send -ENOTCONN status if client driver is available
	bus: mhi: ep: Move chan->lock to the start of processing queued ch ring
	bus: mhi: ep: Save channel state locally during suspend and resume
	iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
	iommu/vt-d: Fix PASID directory pointer coherency
	vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
	vfio/type1: prevent underflow of locked_vm via exec()
	vfio/type1: track locked_vm per dma
	vfio/type1: restore locked_vm
	drm/amd: Fix initialization for nbio 7.5.1
	drm/i915/quirks: Add inverted backlight quirk for HP 14-r206nv
	drm/radeon: Fix eDP for single-display iMac11,2
	drm/i915: Don't use stolen memory for ring buffers with LLC
	drm/i915: Don't use BAR mappings for ring buffers with LLC
	drm/gud: Fix UBSAN warning
	drm/edid: fix AVI infoframe aspect ratio handling
	drm/edid: fix parsing of 3D modes from HDMI VSDB
	qede: avoid uninitialized entries in coal_entry array
	brd: use radix_tree_maybe_preload instead of radix_tree_preload
	sbitmap: Advance the queue index before waking up a queue
	wait: Return number of exclusive waiters awaken
	sbitmap: Try each queue to wake up at least one waiter
	kbuild: Port silent mode detection to future gnu make.
	net: avoid double iput when sock_alloc_file fails
	Linux 6.1.16

Change-Id: I705caf70ee547e6d55f38d133bdcd50713aed745
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-13 15:45:34 +00:00
Nicholas Piggin
711bd1b553 exit: Detect and fix irq disabled state in oops
[ Upstream commit 001c28e57187570e4b5aa4492c7a957fb6d65d7b ]

If a task oopses with irqs disabled, this can cause various cascading
problems in the oops path such as sleep-from-invalid warnings, and
potentially worse.

Since commit 0258b5fd7c ("coredump: Limit coredumps to a single
thread group"), the unconditional irq enable in coredump_task_exit()
will "fix" the irq state to be enabled early in do_exit(), so currently
this may not be triggerable, but that is coincidental and fragile.

Detect and fix the irqs_disabled() condition in the oops path before
calling do_exit(), similarly to the way in_atomic() is handled.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://lore.kernel.org/lkml/20221004094401.708299-1-npiggin@gmail.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10 09:33:45 +01:00
Neill Kapron
bd772a54d2 Revert "exit: Remove profile_task_exit & profile_munmap"
This reverts commit 2d4bcf886e, which
removed hooks required for Android uid_sys_stats.c.

Bug: 219790626
Change-Id: Icea00bbf9abe2fb17312b25f2d575d29aa360999
[nkapron: resolve conflict with 2873cd31a2 exit: Remove profile_handoff_task]
Signed-off-by: Neill Kapron <nkapron@google.com>
2023-03-09 23:13:08 +00:00
Kees Cook
a18417e27e exit: Use READ_ONCE() for all oops/warn limit reads
commit 7535b832c6399b5ebfc5b53af5c51dd915ee2538 upstream.

Use a temporary variable to take full advantage of READ_ONCE() behavior.
Without this, the report (and even the test) might be out of sync with
the initial test.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y5x7GXeluFmZ8E0E@hirez.programming.kicks-ass.net
Fixes: 9fc9e278a5c0 ("panic: Introduce warn_limit")
Fixes: d4ccd54d28d3 ("exit: Put an upper limit on how often we can oops")
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24 07:24:41 +01:00
Kees Cook
e0738725bb exit: Allow oops_limit to be disabled
commit de92f65719cd672f4b48397540b9f9eff67eca40 upstream.

In preparation for keeping oops_limit logic in sync with warn_limit,
have oops_limit == 0 disable checking the Oops counter.

Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24 07:24:41 +01:00
Kees Cook
46cacd7913 exit: Expose "oops_count" to sysfs
commit 9db89b41117024f80b38b15954017fb293133364 upstream.

Since Oops count is now tracked and is a fairly interesting signal, add
the entry /sys/kernel/oops_count to expose it to userspace.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-3-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24 07:24:41 +01:00
Jann Horn
767997ef5d exit: Put an upper limit on how often we can oops
commit d4ccd54d28d3c8598e2354acc13e28c060961dbb upstream.

Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
completely unexploitable exploitable (like NULL dereferences and such) if
each crash elevates a refcount by one or a lock is taken in read mode, and
this causes a counter to eventually overflow.

The most interesting counters for this are 32 bits wide (like open-coded
refcounts that don't use refcount_t). (The ldsem reader count on 32-bit
platforms is just 16 bits, but probably nobody cares about 32-bit platforms
that much nowadays.)

So let's panic the system if the kernel is constantly oopsing.

The speed of oopsing 2^32 times probably depends on several factors, like
how long the stack trace is and which unwinder you're using; an empirically
important one is whether your console is showing a graphical environment or
a text console that oopses will be printed to.
In a quick single-threaded benchmark, it looks like oopsing in a vfork()
child with a very short stack trace only takes ~510 microseconds per run
when a graphical console is active; but switching to a text console that
oopses are printed to slows it down around 87x, to ~45 milliseconds per
run.
(Adding more threads makes this faster, but the actual oops printing
happens under &die_lock on x86, so you can maybe speed this up by a factor
of around 2 and then any further improvement gets eaten up by lock
contention.)

It looks like it would take around 8-12 days to overflow a 32-bit counter
with repeated oopsing on a multi-core X86 system running a graphical
environment; both me (in an X86 VM) and Seth (with a distro kernel on
normal hardware in a standard configuration) got numbers in that ballpark.

12 days aren't *that* short on a desktop system, and you'd likely need much
longer on a typical server system (assuming that people don't run graphical
desktop environments on their servers), and this is a *very* noisy and
violent approach to exploiting the kernel; and it also seems to take orders
of magnitude longer on some machines, probably because stuff like EFI
pstore will slow it down a ton if that's active.

Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20221107201317.324457-1-jannh@google.com
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-2-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24 07:24:41 +01:00
Linus Torvalds
676cb49573 - hfs and hfsplus kmap API modernization from Fabio Francesco
- Valentin Schneider makes crash-kexec work properly when invoked from
   an NMI-time panic.
 
 - ntfs bugfixes from Hawkins Jiawei
 
 - Jiebin Sun improves IPC msg scalability by replacing atomic_t's with
   percpu counters.
 
 - nilfs2 cleanups from Minghao Chi
 
 - lots of other single patches all over the tree!
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0Yf0gAKCRDdBJ7gKXxA
 joapAQDT1d1zu7T8yf9cQXkYnZVuBKCjxKE/IsYvqaq1a42MjQD/SeWZg0wV05B8
 DhJPj9nkEp6R3Rj3Mssip+3vNuceAQM=
 =lUQY
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - hfs and hfsplus kmap API modernization (Fabio Francesco)

 - make crash-kexec work properly when invoked from an NMI-time panic
   (Valentin Schneider)

 - ntfs bugfixes (Hawkins Jiawei)

 - improve IPC msg scalability by replacing atomic_t's with percpu
   counters (Jiebin Sun)

 - nilfs2 cleanups (Minghao Chi)

 - lots of other single patches all over the tree!

* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
  proc: test how it holds up with mapping'less process
  mailmap: update Frank Rowand email address
  ia64: mca: use strscpy() is more robust and safer
  init/Kconfig: fix unmet direct dependencies
  ia64: update config files
  nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
  fork: remove duplicate included header files
  init/main.c: remove unnecessary (void*) conversions
  proc: mark more files as permanent
  nilfs2: remove the unneeded result variable
  nilfs2: delete unnecessary checks before brelse()
  checkpatch: warn for non-standard fixes tag style
  usr/gen_init_cpio.c: remove unnecessary -1 values from int file
  ipc/msg: mitigate the lock contention with percpu counter
  percpu: add percpu_counter_add_local and percpu_counter_sub_local
  fs/ocfs2: fix repeated words in comments
  relay: use kvcalloc to alloc page array in relay_alloc_page_array
  proc: make config PROC_CHILDREN depend on PROC_FS
  fs: uninline inode_maybe_inc_iversion()
  ...
2022-10-12 11:00:22 -07:00
Linus Torvalds
27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Linus Torvalds
30c999937f Scheduler changes for v6.1:
- Debuggability:
 
      - Change most occurances of BUG_ON() to WARN_ON_ONCE()
 
      - Reorganize & fix TASK_ state comparisons, turn it into a bitmap
 
      - Update/fix misc scheduler debugging facilities
 
  - Load-balancing & regular scheduling:
 
      - Improve the behavior of the scheduler in presence of lot of
        SCHED_IDLE tasks - in particular they should not impact other
        scheduling classes.
 
      - Optimize task load tracking, cleanups & fixes
 
      - Clean up & simplify misc load-balancing code
 
  - Freezer:
 
      - Rewrite the core freezer to behave better wrt thawing and be simpler
        in general, by replacing PF_FROZEN with TASK_FROZEN & fixing/adjusting
        all the fallout.
 
  - Deadline scheduler:
 
      - Fix the DL capacity-aware code
 
      - Factor out dl_task_is_earliest_deadline() & replenish_dl_new_period()
 
      - Relax/optimize locking in task_non_contending()
 
  - Cleanups:
 
      - Factor out the update_current_exec_runtime() helper
 
      - Various cleanups, simplifications
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmM/01cRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1geZA/+PB4KC1T9aVxzaTHI36R03YgJYZmIdtxw
 wTf02MixePmz+gQCbepJbempGOh5ST28aOcI0xhdYOql5B63MaUBBMlB0HvGUyDG
 IU3zETqLMRtAbnSTdQFv8m++ECUtZYp8/x1FCel4WO7ya4ETkRu1NRfCoUepEhpZ
 aVAlae9LH3NBaF9t7s0PT2lTjf3pIzMFRkddJ0ywJhbFR3VnWat05fAK+J6fGY8+
 LS54coefNlJD4oDh5TY8uniL1j5SmWmmwbk9Cdj7bLU5P3dFSS0/+5FJNHJPVGDE
 srGT7wstRUcDrN0CnZo48VIUBiApJCCDqTfJYi9wNYd0NAHvwY6MIJJgEIY8mKsI
 L/qH26H81Wt+ezSZ/5JIlGlZ/LIeNaa6OO/fbWEYABBQogvvx3nxsRNUYKSQzumH
 CnSBasBjLnjWyLlK4qARM9cI7NFSEK6NUigrEx/7h8JFu/8T4DlSy6LsF1HUyKgq
 4+FJLAqG6cL0tcwB/fHYd0oRESN8dStnQhGxSojgufwLc7dlFULvCYF5JM/dX+/V
 IKwbOfIOeOn6ViMtSOXAEGdII+IQ2/ZFPwr+8Z5JC7NzvTVL6xlu/3JXkLZR3L7o
 yaXTSaz06h1vil7Z+GRf7RHc+wUeGkEpXh5vnarGZKXivhFdWsBdROIJANK+xR0i
 TeSLCxQxXlU=
 =KjMD
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Debuggability:

   - Change most occurances of BUG_ON() to WARN_ON_ONCE()

   - Reorganize & fix TASK_ state comparisons, turn it into a bitmap

   - Update/fix misc scheduler debugging facilities

  Load-balancing & regular scheduling:

   - Improve the behavior of the scheduler in presence of lot of
     SCHED_IDLE tasks - in particular they should not impact other
     scheduling classes.

   - Optimize task load tracking, cleanups & fixes

   - Clean up & simplify misc load-balancing code

  Freezer:

   - Rewrite the core freezer to behave better wrt thawing and be
     simpler in general, by replacing PF_FROZEN with TASK_FROZEN &
     fixing/adjusting all the fallout.

  Deadline scheduler:

   - Fix the DL capacity-aware code

   - Factor out dl_task_is_earliest_deadline() &
     replenish_dl_new_period()

   - Relax/optimize locking in task_non_contending()

  Cleanups:

   - Factor out the update_current_exec_runtime() helper

   - Various cleanups, simplifications"

* tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  sched: Fix more TASK_state comparisons
  sched: Fix TASK_state comparisons
  sched/fair: Move call to list_last_entry() in detach_tasks
  sched/fair: Cleanup loop_max and loop_break
  sched/fair: Make sure to try to detach at least one movable task
  sched: Show PF_flag holes
  freezer,sched: Rewrite core freezer logic
  sched: Widen TAKS_state literals
  sched/wait: Add wait_event_state()
  sched/completion: Add wait_for_completion_state()
  sched: Add TASK_ANY for wait_task_inactive()
  sched: Change wait_task_inactive()s match_state
  freezer,umh: Clean up freezer/initrd interaction
  freezer: Have {,un}lock_system_sleep() save/restore flags
  sched: Rename task_running() to task_on_cpu()
  sched/fair: Cleanup for SIS_PROP
  sched/fair: Default to false in test_idle_cores()
  sched/fair: Remove useless check in select_idle_core()
  sched/fair: Avoid double search on same cpu
  sched/fair: Remove redundant check in select_idle_smt()
  ...
2022-10-10 09:10:28 -07:00
Linus Torvalds
e572410e47 ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
Recently I had a conversation where it was pointed out to me that
 SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
 difficult for a tracer to handle.
 
 Keeping SIGKILL working after the process has been killed is pain
 from an implementation point of view.
 
 So since the debuggers don't want this behavior let's see if we can
 remove this wart for the userspace API
 
 If a regression is detected it should only need to be the last change
 that is the reverted.  The other two are just general cleanups that
 make the last patch simpler.
 
 Eric W. Biederman (3):
       signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
       signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
       signal: Drop signals received after a fatal signal has been processed
 
  fs/coredump.c                |  2 +-
  include/linux/sched/signal.h |  1 +
  kernel/exit.c                | 20 +++++++++++++++++++-
  kernel/fork.c                |  2 ++
  kernel/signal.c              |  3 ++-
  5 files changed, 25 insertions(+), 3 deletions(-)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmM7U40ACgkQC/v6Eiaj
 j0B7eRAAst6EW4nkxiDBN/PRo43Kz7tkYCZGLAq73vZAaKkyTFSwghT85cZwTsuc
 px/iDPWZpSGdIeHhdt1giJeuFG0FuoMR9prQVx3z/fM6KLXEJb86OtW1c1uW00Bh
 TkhVPQiF/9bc+Eb/nRKF61NA4sP0OXwO1+t/zBL4clekO9vFLP1DpRBE9OrNlHq2
 NlDqoPqq6SsKYG8f+J2LJKKzRWLICvHCtz4uUt6O11Wt/PH2TZYdYnXf/2vvZyzS
 SOs8kQjw4QPSptcNH/LIZz0WNHagn1uU/2/T106LOgPgcr515T4MhqOK+Wp95BjA
 0RrhsfdMD+7HArqLmt9VKgKO9Gs+T/M6jQZgpUyzlw3qPorKUGu4s9UnUgS7l4uz
 oNV9no1Ei2fQ+YR6RBR74579a45FWkqRBsaia59KFCzZxRsQz8VB1cqcIgl9dFUA
 f81qt9FiX8duVYZIcoT79BvV1bQ3LGwyygrH+X3sDTQSdN/aeZ24JdGP2MNtYO/c
 jWN/mMVHc1xOsYmACVGETWhZf0Y7lGGBYSIJ92jT2HxIuEiqmFE4kngRjKcYgL/G
 1/o9z8VntMq5t+ZcQY5WfK/WpSeSPXa6TsP80PEm/hvdMwZLEO4IzCR+gPJB6oJS
 KPQ3EZfRCB2Jb1qBmcpbleA/RBA0iJUZtBvtIPI7QmMTR+VRdhc=
 =gX4y
 -----END PGP SIGNATURE-----

Merge tag 'signal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ptrace update from Eric Biederman:
 "ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT

  Recently I had a conversation where it was pointed out to me that
  SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
  difficult for a tracer to handle.

  Keeping SIGKILL working after the process has been killed is pain from
  an implementation point of view.

  So since the debuggers don't want this behavior let's see if we can
  remove this wart for the userspace API

  If a regression is detected it should only need to be the last change
  that is the reverted. The other two are just general cleanups that
  make the last patch simpler"

* tag 'signal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Drop signals received after a fatal signal has been processed
  signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
  signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
2022-10-09 16:14:15 -07:00
Alexander Potapenko
50b5e49ca6 kmsan: handle task creation and exiting
Tell KMSAN that a new task is created, so the tool creates a backing
metadata structure for that task.

Link: https://lkml.kernel.org/r/20220915150417.722975-17-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:20 -07:00
Yu Zhao
bd74fdaea1 mm: multi-gen LRU: support page table walks
To further exploit spatial locality, the aging prefers to walk page tables
to search for young PTEs and promote hot pages.  A kill switch will be
added in the next patch to disable this behavior.  When disabled, the
aging relies on the rmap only.

NB: this behavior has nothing similar with the page table scanning in the
2.4 kernel [1], which searches page tables for old PTEs, adds cold pages
to swapcache and unmaps them.

To avoid confusion, the term "iteration" specifically means the traversal
of an entire mm_struct list; the term "walk" will be applied to page
tables and the rmap, as usual.

An mm_struct list is maintained for each memcg, and an mm_struct follows
its owner task to the new memcg when this task is migrated.  Given an
lruvec, the aging iterates lruvec_memcg()->mm_list and calls
walk_page_range() with each mm_struct on this list to promote hot pages
before it increments max_seq.

When multiple page table walkers iterate the same list, each of them gets
a unique mm_struct; therefore they can run concurrently.  Page table
walkers ignore any misplaced pages, e.g., if an mm_struct was migrated,
pages it left in the previous memcg will not be promoted when its current
memcg is under reclaim.  Similarly, page table walkers will not promote
pages from nodes other than the one under reclaim.

This patch uses the following optimizations when walking page tables:
1. It tracks the usage of mm_struct's between context switches so that
   page table walkers can skip processes that have been sleeping since
   the last iteration.
2. It uses generational Bloom filters to record populated branches so
   that page table walkers can reduce their search space based on the
   query results, e.g., to skip page tables containing mostly holes or
   misplaced pages.
3. It takes advantage of the accessed bit in non-leaf PMD entries when
   CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y.
4. It does not zigzag between a PGD table and the same PMD table
   spanning multiple VMAs. IOW, it finishes all the VMAs within the
   range of the same PMD table before it returns to a PGD table. This
   improves the cache performance for workloads that have large
   numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5.

Server benchmark results:
  Single workload:
    fio (buffered I/O): no change

  Single workload:
    memcached (anon): +[8, 10]%
                Ops/sec      KB/sec
      patch1-7: 1147696.57   44640.29
      patch1-8: 1245274.91   48435.66

  Configurations:
    no change

Client benchmark results:
  kswapd profiles:
    patch1-7
      48.16%  lzo1x_1_do_compress (real work)
       8.20%  page_vma_mapped_walk (overhead)
       7.06%  _raw_spin_unlock_irq
       2.92%  ptep_clear_flush
       2.53%  __zram_bvec_write
       2.11%  do_raw_spin_lock
       2.02%  memmove
       1.93%  lru_gen_look_around
       1.56%  free_unref_page_list
       1.40%  memset

    patch1-8
      49.44%  lzo1x_1_do_compress (real work)
       6.19%  page_vma_mapped_walk (overhead)
       5.97%  _raw_spin_unlock_irq
       3.13%  get_pfn_folio
       2.85%  ptep_clear_flush
       2.42%  __zram_bvec_write
       2.08%  do_raw_spin_lock
       1.92%  memmove
       1.44%  alloc_zspage
       1.36%  memset

  Configurations:
    no change

Thanks to the following developers for their efforts [3].
  kernel test robot <lkp@intel.com>

[1] https://lwn.net/Articles/23732/
[2] https://llvm.org/docs/ScudoHardenedAllocator.html
[3] https://lore.kernel.org/r/202204160827.ekEARWQo-lkp@intel.com/

Link: https://lkml.kernel.org/r/20220918080010.2920238-9-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:09 -07:00
Kefeng Wang
2be9880dc8 kernel: exit: cleanup release_thread()
Only x86 has own release_thread(), introduce a new weak release_thread()
function to clean empty definitions in other ARCHs.

Link: https://lkml.kernel.org/r/20220819014406.32266-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Guo Ren <guoren@kernel.org>				[csky]
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Brian Cain <bcain@quicinc.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>			[powerpc]
Acked-by: Stafford Horne <shorne@gmail.com>			[openrisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>		[arm64]
Acked-by: Huacai Chen <chenhuacai@kernel.org>			[LoongArch]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Guo Ren <guoren@kernel.org> [csky]
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 21:55:07 -07:00
Peter Zijlstra
f5d39b0208 freezer,sched: Rewrite core freezer logic
Rewrite the core freezer to behave better wrt thawing and be simpler
in general.

By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
ensured frozen tasks stay frozen until thawed and don't randomly wake
up early, as is currently possible.

As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
two PF_flags (yay!).

Specifically; the current scheme works a little like:

	freezer_do_not_count();
	schedule();
	freezer_count();

And either the task is blocked, or it lands in try_to_freezer()
through freezer_count(). Now, when it is blocked, the freezer
considers it frozen and continues.

However, on thawing, once pm_freezing is cleared, freezer_count()
stops working, and any random/spurious wakeup will let a task run
before its time.

That is, thawing tries to thaw things in explicit order; kernel
threads and workqueues before doing bringing SMP back before userspace
etc.. However due to the above mentioned races it is entirely possible
for userspace tasks to thaw (by accident) before SMP is back.

This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
where the userspace task requires a special CPU to run.

As said; replace this with a special task state TASK_FROZEN and add
the following state transitions:

	TASK_FREEZABLE	-> TASK_FROZEN
	__TASK_STOPPED	-> TASK_FROZEN
	__TASK_TRACED	-> TASK_FROZEN

The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL
(IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state
is already required to deal with spurious wakeups and the freezer
causes one such when thawing the task (since the original state is
lost).

The special __TASK_{STOPPED,TRACED} states *can* be restored since
their canonical state is in ->jobctl.

With this, frozen tasks need an explicit TASK_FROZEN wakeup and are
free of undue (early / spurious) wakeups.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
2022-09-07 21:53:50 +02:00
Ingo Molnar
dcca34754a exit: Fix typo in comment: s/sub-theads/sub-threads
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-08-03 10:44:54 +02:00
Eric W. Biederman
d80f7d7b2c signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
Track how many threads have not started exiting and when the last
thread starts exiting set SIGNAL_GROUP_EXIT.

This guarantees that SIGNAL_GROUP_EXIT will get set when a process
exits.  In practice this achieves nothing as glibc's implementation of
_exit calls sys_group_exit then sys_exit.  While glibc's implemenation
of pthread_exit calls exit (which cleansup and calls _exit) if it is
the last thread and sys_exit if it is the last thread.

This means the only way the kernel might observe a process that does
not set call exit_group is if the language runtime does not use glibc.

With more cleanups I hope to move the decrement of quick_threads
earlier.

Link: https://lkml.kernel.org/r/87bkukd4tc.fsf_-_@email.froward.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-07-20 10:23:51 -05:00
Eric W. Biederman
cbe9dac379 signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
The function do_group_exit has an optimization that avoids taking
siglock and doing the work to find other threads in the signal group
and shutting them down.

It is very desirable for SIGNAL_GROUP_EXIT to always been set whenever
it is decided for the process to exit.  That ensures only a single
place needs to be tested, and a single bit of state needs to be looked
at.  This makes the optimization in do_group_exit counter productive.

Make the code and maintenance simpler by removing this unnecessary
option.

Link: https://lkml.kernel.org/r/87letod4v3.fsf_-_@email.froward.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-07-20 10:21:52 -05:00
Oleg Nesterov
d5b36a4dbd fix race between exit_itimers() and /proc/pid/timers
As Chris explains, the comment above exit_itimers() is not correct,
we can race with proc_timers_seq_ops. Change exit_itimers() to clear
signal->posix_timers with ->siglock held.

Cc: <stable@vger.kernel.org>
Reported-by: chris@accessvector.net
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-11 09:52:59 -07:00
Linus Torvalds
1930a6e739 ptrace: Cleanups for v5.18
This set of changes removes tracehook.h, moves modification of all of
 the ptrace fields inside of siglock to remove races, adds a missing
 permission check to ptrace.c
 
 The removal of tracehook.h is quite significant as it has been a major
 source of confusion in recent years.  Much of that confusion was
 around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
 making the semantics clearer).
 
 For people who don't know tracehook.h is a vestiage of an attempt to
 implement uprobes like functionality that was never fully merged, and
 was later superseeded by uprobes when uprobes was merged.  For many
 years now we have been removing what tracehook functionaly a little
 bit at a time.  To the point where now anything left in tracehook.h is
 some weird strange thing that is difficult to understand.
 
 Eric W. Biederman (15):
       ptrace: Move ptrace_report_syscall into ptrace.h
       ptrace/arm: Rename tracehook_report_syscall report_syscall
       ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
       ptrace: Remove arch_syscall_{enter,exit}_tracehook
       ptrace: Remove tracehook_signal_handler
       task_work: Remove unnecessary include from posix_timers.h
       task_work: Introduce task_work_pending
       task_work: Call tracehook_notify_signal from get_signal on all architectures
       task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
       signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
       resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
       resume_user_mode: Move to resume_user_mode.h
       tracehook: Remove tracehook.h
       ptrace: Move setting/clearing ptrace_message into ptrace_stop
       ptrace: Return the signal to continue with from ptrace_stop
 
 Jann Horn (1):
       ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
 
 Yang Li (1):
       ptrace: Remove duplicated include in ptrace.c
 
  MAINTAINERS                          |   1 -
  arch/Kconfig                         |   5 +-
  arch/alpha/kernel/ptrace.c           |   5 +-
  arch/alpha/kernel/signal.c           |   4 +-
  arch/arc/kernel/ptrace.c             |   5 +-
  arch/arc/kernel/signal.c             |   4 +-
  arch/arm/kernel/ptrace.c             |  12 +-
  arch/arm/kernel/signal.c             |   4 +-
  arch/arm64/kernel/ptrace.c           |  14 +--
  arch/arm64/kernel/signal.c           |   4 +-
  arch/csky/kernel/ptrace.c            |   5 +-
  arch/csky/kernel/signal.c            |   4 +-
  arch/h8300/kernel/ptrace.c           |   5 +-
  arch/h8300/kernel/signal.c           |   4 +-
  arch/hexagon/kernel/process.c        |   4 +-
  arch/hexagon/kernel/signal.c         |   1 -
  arch/hexagon/kernel/traps.c          |   6 +-
  arch/ia64/kernel/process.c           |   4 +-
  arch/ia64/kernel/ptrace.c            |   6 +-
  arch/ia64/kernel/signal.c            |   1 -
  arch/m68k/kernel/ptrace.c            |   5 +-
  arch/m68k/kernel/signal.c            |   4 +-
  arch/microblaze/kernel/ptrace.c      |   5 +-
  arch/microblaze/kernel/signal.c      |   4 +-
  arch/mips/kernel/ptrace.c            |   5 +-
  arch/mips/kernel/signal.c            |   4 +-
  arch/nds32/include/asm/syscall.h     |   2 +-
  arch/nds32/kernel/ptrace.c           |   5 +-
  arch/nds32/kernel/signal.c           |   4 +-
  arch/nios2/kernel/ptrace.c           |   5 +-
  arch/nios2/kernel/signal.c           |   4 +-
  arch/openrisc/kernel/ptrace.c        |   5 +-
  arch/openrisc/kernel/signal.c        |   4 +-
  arch/parisc/kernel/ptrace.c          |   7 +-
  arch/parisc/kernel/signal.c          |   4 +-
  arch/powerpc/kernel/ptrace/ptrace.c  |   8 +-
  arch/powerpc/kernel/signal.c         |   4 +-
  arch/riscv/kernel/ptrace.c           |   5 +-
  arch/riscv/kernel/signal.c           |   4 +-
  arch/s390/include/asm/entry-common.h |   1 -
  arch/s390/kernel/ptrace.c            |   1 -
  arch/s390/kernel/signal.c            |   5 +-
  arch/sh/kernel/ptrace_32.c           |   5 +-
  arch/sh/kernel/signal_32.c           |   4 +-
  arch/sparc/kernel/ptrace_32.c        |   5 +-
  arch/sparc/kernel/ptrace_64.c        |   5 +-
  arch/sparc/kernel/signal32.c         |   1 -
  arch/sparc/kernel/signal_32.c        |   4 +-
  arch/sparc/kernel/signal_64.c        |   4 +-
  arch/um/kernel/process.c             |   4 +-
  arch/um/kernel/ptrace.c              |   5 +-
  arch/x86/kernel/ptrace.c             |   1 -
  arch/x86/kernel/signal.c             |   5 +-
  arch/x86/mm/tlb.c                    |   1 +
  arch/xtensa/kernel/ptrace.c          |   5 +-
  arch/xtensa/kernel/signal.c          |   4 +-
  block/blk-cgroup.c                   |   2 +-
  fs/coredump.c                        |   1 -
  fs/exec.c                            |   1 -
  fs/io-wq.c                           |   6 +-
  fs/io_uring.c                        |  11 +-
  fs/proc/array.c                      |   1 -
  fs/proc/base.c                       |   1 -
  include/asm-generic/syscall.h        |   2 +-
  include/linux/entry-common.h         |  47 +-------
  include/linux/entry-kvm.h            |   2 +-
  include/linux/posix-timers.h         |   1 -
  include/linux/ptrace.h               |  81 ++++++++++++-
  include/linux/resume_user_mode.h     |  64 ++++++++++
  include/linux/sched/signal.h         |  17 +++
  include/linux/task_work.h            |   5 +
  include/linux/tracehook.h            | 226 -----------------------------------
  include/uapi/linux/ptrace.h          |   2 +-
  kernel/entry/common.c                |  19 +--
  kernel/entry/kvm.c                   |   9 +-
  kernel/exit.c                        |   3 +-
  kernel/livepatch/transition.c        |   1 -
  kernel/ptrace.c                      |  47 +++++---
  kernel/seccomp.c                     |   1 -
  kernel/signal.c                      |  62 +++++-----
  kernel/task_work.c                   |   4 +-
  kernel/time/posix-cpu-timers.c       |   1 +
  mm/memcontrol.c                      |   2 +-
  security/apparmor/domain.c           |   1 -
  security/selinux/hooks.c             |   1 -
  85 files changed, 372 insertions(+), 495 deletions(-)
 
 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
 j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
 Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
 Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
 DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
 0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
 S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
 3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
 /HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
 4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
 eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
 disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
 =uEro
 -----END PGP SIGNATURE-----

Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ptrace cleanups from Eric Biederman:
 "This set of changes removes tracehook.h, moves modification of all of
  the ptrace fields inside of siglock to remove races, adds a missing
  permission check to ptrace.c

  The removal of tracehook.h is quite significant as it has been a major
  source of confusion in recent years. Much of that confusion was around
  task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
  semantics clearer).

  For people who don't know tracehook.h is a vestiage of an attempt to
  implement uprobes like functionality that was never fully merged, and
  was later superseeded by uprobes when uprobes was merged. For many
  years now we have been removing what tracehook functionaly a little
  bit at a time. To the point where anything left in tracehook.h was
  some weird strange thing that was difficult to understand"

* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ptrace: Remove duplicated include in ptrace.c
  ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
  ptrace: Return the signal to continue with from ptrace_stop
  ptrace: Move setting/clearing ptrace_message into ptrace_stop
  tracehook: Remove tracehook.h
  resume_user_mode: Move to resume_user_mode.h
  resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
  signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
  task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
  task_work: Call tracehook_notify_signal from get_signal on all architectures
  task_work: Introduce task_work_pending
  task_work: Remove unnecessary include from posix_timers.h
  ptrace: Remove tracehook_signal_handler
  ptrace: Remove arch_syscall_{enter,exit}_tracehook
  ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
  ptrace/arm: Rename tracehook_report_syscall report_syscall
  ptrace: Move ptrace_report_syscall into ptrace.h
2022-03-28 17:29:53 -07:00
Linus Torvalds
7001052160 Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism
 where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP.
 
 Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is
 limited to 2 instructions (and typically fewer) on branch targets not starting
 with ENDBR. CET-IBT also limits speculation of the next sequential instruction
 after the indirect CALL/JMP [1].
 
 CET-IBT is fundamentally incompatible with retpolines, but provides, as
 described above, speculation limits itself.
 
 [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp
 bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD
 1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX
 urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF
 BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b
 y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde
 go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht
 9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW
 ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT
 x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T
 dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL
 EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz
 beB/7u2KIUbKEkSN
 =jZfK
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
 "Add support for Intel CET-IBT, available since Tigerlake (11th gen),
  which is a coarse grained, hardware based, forward edge
  Control-Flow-Integrity mechanism where any indirect CALL/JMP must
  target an ENDBR instruction or suffer #CP.

  Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
  is limited to 2 instructions (and typically fewer) on branch targets
  not starting with ENDBR. CET-IBT also limits speculation of the next
  sequential instruction after the indirect CALL/JMP [1].

  CET-IBT is fundamentally incompatible with retpolines, but provides,
  as described above, speculation limits itself"

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html

* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  kvm/emulate: Fix SETcc emulation for ENDBR
  x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0
  x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0
  kbuild: Fixup the IBT kbuild changes
  x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
  x86: Remove toolchain check for X32 ABI capability
  x86/alternative: Use .ibt_endbr_seal to seal indirect calls
  objtool: Find unused ENDBR instructions
  objtool: Validate IBT assumptions
  objtool: Add IBT/ENDBR decoding
  objtool: Read the NOENDBR annotation
  x86: Annotate idtentry_df()
  x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
  x86: Annotate call_on_stack()
  objtool: Rework ASM_REACHABLE
  x86: Mark __invalid_creds() __noreturn
  exit: Mark do_group_exit() __noreturn
  x86: Mark stop_this_cpu() __noreturn
  objtool: Ignore extra-symbol code
  objtool: Rename --duplicate to --lto
  ...
2022-03-27 10:17:23 -07:00
Linus Torvalds
169e77764a Networking changes for 5.18.
Core
 ----
 
  - Introduce XDP multi-buffer support, allowing the use of XDP with
    jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
 
  - Speed up netns dismantling (5x) and lower the memory cost a little.
    Remove unnecessary per-netns sockets. Scope some lists to a netns.
    Cut down RCU syncing. Use batch methods. Allow netdev registration
    to complete out of order.
 
  - Support distinguishing timestamp types (ingress vs egress) and
    maintaining them across packet scrubbing points (e.g. redirect).
 
  - Continue the work of annotating packet drop reasons throughout
    the stack.
 
  - Switch netdev error counters from an atomic to dynamically
    allocated per-CPU counters.
 
  - Rework a few preempt_disable(), local_irq_save() and busy waiting
    sections problematic on PREEMPT_RT.
 
  - Extend the ref_tracker to allow catching use-after-free bugs.
 
 BPF
 ---
 
  - Introduce "packing allocator" for BPF JIT images. JITed code is
    marked read only, and used to be allocated at page granularity.
    Custom allocator allows for more efficient memory use, lower
    iTLB pressure and prevents identity mapping huge pages from
    getting split.
 
  - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
    the correct probe read access method, add appropriate helpers.
 
  - Convert the BPF preload to use light skeleton and drop
    the user-mode-driver dependency.
 
  - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
    its use as a packet generator.
 
  - Allow local storage memory to be allocated with GFP_KERNEL if called
    from a hook allowed to sleep.
 
  - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
    bits to come later).
 
  - Add unstable conntrack lookup helpers for BPF by using the BPF
    kfunc infra.
 
  - Allow cgroup BPF progs to return custom errors to user space.
 
  - Add support for AF_UNIX iterator batching.
 
  - Allow iterator programs to use sleepable helpers.
 
  - Support JIT of add, and, or, xor and xchg atomic ops on arm64.
 
  - Add BTFGen support to bpftool which allows to use CO-RE in kernels
    without BTF info.
 
  - Large number of libbpf API improvements, cleanups and deprecations.
 
 Protocols
 ---------
 
  - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
 
  - Adjust TSO packet sizes based on min_rtt, allowing very low latency
    links (data centers) to always send full-sized TSO super-frames.
 
  - Make IPv6 flow label changes (AKA hash rethink) more configurable,
    via sysctl and setsockopt. Distinguish between server and client
    behavior.
 
  - VxLAN support to "collect metadata" devices to terminate only
    configured VNIs. This is similar to VLAN filtering in the bridge.
 
  - Support inserting IPv6 IOAM information to a fraction of frames.
 
  - Add protocol attribute to IP addresses to allow identifying where
    given address comes from (kernel-generated, DHCP etc.)
 
  - Support setting socket and IPv6 options via cmsg on ping6 sockets.
 
  - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
    Define dscp_t and stop taking ECN bits into account in fib-rules.
 
  - Add support for locked bridge ports (for 802.1X).
 
  - tun: support NAPI for packets received from batched XDP buffs,
    doubling the performance in some scenarios.
 
  - IPv6 extension header handling in Open vSwitch.
 
  - Support IPv6 control message load balancing in bonding, prevent
    neighbor solicitation and advertisement from using the wrong port.
    Support NS/NA monitor selection similar to existing ARP monitor.
 
  - SMC
    - improve performance with TCP_CORK and sendfile()
    - support auto-corking
    - support TCP_NODELAY
 
  - MCTP (Management Component Transport Protocol)
    - add user space tag control interface
    - I2C binding driver (as specified by DMTF DSP0237)
 
  - Multi-BSSID beacon handling in AP mode for WiFi.
 
  - Bluetooth:
    - handle MSFT Monitor Device Event
    - add MGMT Adv Monitor Device Found/Lost events
 
  - Multi-Path TCP:
    - add support for the SO_SNDTIMEO socket option
    - lots of selftest cleanups and improvements
 
  - Increase the max PDU size in CAN ISOTP to 64 kB.
 
 Driver API
 ----------
 
  - Add HW counters for SW netdevs, a mechanism for devices which
    offload packet forwarding to report packet statistics back to
    software interfaces such as tunnels.
 
  - Select the default NIC queue count as a fraction of number of
    physical CPU cores, instead of hard-coding to 8.
 
  - Expose devlink instance locks to drivers. Allow device layer of
    drivers to use that lock directly instead of creating their own
    which always runs into ordering issues in devlink callbacks.
 
  - Add header/data split indication to guide user space enabling
    of TCP zero-copy Rx.
 
  - Allow configuring completion queue event size.
 
  - Refactor page_pool to enable fragmenting after allocation.
 
  - Add allocation and page reuse statistics to page_pool.
 
  - Improve Multiple Spanning Trees support in the bridge to allow
    reuse of topologies across VLANs, saving HW resources in switches.
 
  - DSA (Distributed Switch Architecture):
    - replay and offload of host VLAN entries
    - offload of static and local FDB entries on LAG interfaces
    - FDB isolation and unicast filtering
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - LAN937x T1 PHYs
    - Davicom DM9051 SPI NIC driver
    - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
    - Microchip ksz8563 switches
    - Netronome NFP3800 SmartNICs
    - Fungible SmartNICs
    - MediaTek MT8195 switches
 
  - WiFi:
    - mt76: MediaTek mt7916
    - mt76: MediaTek mt7921u USB adapters
    - brcmfmac: Broadcom BCM43454/6
 
  - Mobile:
    - iosm: Intel M.2 7360 WWAN card
 
 Drivers
 -------
 
  - Convert many drivers to the new phylink API built for split PCS
    designs but also simplifying other cases.
 
  - Intel Ethernet NICs:
    - add TTY for GNSS module for E810T device
    - improve AF_XDP performance
    - GTP-C and GTP-U filter offload
    - QinQ VLAN support
 
  - Mellanox Ethernet NICs (mlx5):
    - support xdp->data_meta
    - multi-buffer XDP
    - offload tc push_eth and pop_eth actions
 
  - Netronome Ethernet NICs (nfp):
    - flow-independent tc action hardware offload (police / meter)
    - AF_XDP
 
  - Other Ethernet NICs:
    - at803x: fiber and SFP support
    - xgmac: mdio: preamble suppression and custom MDC frequencies
    - r8169: enable ASPM L1.2 if system vendor flags it as safe
    - macb/gem: ZynqMP SGMII
    - hns3: add TX push mode
    - dpaa2-eth: software TSO
    - lan743x: multi-queue, mdio, SGMII, PTP
    - axienet: NAPI and GRO support
 
  - Mellanox Ethernet switches (mlxsw):
    - source and dest IP address rewrites
    - RJ45 ports
 
  - Marvell Ethernet switches (prestera):
    - basic routing offload
    - multi-chain TC ACL offload
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - PTP over UDP with the ocelot-8021q DSA tagging protocol
    - basic QoS classification on Felix DSA switch using dcbnl
    - port mirroring for ocelot switches
 
  - Microchip high-speed industrial Ethernet (sparx5):
    - offloading of bridge port flooding flags
    - PTP Hardware Clock
 
  - Other embedded switches:
    - lan966x: PTP Hardward Clock
    - qca8k: mdio read/write operations via crafted Ethernet packets
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
    - enable RX PPDU stats in monitor co-exist mode
 
  - Intel WiFi (iwlwifi):
    - UHB TAS enablement via BIOS
    - band disablement via BIOS
    - channel switch offload
    - 32 Rx AMPDU sessions in newer devices
 
  - MediaTek WiFi (mt76):
    - background radar detection
    - thermal management improvements on mt7915
    - SAR support for more mt76 platforms
    - MBSSID and 6 GHz band on mt7915
 
  - RealTek WiFi:
    - rtw89: AP mode
    - rtw89: 160 MHz channels and 6 GHz band
    - rtw89: hardware scan
 
  - Bluetooth:
    - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
 
  - Microchip CAN (mcp251xfd):
    - multiple RX-FIFOs and runtime configurable RX/TX rings
    - internal PLL, runtime PM handling simplification
    - improve chip detection and error handling after wakeup
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
 IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
 FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
 AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
 KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
 K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
 cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
 krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
 dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
 =ELpe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "The sprinkling of SPI drivers is because we added a new one and Mark
  sent us a SPI driver interface conversion pull request.

  Core
  ----

   - Introduce XDP multi-buffer support, allowing the use of XDP with
     jumbo frame MTUs and combination with Rx coalescing offloads (LRO).

   - Speed up netns dismantling (5x) and lower the memory cost a little.
     Remove unnecessary per-netns sockets. Scope some lists to a netns.
     Cut down RCU syncing. Use batch methods. Allow netdev registration
     to complete out of order.

   - Support distinguishing timestamp types (ingress vs egress) and
     maintaining them across packet scrubbing points (e.g. redirect).

   - Continue the work of annotating packet drop reasons throughout the
     stack.

   - Switch netdev error counters from an atomic to dynamically
     allocated per-CPU counters.

   - Rework a few preempt_disable(), local_irq_save() and busy waiting
     sections problematic on PREEMPT_RT.

   - Extend the ref_tracker to allow catching use-after-free bugs.

  BPF
  ---

   - Introduce "packing allocator" for BPF JIT images. JITed code is
     marked read only, and used to be allocated at page granularity.
     Custom allocator allows for more efficient memory use, lower iTLB
     pressure and prevents identity mapping huge pages from getting
     split.

   - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
     the correct probe read access method, add appropriate helpers.

   - Convert the BPF preload to use light skeleton and drop the
     user-mode-driver dependency.

   - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
     its use as a packet generator.

   - Allow local storage memory to be allocated with GFP_KERNEL if
     called from a hook allowed to sleep.

   - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
     bits to come later).

   - Add unstable conntrack lookup helpers for BPF by using the BPF
     kfunc infra.

   - Allow cgroup BPF progs to return custom errors to user space.

   - Add support for AF_UNIX iterator batching.

   - Allow iterator programs to use sleepable helpers.

   - Support JIT of add, and, or, xor and xchg atomic ops on arm64.

   - Add BTFGen support to bpftool which allows to use CO-RE in kernels
     without BTF info.

   - Large number of libbpf API improvements, cleanups and deprecations.

  Protocols
  ---------

   - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.

   - Adjust TSO packet sizes based on min_rtt, allowing very low latency
     links (data centers) to always send full-sized TSO super-frames.

   - Make IPv6 flow label changes (AKA hash rethink) more configurable,
     via sysctl and setsockopt. Distinguish between server and client
     behavior.

   - VxLAN support to "collect metadata" devices to terminate only
     configured VNIs. This is similar to VLAN filtering in the bridge.

   - Support inserting IPv6 IOAM information to a fraction of frames.

   - Add protocol attribute to IP addresses to allow identifying where
     given address comes from (kernel-generated, DHCP etc.)

   - Support setting socket and IPv6 options via cmsg on ping6 sockets.

   - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
     Define dscp_t and stop taking ECN bits into account in fib-rules.

   - Add support for locked bridge ports (for 802.1X).

   - tun: support NAPI for packets received from batched XDP buffs,
     doubling the performance in some scenarios.

   - IPv6 extension header handling in Open vSwitch.

   - Support IPv6 control message load balancing in bonding, prevent
     neighbor solicitation and advertisement from using the wrong port.
     Support NS/NA monitor selection similar to existing ARP monitor.

   - SMC
      - improve performance with TCP_CORK and sendfile()
      - support auto-corking
      - support TCP_NODELAY

   - MCTP (Management Component Transport Protocol)
      - add user space tag control interface
      - I2C binding driver (as specified by DMTF DSP0237)

   - Multi-BSSID beacon handling in AP mode for WiFi.

   - Bluetooth:
      - handle MSFT Monitor Device Event
      - add MGMT Adv Monitor Device Found/Lost events

   - Multi-Path TCP:
      - add support for the SO_SNDTIMEO socket option
      - lots of selftest cleanups and improvements

   - Increase the max PDU size in CAN ISOTP to 64 kB.

  Driver API
  ----------

   - Add HW counters for SW netdevs, a mechanism for devices which
     offload packet forwarding to report packet statistics back to
     software interfaces such as tunnels.

   - Select the default NIC queue count as a fraction of number of
     physical CPU cores, instead of hard-coding to 8.

   - Expose devlink instance locks to drivers. Allow device layer of
     drivers to use that lock directly instead of creating their own
     which always runs into ordering issues in devlink callbacks.

   - Add header/data split indication to guide user space enabling of
     TCP zero-copy Rx.

   - Allow configuring completion queue event size.

   - Refactor page_pool to enable fragmenting after allocation.

   - Add allocation and page reuse statistics to page_pool.

   - Improve Multiple Spanning Trees support in the bridge to allow
     reuse of topologies across VLANs, saving HW resources in switches.

   - DSA (Distributed Switch Architecture):
      - replay and offload of host VLAN entries
      - offload of static and local FDB entries on LAG interfaces
      - FDB isolation and unicast filtering

  New hardware / drivers
  ----------------------

   - Ethernet:
      - LAN937x T1 PHYs
      - Davicom DM9051 SPI NIC driver
      - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
      - Microchip ksz8563 switches
      - Netronome NFP3800 SmartNICs
      - Fungible SmartNICs
      - MediaTek MT8195 switches

   - WiFi:
      - mt76: MediaTek mt7916
      - mt76: MediaTek mt7921u USB adapters
      - brcmfmac: Broadcom BCM43454/6

   - Mobile:
      - iosm: Intel M.2 7360 WWAN card

  Drivers
  -------

   - Convert many drivers to the new phylink API built for split PCS
     designs but also simplifying other cases.

   - Intel Ethernet NICs:
      - add TTY for GNSS module for E810T device
      - improve AF_XDP performance
      - GTP-C and GTP-U filter offload
      - QinQ VLAN support

   - Mellanox Ethernet NICs (mlx5):
      - support xdp->data_meta
      - multi-buffer XDP
      - offload tc push_eth and pop_eth actions

   - Netronome Ethernet NICs (nfp):
      - flow-independent tc action hardware offload (police / meter)
      - AF_XDP

   - Other Ethernet NICs:
      - at803x: fiber and SFP support
      - xgmac: mdio: preamble suppression and custom MDC frequencies
      - r8169: enable ASPM L1.2 if system vendor flags it as safe
      - macb/gem: ZynqMP SGMII
      - hns3: add TX push mode
      - dpaa2-eth: software TSO
      - lan743x: multi-queue, mdio, SGMII, PTP
      - axienet: NAPI and GRO support

   - Mellanox Ethernet switches (mlxsw):
      - source and dest IP address rewrites
      - RJ45 ports

   - Marvell Ethernet switches (prestera):
      - basic routing offload
      - multi-chain TC ACL offload

   - NXP embedded Ethernet switches (ocelot & felix):
      - PTP over UDP with the ocelot-8021q DSA tagging protocol
      - basic QoS classification on Felix DSA switch using dcbnl
      - port mirroring for ocelot switches

   - Microchip high-speed industrial Ethernet (sparx5):
      - offloading of bridge port flooding flags
      - PTP Hardware Clock

   - Other embedded switches:
      - lan966x: PTP Hardward Clock
      - qca8k: mdio read/write operations via crafted Ethernet packets

   - Qualcomm 802.11ax WiFi (ath11k):
      - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
      - enable RX PPDU stats in monitor co-exist mode

   - Intel WiFi (iwlwifi):
      - UHB TAS enablement via BIOS
      - band disablement via BIOS
      - channel switch offload
      - 32 Rx AMPDU sessions in newer devices

   - MediaTek WiFi (mt76):
      - background radar detection
      - thermal management improvements on mt7915
      - SAR support for more mt76 platforms
      - MBSSID and 6 GHz band on mt7915

   - RealTek WiFi:
      - rtw89: AP mode
      - rtw89: 160 MHz channels and 6 GHz band
      - rtw89: hardware scan

   - Bluetooth:
      - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)

   - Microchip CAN (mcp251xfd):
      - multiple RX-FIFOs and runtime configurable RX/TX rings
      - internal PLL, runtime PM handling simplification
      - improve chip detection and error handling after wakeup"

* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
  llc: fix netdevice reference leaks in llc_ui_bind()
  drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
  ice: don't allow to run ice_send_event_to_aux() in atomic ctx
  ice: fix 'scheduling while atomic' on aux critical err interrupt
  net/sched: fix incorrect vlan_push_eth dest field
  net: bridge: mst: Restrict info size queries to bridge ports
  net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
  drivers: net: xgene: Fix regression in CRC stripping
  net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
  net: dsa: fix missing host-filtered multicast addresses
  net/mlx5e: Fix build warning, detected write beyond size of field
  iwlwifi: mvm: Don't fail if PPAG isn't supported
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  netdevice: add missing dm_private kdoc
  net: bridge: mst: prevent NULL deref in br_mst_info_size()
  selftests: forwarding: Use same VRF for port and VLAN upper
  ...
2022-03-24 13:13:26 -07:00
Linus Torvalds
194dfe88d6 asm-generic updates for 5.18
There are three sets of updates for 5.18 in the asm-generic tree:
 
  - The set_fs()/get_fs() infrastructure gets removed for good. This
    was already gone from all major architectures, but now we can
    finally remove it everywhere, which loses some particularly
    tricky and error-prone code.
    There is a small merge conflict against a parisc cleanup, the
    solution is to use their new version.
 
  - The nds32 architecture ends its tenure in the Linux kernel. The
    hardware is still used and the code is in reasonable shape, but
    the mainline port is not actively maintained any more, as all
    remaining users are thought to run vendor kernels that would never
    be updated to a future release.
    There are some obvious conflicts against changes to the removed
    files.
 
  - A series from Masahiro Yamada cleans up some of the uapi header
    files to pass the compile-time checks.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
 GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
 IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
 cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
 sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
 DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
 G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
 a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
 ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
 QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
 CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
 h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
 =vtCN
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "There are three sets of updates for 5.18 in the asm-generic tree:

   - The set_fs()/get_fs() infrastructure gets removed for good.

     This was already gone from all major architectures, but now we can
     finally remove it everywhere, which loses some particularly tricky
     and error-prone code. There is a small merge conflict against a
     parisc cleanup, the solution is to use their new version.

   - The nds32 architecture ends its tenure in the Linux kernel.

     The hardware is still used and the code is in reasonable shape, but
     the mainline port is not actively maintained any more, as all
     remaining users are thought to run vendor kernels that would never
     be updated to a future release.

   - A series from Masahiro Yamada cleans up some of the uapi header
     files to pass the compile-time checks"

* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
  nds32: Remove the architecture
  uaccess: remove CONFIG_SET_FS
  ia64: remove CONFIG_SET_FS support
  sh: remove CONFIG_SET_FS support
  sparc64: remove CONFIG_SET_FS support
  lib/test_lockup: fix kernel pointer check for separate address spaces
  uaccess: generalize access_ok()
  uaccess: fix type mismatch warnings from access_ok()
  arm64: simplify access_ok()
  m68k: fix access_ok for coldfire
  MIPS: use simpler access_ok()
  MIPS: Handle address errors for accesses above CPU max virtual user address
  uaccess: add generic __{get,put}_kernel_nofault
  nios2: drop access_ok() check from __put_user()
  x86: use more conventional access_ok() definition
  x86: remove __range_not_ok()
  sparc64: add __{get,put}_kernel_nofault()
  nds32: fix access_ok() checks in get/put_user
  uaccess: fix nios2 and microblaze get_user_8()
  sparc64: fix building assembly files
  ...
2022-03-23 18:03:08 -07:00
Linus Torvalds
616355cc81 for-5.18/block-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0+GcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprUpD/9aTJEnj7VCw7UouSsg098sdjtoy9ilslU3
 ew47K8CIXHbCB4CDqLnFyvCwAdG1XGgS+fUmFAxvTr29R9SZeS5d+bXL6sZzEo0C
 bwxsJy9MM2QRtMvB+giAt1myXbwB8cG+ketMBWXqwXXRHRzPbbQfMZia7FqWMnfY
 KQanH9IwYHp1oa5U/W6Qcjm4oCnLgBMRwqByzUCtiF3y9qgaLkK+3IgkNwjJQjLA
 DTeUJ/9CgxGQQbzA+LPktbw2xfTqiUfcKq0mWx6Zt4wwNXn1ClqUDUXX6QSM8/5u
 3OimbscSkEPPTIYZbVBPkhFnAlQb4JaJEgOrbXvYKVV2Dh+eZY81XwNeE/E8gdBY
 TnHOTOCjkN/4sR3hIrWazlJzPLdpPA0eOYrhguCraQsX9mcsYNxlJ9otRv/Ve99g
 uqL0RZg3+NoK84fm79FCGy/ZmPQJvJttlBT9CKVwylv/Lky42xWe7AdM3OipKluY
 2nh+zN5Ai7WxZdTKXQFRhCSWfWQ+1qW51tB3dcGW+BooZr/oox47qKQVcHsEWbq1
 RNR45F5a4AuPwYUHF/P36WviLnEuq9AvX7OTTyYOplyVQohKIoDXp9chVzLNzBiZ
 KBR00W6MLKKKN+8foalQWgNyb2i2PH7Ib4xRXvXj/22Vwxg5UmUoBmSDSas9SZUS
 +dMo7CtNgA==
 =DpgP
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo)

 - blk-rq-qos completion fix (Tejun)

 - blk-cgroup merge fix (Tejun)

 - Add offline error return value to distinguish it from an IO error on
   the device (Song)

 - IO stats fixes (Zhang, Christoph)

 - blkcg refcount fixes (Ming, Yu)

 - Fix for indefinite dispatch loop softlockup (Shin'ichiro)

 - blk-mq hardware queue management improvements (Ming)

 - sbitmap dead code removal (Ming, John)

 - Plugging merge improvements (me)

 - Show blk-crypto capabilities in sysfs (Eric)

 - Multiple delayed queue run improvement (David)

 - Block throttling fixes (Ming)

 - Start deprecating auto module loading based on dev_t (Christoph)

 - bio allocation improvements (Christoph, Chaitanya)

 - Get rid of bio_devname (Christoph)

 - bio clone improvements (Christoph)

 - Block plugging improvements (Christoph)

 - Get rid of genhd.h header (Christoph)

 - Ensure drivers use appropriate flush helpers (Christoph)

 - Refcounting improvements (Christoph)

 - Queue initialization and teardown improvements (Ming, Christoph)

 - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng,
   Lukas, Nian, Yang, Eric, Chengming)

* tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits)
  block: cancel all throttled bios in del_gendisk()
  block: let blkcg_gq grab request queue's refcnt
  block: avoid use-after-free on throttle data
  block: limit request dispatch loop duration
  block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative"
  sr: simplify the local variable initialization in sr_block_open()
  block: don't merge across cgroup boundaries if blkcg is enabled
  block: fix rq-qos breakage from skipping rq_qos_done_bio()
  block: flush plug based on hardware and software queue order
  block: ensure plug merging checks the correct queue at least once
  block: move rq_qos_exit() into disk_release()
  block: do more work in elevator_exit
  block: move blk_exit_queue into disk_release
  block: move q_usage_counter release into blk_queue_release
  block: don't remove hctx debugfs dir from blk_mq_exit_queue
  block: move blkcg initialization/destroy into disk allocation/release handler
  sr: implement ->free_disk to simplify refcounting
  sd: implement ->free_disk to simplify refcounting
  sd: delay calling free_opal_dev
  sd: call sd_zbc_release_disk before releasing the scsi_device reference
  ...
2022-03-21 16:48:55 -07:00
Masami Hiramatsu
54ecbe6f1e rethook: Add a generic return hook
Add a return hook framework which hooks the function return. Most of the
logic came from the kretprobe, but this is independent from kretprobe.

Note that this is expected to be used with other function entry hooking
feature, like ftrace, fprobe, adn kprobes. Eventually this will replace
the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment,
this is just an additional hook.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2
2022-03-17 20:16:29 -07:00
Peter Zijlstra
eae654f1c2 exit: Mark do_group_exit() __noreturn
vmlinux.o: warning: objtool: get_signal()+0x108: unreachable instruction

0000 000000000007f930 <get_signal>:
...
0103    7fa33:  e8 00 00 00 00          call   7fa38 <get_signal+0x108> 7fa34: R_X86_64_PLT32   do_group_exit-0x4
0108    7fa38:  41 8b 45 74             mov    0x74(%r13),%eax

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.351270711@infradead.org
2022-03-15 10:32:43 +01:00
Eric W. Biederman
355f841a3f tracehook: Remove tracehook.h
Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.

Update the few files that were depending upon tracehook.h to bring in
definitions to use the headers they need directly.

Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-13-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-03-10 16:51:51 -06:00
Arnd Bergmann
967747bbc0 uaccess: remove CONFIG_SET_FS
There are no remaining callers of set_fs(), so CONFIG_SET_FS
can be removed globally, along with the thread_info field and
any references to it.

This turns access_ok() into a cheaper check against TASK_SIZE_MAX.

As CONFIG_SET_FS is now gone, drop all remaining references to
set_fs()/get_fs(), mm_segment_t, user_addr_max() and uaccess_kernel().

Acked-by: Sam Ravnborg <sam@ravnborg.org> # for sparc32 changes
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Sergey Matyukevich <sergey.matyukevich@synopsys.com> # for arc changes
Acked-by: Stafford Horne <shorne@gmail.com> # [openrisc, asm-generic]
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25 09:36:06 +01:00
Sebastian Andrzej Siewior
1a03d3f13f fork: Move task stack accounting to do_exit()
There is no need to perform the stack accounting of the outgoing task in
its final schedule() invocation which happens with preemption disabled.
The task is leaving, the resources will be freed and the accounting can
happen in do_exit() before the actual schedule invocation which
frees the stack memory.

Move the accounting of the stack memory from release_task_stack() to
exit_task_stack_account() which then can be invoked from do_exit().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20220217102406.3697941-7-bigeasy@linutronix.de
2022-02-22 22:25:02 +01:00
Christoph Hellwig
b1f866b013 block: remove blk_needs_flush_plug
blk_needs_flush_plug fails to account for the cb_list, which needs
flushing as well.  Remove it and just check if there is a plug instead
of poking into the internals of the plug structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220127070549.1377856-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:50:00 -07:00
Eric W. Biederman
907c311f37 exit: Fix the exit_code for wait_task_zombie
The function wait_task_zombie is defined to always returns the process not
thread exit status.  Unfortunately when process group exit support
was added to wait_task_zombie the WNOWAIT case was overlooked.

Usually tsk->exit_code and tsk->signal->group_exit_code will be in sync
so fixing this is bug probably has no effect in practice.  But fix
it anyway so that people aren't scratching their heads about why
the two code paths are different.

History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 2c66151cbc2c ("[PATCH] sys_exit() threading improvements, BK-curr")
Link: https://lkml.kernel.org/r/20220103213312.9144-3-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08 12:43:57 -06:00
Eric W. Biederman
270b6541e6 exit: Coredumps reach do_group_exit
The comment about coredumps not reaching do_group_exit and the
corresponding BUG_ON are bogus.

What happens and has happened for years is that get_signal calls
do_coredump (which sets SIGNAL_GROUP_EXIT and group_exit_code) and
then do_group_exit passing the signal number.  Then do_group_exit
ignores the exit_code it is passed and uses signal->group_exit_code
from the coredump.

The comment and BUG_ON were correct when they were added during the
2.5 development cycle, but became obsolete and incorrect when
get_signal was changed to fall through to do_group_exit after
do_coredump in 2.6.10-rc2.

So remove the stale comment and BUG_ON

Fixes: 63bd6144f191 ("[PATCH] Invalid BUG_ONs in signal.c")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Link: https://lkml.kernel.org/r/20220103213312.9144-2-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08 12:43:57 -06:00
Eric W. Biederman
2d4bcf886e exit: Remove profile_task_exit & profile_munmap
When I say remove I mean remove.  All profile_task_exit and
profile_munmap do is call a blocking notifier chain.  The helpers
profile_task_register and profile_task_unregister are not called
anywhere in the tree.  Which means this is all dead code.

So remove the dead code and make it easier to read do_exit.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/20220103213312.9144-1-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08 12:43:57 -06:00
Eric W. Biederman
49697335e0 signal: Remove the helper signal_group_exit
This helper is misleading.  It tests for an ongoing exec as well as
the process having received a fatal signal.

Sometimes it is appropriate to treat an on-going exec differently than
a process that is shutting down due to a fatal signal.  In particular
taking the fast path out of exit_signals instead of retargeting
signals is not appropriate during exec, and not changing the the exit
code in do_group_exit during exec.

Removing the helper makes it more obvious what is going on as both
cases must be coded for explicitly.

While removing the helper fix the two cases where I have observed
using signal_group_exit resulted in the wrong result.

In exit_signals only test for SIGNAL_GROUP_EXIT so that signals are
retargetted during an exec.

In do_group_exit use 0 as the exit code during an exec as de_thread
does not set group_exit_code.  As best as I can determine
group_exit_code has been is set to 0 most of the time during
de_thread.  During a thread group stop group_exit_code is set to the
stop signal and when the thread group receives SIGCONT group_exit_code
is reset to 0.

Link: https://lkml.kernel.org/r/20211213225350.27481-8-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08 12:43:57 -06:00
Eric W. Biederman
60700e38fb signal: Rename group_exit_task group_exec_task
The only remaining user of group_exit_task is exec.  Rename the field
so that it is clear which part of the code uses it.

Update the comment above the definition of group_exec_task to document
how it is currently used.

Link: https://lkml.kernel.org/r/20211213225350.27481-7-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08 12:43:57 -06:00
Eric W. Biederman
de77c3a5b9 exit: Move force_uaccess back into do_exit
With kernel threads on architectures that still have set_fs/get_fs
running as KERNEL_DS moving force_uaccess_begin does not appear safe.
Calling force_uaccess_begin is a noop on anything people care about.

Update the comment to explain why this code while looking like an
obvious candidate for moving to make_task_dead probably needs to
remain in do_exit until set_fs/get_fs are entirely removed from the
kernel.

Fixes: 05ea0424f0 ("exit: Move oops specific logic from do_exit into make_task_dead")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/YdUxGKRcSiDy8jGg@zeniv-ca.linux.org.uk
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08 10:53:07 -06:00
Eric W. Biederman
912616f142 exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
Change the task state to EXIT_DEAD and take an extra rcu_refernce
to guarantee the task will not be reaped and that it will not be
freed.

Link: https://lkml.kernel.org/r/YdUzjrLAlRiNLQp2@zeniv-ca.linux.org.uk
Pointed-out-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 7f80a2fd7d ("exit: Stop poorly open coding do_task_dead in make_task_dead")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08 10:51:23 -06:00
Eric W. Biederman
cead185526 exit: Rename complete_and_exit to kthread_complete_and_exit
Update complete_and_exit to call kthread_exit instead of do_exit.

Change the name to reflect this change in functionality.  All of the
users of complete_and_exit are causing the current kthread to exit so
this change makes it clear what is happening.

Move the implementation of kthread_complete_and_exit from
kernel/exit.c to to kernel/kthread.c.  As this function is kthread
specific it makes most sense to live with the kthread functions.

There are no functional change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Eric W. Biederman
eb55e716ac exit: Stop exporting do_exit
Now that there are no more modular uses of do_exit remove the EXPORT_SYMBOL.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Eric W. Biederman
7f80a2fd7d exit: Stop poorly open coding do_task_dead in make_task_dead
When the kernel detects it is oops or otherwise force killing a task
while it exits the code poorly attempts to permanently stop the task
from scheduling.

I say poorly because it is possible for a task in TASK_UINTERRUPTIBLE
to be woken up.

As it makes no sense for the task to continue call do_task_dead
instead which actually does the work and permanently removes the task
from the scheduler.  Guaranteeing the task will never be woken
up again.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Eric W. Biederman
05ea0424f0 exit: Move oops specific logic from do_exit into make_task_dead
The beginning of do_exit has become cluttered and difficult to read as
it is filled with checks to handle things that can only happen when
the kernel is operating improperly.

Now that we have a dedicated function for cleaning up a task when the
kernel is operating improperly move the checks there.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Eric W. Biederman
0e25498f8c exit: Add and use make_task_dead.
There are two big uses of do_exit.  The first is it's design use to be
the guts of the exit(2) system call.  The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure.  In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Linus Torvalds
a602285ac1 Merge branch 'per_signal_struct_coredumps-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull per signal_struct coredumps from Eric Biederman:
 "Current coredumps are mixed up with the exit code, the signal handling
  code, and the ptrace code making coredumps much more complicated than
  necessary and difficult to follow.

  This series of changes starts with ptrace_stop and cleans it up,
  making it easier to follow what is happening in ptrace_stop. Then
  cleans up the exec interactions with coredumps. Then cleans up the
  coredump interactions with exit. Finally the coredump interactions
  with the signal handling code is cleaned up.

  The first and last changes are bug fixes for minor bugs.

  I believe the fact that vfork followed by execve can kill the process
  the called vfork if exec fails is sufficient justification to change
  the userspace visible behavior.

  In previous discussions some of these changes were organized
  differently and individually appeared to make the code base worse. As
  currently written I believe they all stand on their own as cleanups
  and bug fixes.

  Which means that even if the worst should happen and the last change
  needs to be reverted for some unimaginable reason, the code base will
  still be improved.

  If the worst does not happen there are a more cleanups that can be
  made. Signals that generate coredumps can easily become eligible for
  short circuit delivery in complete_signal. The entire rendezvous for
  generating a coredump can move into get_signal. The function
  force_sig_info_to_task be written in a way that does not modify the
  signal handling state of the target task (because coredumps are
  eligible for short circuit delivery). Many of these future cleanups
  can be done another way but nothing so cleanly as if coredumps become
  per signal_struct"

* 'per_signal_struct_coredumps-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  coredump: Limit coredumps to a single thread group
  coredump:  Don't perform any cleanups before dumping core
  exit: Factor coredump_exit_mm out of exit_mm
  exec: Check for a pending fatal signal instead of core_state
  ptrace: Remove the unnecessary arguments from arch_ptrace_stop
  signal: Remove the bogus sigkill_pending in ptrace_stop
2021-11-03 12:15:29 -07:00
Linus Torvalds
9a7e0a90a4 Scheduler updates:
- Revert the printk format based wchan() symbol resolution as it can leak
    the raw value in case that the symbol is not resolvable.
 
  - Make wchan() more robust and work with all kind of unwinders by
    enforcing that the task stays blocked while unwinding is in progress.
 
  - Prevent sched_fork() from accessing an invalid sched_task_group
 
  - Improve asymmetric packing logic
 
  - Extend scheduler statistics to RT and DL scheduling classes and add
    statistics for bandwith burst to the SCHED_FAIR class.
 
  - Properly account SCHED_IDLE entities
 
  - Prevent a potential deadlock when initial priority is assigned to a
    newly created kthread. A recent change to plug a race between cpuset and
    __sched_setscheduler() introduced a new lock dependency which is now
    triggered. Break the lock dependency chain by moving the priority
    assignment to the thread function.
 
  - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
 
  - Improve idle balancing in general and especially for NOHZ enabled
    systems.
 
  - Provide proper interfaces for live patching so it does not have to
    fiddle with scheduler internals.
 
  - Add cluster aware scheduling support.
 
  - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
    scheduler options and delaying mmdrop)
 
  - The usual small tweaks and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
 iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
 /1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
 aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
 3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
 ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
 vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
 mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
 V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
 s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
 i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
 d2qWG7UvoseT+g==
 =fgtS
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Thomas Gleixner:

 - Revert the printk format based wchan() symbol resolution as it can
   leak the raw value in case that the symbol is not resolvable.

 - Make wchan() more robust and work with all kind of unwinders by
   enforcing that the task stays blocked while unwinding is in progress.

 - Prevent sched_fork() from accessing an invalid sched_task_group

 - Improve asymmetric packing logic

 - Extend scheduler statistics to RT and DL scheduling classes and add
   statistics for bandwith burst to the SCHED_FAIR class.

 - Properly account SCHED_IDLE entities

 - Prevent a potential deadlock when initial priority is assigned to a
   newly created kthread. A recent change to plug a race between cpuset
   and __sched_setscheduler() introduced a new lock dependency which is
   now triggered. Break the lock dependency chain by moving the priority
   assignment to the thread function.

 - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.

 - Improve idle balancing in general and especially for NOHZ enabled
   systems.

 - Provide proper interfaces for live patching so it does not have to
   fiddle with scheduler internals.

 - Add cluster aware scheduling support.

 - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
   scheduler options and delaying mmdrop)

 - The usual small tweaks and improvements all over the place

* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
  sched/fair: Cleanup newidle_balance
  sched/fair: Remove sysctl_sched_migration_cost condition
  sched/fair: Wait before decaying max_newidle_lb_cost
  sched/fair: Skip update_blocked_averages if we are defering load balance
  sched/fair: Account update_blocked_averages in newidle_balance cost
  x86: Fix __get_wchan() for !STACKTRACE
  sched,x86: Fix L2 cache mask
  sched/core: Remove rq_relock()
  sched: Improve wake_up_all_idle_cpus() take #2
  irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
  irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
  irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
  sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
  sched: Add cluster scheduler level for x86
  sched: Add cluster scheduler level in core and related Kconfig for ARM64
  topology: Represent clusters of CPUs within a die
  sched: Disable -Wunused-but-set-variable
  sched: Add wrapper for get_wchan() to keep task blocked
  x86: Fix get_wchan() to support the ORC unwinder
  proc: Use task_is_running() for wchan in /proc/$pid/stat
  ...
2021-11-01 13:48:52 -07:00
Christoph Hellwig
545c6647d2 kernel: remove spurious blkdev.h includes
Various files have acquired spurious includes of <linux/blkdev.h> over
time.  Remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:01 -06:00
Eric W. Biederman
0258b5fd7c coredump: Limit coredumps to a single thread group
Today when a signal is delivered with a handler of SIG_DFL whose
default behavior is to generate a core dump not only that process but
every process that shares the mm is killed.

In the case of vfork this looks like a real world problem.  Consider
the following well defined sequence.

	if (vfork() == 0) {
		execve(...);
		_exit(EXIT_FAILURE);
	}

If a signal that generates a core dump is received after vfork but
before the execve changes the mm the process that called vfork will
also be killed (as the mm is shared).

Similarly if the execve fails after the point of no return the kernel
delivers SIGSEGV which will kill both the exec'ing process and because
the mm is shared the process that called vfork as well.

As far as I can tell this behavior is a violation of people's
reasonable expectations, POSIX, and is unnecessarily fragile when the
system is low on memory.

Solve this by making a userspace visible change to only kill a single
process/thread group.  This is possible because Jann Horn recently
modified[1] the coredump code so that the mm can safely be modified
while the coredump is happening.  With LinuxThreads long gone I don't
expect anyone to have a notice this behavior change in practice.

To accomplish this move the core_state pointer from mm_struct to
signal_struct, which allows different thread groups to coredump
simultatenously.

In zap_threads remove the work to kill anything except for the current
thread group.

v2: Remove core_state from the VM_BUG_ON_MM print to fix
    compile failure when CONFIG_DEBUG_VM is enabled.
    Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>

[1] a07279c9a8 ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot")
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133
Link: https://lkml.kernel.org/r/20211007144701.67592574@canb.auug.org.au
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-08 12:06:02 -05:00
Eric W. Biederman
9230738308 coredump: Don't perform any cleanups before dumping core
Rename coredump_exit_mm to coredump_task_exit and call it from do_exit
before PTRACE_EVENT_EXIT, and before any cleanup work for a task
happens.  This ensures that an accurate copy of the process can be
captured in the coredump as no cleanup for the process happens before
the coredump completes.  This also ensures that PTRACE_EVENT_EXIT
will not be visited by any thread until the coredump is complete.

Add a new flag PF_POSTCOREDUMP so that tasks that have passed through
coredump_task_exit can be recognized and ignored in zap_process.

Now that all of the coredumping happens before exit_mm remove code to
test for a coredump in progress from mm_release.

Replace "may_ptrace_stop()" with a simple test of "current->ptrace".
The other tests in may_ptrace_stop all concern avoiding stopping
during a coredump.  These tests are no longer necessary as it is now
guaranteed that fatal_signal_pending will be set if the code enters
ptrace_stop during a coredump.  The code in ptrace_stop is guaranteed
not to stop if fatal_signal_pending returns true.

Until this change "ptrace_event(PTRACE_EVENT_EXIT)" could call
ptrace_stop without fatal_signal_pending being true, as signals are
dequeued in get_signal before calling do_exit.  This is no longer
an issue as "ptrace_event(PTRACE_EVENT_EXIT)" is no longer reached
until after the coredump completes.

Link: https://lkml.kernel.org/r/874kaax26c.fsf@disp2133
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-06 11:28:39 -05:00
Eric W. Biederman
d67e03e361 exit: Factor coredump_exit_mm out of exit_mm
Separate the coredump logic from the ordinary exit_mm logic
by moving the coredump logic out of exit_mm into it's own
function coredump_exit_mm.

Link: https://lkml.kernel.org/r/87a6k2x277.fsf@disp2133
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-06 11:28:21 -05:00