To allow more flexible arrangements while still provide a single kernel
for distros, provide a boot time parameter to enable/disable lazy RCU.
Specify:
rcutree.enable_rcu_lazy=[y|1|n|0]
Which also requires
rcu_nocbs=all
at boot time to enable/disable lazy RCU.
To disable it by default at build time when CONFIG_RCU_LAZY=y, the new
CONFIG_RCU_LAZY_DEFAULT_OFF can be used.
Bug: 258241771
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/lkml/20231203011252.233748-1-qyousef@layalina.io/
[Fix trivial conflicts rejecting newer code that doesn't exist on 6.1]
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: Ib5585ae717a2ba7749f2802101b785c4e5de8a90
On many systems, a great deal of boot (in userspace) happens after the
kernel thinks the boot has completed. It is difficult to determine if
the system has really booted from the kernel side. Some features like
lazy-RCU can risk slowing down boot time if, say, a callback has been
added that the boot synchronously depends on. Further expedited callbacks
can get unexpedited way earlier than it should be, thus slowing down
boot (as shown in the data below).
For these reasons, this commit adds a config option
'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter rcupdate.boot_end_delay.
Userspace can also make RCU's view of the system as booted, by writing the
time in milliseconds to: /sys/module/rcupdate/parameters/rcu_boot_end_delay
Or even just writing a value of 0 to this sysfs node.
However, under no circumstance will the boot be allowed to end earlier
than just before init is launched.
The default value of CONFIG_RCU_BOOT_END_DELAY is chosen as 15s. This
suites ChromeOS and also a PREEMPT_RT system below very well, which need
no config or parameter changes, and just a simple application of this patch. A
system designer can also choose a specific value here to keep RCU from marking
boot completion. As noted earlier, RCU's perspective of the system as booted
will not be marker until at least rcu_boot_end_delay milliseconds have passed
or an update is made via writing a small value (or 0) in milliseconds to:
/sys/module/rcupdate/parameters/rcu_boot_end_delay.
One side-effect of this patch is, there is a risk that a real-time workload
launched just after the kernel boots will suffer interruptions due to expedited
RCU, which previous ended just before init was launched. However, to mitigate
such an issue (however unlikely), the user should either tune
CONFIG_RCU_BOOT_END_DELAY to a smaller value than 15 seconds or write a value
of 0 to /sys/module/rcupdate/parameters/rcu_boot_end_delay, once userspace
boots, and before launching the real-time workload.
Qiuxu also noted impressive boot-time improvements with earlier version
of patch. An excerpt from the data he shared:
1) Testing environment:
OS : CentOS Stream 8 (non-RT OS)
Kernel : v6.2
Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads)
Qemu args : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, …
2) OS boot time definition:
The time from the start of the kernel boot to the shell command line
prompt is shown from the console. [ Different people may have
different OS boot time definitions. ]
3) Measurement method (very rough method):
A timer in the kernel periodically prints the boot time every 100ms.
As soon as the shell command line prompt is shown from the console,
we record the boot time printed by the timer, then the printed boot
time is the OS boot time.
4) Measured OS boot time (in seconds)
a) Measured 10 times w/o this patch:
8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s
The average OS boot time was: ~8.7s
b) Measure 10 times w/ this patch:
8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s
The average OS boot time was: ~8.3s.
(CHROMIUM tag rationale: Submitted upstream but got lots of pushback as
it may harm a PREEMPT_RT system -- the concern is VERY theoretical and
this improves things for ChromeOS. Plus we are not a PREEMPT_RT system.
So I am strongly suggesting this mostly simple change for ChromeOS.)
Bug: 258241771
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4350228
Commit-Queue: Joel Fernandes <joelaf@google.com>
Commit-Queue: Vineeth Pillai <vineethrp@google.com>
Tested-by: Vineeth Pillai <vineethrp@google.com>
Tested-by: Joel Fernandes <joelaf@google.com>
Reviewed-by: Vineeth Pillai <vineethrp@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4909180
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: Ibd262189d7f92dbcc57f1508efe90fcfba95a6cc
Changes in 6.1.51
ACPI: thermal: Drop nocrt parameter
module: Expose module_init_layout_section()
arm64: module-plts: inline linux/moduleloader.h
arm64: module: Use module_init_layout_section() to spot init sections
ARM: module: Use module_init_layout_section() to spot init sections
lockdep: fix static memory detection even more
parisc: Cleanup mmap implementation regarding color alignment
parisc: sys_parisc: parisc_personality() is called from asm code
io_uring/parisc: Adjust pgoff in io_uring mmap() for parisc
kallsyms: Fix kallsyms_selftest failure
thunderbolt: Fix a backport error for display flickering issue
Linux 6.1.51
Change-Id: I8bc79fc29ebf10ba654c16b771af1519eea39b38
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Changes in 6.1.47
mmc: sdhci-f-sdh30: Replace with sdhci_pltfm
cpuidle: psci: Extend information in log about OSI/PC mode
cpuidle: psci: Move enabling OSI mode after power domains creation
zsmalloc: consolidate zs_pool's migrate_lock and size_class's locks
zsmalloc: fix races between modifications of fullness and isolated
selftests: forwarding: tc_actions: cleanup temporary files when test is aborted
selftests: forwarding: tc_actions: Use ncat instead of nc
net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore
net/smc: Fix setsockopt and sysctl to specify same buffer size again
net: phy: at803x: Use devm_regulator_get_enable_optional()
net: phy: at803x: fix the wol setting functions
drm/amdgpu: fix calltrace warning in amddrm_buddy_fini
drm/amdgpu: Fix integer overflow in amdgpu_cs_pass1
drm/amdgpu: fix memory leak in mes self test
ASoC: Intel: sof_sdw: add quirk for MTL RVP
ASoC: Intel: sof_sdw: add quirk for LNL RVP
PCI: tegra194: Fix possible array out of bounds access
ASoC: SOF: amd: Add pci revision id check
drm/stm: ltdc: fix late dereference check
drm: rcar-du: remove R-Car H3 ES1.* workarounds
ASoC: amd: vangogh: Add check for acp config flags in vangogh platform
ARM: dts: imx6dl: prtrvt, prtvt7, prti6q, prtwd2: fix USB related warnings
ASoC: Intel: sof_sdw_rt_sdca_jack_common: test SOF_JACK_JDSRC in _exit
ASoC: Intel: sof_sdw: Add support for Rex soundwire
iopoll: Call cpu_relax() in busy loops
ASoC: SOF: Intel: fix SoundWire/HDaudio mutual exclusion
dma-remap: use kvmalloc_array/kvfree for larger dma memory remap
accel/habanalabs: add pci health check during heartbeat
HID: logitech-hidpp: Add USB and Bluetooth IDs for the Logitech G915 TKL Keyboard
iommu/amd: Introduce Disable IRTE Caching Support
drm/amdgpu: install stub fence into potential unused fence pointers
drm/amd/display: Apply 60us prefetch for DCFCLK <= 300Mhz
RDMA/mlx5: Return the firmware result upon destroying QP/RQ
drm/amd/display: Skip DPP DTO update if root clock is gated
drm/amd/display: Enable dcn314 DPP RCO
ASoC: SOF: core: Free the firmware trace before calling snd_sof_shutdown()
HID: intel-ish-hid: ipc: Add Arrow Lake PCI device ID
ALSA: hda/realtek: Add quirks for ROG ALLY CS35l41 audio
smb: client: fix warning in cifs_smb3_do_mount()
cifs: fix session state check in reconnect to avoid use-after-free issue
serial: stm32: Ignore return value of uart_remove_one_port() in .remove()
led: qcom-lpg: Fix resource leaks in for_each_available_child_of_node() loops
media: v4l2-mem2mem: add lock to protect parameter num_rdy
media: camss: set VFE bpl_alignment to 16 for sdm845 and sm8250
usb: gadget: u_serial: Avoid spinlock recursion in __gs_console_push
usb: gadget: uvc: queue empty isoc requests if no video buffer is available
media: platform: mediatek: vpu: fix NULL ptr dereference
thunderbolt: Read retimer NVM authentication status prior tb_retimer_set_inbound_sbtx()
usb: chipidea: imx: don't request QoS for imx8ulp
usb: chipidea: imx: add missing USB PHY DPDM wakeup setting
gfs2: Fix possible data races in gfs2_show_options()
pcmcia: rsrc_nonstatic: Fix memory leak in nonstatic_release_resource_db()
thunderbolt: Add Intel Barlow Ridge PCI ID
thunderbolt: Limit Intel Barlow Ridge USB3 bandwidth
firewire: net: fix use after free in fwnet_finish_incoming_packet()
watchdog: sp5100_tco: support Hygon FCH/SCH (Server Controller Hub)
Bluetooth: L2CAP: Fix use-after-free
Bluetooth: btusb: Add MT7922 bluetooth ID for the Asus Ally
ceph: try to dump the msgs when decoding fails
drm/amdgpu: Fix potential fence use-after-free v2
fs/ntfs3: Enhance sanity check while generating attr_list
fs: ntfs3: Fix possible null-pointer dereferences in mi_read()
fs/ntfs3: Mark ntfs dirty when on-disk struct is corrupted
ALSA: hda/realtek: Add quirks for Unis H3C Desktop B760 & Q760
ALSA: hda: fix a possible null-pointer dereference due to data race in snd_hdac_regmap_sync()
ALSA: hda/realtek: Add quirk for ASUS ROG GX650P
ALSA: hda/realtek: Add quirk for ASUS ROG GA402X
ALSA: hda/realtek: Add quirk for ASUS ROG GZ301V
powerpc/kasan: Disable KCOV in KASAN code
Bluetooth: MGMT: Use correct address for memcpy()
ring-buffer: Do not swap cpu_buffer during resize process
igc: read before write to SRRCTL register
drm/amd/display: save restore hdcp state when display is unplugged from mst hub
drm/amd/display: phase3 mst hdcp for multiple displays
drm/amd/display: fix access hdcp_workqueue assert
KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption
ARM: dts: nxp/imx6sll: fix wrong property name in usbphy node
fbdev/hyperv-fb: Do not set struct fb_info.apertures
video/aperture: Only remove sysfb on the default vga pci device
btrfs: move out now unused BG from the reclaim list
btrfs: convert btrfs_block_group::needs_free_space to runtime flag
btrfs: convert btrfs_block_group::seq_zone to runtime flag
btrfs: fix use-after-free of new block group that became unused
virtio-mmio: don't break lifecycle of vm_dev
vduse: Use proper spinlock for IRQ injection
vdpa/mlx5: Fix mr->initialized semantics
vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary
cifs: fix potential oops in cifs_oplock_break
i2c: bcm-iproc: Fix bcm_iproc_i2c_isr deadlock issue
i2c: hisi: Only handle the interrupt of the driver's transfer
i2c: tegra: Fix i2c-tegra DMA config option processing
fbdev: mmp: fix value check in mmphw_probe()
powerpc/rtas_flash: allow user copy to flash block cache objects
vdpa: Add features attr to vdpa_nl_policy for nlattr length check
vdpa: Add queue index attr to vdpa_nl_policy for nlattr length check
vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length check
vdpa: Enable strict validation for netlinks ops
tty: n_gsm: fix the UAF caused by race condition in gsm_cleanup_mux
tty: serial: fsl_lpuart: Clear the error flags by writing 1 for lpuart32 platforms
btrfs: fix incorrect splitting in btrfs_drop_extent_map_range
btrfs: fix BUG_ON condition in btrfs_cancel_balance
i2c: designware: Correct length byte validation logic
i2c: designware: Handle invalid SMBus block data response length value
net: xfrm: Fix xfrm_address_filter OOB read
net: af_key: fix sadb_x_filter validation
net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure
xfrm: fix slab-use-after-free in decode_session6
ip6_vti: fix slab-use-after-free in decode_session6
ip_vti: fix potential slab-use-after-free in decode_session6
xfrm: add NULL check in xfrm_update_ae_params
xfrm: add forgotten nla_policy for XFRMA_MTIMER_THRESH
virtio_net: notify MAC address change on device initialization
virtio-net: set queues after driver_ok
net: pcs: Add missing put_device call in miic_create
net: phy: fix IRQ-based wake-on-lan over hibernate / power off
selftests: mirror_gre_changes: Tighten up the TTL test match
drm/panel: simple: Fix AUO G121EAN01 panel timings according to the docs
net: macb: In ZynqMP resume always configure PS GTR for non-wakeup source
octeon_ep: cancel tx_timeout_task later in remove sequence
netfilter: nf_tables: fix false-positive lockdep splat
netfilter: nf_tables: deactivate catchall elements in next generation
ipvs: fix racy memcpy in proc_do_sync_threshold
netfilter: nft_dynset: disallow object maps
net: phy: broadcom: stub c45 read/write for 54810
team: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
net: openvswitch: reject negative ifindex
iavf: fix FDIR rule fields masks validation
i40e: fix misleading debug logs
net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset
sfc: don't unregister flow_indr if it was never registered
sock: Fix misuse of sk_under_memory_pressure()
net: do not allow gso_size to be set to GSO_BY_FRAGS
qede: fix firmware halt over suspend and resume
ice: Block switchdev mode when ADQ is active and vice versa
bus: ti-sysc: Flush posted write on enable before reset
arm64: dts: qcom: qrb5165-rb5: fix thermal zone conflict
arm64: dts: rockchip: Disable HS400 for eMMC on ROCK Pi 4
arm64: dts: rockchip: Disable HS400 for eMMC on ROCK 4C+
ARM: dts: imx: align LED node names with dtschema
ARM: dts: imx6: phytec: fix RTC interrupt level
arm64: dts: imx8mm: Drop CSI1 PHY reference clock configuration
ARM: dts: imx: Set default tuning step for imx6sx usdhc
arm64: dts: imx93: Fix anatop node size
ASoC: rt5665: add missed regulator_bulk_disable
ASoC: meson: axg-tdm-formatter: fix channel slot allocation
ALSA: hda/realtek: Add quirks for HP G11 Laptops
soc: aspeed: uart-routing: Use __sysfs_match_string
soc: aspeed: socinfo: Add kfree for kstrdup
ALSA: hda/realtek - Remodified 3k pull low procedure
riscv: uaccess: Return the number of bytes effectively not copied
serial: 8250: Fix oops for port->pm on uart_change_pm()
ALSA: usb-audio: Add support for Mythware XA001AU capture and playback interfaces.
cifs: Release folio lock on fscache read hit.
virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG case
arm64: dts: rockchip: Fix Wifi/Bluetooth on ROCK Pi 4 boards
blk-crypto: dynamically allocate fallback profile
mmc: wbsd: fix double mmc_free_host() in wbsd_init()
mmc: block: Fix in_flight[issue_type] value error
drm/qxl: fix UAF on handle creation
drm/i915/sdvo: fix panel_type initialization
drm/amd: flush any delayed gfxoff on suspend entry
drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix
drm/amdgpu/pm: fix throttle_status for other than MP1 11.0.7
ASoC: amd: vangogh: select CONFIG_SND_AMD_ACP_CONFIG
drm/amd/display: disable RCO for DCN314
zsmalloc: allow only one active pool compaction context
sched/fair: unlink misfit task from cpu overutilized
sched/fair: Remove capacity inversion detection
drm/amd/display: Implement workaround for writing to OTG_PIXEL_RATE_DIV register
hugetlb: do not clear hugetlb dtor until allocating vmemmap
netfilter: set default timeout to 3 secs for sctp shutdown send and recv state
arm64/ptrace: Ensure that SME is set up for target when writing SSVE state
drm/amd/pm: skip the RLC stop when S0i3 suspend for SMU v13.0.4/11
drm/amdgpu: keep irq count in amdgpu_irq_disable_all
af_unix: Fix null-ptr-deref in unix_stream_sendpage().
drm/nouveau/disp: fix use-after-free in error handling of nouveau_connector_create
net: fix the RTO timer retransmitting skb every 1ms if linear option is enabled
mmc: f-sdh30: fix order of function calls in sdhci_f_sdh30_remove
Linux 6.1.47
Change-Id: I7c55c71f43f88a1d44d39c835e3f6e58d4c86279
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Changes in 6.1.44
init: Provide arch_cpu_finalize_init()
x86/cpu: Switch to arch_cpu_finalize_init()
ARM: cpu: Switch to arch_cpu_finalize_init()
ia64/cpu: Switch to arch_cpu_finalize_init()
loongarch/cpu: Switch to arch_cpu_finalize_init()
m68k/cpu: Switch to arch_cpu_finalize_init()
mips/cpu: Switch to arch_cpu_finalize_init()
sh/cpu: Switch to arch_cpu_finalize_init()
sparc/cpu: Switch to arch_cpu_finalize_init()
um/cpu: Switch to arch_cpu_finalize_init()
init: Remove check_bugs() leftovers
init: Invoke arch_cpu_finalize_init() earlier
init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
x86/init: Initialize signal frame size late
x86/fpu: Remove cpuinfo argument from init functions
x86/fpu: Mark init functions __init
x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
x86/speculation: Add Gather Data Sampling mitigation
x86/speculation: Add force option to GDS mitigation
x86/speculation: Add Kconfig option for GDS
KVM: Add GDS_NO support to KVM
x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build
x86/xen: Fix secondary processors' FPU initialization
x86/mm: fix poking_init() for Xen PV guests
x86/mm: Use mm_alloc() in poking_init()
mm: Move mm_cachep initialization to mm_init()
x86/mm: Initialize text poking earlier
Documentation/x86: Fix backwards on/off logic about YMM support
x86/bugs: Increase the x86 bugs vector size to two u32s
x86/cpu, kvm: Add support for CPUID_80000021_EAX
x86/srso: Add a Speculative RAS Overflow mitigation
x86/srso: Add IBPB_BRTYPE support
x86/srso: Add SRSO_NO support
x86/srso: Add IBPB
x86/srso: Add IBPB on VMEXIT
x86/srso: Fix return thunks in generated code
x86/srso: Add a forgotten NOENDBR annotation
x86/srso: Tie SBPB bit setting to microcode patch detection
xen/netback: Fix buffer overrun triggered by unusual packet
x86: fix backwards merge of GDS/SRSO bit
Linux 6.1.44
Change-Id: Ia40e37c806ae2a2daf2127415aa28d0151660667
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 5f641174a12b8a876a4101201a21ef4675ecc014 upstream.
The `nocrt` module parameter has no code associated with it and does
nothing. As `crt=-1` has same functionality as what nocrt should be
doing drop `nocrt` and associated documentation.
This should fix a quirk for Gigabyte GA-7ZX that used `nocrt` and
thus didn't function properly.
Fixes: 8c99fdce30 ("ACPI: thermal: set "thermal.nocrt" via DMI on Gigabyte GA-7ZX")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 66419036f68a838c00cbccacd6cb2e99da6e5710 ]
An Interrupt Remapping Table (IRT) stores interrupt remapping configuration
for each device. In a normal operation, the AMD IOMMU caches the table
to optimize subsequent data accesses. This requires the IOMMU driver to
invalidate IRT whenever it updates the table. The invalidation process
includes issuing an INVALIDATE_INTERRUPT_TABLE command following by
a COMPLETION_WAIT command.
However, there are cases in which the IRT is updated at a high rate.
For example, for IOMMU AVIC, the IRTE[IsRun] bit is updated on every
vcpu scheduling (i.e. amd_iommu_update_ga()). On system with large
amount of vcpus and VFIO PCI pass-through devices, the invalidation
process could potentially become a performance bottleneck.
Introducing a new kernel boot option:
amd_iommu=irtcachedis
which disables IRTE caching by setting the IRTCachedis bit in each IOMMU
Control register, and bypass the IRT invalidation process.
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Co-developed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20230530141137.14376-4-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Upstream commit: fb3bd914b3ec28f5fb697ac55c4846ac2d542855
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.
The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence. To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.
To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference. In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.
In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 553a5c03e90a6087e88f8ff878335ef0621536fb upstream
The Gather Data Sampling (GDS) vulnerability allows malicious software
to infer stale data previously stored in vector registers. This may
include sensitive data such as cryptographic keys. GDS is mitigated in
microcode, and systems with up-to-date microcode are protected by
default. However, any affected system that is running with older
microcode will still be vulnerable to GDS attacks.
Since the gather instructions used by the attacker are part of the
AVX2 and AVX512 extensions, disabling these extensions prevents gather
instructions from being executed, thereby mitigating the system from
GDS. Disabling AVX2 is sufficient, but we don't have the granularity
to do this. The XCR0[2] disables AVX, with no option to just disable
AVX2.
Add a kernel parameter gather_data_sampling=force that will enable the
microcode mitigation if available, otherwise it will disable AVX on
affected systems.
This option will be ignored if cmdline mitigations=off.
This is a *big* hammer. It is known to break buggy userspace that
uses incomplete, buggy AVX enumeration. Unfortunately, such userspace
does exist in the wild:
https://www.mail-archive.com/bug-coreutils@gnu.org/msg33046.html
[ dhansen: add some more ominous warnings about disabling AVX ]
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8974eb588283b7d44a7c91fa09fcbaf380339f3a upstream
Gather Data Sampling (GDS) is a hardware vulnerability which allows
unprivileged speculative access to data which was previously stored in
vector registers.
Intel processors that support AVX2 and AVX512 have gather instructions
that fetch non-contiguous data elements from memory. On vulnerable
hardware, when a gather instruction is transiently executed and
encounters a fault, stale data from architectural or internal vector
registers may get transiently stored to the destination vector
register allowing an attacker to infer the stale data using typical
side channel techniques like cache timing attacks.
This mitigation is different from many earlier ones for two reasons.
First, it is enabled by default and a bit must be set to *DISABLE* it.
This is the opposite of normal mitigation polarity. This means GDS can
be mitigated simply by updating microcode and leaving the new control
bit alone.
Second, GDS has a "lock" bit. This lock bit is there because the
mitigation affects the hardware security features KeyLocker and SGX.
It needs to be enabled and *STAY* enabled for these features to be
mitigated against GDS.
The mitigation is enabled in the microcode by default. Disable it by
setting gather_data_sampling=off or by disabling all mitigations with
mitigations=off. The mitigation status can be checked by reading:
/sys/devices/system/cpu/vulnerabilities/gather_data_sampling
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SoCs featuring peripherals that can issue non-coherent DMA traffic
beyond the point of coherency (PoC) present multiple challenges for the
DMA-API implementation in Linux. Many of these challenges can be
overcome by suitable configuration of the interconnect, however the
presence of a cacheable alias for non-cacheable buffers can still lead
to coherence issues arising when stale clean lines are back-snooped from
the cache hierarchy to satisfy a non-cacheable transaction at the PoC.
Removing all cacheable aliases on a case-by-cases basis is both
error-prone and expensive. Instead, leverage the stage-2 identity
mapping installed by pKVM to enforce consistent cacheability for all
stage-1 aliases.
Bug: 240786634
Change-Id: I78b0aa51fe3e23811bbd25481173086aa957c4bf
Signed-off-by: Will Deacon <willdeacon@google.com>
kvm-arm.protected_modules="" takes a list of modules that pKVM will load
before de-privileging the host. This is necessary as no loading will be
allowed later.
We can't rely on request_module() that might be disabled by umh's
configuration. Instead, create our own version, locked by the pKVM/KVM
static keys and marked as __init to be cleared once the kernel init is
done. Belt and braces.
Keep the previous kvm-arm.protected_modules for compatibility.
Bug: 254835242
Change-Id: Ia6881b4c7a60cf81d19ead12c5d4638a27eff3eb
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
ZONE_DMA32 is enabled by default on android14-6.1, yet it is not
needed for all devices, nor is it desirable to have if not needed. For
instance, if a partner in GKI 1.0 did not use ZONE_DMA32, memory can
be lower for ZONE_NORMAL relative to older targets, such that memory
would run out more quickly in ZONE_NORMAL leading kswapd to be invoked
unnecessarily.
Correspondingly, provide a means of making ZONE_DMA32 empty via the
kernel command line when it is compiled in via CONFIG_ZONE_DMA32.
P.S. The following two patches are squashed into this one,
1. bf96382 ("ANDROID: dma-direct: Make DMA32 disablement work for CONFIG_NUMA")
2. 135406c ("ANDROID: dma-direct: Document disable_dma32")
Bug: 199917449
Bug: 268587627
Change-Id: I70ec76914b92e518d61a61072f0b3cb41cb28646
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
Signed-off-by: Chinwen Chang <chinwen.chang@mediatek.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmO5RX4ACgkQONu9yGCS
aT7NFRAAlqi2Wwx1NZU3HE9nr/fgdGFDlEJbKXP9jiwIjiIx5mAPaysb9uhOG5qj
11uG/S3+xsb5ryhSpCR5I0rPnymK/XWDQtfYlKMuB+bLr+w03BBYzZX5v+YihUmw
mWa32+xiei1KzIjxMGVFciiSYoMxWR5smqe559LPiabB3dcRfRPSHQDJCOWE5T2y
2Tlts/gUpfqMh+MPNQOYgB0TZmUhzin9XW4AcDqLsyupKRLKEusDVIA5QfznuCyN
UFjTem9h+4qPSZOgzdmSX9QljYL8Jqh4gwXUcl4/EUoObPN/tTbjCYZkuyOT4r3F
FsN/w6+32C/TjQSBg50d4yT4TFC4bjnc3VCb2dI+6WazLqAjS7RrkoqTl37K/rwC
Gb3FmjwQNNx/iNq5kM5NvuJgLWuLAVZBn+WxWBk1hqE5f8l0l5/NgxdHfSwjMvJL
toqXT98yoSTMMGaQ8QA4MLh9Gx2CYC0JeyaWnFavMOBs9Oxy+SbNyoUKo0Wn8kHh
z/6/Pdk6Qd3PKv9pXsrTpWAPzQs5w23+XbZ2pJw+K5yVKvmNQJEhSJ9fzFRksldW
ykDMzVmZ0kswcNuJgQFUi4QTu17p3FA4UJG1xvJNCKPsgelLe8153TFDds0yXfu1
IwqZiHpwBF1tyGHYEBdqmivj77eWtNsWMSZHHDN9jJ9sNxS+kSM=
=CeYO
-----END PGP SIGNATURE-----
Merge 6.1.4 into android14-6.1
Changes in 6.1.4
drm/amdgpu: skip MES for S0ix as well since it's part of GFX
drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0
media: stv0288: use explicitly signed char
cxl/region: Fix memdev reuse check
arm64: dts: qcom: sc8280xp: fix UFS DMA coherency
arm64: Prohibit instrumentation on arch_stack_walk()
soc: qcom: Select REMAP_MMIO for LLCC driver
soc: qcom: Select REMAP_MMIO for ICC_BWMON driver
kest.pl: Fix grub2 menu handling for rebooting
ktest.pl minconfig: Unset configs instead of just removing them
jbd2: use the correct print format
perf/x86/intel/uncore: Disable I/O stacks to PMU mapping on ICX-D
perf/x86/intel/uncore: Clear attr_update properly
arm64: dts: qcom: sdm845-db845c: correct SPI2 pins drive strength
arm64: dts: qcom: sc8280xp: fix UFS reference clocks
mmc: sdhci-sprd: Disable CLK_AUTO when the clock is less than 400K
phy: qcom-qmp-combo: fix out-of-bounds clock access
drm/amd/pm: update SMU13.0.0 reported maximum shader clock
drm/amd/pm: correct SMU13.0.0 pstate profiling clock settings
btrfs: fix uninitialized parent in insert_state
btrfs: fix extent map use-after-free when handling missing device in read_one_chunk
btrfs: fix resolving backrefs for inline extent followed by prealloc
ARM: ux500: do not directly dereference __iomem
arm64: dts: qcom: sdm850-samsung-w737: correct I2C12 pins drive strength
random: use rejection sampling for uniform bounded random integers
x86/fpu/xstate: Fix XSTATE_WARN_ON() to emit relevant diagnostics
arm64: dts: qcom: sdm850-lenovo-yoga-c630: correct I2C12 pins drive strength
cxl/region: Fix missing probe failure
EDAC/mc_sysfs: Increase legacy channel support to 12
selftests: Use optional USERCFLAGS and USERLDFLAGS
x86/MCE/AMD: Clear DFR errors found in THR handler
random: add helpers for random numbers with given floor or range
PM/devfreq: governor: Add a private governor_data for governor
cpufreq: Init completion before kobject_init_and_add()
ext2: unbugger ext2_empty_dir()
media: s5p-mfc: Fix to handle reference queue during finishing
media: s5p-mfc: Clear workbit to handle error condition
media: s5p-mfc: Fix in register read and write for H264
bpf: Resolve fext program type when checking map compatibility
ALSA: patch_realtek: Fix Dell Inspiron Plus 16
ALSA: hda/realtek: Apply dual codec fixup for Dell Latitude laptops
platform/x86: thinkpad_acpi: Fix max_brightness of thinklight
platform/x86: ideapad-laptop: Revert "check for touchpad support in _CFG"
platform/x86: ideapad-laptop: Add new _CFG bit numbers for future use
platform/x86: ideapad-laptop: support for more special keys in WMI
ACPI: video: Simplify __acpi_video_get_backlight_type()
ACPI: video: Prefer native over vendor
platform/x86: ideapad-laptop: Refactor ideapad_sync_touchpad_state()
platform/x86: ideapad-laptop: Do not send KEY_TOUCHPAD* events on probe / resume
platform/x86: ideapad-laptop: Only toggle ps2 aux port on/off on select models
platform/x86: ideapad-laptop: Send KEY_TOUCHPAD_TOGGLE on some models
platform/x86: ideapad-laptop: Stop writing VPCCMD_W_TOUCHPAD at probe time
platform/x86: intel-uncore-freq: add Emerald Rapids support
ALSA: hda/cirrus: Add extra 10 ms delay to allow PLL settle and lock.
platform/x86: x86-android-tablets: Add Medion Lifetab S10346 data
platform/x86: x86-android-tablets: Add Lenovo Yoga Tab 3 (YT3-X90F) charger + fuel-gauge data
platform/x86: x86-android-tablets: Add Advantech MICA-071 extra button
HID: Ignore HP Envy x360 eu0009nv stylus battery
ALSA: usb-audio: Add new quirk FIXED_RATE for JBL Quantum810 Wireless
fs: dlm: fix sock release if listen fails
fs: dlm: retry accept() until -EAGAIN or error returns
mptcp: netlink: fix some error return code
mptcp: remove MPTCP 'ifdef' in TCP SYN cookies
mptcp: dedicated request sock for subflow in v6
mptcp: use proper req destructor for IPv6
dm cache: Fix ABBA deadlock between shrink_slab and dm_cache_metadata_abort
dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata
dm thin: Use last transaction's pmd->root when commit failed
dm thin: resume even if in FAIL mode
dm thin: Fix UAF in run_timer_softirq()
dm integrity: Fix UAF in dm_integrity_dtr()
dm clone: Fix UAF in clone_dtr()
dm cache: Fix UAF in destroy()
dm cache: set needs_check flag after aborting metadata
ata: ahci: fix enum constants for gcc-13
PCI/DOE: Fix maximum data object length miscalculation
tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
perf/core: Call LSM hook after copying perf_event_attr
xtensa: add __umulsidi3 helper
of/kexec: Fix reading 32-bit "linux,initrd-{start,end}" values
ima: Fix hash dependency to correct algorithm
KVM: VMX: Resume guest immediately when injecting #GP on ECREATE
KVM: nVMX: Inject #GP, not #UD, if "generic" VMXON CR0/CR4 check fails
KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
KVM: nVMX: Properly expose ENABLE_USR_WAIT_PAUSE control to L1
x86/microcode/intel: Do not retry microcode reloading on the APs
ftrace/x86: Add back ftrace_expected for ftrace bug reports
x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK
x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK
tracing: Fix race where eprobes can be called before the event
powerpc/ftrace: fix syscall tracing on PPC64_ELF_ABI_V1
tracing: Fix complicated dependency of CONFIG_TRACER_MAX_TRACE
tracing/hist: Fix wrong return value in parse_action_params()
tracing/probes: Handle system names with hyphens
tracing: Fix issue of missing one synthetic field
tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
staging: media: tegra-video: fix chan->mipi value on error
staging: media: tegra-video: fix device_node use after free
arm64: dts: mediatek: mt8195-demo: fix the memory size of node secmon
ARM: 9256/1: NWFPE: avoid compiler-generated __aeabi_uldivmod
media: dvb-core: Fix double free in dvb_register_device()
media: dvb-core: Fix UAF due to refcount races at releasing
cifs: fix confusing debug message
cifs: fix missing display of three mount options
cifs: set correct tcon status after initial tree connect
cifs: set correct ipc status after initial tree connect
cifs: set correct status of tcon ipc when reconnecting
ravb: Fix "failed to switch device to config mode" message during unbind
rtc: ds1347: fix value written to century register
drm/amdgpu: fix mmhub register base coding error
block: mq-deadline: Fix dd_finish_request() for zoned devices
block: mq-deadline: Do not break sequential write streams to zoned HDDs
md/bitmap: Fix bitmap chunk size overflow issues
efi: Add iMac Pro 2017 to uefi skip cert quirk
wifi: wilc1000: sdio: fix module autoloading
ASoC: jz4740-i2s: Handle independent FIFO flush bits
ipu3-imgu: Fix NULL pointer dereference in imgu_subdev_set_selection()
ipmi: fix long wait in unload when IPMI disconnect
mtd: spi-nor: Check for zero erase size in spi_nor_find_best_erase_type()
ima: Fix a potential NULL pointer access in ima_restore_measurement_list
ipmi: fix use after free in _ipmi_destroy_user()
mtd: spi-nor: gigadevice: gd25q256: replace gd25q256_default_init with gd25q256_post_bfpt
ima: Fix memory leak in __ima_inode_hash()
um: virt-pci: Avoid GCC non-NULL warning
crypto: ccree,hisilicon - Fix dependencies to correct algorithm
PCI: Fix pci_device_is_present() for VFs by checking PF
PCI/sysfs: Fix double free in error path
RISC-V: kexec: Fix memory leak of fdt buffer
riscv: Fixup compile error with !MMU
RISC-V: kexec: Fix memory leak of elf header buffer
riscv: stacktrace: Fixup ftrace_graph_ret_addr retp argument
riscv: mm: notify remote harts about mmu cache updates
crypto: n2 - add missing hash statesize
crypto: ccp - Add support for TEE for PCI ID 0x14CA
driver core: Fix bus_type.match() error handling in __driver_attach()
bus: mhi: host: Fix race between channel preparation and M0 event
phy: qcom-qmp-combo: fix sdm845 reset
phy: qcom-qmp-combo: fix sc8180x reset
iommu/amd: Fix ivrs_acpihid cmdline parsing code
iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid options
test_kprobes: Fix implicit declaration error of test_kprobes
hugetlb: really allocate vma lock for all sharable vmas
remoteproc: imx_dsp_rproc: Add mutex protection for workqueue
remoteproc: core: Do pm_relax when in RPROC_OFFLINE state
remoteproc: imx_rproc: Correct i.MX93 DRAM mapping
parisc: led: Fix potential null-ptr-deref in start_task()
parisc: Drop locking in pdc console code
parisc: Fix locking in pdc_iodc_print() firmware call
parisc: Add missing FORCE prerequisites in Makefile
parisc: Drop duplicate kgdb_pdc console
parisc: Drop PMD_SHIFT from calculation in pgtable.h
device_cgroup: Roll back to original exceptions after copy failure
drm/connector: send hotplug uevent on connector cleanup
drm/vmwgfx: Validate the box size for the snooped cursor
drm/mgag200: Fix PLL setup for G200_SE_A rev >=4
drm/etnaviv: move idle mapping reaping into separate function
drm/i915/dsi: fix VBT send packet port selection for dual link DSI
drm/ingenic: Fix missing platform_driver_unregister() call in ingenic_drm_init()
drm/etnaviv: reap idle mapping if it doesn't match the softpin address
ext4: silence the warning when evicting inode with dioread_nolock
ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop
ext4: remove trailing newline from ext4_msg() message
ext4: correct inconsistent error msg in nojournal mode
fs: ext4: initialize fsdata in pagecache_write()
ext4: fix use-after-free in ext4_orphan_cleanup
ext4: fix undefined behavior in bit shift for ext4_check_flag_values
ext4: add EXT4_IGET_BAD flag to prevent unexpected bad inode
ext4: add helper to check quota inums
ext4: fix bug_on in __es_tree_search caused by bad quota inode
ext4: fix reserved cluster accounting in __es_remove_extent()
ext4: journal_path mount options should follow links
ext4: check and assert if marking an no_delete evicting inode dirty
ext4: fix bug_on in __es_tree_search caused by bad boot loader inode
ext4: don't allow journal inode to have encrypt flag
ext4: disable fast-commit of encrypted dir operations
ext4: fix leaking uninitialized memory in fast-commit journal
ext4: don't set up encryption key during jbd2 transaction
ext4: add missing validation of fast-commit record lengths
ext4: fix unaligned memory access in ext4_fc_reserve_space()
ext4: fix off-by-one errors in fast-commit block filling
ext4: fix uninititialized value in 'ext4_evict_inode'
ext4: init quota for 'old.inode' in 'ext4_rename'
ext4: don't fail GETFSUUID when the caller provides a long buffer
ext4: fix delayed allocation bug in ext4_clu_mapped for bigalloc + inline
ext4: fix corruption when online resizing a 1K bigalloc fs
ext4: fix error code return to user-space in ext4_get_branch()
ext4: fix bad checksum after online resize
ext4: dont return EINVAL from GETFSUUID when reporting UUID length
ext4: fix corrupt backup group descriptors after online resize
ext4: avoid BUG_ON when creating xattrs
ext4: fix deadlock due to mbcache entry corruption
ext4: fix kernel BUG in 'ext4_write_inline_data_end()'
ext4: fix inode leak in ext4_xattr_inode_create() on an error path
ext4: initialize quota before expanding inode in setproject ioctl
ext4: avoid unaccounted block allocation when expanding inode
ext4: allocate extended attribute value in vmalloc area
drm/i915/ttm: consider CCS for backup objects
drm/amd/display: Add DCN314 display SG Support
drm/amdgpu: handle polaris10/11 overlap asics (v2)
drm/amdgpu: make display pinning more flexible (v2)
drm/i915: improve the catch-all evict to handle lock contention
drm/i915/migrate: Account for the reserved_space
drm/amd/pm: add missing SMU13.0.0 mm_dpm feature mapping
drm/amd/pm: add missing SMU13.0.7 mm_dpm feature mapping
drm/amd/pm: bump SMU13.0.0 driver_if header to version 0x34
drm/amd/pm: correct the fan speed retrieving in PWM for some SMU13 asics
Linux 6.1.4
Change-Id: I79c26bc73b1275fab4e9984d2a32a8b915bbfa1c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 1198d2316dc4265a97d0e8445a22c7a6d17580a4 upstream.
Currently, these options cause the following libkmod error:
libkmod: ERROR ../libkmod/libkmod-config.c:489 kcmdline_parse_result: \
Ignoring bad option on kernel command line while parsing module \
name: 'ivrs_xxxx[XX:XX'
Fix by introducing a new parameter format for these options and
throw a warning for the deprecated format.
Users are still allowed to omit the PCI Segment if zero.
Adding a Link: to the reason why we're modding the syntax parsing
in the driver and not in libkmod.
Fixes: ca3bf5d47c ("iommu/amd: Introduces ivrs_acpihid kernel parameter")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-modules/20200310082308.14318-2-lucas.demarchi@intel.com/
Reported-by: Kim Phillips <kim.phillips@amd.com>
Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Link: https://lore.kernel.org/r/20220919155638.391481-2-kim.phillips@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add some initial documentation for the Protected KVM (pKVM) feature on
arm64, describing the user ABI for creating protected VMs as well as
their limitations.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Change-Id: I152af404f24b9aba3cc9be6acd8e26afcfa4b0a5
Signed-off-by: Quentin Perret <qperret@google.com>
Should a guest desire to enroll into the MMIO guard, allow it to
do so with a command-line option.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Bug: 233587962
Change-Id: Ia9a77f693531740500739693c52b4959abacafd4
[willdeacon@: Add hypercall IDs]
Signed-off-by: Will Deacon <willdeacon@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Add a new amd pstate driver command line option to enable driver passive
working mode via MSR and shared memory interface to request desired
performance within abstract scale and the power management firmware
(SMU) convert the perf requests into actual hardware pstates.
Also the `disable` parameter can disable the pstate driver loading by
adding `amd_pstate=disable` to kernel command line.
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Tested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY0ZjFAAKCRCAXGG7T9hj
vjEsAP4rFMnqc6AXy4Mpvv8cxBtEuQZbwEqgBrMJUvK1jZQrBQD/dOJK2GBCVcfD
2yaVlefFiJGTw5WUlbPeohUlTZ8pJwg=
=xsHV
-----END PGP SIGNATURE-----
Merge tag 'for-linus-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- Some minor typo fixes
- A fix of the Xen pcifront driver for supporting the device model to
run in a Linux stub domain
- A cleanup of the pcifront driver
- A series to enable grant-based virtio with Xen on x86
- A cleanup of Xen PV guests to distinguish between safe and faulting
MSR accesses
- Two fixes of the Xen gntdev driver
- Two fixes of the new xen grant DMA driver
* tag 'for-linus-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Kconfig: Fix spelling mistake "Maxmium" -> "Maximum"
xen/pv: support selecting safe/unsafe msr accesses
xen/pv: refactor msr access functions to support safe and unsafe accesses
xen/pv: fix vendor checks for pmu emulation
xen/pv: add fault recovery control to pmu msr accesses
xen/virtio: enable grant based virtio on x86
xen/virtio: use dom0 as default backend for CONFIG_XEN_VIRTIO_FORCE_GRANT
xen/virtio: restructure xen grant dma setup
xen/pcifront: move xenstore config scanning into sub-function
xen/gntdev: Accommodate VMA splitting
xen/gntdev: Prevent leaking grants
xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices
xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page()
xen/xenbus: Fix spelling mistake "hardward" -> "hardware"
xen-pcifront: Handle missed Connected state
Instead of always doing the safe variants for reading and writing MSRs
in Xen PV guests, make the behavior controllable via Kconfig option
and a boot parameter.
The default will be the current behavior, which is to always use the
safe variant.
Signed-off-by: Juergen Gross <jgross@suse.com>
linux-next for a couple of months without, to my knowledge, any negative
reports (or any positive ones, come to that).
- Also the Maple Tree from Liam R. Howlett. An overlapping range-based
tree for vmas. It it apparently slight more efficient in its own right,
but is mainly targeted at enabling work to reduce mmap_lock contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
(https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
This has yet to be addressed due to Liam's unfortunately timed
vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down to
the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
=xfWx
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging
activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
Including:
- Removal of the bus_set_iommu() interface which became
unnecesary because of IOMMU per-device probing
- Make the dma-iommu.h header private
- Intel VT-d changes from Lu Baolu:
- Decouple PASID and PRI from SVA
- Add ESRTPS & ESIRTPS capability check
- Cleanups
- Apple DART support for the M1 Pro/MAX SOCs
- Support for AMD IOMMUv2 page-tables for the DMA-API layer. The
v2 page-tables are compatible with the x86 CPU page-tables.
Using them for DMA-API prepares support for hardware-assisted
IOMMU virtualization
- Support for MT6795 Helio X10 M4Us in the Mediatek IOMMU driver
- Some smaller fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmNEC5oACgkQK/BELZcB
GuNcOQ/6A5SXmcvDRLYZW1ENM5Z6xsZ1LabSZkjhYSpmbJyu8Uny/Z2aRWqxPMLJ
hJeHTsWSLhrTq1VfjFhELHB3kgT2DRr7H3LXXaMNC6qz690EcavX1wKX2AxH0m22
8YrktkyAmFQ3BG6rsQLdlMMasLph/x06ix/xO9opQZVFdj/fV0Jx7ekX1JK+U3hx
MI96i5W3G5PBVHBypAvjxSlmA4saj9Fhk7l3IZL7py9AOKz7NypuwWRs+86PMBiO
EzLt5aF4g8pmKChF/c9BsoIbjBYvTG/s3NbycIng0ACc2SOvf+EvtoVZQclWifbT
lwti9PLdsoVUnPOZHLYOTx4xSf/UyoLVzaLxJ52aoXnNYe2qaX5DANXhT2mWIY/Y
z1mzOkShmK7WF7a8arRyqJeLJ4SvDx8GrbvLiom3DAzmqVHzzFGadHtt5fvGYN4F
Jet/JIN3HjECQbamqtPBpWquBFhLmgusPksIiyMFscRvYdZqkaVkTkElcF3WqAMm
QkeecfoTQ9Vdtdz44ZVLRjKpS77yRZmHshp1r/rfSI+9Ok8uRI+xmmcyrAI6ElqH
DH14tLHPzw694rTHF+bTCd+pPMGOoFLi0xAfUXAeGWm1uzC1JIRrVu5JeQNOUOSD
5SQDXB7dPrhXngaws5Fx2u3amCO3688mslcGgM7q54kC+LyVo0E=
=h0sT
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- remove the bus_set_iommu() interface which became unnecesary because
of IOMMU per-device probing
- make the dma-iommu.h header private
- Intel VT-d changes from Lu Baolu:
- Decouple PASID and PRI from SVA
- Add ESRTPS & ESIRTPS capability check
- Cleanups
- Apple DART support for the M1 Pro/MAX SOCs
- support for AMD IOMMUv2 page-tables for the DMA-API layer.
The v2 page-tables are compatible with the x86 CPU page-tables. Using
them for DMA-API prepares support for hardware-assisted IOMMU
virtualization
- support for MT6795 Helio X10 M4Us in the Mediatek IOMMU driver
- some smaller fixes and cleanups
* tag 'iommu-updates-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (59 commits)
iommu/vt-d: Avoid unnecessary global DMA cache invalidation
iommu/vt-d: Avoid unnecessary global IRTE cache invalidation
iommu/vt-d: Rename cap_5lp_support to cap_fl5lp_support
iommu/vt-d: Remove pasid_set_eafe()
iommu/vt-d: Decouple PASID & PRI enabling from SVA
iommu/vt-d: Remove unnecessary SVA data accesses in page fault path
dt-bindings: iommu: arm,smmu-v3: Relax order of interrupt names
iommu: dart: Support t6000 variant
iommu/io-pgtable-dart: Add DART PTE support for t6000
iommu/io-pgtable: Add DART subpage protection support
iommu/io-pgtable: Move Apple DART support to its own file
iommu/mediatek: Add support for MT6795 Helio X10 M4Us
iommu/mediatek: Introduce new flag TF_PORT_TO_ADDR_MT8173
dt-bindings: mediatek: Add bindings for MT6795 M4U
iommu/iova: Fix module config properly
iommu/amd: Fix sparse warning
iommu/amd: Remove outdated comment
iommu/amd: Free domain ID after domain_flush_pages
iommu/amd: Free domain id in error path
iommu/virtio: Fix compile error with viommu_capable()
...
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
- Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
- Add support for syscall wrappers.
- Add support for KFENCE on 64-bit.
- Update 64-bit HV KVM to use the new guest state entry/exit accounting API.
- Support execute-only memory when using the Radix MMU (P9 or later).
- Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
- Updates to our linker script to move more data into read-only sections.
- Allow the VDSO to be randomised on 32-bit.
- Many other small features and fixes.
Thanks to: Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Christophe
Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas, Gaosheng Cui, Gustavo A. R. Silva,
Haren Myneni, Hari Bathini, Jilin Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof
Kozlowski, Laurent Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Rohan McLure,
Russell Currey, Sachin Sant, Segher Boessenkool, Shrikanth Hegde, Tyrel Datwyler, Wolfram
Sang, ye xingchen, Zheng Yongjun.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNCpBMTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDx3EACCf86iumFF3RyvENtDwoTRgH3H0z2E
/ZC4LKrtxgaPFJzKUT4F0kLK85Hw5GzMEKK42NIhAB0o5vFwmEzxOtnlHOyEufAm
EDIZDIfxV2J9Qx/cW2DSojPj/o9O6noXwhw9SBqMwiDWd8gXmNgOUEklAO7aR7Vq
Ne2N2FLMNthZydCoHR6dAEjfe2ceFXP5cALwzQO+ILDdZQ0UcF2Yq4yw/gEDoCrB
FH7mmE7UaQQHvYzo85VTZu7XfUys1P7kUcnhVurOg7/07ITnvnQR+itKZXC+bSft
1K7ULtjd2QiCgxZA/apFc3lO46kqHVFsB3onRQw12/Ku5vfGFfY0L0iK97OgM4s0
0u4r+J7A+MM5YBJVVjwZ6woYO5CWMHYKBZepxOpcvftPxj1LNkiHsryqKILGISEC
aIY/lI0hpeNU4QshDMXzSTgeb/VF9O5cGPncTPkOFbXxD4RpVyz8tSngsG1+D8lj
S6B2h3k4A14rnblLOxP22jcedBlTYQcRQS4vwr0a7+63QTjfSJ12xT3ucIAKU9f7
65rVSS/igbrfxqHDmrd60WWZBMXeK0Zy7YIG6iYPTxpP31eFpSp9wtDlV7V2+EH2
F2p+TJY8aTA8UW+2L5gigN3RsBeeEB8zxJkB14ivICM7+XzVu11PxPDqjDZYkfzC
ueKKvCcHhHAYqQ==
=TFBA
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
- Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
- Add support for syscall wrappers.
- Add support for KFENCE on 64-bit.
- Update 64-bit HV KVM to use the new guest state entry/exit accounting
API.
- Support execute-only memory when using the Radix MMU (P9 or later).
- Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
- Updates to our linker script to move more data into read-only
sections.
- Allow the VDSO to be randomised on 32-bit.
- Many other small features and fixes.
Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
Yongjun.
* tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
KVM: PPC: Book3S HV: Fix stack frame regs marker
powerpc: Don't add __powerpc_ prefix to syscall entry points
powerpc/64s/interrupt: Fix stack frame regs marker
powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
powerpc/pseries: Add firmware details to the hardware description
powerpc/powernv: Add opal details to the hardware description
powerpc: Add device-tree model to the hardware description
powerpc/64: Add logical PVR to the hardware description
powerpc: Add PVR & CPU name to hardware description
powerpc: Add hardware description string
powerpc/configs: Enable PPC_UV in powernv_defconfig
powerpc/configs: Update config files for removed/renamed symbols
powerpc/mm: Fix UBSAN warning reported on hugetlb
powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
powerpc: Drops STABS_DEBUG from linker scripts
powerpc/64s: Remove lost/old comment
powerpc/64s: Remove old STAB comment
powerpc: remove orphan systbl_chk.sh
...
This KUnit update for Linux 6.1-rc1 consists of several documentation
fixes, UML related cleanups, and a feature to enable/disable KUnit
tests. This update includes the following change to
- rename all_test_uml.config, use it for --alltests
Note: if anyone was using all_tests_uml.config, this change breaks them.
This change simplifies the usage and eliminates the need to type:
--kunitconfig=tools/testing/kunit/configs/all_tests_uml.config.
A simple workaround to create a symlink to the new name can solve the
problem for anyone using all_tests_uml.config.
all_tests_uml.config should work across ~all architectures.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmM97okACgkQCwJExA0N
Qxwe9BAA4rUh/CdaDpGxSoknamlSlXbcsxyHQK32OoqtIQPJfPcqxZwZhqGzpZAE
z7qqnfN7dGWACcJjaZYZtkjJ1tsJ7OADf6m81drM6lmHSDRldAhdXMK9qqJAA2pk
OMciu7TcoSezaMw3eKZZ8jeQ2OIEUirQQXFIuMkb9r+E65tyIGCVo9njY19is6bk
7TcrUNlzgTednzkJ7B5BY54vZjvET/kwwBhXUgygMoi0JkKIoksJcxF2y7x1UKUx
Kfu/3bqiBAAEiOdMLxnxqwNH/Uhny1IWV6w+p0gtEsVGS47QvRew7gtIuKgCZWpf
S/w7QL+zCaXGB3jM8KVk7p4L9qU6EIttqu7kwXSs5OZz8QVrw6oc1l0RDUI5NIWi
TLRO4QJHWssKNrSyldtk2J9wFEHOF48Hc6HYqHxJLFYwfutt5hzsKotN6+H60Uth
TVAE+oUWknQks1Imv7x3Avq5k6CyfH9dFqqH+IrM2hDbqkZN1hfHvm1RJgcXFaHf
f4UpAYwCgkb3Maa7mTBq2/x98UUhh051EQuHDUO6H0tzccxJUbwkeeGK+WH51aZu
d6RD4ZnH4BlB+EMbfXgkXdsGqcsQMneoLh5/V0y1L8aSyjJQ8FYlK5lQYwu9VBXQ
6g6AH4nlhvVplXgCDeGnnTa7HfTD87Vx9Dws+uO+J/QbwqTatI4=
=Q5W1
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
"Several documentation fixes, UML related cleanups, and a feature to
enable/disable KUnit tests
This includes the change to rename all_test_uml.config, and use it for
'--alltests'. Note: if anyone was using all_tests_uml.config, this
change breaks them.
This change simplifies the usage and eliminates the need to type:
--kunitconfig=tools/testing/kunit/configs/all_tests_uml.config
A simple workaround to create a symlink to the new name can solve the
problem for anyone using all_tests_uml.config.
all_tests_uml.config should work across ~all architectures"
* tag 'linux-kselftest-kunit-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: Kunit: Use full path to .kunitconfig
kunit: tool: rename all_test_uml.config, use it for --alltests
kunit: tool: remove UML specific options from all_tests_uml.config
lib: stackinit: update reference to kunit-tool
lib: overflow: update reference to kunit-tool
Documentation: KUnit: update links in the index page
Documentation: KUnit: add intro to the getting-started page
Documentation: KUnit: Reword start guide for selecting tests
Documentation: KUnit: add note about mrproper in start.rst
Documentation: KUnit: avoid repeating "kunit.py run" in start.rst
Documentation: KUnit: remove duplicated docs for kunit_tool
Documentation: Kunit: Add ref for other kinds of tests
Documentation: KUnit: Fix non-uml anchor
Documentation: Kunit: Fix inconsistent titles
Documentation: kunit: fix trivial typo
kunit: no longer call module_info(test, "Y") for kunit modules
kunit: add kunit.enable to enable/disable KUnit test
kunit: tool: make --raw_output=kunit (aka --raw_output) preserve leading spaces
- arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
vector granule register added to the user regs together with SVE perf
extensions documentation.
- SVE updates: add HWCAP for SVE EBF16, update the SVE ABI documentation
to match the actual kernel behaviour (zeroing the registers on syscall
rather than "zeroed or preserved" previously).
- More conversions to automatic system registers generation.
- vDSO: use self-synchronising virtual counter access in gettimeofday()
if the architecture supports it.
- arm64 stacktrace cleanups and improvements.
- arm64 atomics improvements: always inline assembly, remove LL/SC
trampolines.
- Improve the reporting of EL1 exceptions: rework BTI and FPAC exception
handling, better EL1 undefs reporting.
- Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
result.
- arm64 defconfig updates: build CoreSight as a module, enable options
necessary for docker, memory hotplug/hotremove, enable all PMUs
provided by Arm.
- arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
extensions).
- arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
unused function.
- kselftest updates for arm64: simple HWCAP validation, FP stress test
improvements, validation of ZA regs in signal handlers, include larger
SVE and SME vector lengths in signal tests, various cleanups.
- arm64 alternatives (code patching) improvements to robustness and
consistency: replace cpucap static branches with equivalent
alternatives, associate callback alternatives with a cpucap.
- Miscellaneous updates: optimise kprobe performance of patching
single-step slots, simplify uaccess_mask_ptr(), move MTE registers
initialisation to C, support huge vmalloc() mappings, run softirqs on
the per-CPU IRQ stack, compat (arm32) misalignment fixups for
multiword accesses.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmM9W4cACgkQa9axLQDI
XvEy3w/+LJ3KCFowWiz5gTAWikjv+UVssHjLMJixn47V7hsEFQ26Xnam/438rTMI
kE95u6DHUpw2SMIxKzFRO7oI5cQtP+cWGwTtOUnjVO+U1oN+HqDOIbO9DbylWDcU
eeeqMMmawMfTPuZrYklpOhXscsorbrKIvYBg7wHYOcwBYV3EPhWr89lwMvTVRuyJ
qpX628KlkGMaBcONNhv3nS3qZcAOs0oHQCAVS4C8czLDL+vtJlumXUS3xr1Mqm72
xtFe7sje8Djr2kZ8mzh0GbFiZEBoBD3F/l7ayq8gVRaVpToUt8sk36Stjs4LojF1
6imuAfji/5TItkScq5KhGqj6MIugwp/eUVbRN74OLNTYx7msF1ZADNFQ+Q0UuY0H
SYK13KvmOji0xjS8qAfhqrwNB79sk3fb+zF9LjETbdz4ZJCgg9gcFbSUTY0DvMfS
MXZk/jVeB07olA8xYbjh0BRt4UV9xU628FPQzK5k7e4Nzl4jSvgtJZCZanfuVtjy
/ZS1vbN8o7tQLBAlVnw+Exi/VedkKxkkMgm8tPKsMgERTFDx0Pc4Gs72hRpDnPWT
MRbeCCGleAf3JQ5vF0coBDNOCEVvweQgShHOyHTz0GyhWXLCFx3RJICo5I4EIpps
LLUk4JK0fO3LVrf1AEpu5ZP4+Sact0zfsH3gB7qyLPYFDmjDXD8=
=jl3Z
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
vector granule register added to the user regs together with SVE perf
extensions documentation.
- SVE updates: add HWCAP for SVE EBF16, update the SVE ABI
documentation to match the actual kernel behaviour (zeroing the
registers on syscall rather than "zeroed or preserved" previously).
- More conversions to automatic system registers generation.
- vDSO: use self-synchronising virtual counter access in gettimeofday()
if the architecture supports it.
- arm64 stacktrace cleanups and improvements.
- arm64 atomics improvements: always inline assembly, remove LL/SC
trampolines.
- Improve the reporting of EL1 exceptions: rework BTI and FPAC
exception handling, better EL1 undefs reporting.
- Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
result.
- arm64 defconfig updates: build CoreSight as a module, enable options
necessary for docker, memory hotplug/hotremove, enable all PMUs
provided by Arm.
- arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
extensions).
- arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
unused function.
- kselftest updates for arm64: simple HWCAP validation, FP stress test
improvements, validation of ZA regs in signal handlers, include
larger SVE and SME vector lengths in signal tests, various cleanups.
- arm64 alternatives (code patching) improvements to robustness and
consistency: replace cpucap static branches with equivalent
alternatives, associate callback alternatives with a cpucap.
- Miscellaneous updates: optimise kprobe performance of patching
single-step slots, simplify uaccess_mask_ptr(), move MTE registers
initialisation to C, support huge vmalloc() mappings, run softirqs on
the per-CPU IRQ stack, compat (arm32) misalignment fixups for
multiword accesses.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (126 commits)
arm64: alternatives: Use vdso/bits.h instead of linux/bits.h
arm64/kprobe: Optimize the performance of patching single-step slot
arm64: defconfig: Add Coresight as module
kselftest/arm64: Handle EINTR while reading data from children
kselftest/arm64: Flag fp-stress as exiting when we begin finishing up
kselftest/arm64: Don't repeat termination handler for fp-stress
ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
arm64/mm: fold check for KFENCE into can_set_direct_map()
arm64: ftrace: fix module PLTs with mcount
arm64: module: Remove unused plt_entry_is_initialized()
arm64: module: Make plt_equals_entry() static
arm64: fix the build with binutils 2.27
kselftest/arm64: Don't enable v8.5 for MTE selftest builds
arm64: uaccess: simplify uaccess_mask_ptr()
arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
kselftest/arm64: Fix typo in hwcap check
arm64: mte: move register initialization to C
arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
arm64/sve: Add Perf extensions documentation
...
Core
----
- Introduce and use a single page frag cache for allocating small skb
heads, clawing back the 10-20% performance regression in UDP flood
test from previous fixes.
- Run packets which already went thru HW coalescing thru SW GRO.
This significantly improves TCP segment coalescing and simplifies
deployments as different workloads benefit from HW or SW GRO.
- Shrink the size of the base zero-copy send structure.
- Move TCP init under a new slow / sleepable version of DO_ONCE().
BPF
---
- Add BPF-specific, any-context-safe memory allocator.
- Add helpers/kfuncs for PKCS#7 signature verification from BPF
programs.
- Define a new map type and related helpers for user space -> kernel
communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
- Allow targeting BPF iterators to loop through resources of one
task/thread.
- Add ability to call selected destructive functions.
Expose crash_kexec() to allow BPF to trigger a kernel dump.
Use CAP_SYS_BOOT check on the loading process to judge permissions.
- Enable BPF to collect custom hierarchical cgroup stats efficiently
by integrating with the rstat framework.
- Support struct arguments for trampoline based programs.
Only structs with size <= 16B and x86 are supported.
- Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
sockets (instead of just TCP and UDP sockets).
- Add a helper for accessing CLOCK_TAI for time sensitive network
related programs.
- Support accessing network tunnel metadata's flags.
- Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
- Add support for writing to Netfilter's nf_conn:mark.
Protocols
---------
- WiFi: more Extremely High Throughput (EHT) and Multi-Link
Operation (MLO) work (802.11be, WiFi 7).
- vsock: improve support for SO_RCVLOWAT.
- SMC: support SO_REUSEPORT.
- Netlink: define and document how to use netlink in a "modern" way.
Support reporting missing attributes via extended ACK.
- IPSec: support collect metadata mode for xfrm interfaces.
- TCPv6: send consistent autoflowlabel in SYN_RECV state
and RST packets.
- TCP: introduce optional per-netns connection hash table to allow
better isolation between namespaces (opt-in, at the cost of memory
and cache pressure).
- MPTCP: support TCP_FASTOPEN_CONNECT.
- Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
- Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
- Open vSwitch:
- Allow specifying ifindex of new interfaces.
- Allow conntrack and metering in non-initial user namespace.
- TLS: support the Korean ARIA-GCM crypto algorithm.
- Remove DECnet support.
Driver API
----------
- Allow selecting the conduit interface used by each port
in DSA switches, at runtime.
- Ethernet Power Sourcing Equipment and Power Device support.
- Add tc-taprio support for queueMaxSDU parameter, i.e. setting
per traffic class max frame size for time-based packet schedules.
- Support PHY rate matching - adapting between differing host-side
and link-side speeds.
- Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
- Validate OF (device tree) nodes for DSA shared ports; make
phylink-related properties mandatory on DSA and CPU ports.
Enforcing more uniformity should allow transitioning to phylink.
- Require that flash component name used during update matches one
of the components for which version is reported by info_get().
- Remove "weight" argument from driver-facing NAPI API as much
as possible. It's one of those magic knobs which seemed like
a good idea at the time but is too indirect to use in practice.
- Support offload of TLS connections with 256 bit keys.
New hardware / drivers
----------------------
- Ethernet:
- Microchip KSZ9896 6-port Gigabit Ethernet Switch
- Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
- Analog Devices ADIN1110 and ADIN2111 industrial single pair
Ethernet (10BASE-T1L) MAC+PHY.
- Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
- Ethernet SFPs / modules:
- RollBall / Hilink / Turris 10G copper SFPs
- HALNy GPON module
- WiFi:
- CYW43439 SDIO chipset (brcmfmac)
- CYW89459 PCIe chipset (brcmfmac)
- BCM4378 on Apple platforms (brcmfmac)
Drivers
-------
- CAN:
- gs_usb: HW timestamp support
- Ethernet PHYs:
- lan8814: cable diagnostics
- Ethernet NICs:
- Intel (100G):
- implement control of FCS/CRC stripping
- port splitting via devlink
- L2TPv3 filtering offload
- nVidia/Mellanox:
- tunnel offload for sub-functions
- MACSec offload, w/ Extended packet number and replay
window offload
- significantly restructure, and optimize the AF_XDP support,
align the behavior with other vendors
- Huawei:
- configuring DSCP map for traffic class selection
- querying standard FEC statistics
- querying SerDes lane number via ethtool
- Marvell/Cavium:
- egress priority flow control
- MACSec offload
- AMD/SolarFlare:
- PTP over IPv6 and raw Ethernet
- small / embedded:
- ax88772: convert to phylink (to support SFP cages)
- altera: tse: convert to phylink
- ftgmac100: support fixed link
- enetc: standard Ethtool counters
- macb: ZynqMP SGMII dynamic configuration support
- tsnep: support multi-queue and use page pool
- lan743x: Rx IP & TCP checksum offload
- igc: add xdp frags support to ndo_xdp_xmit
- Ethernet high-speed switches:
- Marvell (prestera):
- support SPAN port features (traffic mirroring)
- nexthop object offloading
- Microchip (sparx5):
- multicast forwarding offload
- QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- support RGMII cmode
- NXP (felix):
- standardized ethtool counters
- Microchip (lan966x):
- QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
- traffic policing and mirroring
- link aggregation / bonding offload
- QUSGMII PHY mode support
- Qualcomm 802.11ax WiFi (ath11k):
- cold boot calibration support on WCN6750
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- Wake-on-WLAN support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
- MediaTek WiFi (mt76):
- WiFi-to-Ethernet bridging offload for MT7986 chips
- RealTek WiFi (rtw89):
- P2P support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmM7vtkACgkQMUZtbf5S
Irvotg//dmh53rC+UMKO3OgOqPlSMnaqzbUdDEfN6mj4Mpox7Csb8zERVURHhBHY
fvlXWsDgxmvgTebI5fvNC5+f1iW5xcqgJV2TWnNmDOKWwvQwb6qQfgixVmunvkpe
IIukMXYt0dAf9bXeeEfbNXcCb85cPwB76stX0tMV6BX7osp3T0TL1fvFk0NJkL0j
TeydLad/yAQtPb4TbeWYjNDoxPVDf0cVpUrevLGmWE88UMYmgTqPze+h1W5Wri52
bzjdLklY/4cgcIZClHQ6F9CeRWqEBxvujA5Hj/cwOcn/ptVVJWUGi7sQo3sYkoSs
HFu+F8XsTec14kGNC0Ab40eVdqs5l/w8+E+4jvgXeKGOtVns8DwoiUIzqXpyty89
Ib04mffrwWNjFtHvo/kIsNwP05X2PGE9HUHfwsTUfisl/ASvMmQp7D7vUoqQC/4B
AMVzT5qpjkmfBHYQQGuw8FxJhMeAOjC6aAo6censhXJyiUhIfleQsN0syHdaNb8q
9RZlhAgQoVb6ZgvBV8r8unQh/WtNZ3AopwifwVJld2unsE/UNfQy2KyqOWBES/zf
LP9sfuX0JnmHn8s1BQEUMPU1jF9ZVZCft7nufJDL6JhlAL+bwZeEN4yCiAHOPZqE
ymSLHI9s8yWZoNpuMWKrI9kFexVnQFKmA3+quAJUcYHNMSsLkL8=
=Gsio
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Introduce and use a single page frag cache for allocating small skb
heads, clawing back the 10-20% performance regression in UDP flood
test from previous fixes.
- Run packets which already went thru HW coalescing thru SW GRO. This
significantly improves TCP segment coalescing and simplifies
deployments as different workloads benefit from HW or SW GRO.
- Shrink the size of the base zero-copy send structure.
- Move TCP init under a new slow / sleepable version of DO_ONCE().
BPF:
- Add BPF-specific, any-context-safe memory allocator.
- Add helpers/kfuncs for PKCS#7 signature verification from BPF
programs.
- Define a new map type and related helpers for user space -> kernel
communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
- Allow targeting BPF iterators to loop through resources of one
task/thread.
- Add ability to call selected destructive functions. Expose
crash_kexec() to allow BPF to trigger a kernel dump. Use
CAP_SYS_BOOT check on the loading process to judge permissions.
- Enable BPF to collect custom hierarchical cgroup stats efficiently
by integrating with the rstat framework.
- Support struct arguments for trampoline based programs. Only
structs with size <= 16B and x86 are supported.
- Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
sockets (instead of just TCP and UDP sockets).
- Add a helper for accessing CLOCK_TAI for time sensitive network
related programs.
- Support accessing network tunnel metadata's flags.
- Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
- Add support for writing to Netfilter's nf_conn:mark.
Protocols:
- WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
(MLO) work (802.11be, WiFi 7).
- vsock: improve support for SO_RCVLOWAT.
- SMC: support SO_REUSEPORT.
- Netlink: define and document how to use netlink in a "modern" way.
Support reporting missing attributes via extended ACK.
- IPSec: support collect metadata mode for xfrm interfaces.
- TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
packets.
- TCP: introduce optional per-netns connection hash table to allow
better isolation between namespaces (opt-in, at the cost of memory
and cache pressure).
- MPTCP: support TCP_FASTOPEN_CONNECT.
- Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
- Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
- Open vSwitch:
- Allow specifying ifindex of new interfaces.
- Allow conntrack and metering in non-initial user namespace.
- TLS: support the Korean ARIA-GCM crypto algorithm.
- Remove DECnet support.
Driver API:
- Allow selecting the conduit interface used by each port in DSA
switches, at runtime.
- Ethernet Power Sourcing Equipment and Power Device support.
- Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
traffic class max frame size for time-based packet schedules.
- Support PHY rate matching - adapting between differing host-side
and link-side speeds.
- Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
- Validate OF (device tree) nodes for DSA shared ports; make
phylink-related properties mandatory on DSA and CPU ports.
Enforcing more uniformity should allow transitioning to phylink.
- Require that flash component name used during update matches one of
the components for which version is reported by info_get().
- Remove "weight" argument from driver-facing NAPI API as much as
possible. It's one of those magic knobs which seemed like a good
idea at the time but is too indirect to use in practice.
- Support offload of TLS connections with 256 bit keys.
New hardware / drivers:
- Ethernet:
- Microchip KSZ9896 6-port Gigabit Ethernet Switch
- Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
- Analog Devices ADIN1110 and ADIN2111 industrial single pair
Ethernet (10BASE-T1L) MAC+PHY.
- Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
- Ethernet SFPs / modules:
- RollBall / Hilink / Turris 10G copper SFPs
- HALNy GPON module
- WiFi:
- CYW43439 SDIO chipset (brcmfmac)
- CYW89459 PCIe chipset (brcmfmac)
- BCM4378 on Apple platforms (brcmfmac)
Drivers:
- CAN:
- gs_usb: HW timestamp support
- Ethernet PHYs:
- lan8814: cable diagnostics
- Ethernet NICs:
- Intel (100G):
- implement control of FCS/CRC stripping
- port splitting via devlink
- L2TPv3 filtering offload
- nVidia/Mellanox:
- tunnel offload for sub-functions
- MACSec offload, w/ Extended packet number and replay window
offload
- significantly restructure, and optimize the AF_XDP support,
align the behavior with other vendors
- Huawei:
- configuring DSCP map for traffic class selection
- querying standard FEC statistics
- querying SerDes lane number via ethtool
- Marvell/Cavium:
- egress priority flow control
- MACSec offload
- AMD/SolarFlare:
- PTP over IPv6 and raw Ethernet
- small / embedded:
- ax88772: convert to phylink (to support SFP cages)
- altera: tse: convert to phylink
- ftgmac100: support fixed link
- enetc: standard Ethtool counters
- macb: ZynqMP SGMII dynamic configuration support
- tsnep: support multi-queue and use page pool
- lan743x: Rx IP & TCP checksum offload
- igc: add xdp frags support to ndo_xdp_xmit
- Ethernet high-speed switches:
- Marvell (prestera):
- support SPAN port features (traffic mirroring)
- nexthop object offloading
- Microchip (sparx5):
- multicast forwarding offload
- QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- support RGMII cmode
- NXP (felix):
- standardized ethtool counters
- Microchip (lan966x):
- QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
- traffic policing and mirroring
- link aggregation / bonding offload
- QUSGMII PHY mode support
- Qualcomm 802.11ax WiFi (ath11k):
- cold boot calibration support on WCN6750
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- Wake-on-WLAN support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
- MediaTek WiFi (mt76):
- WiFi-to-Ethernet bridging offload for MT7986 chips
- RealTek WiFi (rtw89):
- P2P support"
* tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits)
eth: pse: add missing static inlines
once: rename _SLOW to _SLEEPABLE
net: pse-pd: add regulator based PSE driver
dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller
ethtool: add interface to interact with Ethernet Power Equipment
net: mdiobus: search for PSE nodes by parsing PHY nodes.
net: mdiobus: fwnode_mdiobus_register_phy() rework error handling
net: add framework to support Ethernet PSE and PDs devices
dt-bindings: net: phy: add PoDL PSE property
net: marvell: prestera: Propagate nh state from hw to kernel
net: marvell: prestera: Add neighbour cache accounting
net: marvell: prestera: add stub handler neighbour events
net: marvell: prestera: Add heplers to interact with fib_notifier_info
net: marvell: prestera: Add length macros for prestera_ip_addr
net: marvell: prestera: add delayed wq and flush wq on deinit
net: marvell: prestera: Add strict cleanup of fib arbiter
net: marvell: prestera: Add cleanup of allocated fib_nodes
net: marvell: prestera: Add router nexthops ABI
eth: octeon: fix build after netif_napi_add() changes
net/mlx5: E-Switch, Return EBUSY if can't get mode lock
...
The swapaccounting= commandline option already does very little today. To
close a trivial containment failure case, the swap ownership tracking part
of the swap controller has recently become mandatory (see commit
2d1c498072 ("mm: memcontrol: make swap tracking an integral part of
memory control") for details), which makes up the majority of the work
during swapout, swapin, and the swap slot map.
The only thing left under this flag is the page_counter operations and the
visibility of the swap control files in the first place, which are rather
meager savings. There also aren't many scenarios, if any, where
controlling the memory of a cgroup while allowing it unlimited access to a
global swap space is a workable resource isolation strategy.
On the other hand, there have been several bugs and confusion around the
many possible swap controller states (cgroup1 vs cgroup2 behavior, memory
accounting without swap accounting, memcg runtime disabled).
This puts the maintenance overhead of retaining the toggle above its
practical benefits. Deprecate it.
Link: https://lkml.kernel.org/r/20220926135704.400818-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch adds the kunit.enable module parameter that will need to be
set to true in addition to KUNIT being enabled for KUnit tests to run.
The default value is true giving backwards compatibility. However, for
the production+testing use case the new config option
KUNIT_DEFAULT_ENABLED can be set to N requiring the tester to opt-in
by passing kunit.enable=1 to the kernel.
Signed-off-by: Joe Fradley <joefradley@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
As commit 559089e0a9 ("vmalloc: replace VM_NO_HUGE_VMAP with
VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc
is an opt-in strategy, so it is saftly to support huge vmalloc
mappings on arm64, for now, it is used in kvmalloc() and
alloc_large_system_hash().
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220911044423.139229-1-wangkefeng.wang@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In commit 2f1ee0913c ("Revert "mm: use early_pfn_to_nid in
page_ext_init""), we call page_ext_init() after page_alloc_init_late() to
avoid some panic problem. It seems that we cannot track early page
allocations in current kernel even if page structure has been initialized
early.
This patch introduces a new boot parameter 'early_page_ext' to resolve
this problem. If we pass it to the kernel, page_ext_init() will be moved
up and the feature 'deferred initialization of struct pages' will be
disabled to initialize the page allocator early and prevent the panic
problem above. It can help us to catch early page allocations. This is
useful especially when we find that the free memory value is not the same
right after different kernel booting.
[akpm@linux-foundation.org: fix section issue by removing __meminitdata]
Link: https://lkml.kernel.org/r/20220825102714.669-1-lizhe.67@bytedance.com
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In our environment, it was found that the mitigation BHB has a great
impact on the benchmark performance. For example, in the lmbench test,
the "process fork && exit" test performance drops by 20%.
So it is necessary to have the ability to turn off the mitigation
individually through cmdline, thus avoiding having to compile the
kernel by adjusting the config.
Signed-off-by: Liu Song <liusong@linux.alibaba.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1661514050-22263-1-git-send-email-liusong@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not provide stolen
time accounting unless CONFIG_PARAVIRT_TIME_ACCOUNTING is enabled.
Implement this using the VPA accumulated wait counters.
Note this will not work on current KVM hosts because KVM does not
implement the VPA dispatch counters (yet). It could be implemented
with the dispatch trace log as it is for VIRT_CPU_ACCOUNTING_NATIVE,
but that is not necessary for the more limited accounting provided
by PARAVIRT_TIME_ACCOUNTING, and it is more expensive, complex, and
has downsides like potential log wrap.
From Shrikanth:
[...] it was tested on Power10 [PowerVM] Shared LPAR. system has two
LPAR. we will call first one LPAR1 and second one as LPAR2. Test was
carried out in SMT=1. Similar observation was seen in SMT=8 as well.
LPAR config header from each LPAR is below. LPAR1 is twice as big as
LPAR2. Since Both are sharing the same underlying hardware, work
stealing will happen when both the LPAR's are contending for the same
resource.
LPAR1:
type=Shared mode=Uncapped smt=Off lcpu=40 cpus=40 ent=20.00
LPAR2:
type=Shared mode=Uncapped smt=Off lcpu=20 cpus=40 ent=10.00
mpstat was used to check for the utilization. stress-ng has been used
as the workload. Few cases are tested. when the both LPAR are idle
there is no steal time. when LPAR1 starts running at 100% which
consumes all of the physical resource, steal time starts to get
accounted. With LPAR1 running at 100% and LPAR2 starts running, steal
time starts increasing. This is as expected. When the LPAR2 Load is
increased further, steal time increases further.
Case 1: 0% LPAR1; 0% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 99.95
Case 2: 100% LPAR1; 0% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
97.68 0.00 0.00 0.00 0.00 0.00 2.32 0.00 0.00 0.00
Case 3: 100% LPAR1; 50% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
86.34 0.00 0.10 0.00 0.00 0.03 13.54 0.00 0.00 0.00
Case 4: 100% LPAR1; 100% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
78.54 0.00 0.07 0.00 0.00 0.02 21.36 0.00 0.00 0.00
Case 5: 50% LPAR1; 100% LPAR2
%usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
49.37 0.00 0.00 0.00 0.00 0.00 1.17 0.00 0.00 49.47
Patch is accounting for the steal time and basic tests are holding
good.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
[mpe: Add SPDX tag to new paravirt_api_clock.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220902085316.2071519-3-npiggin@gmail.com
The APIC supports two modes, legacy APIC (or xAPIC), and Extended APIC
(or x2APIC). X2APIC mode is mostly compatible with legacy APIC, but
it disables the memory-mapped APIC interface in favor of one that uses
MSRs. The APIC mode is controlled by the EXT bit in the APIC MSR.
The MMIO/xAPIC interface has some problems, most notably the APIC LEAK
[1]. This bug allows an attacker to use the APIC MMIO interface to
extract data from the SGX enclave.
Introduce support for a new feature that will allow the BIOS to lock
the APIC in x2APIC mode. If the APIC is locked in x2APIC mode and the
kernel tries to disable the APIC or revert to legacy APIC mode a GP
fault will occur.
Introduce support for a new MSR (IA32_XAPIC_DISABLE_STATUS) and handle
the new locked mode when the LEGACY_XAPIC_DISABLED bit is set by
preventing the kernel from trying to disable the x2APIC.
On platforms with the IA32_XAPIC_DISABLE_STATUS MSR, if SGX or TDX are
enabled the LEGACY_XAPIC_DISABLED will be set by the BIOS. If
legacy APIC is required, then it SGX and TDX need to be disabled in the
BIOS.
[1]: https://aepicleak.com/aepicleak.pdf
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Link: https://lkml.kernel.org/r/20220816231943.1152579-1-daniel.sneddon@linux.intel.com
Steps on the way to 6.0-rc1
Resolves merge conflicts with:
init/Kconfig
kernel/module/Kconfig
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I92b7f12c0ba853ee8bfc81d072fb6eced54dbfa9