android_kernel_msm-6.1_noth.../include
Finn Thain 11f6aadd1f sched/core: Optimize in_task() and in_interrupt() a bit
[ Upstream commit 87c3a5893e865739ce78aa7192d36011022e0af7 ]

Except on x86, preempt_count is always accessed with READ_ONCE().
Repeated invocations in macros like irq_count() produce repeated loads.
These redundant instructions appear in various fast paths. In the one
shown below, for example, irq_count() is evaluated during kernel entry
if !tick_nohz_full_cpu(smp_processor_id()).

0001ed0a <irq_enter_rcu>:
   1ed0a:       4e56 0000       linkw %fp,#0
   1ed0e:       200f            movel %sp,%d0
   1ed10:       0280 ffff e000  andil #-8192,%d0
   1ed16:       2040            moveal %d0,%a0
   1ed18:       2028 0008       movel %a0@(8),%d0
   1ed1c:       0680 0001 0000  addil #65536,%d0
   1ed22:       2140 0008       movel %d0,%a0@(8)
   1ed26:       082a 0001 000f  btst #1,%a2@(15)
   1ed2c:       670c            beqs 1ed3a <irq_enter_rcu+0x30>
   1ed2e:       2028 0008       movel %a0@(8),%d0
   1ed32:       2028 0008       movel %a0@(8),%d0
   1ed36:       2028 0008       movel %a0@(8),%d0
   1ed3a:       4e5e            unlk %fp
   1ed3c:       4e75            rts

This patch doesn't prevent the pointless btst and beqs instructions
above, but it does eliminate 2 of the 3 pointless move instructions
here and elsewhere.

On x86, preempt_count is per-cpu data and the problem does not arise
presumably because the compiler is free to optimize more effectively.

This patch was tested on m68k and x86. I was expecting no changes
to object code for x86 and mostly that's what I saw. However, there
were a few places where code generation was perturbed for some reason.

The performance issue addressed here is minor on uniprocessor m68k. I
got a 0.01% improvement from this patch for a simple "find /sys -false"
benchmark. For architectures and workloads susceptible to cache line bounce
the improvement is expected to be larger. The only SMP architecture I have
is x86, and as x86 unaffected I have not done any further measurements.

Fixes: 15115830c8 ("preempt: Cleanup the macro maze a bit")
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/0a403120a682a525e6db2d81d1a3ffcc137c3742.1694756831.git.fthain@linux-m68k.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28 17:07:03 +00:00
..
acpi ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep() 2023-06-28 11:12:22 +02:00
asm-generic word-at-a-time: use the same return type for has_zero regardless of endianness 2023-08-11 12:08:11 +02:00
clocksource
crypto crypto: api - Use work queue in crypto_destroy_instance 2023-09-13 09:42:32 +02:00
drm gpu/drm: Eliminate DRM_SCHED_PRIORITY_UNSET 2023-11-08 14:11:00 +01:00
dt-bindings dt-bindings: clock: qcom,gcc-sc8280xp: Add missing GDSCs 2023-09-13 09:42:45 +02:00
keys
kunit kunit: add macro to allow conditionally exposing static symbols to tests 2023-11-20 11:52:08 +01:00
kvm KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption 2023-08-23 17:52:28 +02:00
linux sched/core: Optimize in_task() and in_interrupt() a bit 2023-11-28 17:07:03 +00:00
math-emu
media media: cec: core: add adap_unconfigured() callback 2023-09-13 09:42:54 +02:00
memory memory: renesas-rpc-if: Split-off private data from struct rpcif 2023-03-11 13:55:17 +01:00
misc
net net: annotate data-races around sk->sk_dst_pending_confirm 2023-11-28 17:06:56 +00:00
pcmcia
ras
rdma RDMA/cma: Always set static rate to 0 for RoCE 2023-06-21 16:00:59 +02:00
rv
scsi scsi: sd: Introduce manage_shutdown device flag 2023-11-02 09:35:29 +01:00
soc net: mscc: ocelot: don't keep PTP configuration of all ports in single structure 2023-07-19 16:22:01 +02:00
sound ASoC: SOF: Pass PCI SSID to machine driver 2023-11-28 17:06:58 +00:00
target scsi: target: Fix multiple LUN_RESET handling 2023-05-11 23:03:19 +09:00
trace neighbor: tracing: Move pin6 inside CONFIG_IPV6=y section 2023-10-25 12:03:07 +02:00
uapi vsock: read from socket's error queue 2023-11-28 17:06:56 +00:00
ufs scsi: ufs: exynos: Fix DMA alignment for PAGE_SIZE != 4096 2023-03-10 09:33:15 +01:00
vdso
video
xen ACPI: processor: Fix evaluating _PDC method when running as Xen dom0 2023-05-11 23:03:11 +09:00