android_kernel_msm-6.1_noth.../kernel
Don Zickus 58687acba5 lockup_detector: Combine nmi_watchdog and softlockup detector
The new nmi_watchdog (which uses the perf event subsystem) is very
similar in structure to the softlockup detector.  Using Ingo's
suggestion, I combined the two functionalities into one file:
kernel/watchdog.c.

Now both the nmi_watchdog (or hardlockup detector) and softlockup
detector sit on top of the perf event subsystem, which is run every
60 seconds or so to see if there are any lockups.

To detect hardlockups, cpus not responding to interrupts, I
implemented an hrtimer that runs 5 times for every perf event
overflow event.  If that stops counting on a cpu, then the cpu is
most likely in trouble.

To detect softlockups, tasks not yielding to the scheduler, I used the
previous kthread idea that now gets kicked every time the hrtimer fires.
If the kthread isn't being scheduled neither is anyone else and the
warning is printed to the console.

I tested this on x86_64 and both the softlockup and hardlockup paths
work.

V2:
- cleaned up the Kconfig and softlockup combination
- surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI
- seperated out the softlockup case from perf event subsystem
- re-arranged the enabling/disabling nmi watchdog from proc space
- added cpumasks for hardlockup failure cases
- removed fallback to soft events if no PMU exists for hard events

V3:
- comment cleanups
- drop support for older softlockup code
- per_cpu cleanups
- completely remove software clock base hardlockup detector
- use per_cpu masking on hard/soft lockup detection
- #ifdef cleanups
- rename config option NMI_WATCHDOG to LOCKUP_DETECTOR
- documentation additions

V4:
- documentation fixes
- convert per_cpu to __get_cpu_var
- powerpc compile fixes

V5:
- split apart warn flags for hard and soft lockups

TODO:
- figure out how to make an arch-agnostic clock2cycles call
  (if possible) to feed into perf events as a sample period

[fweisbec: merged conflict patch]

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-12 23:55:33 +02:00
..
gcov microblaze: Enable GCOV_PROFILE_ALL 2009-09-21 14:29:21 +02:00
irq Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-04-06 13:03:22 -07:00
power PM / Hibernate: user.c, fix SNAPSHOT_SET_SWAP_AREA handling 2010-04-10 22:28:56 +02:00
time include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
trace Merge branch 'master' into export-slabh 2010-04-05 11:37:28 +09:00
.gitignore
acct.c copy_signal() cleanup: kill taskstats_tgid_init() and acct_init_pacct() 2010-03-12 15:52:39 -08:00
async.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
audit.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
audit.h
audit_tree.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
audit_watch.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
auditfilter.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
auditsc.c audit: preface audit printk with audit 2010-04-05 13:19:45 -07:00
backtracetest.c
bounds.c kbuild: move bounds.h to include/generated 2009-12-12 13:08:14 +01:00
capability.c capabilities: Use RCU to protect task lookup in sys_capget 2009-12-10 09:42:48 +11:00
cgroup.c cgroup: Fix an RCU warning in alloc_css_id() 2010-05-04 09:25:00 -07:00
cgroup_freezer.c rcu: Fix RCU lockdep splat on freezer_fork path 2010-04-30 12:03:17 +02:00
compat.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
configs.c
cpu.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
cpuset.c cpuset: alloc nodemask_t on the heap rather than the stack 2010-03-24 16:31:21 -07:00
cred-internals.h
cred.c CRED: Fix a race in creds_are_invalid() in credentials debugging 2010-04-22 09:14:29 +10:00
delayacct.c headers: taskstats_kern.h trim 2009-09-18 09:48:52 -07:00
dma.c
early_res.c x86: Do not free zero sized per cpu areas 2010-03-29 18:55:40 +02:00
elfcore.c elf coredump: add extended numbering support 2010-03-06 11:26:46 -08:00
exec_domain.c
exit.c mm: avoid null-pointer deref in sync_mm_rss() 2010-04-07 08:38:02 -07:00
extable.c
fork.c mm: avoid null-pointer deref in sync_mm_rss() 2010-04-07 08:38:02 -07:00
freezer.c
futex.c futex: Handle futex value corruption gracefully 2010-02-03 15:13:22 +01:00
futex_compat.c futex: Protect pid lookup in compat code with RCU 2009-12-09 14:22:14 +01:00
groups.c
hrtimer.c hrtimers: Convert to raw_spinlocks 2009-12-14 23:55:34 +01:00
hung_task.c softlockup: Fix hung_task_check_count sysctl 2009-11-27 06:21:57 +01:00
hw_breakpoint.c Merge branch 'perf/core' into perf/urgent 2010-03-04 11:47:52 +01:00
itimer.c itimers: Fix racy writes to cpu_itimer fields 2009-11-18 16:32:12 +01:00
kallsyms.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks mutex: Better control mutex adaptive spinning config 2009-12-03 11:50:11 +01:00
Kconfig.preempt
kexec.c percpu: add __percpu sparse annotations to core kernel subsystems 2010-02-17 11:17:38 +09:00
kfifo.c kfifo: Don't use integer as NULL pointer 2010-02-16 15:11:08 -08:00
kgdb.c kgdb: Turn off tracing while in the debugger 2010-04-02 14:58:19 -05:00
kmod.c kmod: fix resource leak in call_usermodehelper_pipe() 2010-01-11 09:34:04 -08:00
kprobes.c kprobes: Calculate the index correctly when freeing the out-of-line execution slot 2010-03-11 14:06:16 +01:00
ksysfs.c Merge branch 'for-next' into for-linus 2010-03-08 16:55:37 +01:00
kthread.c cpuset: fix the problem that cpuset_mem_spread_node() returns an offline node 2010-03-24 16:31:21 -07:00
latencytop.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
lockdep.c Merge branch 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc 2010-04-05 09:39:11 -07:00
lockdep_internals.h
lockdep_proc.c seq_file: constify seq_operations 2009-09-23 07:39:29 -07:00
lockdep_states.h
Makefile lockup_detector: Combine nmi_watchdog and softlockup detector 2010-05-12 23:55:33 +02:00
module.c Fix up possibly racy module refcounting 2010-04-05 19:50:02 -07:00
mutex-debug.c headers: remove sched.h from interrupt.h 2009-10-11 11:20:58 -07:00
mutex-debug.h locking: Implement new raw_spinlock 2009-12-14 23:55:32 +01:00
mutex.c mutex: Better control mutex adaptive spinning config 2009-12-03 11:50:11 +01:00
mutex.h
nmi_watchdog.c nmi_watchdog: Tell the world we're active 2010-03-02 15:02:05 +01:00
notifier.c sched: Use lockdep-based checking on rcu_dereference() 2010-02-25 10:34:26 +01:00
ns_cgroup.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
nsproxy.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
padata.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
panic.c panic: fix panic_timeout accuracy when running on a hypervisor 2010-03-06 11:26:33 -08:00
params.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-03-12 16:04:50 -08:00
perf_event.c perf: Fix resource leak in failure path of perf_event_open() 2010-05-01 13:11:25 +02:00
pid.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-13 14:43:01 -08:00
pid_namespace.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
pm_qos_params.c pm_qos: clean up racy global "name" variable 2009-10-14 15:31:10 +02:00
posix-cpu-timers.c Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-26 15:10:38 -07:00
posix-timers.c posix-timers.c: Don't export local functions 2010-02-05 14:54:10 +01:00
printk.c printk: avoid warning when CONFIG_PRINTK is disabled 2010-03-06 11:26:33 -08:00
profile.c kernel/profile.c: Switch /proc/irq/prof_cpu_mask to seq_file 2009-09-20 20:15:40 +02:00
ptrace.c ptrace: Fix ptrace_regset() comments and diagnose errors specifically 2010-02-23 13:45:26 -08:00
range.c x86: Change range end to start+size 2010-02-10 17:47:17 -08:00
rcupdate.c rcu: create rcu_my_thread_group_empty() wrapper 2010-05-06 09:28:41 -07:00
rcutiny.c rcu: Eliminate unneeded function wrapping 2009-11-22 18:58:16 +01:00
rcutorture.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2010-03-03 07:34:18 -08:00
rcutree.c rcu: Fix accelerated grace periods for last non-dynticked CPU 2010-02-27 09:53:52 +01:00
rcutree.h rcu: Increase RCU CPU stall timeouts if PROVE_RCU 2010-03-11 13:38:01 +01:00
rcutree_plugin.h rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU 2010-02-28 09:17:42 +01:00
rcutree_trace.c rcu: Stop overflowing signed integers 2010-02-25 10:34:57 +01:00
relay.c splice: comparing unsigned int < 0 2010-03-06 11:26:32 -08:00
res_counter.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
resource.c resources: add interfaces that return conflict information 2010-03-23 13:33:50 -07:00
rtmutex-debug.c sched: Convert pi_lock to raw_spinlock 2009-12-14 23:55:33 +01:00
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c rtmutes: Convert rtmutex.lock to raw_spinlock 2009-12-14 23:55:33 +01:00
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c rcu: Fix RCU lockdep splat in set_task_cpu on fork path 2010-04-30 12:03:17 +02:00
sched_clock.c sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK 2009-12-15 09:04:36 +01:00
sched_cpupri.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
sched_cpupri.h sched: Convert cpupri lock to raw_spinlock 2009-12-14 23:55:33 +01:00
sched_debug.c sched: Fix an RCU warning in print_task() 2010-05-04 09:25:01 -07:00
sched_fair.c sched, rcu: Fix rcu_dereference() for RCU-lockdep 2010-03-01 09:29:58 +01:00
sched_features.h sched: Discard some old bits 2009-12-09 10:03:07 +01:00
sched_idletask.c sched: Remove the sched_class load_balance methods 2010-01-21 13:40:09 +01:00
sched_rt.c Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-13 14:46:18 -08:00
sched_stats.h
seccomp.c
semaphore.c
signal.c kernel core: use helpers for rlimits 2010-03-06 11:26:33 -08:00
slow-work-debugfs.c SLOW_WORK: Move slow_work's proc file to debugfs 2009-12-01 08:20:31 -08:00
slow-work.c slow-work: use get_ref wrapper instead of directly calling get_ref 2010-03-29 09:13:30 -07:00
slow-work.h SLOW_WORK: CONFIG_SLOW_WORK_PROC should be CONFIG_SLOW_WORK_DEBUG 2010-03-29 09:14:47 -07:00
smp.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
softirq.c hrtimer, softirq: Fix hrtimer->softirq trampoline 2010-02-03 18:17:40 +01:00
softlockup.c softlockup: Stop spurious softlockup messages due to overflow 2010-03-21 19:30:13 +01:00
spinlock.c locking: Cleanup the name space completely 2009-12-14 23:55:33 +01:00
srcu.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
stacktrace.c
stop_machine.c percpu: add __percpu sparse annotations to core kernel subsystems 2010-02-17 11:17:38 +09:00
sys.c kernel/sys.c: fix compat uname machine 2010-04-24 11:31:24 -07:00
sys_ni.c Add generic sys_ipc wrapper 2010-03-12 15:52:32 -08:00
sysctl.c lockup_detector: Combine nmi_watchdog and softlockup detector 2010-05-12 23:55:33 +02:00
sysctl_binary.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
sysctl_check.c ipv4 05/05: add sysctl to accept packets with local source addresses 2009-12-03 12:14:38 -08:00
taskstats.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
test_kprobes.c
time.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
timeconst.pl
timer.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
tracepoint.c trivial: fix typo "to to" in multiple files 2009-09-21 15:14:55 +02:00
tsacct.c mm: clean up mm_counter 2010-03-06 11:26:23 -08:00
uid16.c headers: utsname.h redux 2009-09-23 18:13:10 -07:00
up.c
user-return-notifier.c core: Clean up user return notifers use of per_cpu 2009-12-02 10:22:59 +01:00
user.c sched: Remove USER_SCHED 2010-01-21 13:40:18 +01:00
user_namespace.c
utsname.c
utsname_sysctl.c sysctl kernel: Remove binary sysctl logic 2009-11-12 02:04:55 -08:00
wait.c
watchdog.c lockup_detector: Combine nmi_watchdog and softlockup detector 2010-05-12 23:55:33 +02:00
workqueue.c workqueue: flush_delayed_work: keep the original workqueue for re-queueing 2010-04-30 07:24:51 +02:00