Commit graph

29165 commits

Author SHA1 Message Date
Mike Galbraith
4b39fd9685 perfcounters: ratelimit performance counter interrupts
Ratelimit performance counter interrupts to 100KHz per CPU.

This replaces the irq-delta-time based method.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 14:50:02 +01:00
Mike Galbraith
1b023a96d9 perfcounters: throttle on too high IRQ rates
Starting kerneltop with only -c 100 seems to be a bad idea, it can
easily lock the system due to perfcounter IRQ overload.

So add throttling: if a new IRQ arrives in a shorter than
PERFMON_MIN_PERIOD_NS time, turn off perfcounters and untrottle them
from the next timer tick.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 11:33:18 +01:00
Ingo Molnar
bfe2a3c3b5 Merge branch 'core/percpu' into perfcounters/core
Conflicts:
	arch/x86/include/asm/hardirq_32.h
	arch/x86/include/asm/hardirq_64.h

Semantic merge:
	arch/x86/include/asm/hardirq.h
	[ added apic_perf_irqs field. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 10:20:15 +01:00
Ingo Molnar
77835492ed Merge commit 'v2.6.29-rc2' into perfcounters/core
Conflicts:
	include/linux/syscalls.h
2009-01-21 16:37:27 +01:00
Ingo Molnar
198030782c Merge branch 'x86/mm' into core/percpu
Conflicts:
	arch/x86/mm/fault.c
2009-01-21 10:39:51 +01:00
Tejun Heo
6b7c38d555 linker script: kill PERCPU_VADDR_PREALLOC()
Impact: cleanup

With .data.percpu.first in place, PERCPU_VADDR_PREALLOC() is no longer
necessary.  Kill it.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-20 12:29:20 +09:00
Brian Gerst
0bd74fa8e2 percpu: refactor percpu.h
Impact: cleanup

Refactor the DEFINE_PER_CPU_* macros and add .data.percpu.first
section.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-20 12:29:19 +09:00
Ingo Molnar
b2b062b816 Merge branch 'core/percpu' into stackprotector
Conflicts:
	arch/x86/include/asm/pda.h
	arch/x86/include/asm/system.h

Also, moved include/asm-x86/stackprotector.h to arch/x86/include/asm.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-18 18:37:14 +01:00
Ingo Molnar
af37501c79 Merge branch 'core/percpu' into perfcounters/core
Conflicts:
	arch/x86/include/asm/pda.h

We merge tip/core/percpu into tip/perfcounters/core because of a
semantic and contextual conflict: the former eliminates the PDA,
while the latter extends it with apic_perf_irqs field.

Resolve the conflict by moving the new field to the irq_cpustat
structure on 64-bit too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-18 18:15:49 +01:00
Paul Mackerras
d859e29fe3 perf_counter: Add counter enable/disable ioctls
Impact: New perf_counter features

This primarily adds a way for perf_counter users to enable and disable
counters and groups.  Enabling or disabling a counter or group also
enables or disables all of the child counters that have been cloned
from it to monitor children of the task monitored by the top-level
counter.  The userspace interface to enable/disable counters is via
ioctl on the counter file descriptor.

Along the way this extends the code that handles child counters to
handle child counter groups properly.  A group with multiple counters
will be cloned to child tasks if and only if the group leader has the
hw_event.inherit bit set - if it is set the whole group is cloned as a
group in the child task.

In order to be able to enable or disable all child counters of a given
top-level counter, we need a way to find them all.  Hence I have added
a child_list field to struct perf_counter, which is the head of the
list of children for a top-level counter, or the link in that list for
a child counter.  That list is protected by the perf_counter.mutex
field.

This also adds a mutex to the perf_counter_context struct.  Previously
the list of counters was protected just by the lock field in the
context, which meant that perf_counter_init_task had to take that lock
and then take whatever lock/mutex protects the top-level counter's
child_list.  But the counter enable/disable functions need to take
that lock in order to traverse the list, then for each counter take
the lock in that counter's context in order to change the counter's
state safely, which will lead to a deadlock.

To solve this, we now have both a mutex and a spinlock in the context,
and taking either is sufficient to ensure the list of counters can't
change - you have to take both before changing the list.  Now
perf_counter_init_task takes the mutex instead of the lock (which
incidentally means that inherit_counter can use GFP_KERNEL instead of
GFP_ATOMIC) and thus avoids the possible deadlock.  Similarly the new
enable/disable functions can take the mutex while traversing the list
of child counters without incurring a possible deadlock when the
counter manipulation code locks the context for a child counter.

We also had an misfeature that the first counter added to a context
would possibly not go on until the next sched-in, because we were
using ctx->nr_active to detect if the context was running on a CPU.
But nr_active is the number of active counters, and if that was zero
(because the context didn't have any counters yet) it would look like
the context wasn't running on a cpu and so the retry code in
__perf_install_in_context wouldn't retry.  So this adds an 'is_active'
field that is set when the context is on a CPU, even if it has no
counters.  The is_active field is only used for task contexts, not for
per-cpu contexts.

If we enable a subsidiary counter in a group that is active on a CPU,
and the arch code can't enable the counter, then we have to pull the
whole group off the CPU.  We do this with group_sched_out, which gets
moved up in the file so it comes before all its callers.  This also
adds similar logic to __perf_install_in_context so that the "all on,
or none" invariant of groups is preserved when adding a new counter to
a group.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-01-17 18:10:22 +11:00
Tejun Heo
145cd30bac linker script: add missing VMLINUX_SYMBOL
The newly added PERCPU_*() macros define and use __per_cpu_load but
VMLINUX_SYMBOL() was missing from usages causing build failures on
archs where linker visible symbol is different from C symbols
(e.g. blackfin).

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-17 15:26:12 +09:00
Len Brown
88d998c264 Merge branch 'misc' into release 2009-01-16 14:45:34 -05:00
David Brownell
c3407710b7 ACPI: fix ACPI_FADT_S4_RTC_WAKE comment
Make the comment for ACPI_FADT_S4_RTC_WAKE match the ACPI spec;
that bit has nothing to do with status bits.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-01-16 14:32:17 -05:00
Linus Torvalds
e58d4fd89a Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  sata_fsl: Return non-zero on error in probe()
  drivers/ata/pata_ali.c: s/isa_bridge/ali_isa_bridge/ to fix alpha build
  libata: New driver for OCTEON SOC Compact Flash interface (v7).
  libata: Add another column to the ata_timing table.
  sata_via: Add VT8261 support
  pata_atiixp: update port enabledness test handling
  [libata] get-identity ioctl: Fix use of invalid memory pointer
2009-01-16 08:40:57 -08:00
Linus Torvalds
a11d9b623e Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] Skip deleted devices in __scsi_device_lookup_by_target()
  [SCSI] Add SUN Universal Xport to no attach blacklist
  [SCSI] iscsi_tcp: make padbuf non-static
  [SCSI] mpt fusion: Add Firmware debug support
  [SCSI] mpt fusion: Add separate msi enable disable for FC,SPI,SAS
  [SCSI] mpt fusion: Update MPI Headers to version 01.05.19
  [SCSI] qla2xxx: Fix ISP restart bug in multiq code
2009-01-16 08:40:40 -08:00
Linus Torvalds
4c44323db1 Merge branch 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/i915: lock correct mutex around object unreference.
  drm/i915: add support for physical memory objects
  drm/i915: make LVDS fixed mode a preferred mode
  drm: handle depth & bpp changes correctly
  drm: initial KMS config fixes
  drm/i915: setup sarea properly in master_priv
  drm/i915: set vblank enabled flag correctly across IRQ install/uninstall
  drm/i915: don't enable vblanks on disabled pipes
2009-01-16 08:39:52 -08:00
David Daney
3ada9c1264 libata: Add another column to the ata_timing table.
The forthcoming OCTEON SOC Compact Flash driver needs an additional
timing value that was not available in the ata_timing table.  I add a
new column for dmack_hold time.  The values were obtained from the
Compact Flash specification Rev 4.1.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-01-16 10:23:37 -05:00
Jeff Garzik
94be9a58d7 [libata] get-identity ioctl: Fix use of invalid memory pointer
for SAS drivers.

Caught by Ke Wei (and team?) at Marvell.

Also, move the ata_scsi_ioctl export to libata-scsi.c, as that seems to be the
general trend.

Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-01-16 10:17:09 -05:00
Ingo Molnar
6dbde35308 percpu: add optimized generic percpu accessors
It is an optimization and a cleanup, and adds the following new
generic percpu methods:

  percpu_read()
  percpu_write()
  percpu_add()
  percpu_sub()
  percpu_and()
  percpu_or()
  percpu_xor()

and implements support for them on x86. (other architectures will fall
back to a default implementation)

The advantage is that for example to read a local percpu variable,
instead of this sequence:

 return __get_cpu_var(var);

 ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
 ffffffff8102ca32:	81
 ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
 ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax

We can get a single instruction by using the optimized variants:

 return percpu_read(var);

 ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax

I also cleaned up the x86-specific APIs and made the x86 code use
these new generic percpu primitives.

tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
    * added percpu_and() for completeness's sake
    * made generic percpu ops atomic against preemption

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-16 14:20:31 +01:00
Tejun Heo
1a51e3a0ae x86: fold pda into percpu area on SMP
[ Based on original patch from Christoph Lameter and Mike Travis. ]

Currently pdas and percpu areas are allocated separately.  %gs points
to local pda and percpu area can be reached using pda->data_offset.
This patch folds pda into percpu area.

Due to strange gcc requirement, pda needs to be at the beginning of
the percpu area so that pda->stack_canary is at %gs:40.  To achieve
this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
added and used to reserve pda sized chunk at the start of the percpu
area.

After this change, for boot cpu, %gs first points to pda in the
data.init area and later during setup_per_cpu_areas() gets updated to
point to the actual pda.  This means that setup_per_cpu_areas() need
to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
already has modified it when control reaches setup_per_cpu_areas().

This patch also removes now unnecessary get_local_pda() and its call
sites.

A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-16 14:19:46 +01:00
Tejun Heo
3e5d8f9784 x86: make percpu symbols zerobased on SMP
[ Based on original patch from Christoph Lameter and Mike Travis. ]

This patch makes percpu symbols zerobased on x86_64 SMP by adding
PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on
the percpu output section and using it in vmlinux_64.lds.S.  A new
PHDR is added as existing ones cannot contain sections near address
zero.  PERCPU_VADDR() also adds a new symbol __per_cpu_load which
always points to the vaddr of the loaded percpu data.init region.

The following adjustments have been made to accomodate the address
change.

* code to locate percpu gdt_page in head_64.S is updated to add the
  load address to the gdt_page offset.

* __per_cpu_load is used in places where access to the init data area
  is necessary.

* pda->data_offset is initialized soon after C code is entered as zero
  value doesn't work anymore.

This patch is mostly taken from Mike Travis' "x86_64: Base percpu
variables at zero" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-16 14:19:14 +01:00
Jesse Barnes
40a518d9f1 drm: initial KMS config fixes
When mode setting is first initialized, the driver will call into
drm_helper_initial_config() to set up an initial output and framebuffer
configuration.  This routine is responsible for probing the available
connectors, encoders, and crtcs, looking for modes and putting together
something reasonable (where reasonable is defined as "allows kernel
messages to be visible on as many displays as possible").

However, the code was a bit too aggressive in setting default modes when
none were found on a given connector.  Even if some connectors had modes,
any connectors found lacking modes would have the default 800x600 mode added
to their mode list, which in some cases could cause problems later down the
line.  In my case, the LVDS was perfectly available, but the initial config
code added 800x600 modes to both of the detected but unavailable HDMI
connectors (which are on my non-existent docking station).  This ended up
preventing later code from setting a mode on my LVDS, which is bad.

This patch fixes that behavior by making the initial config code walk
through the connectors first, counting the available modes, before it decides
to add any default modes to a possibly connected output.  It also fixes the
logic in drm_target_preferred() that was causing zeroed out modes to be set
as the preferred mode for a given connector, even if no modes were available.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-01-16 18:40:54 +10:00
Linus Torvalds
3feeba1e53 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (95 commits)
  b44: GFP_DMA skb should not escape from driver
  korina: do not use IRQF_SHARED with IRQF_DISABLED
  korina: do not stop queue here
  korina: fix handling tx_chain_tail
  korina: do tx at the right position
  korina: do schedule napi after testing for it
  korina: rework korina_rx() for use with napi
  korina: disable napi on close and restart
  korina: reset resource buffer size to 1536
  korina: fix usage of driver_data
  bnx2x: First slow path interrupt race
  bnx2x: MTU Filter
  bnx2x: Indirection table initialization index
  bnx2x: Missing brackets
  bnx2x: Fixing the doorbell size
  bnx2x: Endianness issues
  bnx2x: VLAN tagged packets without VLAN offload
  bnx2x: Protecting the link change indication
  bnx2x: Flow control updated before reporting the link
  bnx2x: Missing mask when calculating flow control
  ...
2009-01-15 16:53:15 -08:00
Jaswinder Singh Rajput
00bfddaf7f include of <linux/types.h> is preferred over <asm/types.h>
Impact: fix 15 make headers_check warnings:

include of <linux/types.h> is preferred over <asm/types.h>

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:41 -08:00
Ivan Kokshaysky
5f7dc5d750 alpha: fix RTC on marvel
Unlike other alphas, marvel doesn't have real PC-style CMOS clock hardware
- RTC accesses are emulated via PAL calls.  Unfortunately, for unknown
reason these calls work only on CPU #0.  So current implementation for
arbitrary CPU makes CMOS_READ/WRITE to be executed on CPU #0 via IPI.
However, for obvious reason this doesn't work with standard
get/set_rtc_time() functions, where a bunch of CMOS accesses is done with
disabled interrupts.

Solved by making the IPI calls for entire get/set_rtc_time() functions,
not for individual CMOS accesses.  Which is also a lot more effective
performance-wise.

The patch is largely based on the code from Jay Estabrook.
My changes:
- tweak asm-generic/rtc.h by adding a couple of #defines to
  avoid a massive code duplication in arch/alpha/include/asm/rtc.h;
- sys_marvel.c: fix get/set_rtc_time() return values (Jay's FIXMEs).

NOTE: this fixes *only* LIB_RTC drivers.  Legacy (CONFIG_RTC) driver
wont't work on marvel.  Actually I think that we should just disable
CONFIG_RTC on alpha (maybe in 2.6.30?), like most other arches - AFAIK,
all modern distributions use LIB_RTC anyway.

Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:40 -08:00
Qinghuang Feng
1bcbf31337 btrfs & squashfs: Move btrfs and squashfsto's magic number to <linux/magic.h>
Use the standard magic.h for btrfs and squashfs.

Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:38 -08:00
Randy Dunlap
6ae301e85c resources: fix parameter name and kernel-doc
Fix __request_region() parameter kernel-doc notation and parameter name:

Warning(linux-2.6.28-git10//kernel/resource.c:627): No description found for parameter 'flags'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:38 -08:00
Randy Dunlap
3eabdb76a0 jbd: fix missing kernel-doc
Fix jbd header file kernel-doc notation:

Warning(linux-2.6.28-git13//include/linux/jbd.h:823): No description found for parameter 'j_average_commit_time'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:37 -08:00
Li Zefan
45ce80fb6b cgroups: consolidate cgroup documents
Move Documentation/cpusets.txt and Documentation/controllers/* to
Documentation/cgroups/

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:37 -08:00
Ingo Molnar
7f268f4352 Merge branches 'cpus4096', 'x86/cleanups' and 'x86/urgent' into x86/percpu 2009-01-15 13:18:57 +01:00
Benjamin Herrenschmidt
937f1ba56b net: Add init_dummy_netdev() and fix EMAC driver using it
This adds an init_dummy_netdev() function that gets a network device
structure (allocation and lifetime entirely under caller's control) and
initialize the minimum amount of fields so it can be used to schedule
NAPI polls without registering a full blown interface. This is to be
used by drivers that need to tie several hardware interfaces to a single
NAPI poll scheduler due to HW limitations.

It also updates the ibm_newemac driver to use that, this fixing the
oops on 2.6.29 due to passing NULL as "dev" to netif_napi_add()

Symbol is exported GPL only a I don't think we want binary drivers doing
that sort of acrobatics (if we want them at all).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-14 21:05:05 -08:00
Linus Torvalds
5393f78027 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (29 commits)
  powerpc/83xx: Move mcu_mpc8349emitx driver out of drivers/i2c/chips/
  powerpc/83xx: Make serial ports work on MPC8315E-RDB w/ FSL U-Boots
  powerpc/e500mc: Doorbells need to be taken w/exceptions disabled
  powerpc: Enable PS3 options and QPACE in ppc64_defconfig
  powerpc/powermac: Fix occasional SMP boot failure
  powerpc/cacheinfo: Rename cache_dir per-cpu variable
  hvc_console: Use kzalloc() instead of kmalloc() + memset()
  hvc_console: Do not set low_latency when using interrupts
  hvc_console: Call free_irq() only if request_irq() was successful
  hvc_console: Change an mb() to smp_mb() and add some comments
  powerpc: Cleanup from l64 to ll64 change: drivers/net
  powerpc: Cleanup from l64 to ll64 change: drivers/char
  powerpc: Cleanup from l64 to ll64 change: arch code
  powerpc: Change u64/s64 to a long long integer type
  powerpc/kexec: Check crash_base for relocatable kernel
  powerpc: Make dummy section a valid note header
  Xilinx: SPI: updated driver for device tree
  drivers/of: Add the of_find_i2c_device_by_node function.
  powerpc/xsysace: add compatible string for non-ipcore instance
  powerpc/mpc52xx: remove dead code from GPIO driver
  ...
2009-01-14 20:00:28 -08:00
Linus Torvalds
bca268565f Merge branch 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (44 commits)
  [CVE-2009-0029] s390 specific system call wrappers
  [CVE-2009-0029] System call wrappers part 33
  [CVE-2009-0029] System call wrappers part 32
  [CVE-2009-0029] System call wrappers part 31
  [CVE-2009-0029] System call wrappers part 30
  [CVE-2009-0029] System call wrappers part 29
  [CVE-2009-0029] System call wrappers part 28
  [CVE-2009-0029] System call wrappers part 27
  [CVE-2009-0029] System call wrappers part 26
  [CVE-2009-0029] System call wrappers part 25
  [CVE-2009-0029] System call wrappers part 24
  [CVE-2009-0029] System call wrappers part 23
  [CVE-2009-0029] System call wrappers part 22
  [CVE-2009-0029] System call wrappers part 21
  [CVE-2009-0029] System call wrappers part 20
  [CVE-2009-0029] System call wrappers part 19
  [CVE-2009-0029] System call wrappers part 18
  [CVE-2009-0029] System call wrappers part 17
  [CVE-2009-0029] System call wrappers part 16
  [CVE-2009-0029] System call wrappers part 15
  ...
2009-01-14 19:58:40 -08:00
Harvey Harrison
74d96f0186 byteorder: make swab.h include asm/swab.h like a regular header
Add swab.h to kbuild.asm and remove the individual entries from
each arch, mark as unifdef as some arches have some kernel-only
bits inside.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-14 19:56:50 -08:00
Linus Torvalds
c2919f2ab9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
  IDE: fix sparse signed-ness errors with host->host_busy
  ide: fix suspend regression
  tx4938ide: Fix build error due to read_sff_dma_status moving
  ide: remove unused CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ
  sl82c105: remove dead code
  via82cxxx: fix cable warning message
  ide: can't use SSD/non-rotational queue flag for all CFA devices
  it821x.c: use dev->revision instead of pci_read_config_byte
  it821x: Add ultra_mask quirk for Vortex86SX
  ide: fix accidental LOCKDEP breakage caused by local_irq_set() removal
2009-01-14 19:51:26 -08:00
Ben Dooks
e720b9e498 IDE: fix sparse signed-ness errors with host->host_busy
The host_busy field in struct ide_host defaults to a
signed-long, where most arch's test_and_set_bit_*
macros use an unsigned long.

Change to using an unsigned long, which on ARM removes
the following sparse errors:

drivers/ide/ide-io.c:681:8: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:681:8:    expected unsigned long volatile *p
drivers/ide/ide-io.c:681:8:    got long volatile *<noident>
drivers/ide/ide-io.c:681:8: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:681:8:    expected unsigned long volatile *p
drivers/ide/ide-io.c:681:8:    got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3:    expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3:    got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3:    expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3:    got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3:    expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3:    got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3:    expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3:    got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3:    expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3:    got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3:    expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3:    got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3:    expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3:    got long volatile *<noident>
drivers/ide/ide-io.c:695:3: warning: incorrect type in argument 2 (different signedness)
drivers/ide/ide-io.c:695:3:    expected unsigned long volatile *p
drivers/ide/ide-io.c:695:3:    got long volatile *<noident>

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-14 19:19:04 +01:00
Brandon Philips
b94b898f31 it821x: Add ultra_mask quirk for Vortex86SX
On Vortex86SX with IDE controller revision 0x11 ultra DMA must be
disabled. This patch was tested by DMP and seems to work.

It is a cleaned up version of their older Kernel patch: 
 http://www.dmp.com.tw/tech/vortex86sx/patch-2.6.24-DMP.gz

Tested-by: Shawn Lin <shawn@dmp.com.tw>
Signed-off-by: Brandon Philips <bphilips@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-01-14 19:19:02 +01:00
Nick Piggin
18e6959c38 mm: fix assertion
This assertion is incorrect for lockless pagecache.  By definition if we
have an unpinned page that we are trying to take a speculative reference
to, it may become the tail of a compound page at any time (if it is
freed, then reallocated as a compound page).

It was still a valid assertion for the vmscan.c LRU isolation case, but
it doesn't seem incredibly helpful...  if somebody wants it, they can
put it back directly where it applies in the vmscan code.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-14 07:32:44 -08:00
Heiko Carstens
2b66421995 [CVE-2009-0029] System call wrappers part 33
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:32 +01:00
Heiko Carstens
d4e82042c4 [CVE-2009-0029] System call wrappers part 32
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:31 +01:00
Benjamin Herrenschmidt
ee6a093222 [CVE-2009-0029] powerpc: Enable syscall wrappers for 64-bit
This enables the use of syscall wrappers to do proper sign extension
for 64-bit programs.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:17 +01:00
Heiko Carstens
1a94bc3476 [CVE-2009-0029] System call wrapper infrastructure
From: Martin Schwidefsky <schwidefsky@de.ibm.com>

By selecting HAVE_SYSCALL_WRAPPERS architectures can activate
system call wrappers in order to sign extend system call arguments.

All architectures where the ABI defines that the caller of a function
has to perform sign extension probably need this.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:16 +01:00
Heiko Carstens
e55380edf6 [CVE-2009-0029] Rename old_readdir to sys_old_readdir
This way it matches the generic system call name convention.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:15 +01:00
Heiko Carstens
2ed7c03ec1 [CVE-2009-0029] Convert all system calls to return a long
Convert all system calls to return a long. This should be a NOP since all
converted types should have the same size anyway.
With the exception of sys_exit_group which returned void. But that doesn't
matter since the system call doesn't return.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:14 +01:00
Heiko Carstens
4c696ba798 [CVE-2009-0029] Move compat system call declarations to compat header file
Move declarations to correct header file.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-14 14:15:14 +01:00
Paul Mackerras
3b6f9e5cb2 perf_counter: Add support for pinned and exclusive counter groups
Impact: New perf_counter features

A pinned counter group is one that the user wants to have on the CPU
whenever possible, i.e. whenever the associated task is running, for
a per-task group, or always for a per-cpu group.  If the system
cannot satisfy that, it puts the group into an error state where
it is not scheduled any more and reads from it return EOF (i.e. 0
bytes read).  The group can be released from error state and made
readable again using prctl(PR_TASK_PERF_COUNTERS_ENABLE).  When we
have finer-grained enable/disable controls on counters we'll be able
to reset the error state on individual groups.

An exclusive group is one that the user wants to be the only group
using the CPU performance monitor hardware whenever it is on.  The
counter group scheduler will not schedule an exclusive group if there
are already other groups on the CPU and will not schedule other groups
onto the CPU if there is an exclusive group scheduled (that statement
does not apply to groups containing only software counters, which can
always go on and which do not prevent an exclusive group from going on).
With an exclusive group, we will be able to let users program PMU
registers at a low level without the concern that those settings will
perturb other measurements.

Along the way this reorganizes things a little:
- is_software_counter() is moved to perf_counter.h.
- cpuctx->active_oncpu now records the number of hardware counters on
  the CPU, i.e. it now excludes software counters.  Nothing was reading
  cpuctx->active_oncpu before, so this change is harmless.
- A new cpuctx->exclusive field records whether we currently have an
  exclusive group on the CPU.
- counter_sched_out moves higher up in perf_counter.c and gets called
  from __perf_counter_remove_from_context and __perf_counter_exit_task,
  where we used to have essentially the same code.
- __perf_counter_sched_in now goes through the counter list twice, doing
  the pinned counters in the first loop and the non-pinned counters in
  the second loop, in order to give the pinned counters the best chance
  to be scheduled in.

Note that only a group leader can be exclusive or pinned, and that
attribute applies to the whole group.  This avoids some awkwardness in
some corner cases (e.g. where a group leader is closed and the other
group members get added to the context list).  If we want to relax that
restriction later, we can, and it is easier to relax a restriction than
to apply a new one.

This doesn't yet handle the case where a pinned counter is inherited
and goes into error state in the child - the error state is not
propagated up to the parent when the child exits, and arguably it
should.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-01-14 21:00:30 +11:00
venkatesh.pallipadi@intel.com
e4b866ed19 x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param
Impact: cleanup

Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer.
Subsequent patch changes the x86 PAT handling to return a compatible
memtype in pgprot_t, if what was requested cannot be allowed due to conflicts.
No fuctionality change in this patch.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 19:13:01 +01:00
Andi Kleen
c8399943bd x86, generic: mark complex bitops.h inlines as __always_inline
Impact: reduce kernel image size

Hugh Dickins noticed that older gcc versions when the kernel
is built for code size didn't inline some of the bitops.

Mark all complex x86 bitops that have more than a single
asm statement or two as always inline to avoid this problem.

Probably should be done for other architectures too.

Ingo then found a better fix that only requires
a single line change, but it unfortunately only
works on gcc 4.3.

On older gccs the original patch still makes a ~0.3% defconfig
difference with CONFIG_OPTIMIZE_INLINING=y.

With gcc 4.1 and a defconfig like build:

    6116998 1138540  883788 8139326  7c323e vmlinux-oi-with-patch
    6137043 1138540  883788 8159371  7c808b vmlinux-optimize-inlining

~20k / 0.3% difference.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 18:56:30 +01:00
Karen Xie
2856830bd3 [SCSI] iscsi_tcp: make padbuf non-static
virt_to_page() call should not be used on kernel text and data
addresses.  virt_to_page() is used by sg_init_one(). So change padbuf
to be allocated within iscsi_segment.

Signed-off-by: Karen Xie <kxie@chelsio.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-01-13 10:41:34 -06:00
Richard Kennedy
daaf83d2b9 netfilter 09/09: remove padding from struct xt_match on 64bit builds
reorder struct xt_match to remove 8 bytes of padding and make its size
128 bytes.

This saves a small amount of data space in each of the xt netfilter
modules and fits xt_match in one 128 byte cache line.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-12 21:18:37 -08:00