Commit graph

29965 commits

Author SHA1 Message Date
Michal Hocko
dac23b0d05 memcg swap: use mem_cgroup_uncharge_swap fix
Although mem_cgroup_uncharge_swap has an empty placeholder for
!CONFIG_CGROUP_MEM_RES_CTLR_SWAP the definition is placed in the
CONFIG_SWAP ifdef block so we are missing the same definition for
!CONFIG_SWAP which implies !CONFIG_CGROUP_MEM_RES_CTLR_SWAP.

This has not been an issue before, because mem_cgroup_uncharge_swap was
not called from !CONFIG_SWAP context.  But Hugh Dickins has a cleanup
patch to call __mem_cgroup_commit_charge_swapin which is defined also
for !CONFIG_SWAP.

Let's move both the empty definition and declaration outside of the
CONFIG_SWAP block to avoid the following compilation error:

  mm/memcontrol.c: In function '__mem_cgroup_commit_charge_swapin':
  mm/memcontrol.c:2837: error: implicit declaration of function 'mem_cgroup_uncharge_swap'

if CONFIG_SWAP is disabled.

Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:51 -07:00
Stephen Boyd
20955e891d libfs: add simple_open()
debugfs and a few other drivers use an open-coded version of
simple_open() to pass a pointer from the file to the read/write file
ops.  Add support for this simple case to libfs so that we can remove
the many duplicate copies of this simple function.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:50 -07:00
Amir Vadai
08f10affe4 net/dcb: Add an optional max rate attribute
Although not specified in 8021Qaz spec, it could be useful to enable drivers
whose HW supports setting a rate limit for an ETS TC. This patch adds this
optional attribute to DCB netlink. To use it, drivers should implement and
register the callbacks ieee_setmaxrate and ieee_getmaxrate. The units are 64
bits long and specified in Kbps to enable usage over both slow and very fast
networks.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:04 -04:00
Amir Vadai
e5395e92a4 net/mlx4_core: set port QoS attributes
Adding QoS firmware commands:
- mlx4_en_SET_PORT_PRIO2TC - set UP <=> TC
- mlx4_en_SET_PORT_SCHEDULER - set promised BW, max BW and PG number

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:03 -04:00
Amir Vadai
0e98b523c4 net/mlx4_en: Force user priority by QP attribute
Instead of relying on HW to change schedule queue by UP, schedule
queue is fixed for a tx_ring, and UP in WQE is ignored in this aspect.  This
resolves two issues with untagged traffic:
1. untagged traffic has no UP in packet which is needed for QoS. The change
   above allows setting the schedule queue (and by that the UP) of such a stream.
2. BlueFlame uses the same field used by vlan tag. So forcing UP from QPC
   allows using BF for untagged but prioritized traffic.

In old firmware that force UP is not supported, untagged traffic will not subject to
QoS.

Because UP is set by QP, need to always have a tx ring per UP, even if pfcrx
module paramter is false.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:03 -04:00
Mike Sinkovsky
9899b81e7c Ethernet driver for the WIZnet W5300 chip
Based on original driver from chip manufacturer, but nearly full rewite.
Tested and used in production with Blackfin BF531 embedded processor.

Signed-off-by: Mike Sinkovsky <msink@permonline.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 01:43:02 -04:00
Linus Torvalds
43f63c8711 Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  Fix UNC parsing on mount
  Remove unnecessary check for NULL in password parser
  CIFS: Fix VFS lock usage for oplocked files
  Revert "CIFS: Fix VFS lock usage for oplocked files"
  cifs: writing past end of struct in cifs_convert_address()
  cifs: silence compiler warnings showing up with gcc-4.7.0
  CIFS: Fix VFS lock usage for oplocked files
2012-04-04 18:37:09 -07:00
Jiri Pirko
2615598fc1 team: add binary option type
For transfering generic binary data (e.g. BPF code), introduce new
binary option type.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 20:30:40 -04:00
Linus Torvalds
6c216ec636 KGDB/KDB regression fixes
3.4: Fix an an Smatch warning that appeared in the 3.4 merge window
    3.0: Fix kgdb test suite with SMP for all archs without HW single stepping
 2.6.36: Fix kgdb sw breakpoints with CONFIG_DEBUG_RODATA=y limitations on x86
 2.6.26: Fix oops on kgdb test suite with CONFIG_DEBUG_RODATA
         Fix kgdb test suite with SMP for all archs with HW single stepping
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPedocAAoJEIciOldedpOjn7EP/397Rh0zmRlG8oQwMEJcK3E5
 gaRyBNpkGoU3ekHXHx/nzgQ/CS9opzW7nBZDu8weWLjRKMT4RyHfuJcWyu525GvQ
 SnoiX2ZUzP315d8llCYwXmaCEYA7lHQi4T2bGMlDSn1J8kS235EQxllgEfhXDdEC
 DxRWgHABG2UR62C62sGKbPaMMDO9TcNcrAQK27LDLTS7pKLmYqBWBdZKgWzBM/Pr
 AF8vakqSgUw3Aq9qrLge+483uT7uhMoUJofxRppWtm1QgnDcTmri9LOagiazDotz
 RQliRGwVxj9hEo5mLEiQtI0N1kIGCAsK0+9aUJEZRXovRBR9kvqaqHT4c5xdhznr
 VKYvqqTcHBkKLIfNXFvQZnn2cXtNVNqve9CZZwdBJaFYEkaR7ZVQqE6f2xq8KAb2
 RmhvzlEUyLU+89YKkH66uSa22VLSazkeH+4b8AJ4JxYDEab3BHoBCe8axcBQrTsj
 7X5NOs7V3Oj+4J3bS1fbUbxq4t0dfpLLyg8e/lELWtT+Kq7nQRzA2XHRZAMTve8M
 T0cTdrwtUbgY9ZMTpywNB2KlPgTvhWOyfYbH6/Kcks7ecSXlkow3edXoiUbw79iE
 hP8vcMWbT2Rv3IbLkSMFZEQGAG9qL1YyGv4NDmLOoljO1c/Bi3WQIR5aI+di6asV
 Z5q5s/bmGa4+OhFFITSd
 =SW2N
 -----END PGP SIGNATURE-----

Merge tag 'for_linus-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb

Pull KGDB/KDB regression fixes from Jason Wessel:
 - Fix a Smatch warning that appeared in the 3.4 merge window
 - Fix kgdb test suite with SMP for all archs without HW single stepping
 - Fix kgdb sw breakpoints with CONFIG_DEBUG_RODATA=y limitations on x86
 - Fix oops on kgdb test suite with CONFIG_DEBUG_RODATA
 - Fix kgdb test suite with SMP for all archs with HW single stepping

* tag 'for_linus-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
  x86,kgdb: Fix DEBUG_RODATA limitation using text_poke()
  kgdb,debug_core: pass the breakpoint struct instead of address and memory
  kgdbts: (2 of 2) fix single step awareness to work correctly with SMP
  kgdbts: (1 of 2) fix single step awareness to work correctly with SMP
  kgdbts: Fix kernel oops with CONFIG_DEBUG_RODATA
  kdb: Fix smatch warning on dbg_io_ops->is_console
2012-04-04 17:26:08 -07:00
Linus Torvalds
58bca4a8fa Merge branch 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull DMA mapping branch from Marek Szyprowski:
 "Short summary for the whole series:

  A few limitations have been identified in the current dma-mapping
  design and its implementations for various architectures.  There exist
  more than one function for allocating and freeing the buffers:
  currently these 3 are used dma_{alloc, free}_coherent,
  dma_{alloc,free}_writecombine, dma_{alloc,free}_noncoherent.

  For most of the systems these calls are almost equivalent and can be
  interchanged.  For others, especially the truly non-coherent ones
  (like ARM), the difference can be easily noticed in overall driver
  performance.  Sadly not all architectures provide implementations for
  all of them, so the drivers might need to be adapted and cannot be
  easily shared between different architectures.  The provided patches
  unify all these functions and hide the differences under the already
  existing dma attributes concept.  The thread with more references is
  available here:

    http://www.spinics.net/lists/linux-sh/msg09777.html

  These patches are also a prerequisite for unifying DMA-mapping
  implementation on ARM architecture with the common one provided by
  dma_map_ops structure and extending it with IOMMU support.  More
  information is available in the following thread:

    http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819

  More works on dma-mapping framework are planned, especially in the
  area of buffer sharing and managing the shared mappings (together with
  the recently introduced dma_buf interface: commit d15bd7ee44
  "dma-buf: Introduce dma buffer sharing mechanism").

  The patches in the current set introduce a new alloc/free methods
  (with support for memory attributes) in dma_map_ops structure, which
  will later replace dma_alloc_coherent and dma_alloc_writecombine
  functions."

People finally started piping up with support for merging this, so I'm
merging it as the last of the pending stuff from the merge window.
Looks like pohmelfs is going to wait for 3.5 and more external support
for merging.

* 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  common: DMA-mapping: add NON-CONSISTENT attribute
  common: DMA-mapping: add WRITE_COMBINE attribute
  common: dma-mapping: introduce mmap method
  common: dma-mapping: remove old alloc_coherent and free_coherent methods
  Hexagon: adapt for dma_map_ops changes
  Unicore32: adapt for dma_map_ops changes
  Microblaze: adapt for dma_map_ops changes
  SH: adapt for dma_map_ops changes
  Alpha: adapt for dma_map_ops changes
  SPARC: adapt for dma_map_ops changes
  PowerPC: adapt for dma_map_ops changes
  MIPS: adapt for dma_map_ops changes
  X86 & IA64: adapt for dma_map_ops changes
  common: dma-mapping: introduce generic alloc() and free() methods
2012-04-04 17:13:43 -07:00
Giuseppe CAVALLARO
18f05d64ec stmmac: extend CSR Clock Range programming
The CSR Clock Range has been reworked and new macros has
been added in the platform header to allow the CSR Clock
Range selection in the GMII Address Register.
The previous work didn't add the other fields
that can be used to achieve MDC clock of frequency
higher than the IEEE 802.3 specified frequency limit
of 2.5 MHz and program a clock divider of lower value.
On such platforms, these are used indeed so this patch
adds them.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 18:39:24 -04:00
Deepak SIKRI
8327eb65e7 stmmac: re-work the internal GMAC DMA platf parameters
This patch re-works the internal GMAC DMA parameters
passed from the platform.
In the past, we only passed the pbl but, with new core,
other parameters can be passed and are mandatory on some
platforms.

New parameters are documented in stmmac.txt because this
patch has an impact for many platforms.

Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Signed-off-by: Vikas Manocha <vikas.manocha@st.com>
Signed-off-by: Deepak Sikri <deepak.sikri@st.com>
Hacked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 18:39:24 -04:00
Deepak SIKRI
faeae3fa0f stmmac: Define MDC clock selection macros
The patch adds the macros to be used for MDC clock selection. The MDC clock
frequency is based on scaled system clock, and has to be confined to a range
of 1-2.5 MHz. Based on the input CSR clock, the scaling factor has to be
selected.
The platform specific code will provide the default value of this scaling
factor, based on the input CSR clock.
There is an option to set MDC clock higher than the IEEE 802.3 specified
frequency limit of 2.5 MHz. This applies for the interfacing chips that
support higher MDC clocks. The resultant higher clock of 12.5 MHz requires
additional Macros to be defined for the clock divider corresponding to the
to the following selection.
-----------------------------------------
	Selection	MDC Clock
-----------------------------------------
	1000 		clk_csr_i/4
	1001 		clk_csr_i/6
	1010 		clk_csr_i/8
	1011 		clk_csr_i/10
	1100 		clk_csr_i/12
	1101	 	clk_csr_i/14
	1110 		clk_csr_i/16
	1111 		clk_csr_i/18

This support has to be added both in the include file, as well as driver. The
driver need to program the registers based on the interfacing chips. This would
be more board specific information and needs to be passed through the platform
code to the driver. This work would be carried out in the future patch set
release.

Signed-off-by: Deepak Sikri <deepak.sikri@st.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 18:39:23 -04:00
Deepak SIKRI
55f9a4d6fa stmmac: Define CSUM offload engine Types
This patch explicitly defines the CSUM offload engine type which need
(not mandatory) to be passed from the platform code.
STMMAC core supports two check sum offload engine types- Type-1 & Type-2.
Also, there are STMMAC cores that do not have the check sum offload
capabilities.

The behaviour of Type-1 & Type-2 cores related to provision of checksum
increases the packet length for Type-1 cores by 2, as the checksum is appended
at the end of data packet and the same is made accountable in the DMA status.
The STMMAC cores beyond Version-3.5 provide HW interface registers which allows
the user to read the HW capabilities, while to support the previous cores the
information related to HW capabilities has to be provided from the platform
code.

The Type-1 cores which do not have the HW register interface need this
information.

This patch also updates the driver's doc.

Signed-off-by: Deepak Sikri <deepak.sikri@st.com>
Hacked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 18:39:23 -04:00
Srinivas Kandagatla
f142af2e20 stmmac: Allow stmmac to work with other PHY buses(v3).
As stmmac mdio bus name prefix is hardcoded in the driver, this allows
only phys on stmmac mdio buses to connect, however stmmac should allow
phys on other mdio buses too.

This patch adds new variable phy_bus_name to plat_stmmacenet_data
struct to let the BSP decide which phy bus to be used by stmmac driver.
A typical use-case is to have generic MDIO buses like mdio-gpio on top
of stmmac.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Acked-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 18:39:23 -04:00
Linus Torvalds
64ebe98731 More power management updates for 3.4
Fixes mostly, including:
 
 * Patch series that hopefully fixes races between the freezer and request_firmware()
   and request_firmware_nowait() for good, with two cleanups from Stephen Boyd on top.
 
 * Runtime PM fix from Alan Stern preventing tasks from getting stuck indefinitely
   in the runtime PM wait queue.
 
 * Device PM QoS update from MyungJoo Ham introducing a new variant of
   pm_qos_update_request() allowing the callers to specify a timeout.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIcBAABAgAGBQJPdmPZAAoJEKhOf7ml8uNsvcgQAIKBya3ESVg2PbB1riIRJ0M5
 3R5ntbQ0sxa631lIoipZLP6HeN2fgTcfTqhHpr9/dtt80Zh/HbNWee4XEmkJvGOK
 UuG/Vzg2IJA2LKYbRDEALm9GwvlG8ylIrz1mWOSt77K+seyjnvCyfQsoVd5S/+sz
 bzDCwIJlV/lvtynvAMfaZ+O75XW1uYRJ6a1ABviEU4o+J7OC9UCp0h/b9c1WZqDJ
 1X0pBU0/28ZFnYnK+zuAqwJg7pua/HrC0nT/pQTRSZ0kXAgt7uuqIlpVz9HXiqzu
 TVbu3uW6FPWT0TP/iFmKMA1eiQJHLXgshECaccVOoMzIG/pqYTNbfu9BzEho3tL9
 w716ruo1JoythvnlIz4j8R2RtiE8SxTzCqGm4OHcie72VUSqduIhWgRyZOFhebUo
 xqiUSN2cyYUf9SJoeg0TSmQdutoa7vnswZgq4qjlOz39OPxHrwAe5ROXIBwoHvnz
 akmBtnabyNVsRiLe9eIH5N5C9TxHDgZwS70SMYqo1D09Qo+NTUtvSVgC/NiIjhXb
 yY3UliDqGlkUhHJ+8ydntNb39VU4L1MO0IGzEvmvfXvSIcXavGkkmd9RV9yytLEK
 1ujq99NHITzxyuF2+bNGpPQVEVH3sQgAv/doFTiEZiUHIIAy5Fmy/+ipcurslXLm
 urlq4RLG+JXgPjw4XO14
 =ligR
 -----END PGP SIGNATURE-----

Merge tag 'pm-for-3.4-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 - Patch series that hopefully fixes races between the freezer and
   request_firmware() and request_firmware_nowait() for good, with two
   cleanups from Stephen Boyd on top.
 - Runtime PM fix from Alan Stern preventing tasks from getting stuck
   indefinitely in the runtime PM wait queue.
 - Device PM QoS update from MyungJoo Ham introducing a new variant of
   pm_qos_update_request() allowing the callers to specify a timeout.

* tag 'pm-for-3.4-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / QoS: add pm_qos_update_request_timeout() API
  firmware_class: Move request_firmware_nowait() to workqueues
  firmware_class: Reorganize fw_create_instance()
  PM / Sleep: Mitigate race between the freezer and request_firmware()
  PM / Sleep: Move disabling of usermode helpers to the freezer
  PM / Hibernate: Disable usermode helpers right before freezing tasks
  firmware_class: Do not warn that system is not ready from async loads
  firmware_class: Split _request_firmware() into three functions, v2
  firmware_class: Rework usermodehelper check
  PM / Runtime: don't forget to wake up waitqueue on failure
2012-04-04 14:26:40 -07:00
Linus Torvalds
a5149bf3fe Merge branch 'selinux' ("struct common_audit_data" sanitizer)
Merge common_audit_data cleanup patches from Eric Paris.

This is really too late, but it's a long-overdue cleanup of the costly
wrapper functions for the security layer.

The "struct common_audit_data" is used all over in critical paths,
allocated and initialized on the stack.  And used to be much too large,
causing not only unnecessarily big stack frames but the clearing of the
(mostly useless) data was also very visible in profiles.

As a particular example, in one microbenchmark for just doing "stat()"
over files a lot, selinux_inode_permission() used 7% of the CPU time.
That's despite the fact that it doesn't actually *do* anything: it is
just a helper wrapper function in the selinux security layer.

This patch-series shrinks "struct common_audit_data" sufficiently that
code generation for these kinds of wrapper functions is improved
noticeably, and we spend much less time just initializing data that we
will never use.

The functions still get called all the time, and it still shows up at
3.5+% in my microbenchmark, but it's quite a bit lower down the list,
and much less noticeable.

* Emailed patches from Eric Paris <eparis@redhat.com>:
  lsm_audit: don't specify the audit pre/post callbacks in 'struct common_audit_data'
  SELinux: do not allocate stack space for AVC data unless needed
  SELinux: remove avd from slow_avc_audit()
  SELinux: remove avd from selinux_audit_data
  LSM: shrink the common_audit_data data union
  LSM: shrink sizeof LSM specific portion of common_audit_data
2012-04-04 10:11:24 -07:00
Linus Torvalds
4a1e8ebc5e regulator: Fixes for -rc1
A bunch of smallish fixes that came up during the merge window as
 things got more testing - even more fixes from Axel, a fix for error
 handling in more complex systems using -EPROBE_DEFER and a couple of
 small fixes for the new dummy regulators.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPeDhzAAoJEBus8iNuMP3dh3sP/2i2M34Y66HkNGJCbkLgzGsu
 DlwbiNdh+o/uUcALCcyt7tYF06MhsJ4n7fOBRZ+hKr3VMTguHi6B36IKxGrfzBJs
 9XoYQkyO/92g3UjdWQSWKSQfx/qZhw167gmmeL8jQfEQOZZ2MhnCLbbZVsRMFGf6
 PzVDyW0StDWh0uBP6P7UU64pogFfUqSO/guIAqppUI9Ll/ulx0OqY+eov2mQzlmg
 Rqwjt9mF9UMZn4/Br0mQx+iUqqSugtA+2VAAiSx7K8JDjeMPh47S31p/FGCLXrmv
 Lbn2vQGn23RDthdxQlcisY/rJU5WsJbgdjjoppucyYJbYUWzVVMNM0djxtpGIQ7j
 x9X6ZTdG4RuphI5FDku/vKkNS4O/Y0+6bumpqFfXyBoAGOXMKdUmTymCNCogFnOy
 F6vMzrtA27M5HaFaRP3xnMSNU/aBegsgCoEjyQ1b5D1TAKvc5v614cZzyrWaCd+G
 gVE1lDIAo2snraRjZl9LDQwGKOkzEdLXJSzGZ7Pk9n54+QvRBJcYu3l/Wdrn4JES
 jtIprAyWbY4z6n5wuucwwjFZa2UtBukRTV8ZA/jMq+eig3VdN8CSKJCJUQTG+UgV
 fyCNq7WWAhK8rldcjgJhPspEbRazAEnaNwfZUWZAmmnb8bVVRO6pJOl2TiRN0DFt
 8MqdrQtJFusRXgrc+Wlq
 =h5VJ
 -----END PGP SIGNATURE-----

Merge tag 'regulator-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A bunch of smallish fixes that came up during the merge window as
  things got more testing - even more fixes from Axel, a fix for error
  handling in more complex systems using -EPROBE_DEFER and a couple of
  small fixes for the new dummy regulators."

* tag 'regulator-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: Remove non-existent parameter from fixed-helper.c kernel doc
  regulator: Fix setting new voltage in s5m8767_set_voltage
  regulator: fix sysfs name collision between dummy and fixed dummy regulator
  regulator: Fix deadlock on removal of regulators with supplies
  regulator: Fix comments in include/linux/regulator/machine.h
  regulator: Only update [LDOx|DCx]_HIB_MODE bits in wm8350_[ldo|dcdc]_set_suspend_disable
  regulator: Fix setting low power mode for wm831x aldo
  regulator: Return microamps in wm8350_isink_get_current
  regulator: wm8350: Fix the logic to choose best current limit setting
  regulator: wm831x-isink: Fix the logic to choose best current limit setting
  regulator: wm831x-dcdc: Fix the logic to choose best current limit setting
  regulator: anatop: patching to device-tree property "reg".
  regulator: Do proper shift to set correct bit for DC[2|5]_HIB_MODE setting
  regulator: Fix restoring pmic.dcdcx_hib_mode settings in wm8350_dcdc_set_suspend_enable
  regulator: Fix unbalanced lock/unlock in mc13892_regulator_probe error path
  regulator: Fix set and get current limit for wm831x_buckv
  regulator: tps6586x: Fix list minimal voltage setting for LDO0
2012-04-04 10:09:30 -07:00
Axel Lin
45b2604eaa Input: gameport - add helper macro for gameport_driver boilerplate
This patch introduces the module_gameport_driver macro which is a
convenience macro for gameport driver modules similar to
module_platform_driver. It is intended to be used by drivers
which init/exit section does nothing but registers/unregisters the
gameport driver. By using this macro it is possible to eliminate a
few lines of boilerplate code per gameport driver.

Based on work done by Lars-Peter Clausen <lars@metafoo.de> for
other buses (i2c and spi).

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2012-04-04 09:25:44 -07:00
Axel Lin
fa7f86d157 Input: serio - add helper macro for serio_driver boilerplate
This patch introduces the module_serio_driver macro which is a
convenience macro for serio driver modules similar to
module_platform_driver. It is intended to be used by drivers
which init/exit section does nothing but registers/unregisters
the serio driver. By using this macro it is possible to eliminate
a few lines of boilerplate code per serio driver.

Based on work done by Lars-Peter Clausen <lars@metafoo.de> for
other buses (i2c and spi).

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2012-04-04 09:25:42 -07:00
Wolfram Sang
0bf25a4538 Input: add support for LM8333 keypads
This driver adds support for the keypad part of the LM8333 and is
prepared for possible GPIO/PWM drivers. Note that this is not a MFD
because you cannot disable the keypad functionality which, thus,
has to be handled by the core anyhow.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2012-04-04 09:25:39 -07:00
Laxman Dewangan
6ffc327021 regulator: Add support for RICOH PMIC RC5T583 regulator
The RC5T583 PMIC from RICOH consists of 4 DCDC and 10
LDOs. This driver supports the control of different
regulator output through regulator interface.
This driver depends on MFD driver of RC5T583 and uses
mfd rc5t583 apis to communicate to device for accessing
different device's registers.

Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-04-04 11:48:02 +01:00
Mark Brown
65f26846b9 regulator: core: Constify the regulator_desc passed in when registering
Drivers should be able to declare their descriptors const and the framework
shouldn't ever be modifying the desciptor. Make the parameter and the
pointer in regulator_dev const to enforce this.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-04-04 11:43:26 +01:00
Richard Cochran
02eacbd0c4 ethtool: Add a common function for drivers with transmit time stamping.
Currently, most drivers do not support transmit SO_TIMESTAMPING. For those
that do support it, there is one appropriate response to the get_ts_info
query. This patch adds a common function providing this response.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 05:28:46 -04:00
Richard Cochran
c8f3a8c310 ethtool: Introduce a method for getting time stamping capabilities.
This commit adds a new ethtool ioctl that exposes the SO_TIMESTAMPING
capabilities of a network interface. In addition, user space programs
can use this ioctl to discover the PTP Hardware Clock (PHC) device
associated with the interface.

Since software receive time stamps are handled by the stack, the generic
ethtool code can answer the query correctly in case the MAC or PHY
drivers lack special time stamping features.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 05:28:45 -04:00
Richard Cochran
995a9090b2 ptp: Add a method for obtaining the device index.
This commit adds a method that MAC drivers may call in order to find out
the device number of their associated PTP Hardware Clock.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 05:28:45 -04:00
David S. Miller
9b461783d3 Merge branch 'master' of git://1984.lsi.us.es/net 2012-04-03 19:15:48 -04:00
Jiri Pirko
ffe06c17af filter: add XOR operation
Add XOR instruction fo BPF machine. Needed for computing packet hashes.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 18:36:20 -04:00
Jiri Pirko
302d663740 filter: Allow to create sk-unattached filters
Today, BPF filters are bind to sockets. Since BPF machine becomes handy
for other purposes, this patch allows to create unattached filter.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 18:36:20 -04:00
Haiyang Zhang
33be96e47c net/hyperv: Add flow control based on hi/low watermark
In the existing code, we only stop queue when the ringbuffer is full,
so the current packet has to be dropped or retried from upper layer.

This patch stops the tx queue when available ringbuffer is below
the low watermark. So the ringbuffer still has small amount of space
available for the current packet. This will reduce the overhead of
retries on sending.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 17:47:15 -04:00
Eric Dumazet
2def16ae6b net: fix /proc/net/dev regression
Commit f04565ddf5 (dev: use name hash for dev_seq_ops) added a second
regression, as some devices are missing from /proc/net/dev if many
devices are defined.

When seq_file buffer is filled, the last ->next/show() method is
canceled (pos value is reverted to value prior ->next() call)

Problem is after above commit, we dont restart the lookup at right
position in ->start() method.

Fix this by removing the internal 'pos' pointer added in commit, since
we need to use the 'loff_t *pos' provided by seq_file layer.

This also reverts commit 5cac98dd0 (net: Fix corruption
in /proc/*/net/dev_mcast), since its not needed anymore.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Mihai Maruseac <mmaruseac@ixiacom.com>
Tested-by:  Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 17:23:23 -04:00
Linus Torvalds
b61c37f579 lsm_audit: don't specify the audit pre/post callbacks in 'struct common_audit_data'
It just bloats the audit data structure for no good reason, since the
only time those fields are filled are just before calling the
common_lsm_audit() function, which is also the only user of those
fields.

So just make them be the arguments to common_lsm_audit(), rather than
bloating that structure that is passed around everywhere, and is
initialized in hot paths.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-03 09:49:59 -07:00
Eric Paris
48c62af68a LSM: shrink the common_audit_data data union
After shrinking the common_audit_data stack usage for private LSM data I'm
not going to shrink the data union.  To do this I'm going to move anything
larger than 2 void * ptrs to it's own structure and require it to be declared
separately on the calling stack.  Thus hot paths which don't need more than
a couple pointer don't have to declare space to hold large unneeded
structures.  I could get this down to one void * by dealing with the key
struct and the struct path.  We'll see if that is helpful after taking care of
networking.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-03 09:49:10 -07:00
Eric Paris
3b3b0e4fc1 LSM: shrink sizeof LSM specific portion of common_audit_data
Linus found that the gigantic size of the common audit data caused a big
perf hit on something as simple as running stat() in a loop.  This patch
requires LSMs to declare the LSM specific portion separately rather than
doing it in a union.  Thus each LSM can be responsible for shrinking their
portion and don't have to pay a penalty just because other LSMs have a
bigger space requirement.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-03 09:48:40 -07:00
Eric W. Biederman
57a39aa3e3 userns: Kill bogus declaration of function release_uids
There is no release_uids function remove the declaration from sched.h

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-04-03 04:28:52 -07:00
Linus Torvalds
ed359a3b7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Provide device string properly for USB i2400m wimax devices, also
    don't OOPS when providing firmware string.  From Phil Sutter.

 2) Add support for sh_eth SH7734 chips, from Nobuhiro Iwamatsu.

 3) Add another device ID to USB zaurus driver, from Guan Xin.

 4) Loop index start in pool vector iterator is wrong causing MAC to not
    get configured in bnx2x driver, fix from Dmitry Kravkov.

 5) EQL driver assumes HZ=100, fix from Eric Dumazet.

 6) Now that skb_add_rx_frag() can specify the truesize increment
    separately, do so in f_phonet and cdc_phonet, also from Eric
    Dumazet.

 7) virtio_net accidently uses net_ratelimit() not only on the kernel
    warning but also the statistic bump, fix from Rick Jones.

 8) ip_route_input_mc() uses fixed init_net namespace, oops, use
    dev_net(dev) instead.  Fix from Benjamin LaHaise.

 9) dev_forward_skb() needs to clear the incoming interface index of the
    SKB so that it looks like a new incoming packet, also from Benjamin
    LaHaise.

10) iwlwifi mistakenly initializes a channel entry as 2GHZ instead of
    5GHZ, fix from Stanislav Yakovlev.

11) Missing kmalloc() return value checks in orinoco, from Santosh
    Nayak.

12) ath9k doesn't check for HT capabilities in the right way, it is
    checking ht_supported instead of the ATH9K_HW_CAP_HT flag.  Fix from
    Sujith Manoharan.

13) Fix x86 BPF JIT emission of 16-bit immediate field of AND
    instructions, from Feiran Zhuang.

14) Avoid infinite loop in GARP code when registering sysfs entries.
    From David Ward.

15) rose protocol uses memcpy instead of memcmp in a device address
    comparison, oops.  Fix from Daniel Borkmann.

16) Fix build of lpc_eth due to dev_hw_addr_rancom() interface being
    renamed to eth_hw_addr_random().  From Roland Stigge.

17) Make ipv6 RTM_GETROUTE interpret RTA_IIF attribute the same way
    that ipv4 does.  Fix from Shmulik Ladkani.

18) via-rhine has an inverted bit test, causing suspend/resume
    regressions.  Fix from Andreas Mohr.

19) RIONET assumes 4K page size, fix from Akinobu Mita.

20) Initialization of imask register in sky2 is buggy, because bits are
    "or'd" into an uninitialized local variable.  Fix from Lino
    Sanfilippo.

21) Fix FCOE checksum offload handling, from Yi Zou.

22) Fix VLAN processing regression in e1000, from Jiri Pirko.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  sky2: dont overwrite settings for PHY Quick link
  tg3: Fix 5717 serdes powerdown problem
  net: usb: cdc_eem: fix mtu
  net: sh_eth: fix endian check for architecture independent
  usb/rtl8150 : Remove duplicated definitions
  rionet: fix page allocation order of rionet_active
  via-rhine: fix wait-bit inversion.
  ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4
  net: lpc_eth: Fix rename of dev_hw_addr_random
  net/netfilter/nfnetlink_acct.c: use linux/atomic.h
  rose_dev: fix memcpy-bug in rose_set_mac_address
  Fix non TBI PHY access; a bad merge undid bug fix in a previous commit.
  net/garp: avoid infinite loop if attribute already exists
  x86 bpf_jit: fix a bug in emitting the 16-bit immediate operand of AND
  bonding: emit event when bonding changes MAC
  mac80211: fix oper channel timestamp updation
  ath9k: Use HW HT capabilites properly
  MAINTAINERS: adding maintainer for ipw2x00
  net: orinoco: add error handling for failed kmalloc().
  net/wireless: ipw2x00: fix a typo in wiphy struct initilization
  ...
2012-04-02 17:53:39 -07:00
Laxman Dewangan
a4d9f179cc regulator: fixed: Support for open drain gpio pin
Adding flag on fixed regulator board configuration structure
to specify whether gpio is open drain type or not.
Passing this information to gpio library when requesting
gpio so that gpio driver can set the pin state accordingly,
for open drain type:
- Pin can be set HIGH as setting as input, PULL UP on
  pin make this as HIGH.
- Pin can be set LOW as setting it as output and drive to LOW.

The non-open drain pin can be  set HIGH/LOW by setting it to
output and driving it to HIGH/LOW.

Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-04-02 23:17:53 +01:00
Linus Torvalds
95694129b4 Merge branch 'paul' (Fixups from Paul Gortmaker)
This merges some of the fixes from Paul Gortmaker for the header file
cleanup fallout.

Some of the patches are going through arch maintainer trees, and David
Howells suggested another be done differently, but this at least fixes a
few cases.

* emailed from Paul Gortmaker <paul.gortmaker@windriver.com>:
  asm-generic: add linux/types.h to cmpxchg.h
  firewire: restore the device.h include in linux/firewire.h
  frv: fix warnings in mb93090-mb00/pci-dma.c about implicit EXPORT_SYMBOL
  parisc: fix missing cmpxchg file error from system.h split
  blackfin: fix cmpxchg build fails from system.h fallout
  avr32: fix build failures from mis-naming of atmel_nand.h
  ARM: mach-msm: fix compile fail from system.h fallout
  irq_work: fix compile failure on MIPS from system.h split
2012-04-02 14:41:43 -07:00
Paul Gortmaker
f68c56b7d2 firewire: restore the device.h include in linux/firewire.h
Commit 313162d0b8 ("device.h: audit and cleanup users in main include
dir") exchanged an include <linux/device.h> for a struct *device but in
actuality I misread this file when creating 313162d and it should have
remained an include.

There were no build regressions since all consumers were already getting
device.h anyway, but make it right regardless.

Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-02 14:41:27 -07:00
Paul Gortmaker
3d92e05118 avr32: fix build failures from mis-naming of atmel_nand.h
Commit bf4289cba0 ("ATMEL: fix nand ecc support") indicated that it
wanted to "Move platform data to a common header
include/linux/platform_data/atmel_nand.h" and the new header even had
re-include protectors with:

    #ifndef __ATMEL_NAND_H__

However, the file that was added was simply called atmel.h
and this caused avr32 defconfig to fail with:

  In file included from arch/avr32/boards/atstk1000/setup.c:22:
  arch/avr32/mach-at32ap/include/mach/board.h:10:44: error: linux/platform_data/atmel_nand.h: No such file or directory
  In file included from arch/avr32/boards/atstk1000/setup.c:22:
  arch/avr32/mach-at32ap/include/mach/board.h:121: warning: 'struct atmel_nand_data' declared inside parameter list
  arch/avr32/mach-at32ap/include/mach/board.h:121: warning: its scope is only this definition or declaration, which is probably not what you want
  make[2]: *** [arch/avr32/boards/atstk1000/setup.o] Error 1

It seems the scope of the file contents will expand beyond
just nand, so ignore the original intention, and fix up the
users who reference the bad name with the _nand suffix.

CC: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
CC: David Woodhouse <dwmw2@infradead.org>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-02 14:41:25 -07:00
Linus Torvalds
b1a808ff43 Merge branch 'for-next' of git://gitorious.org/kernel-hsi/kernel-hsi
Pull HSI (High Speed Synchronous Serial Interface) framework from Carlos Chinea:
 "The High Speed Synchronous Serial Interface (HSI) is a serial
  interface mainly used for connecting application engines (APE) with
  cellular modem engines (CMT) in cellular handsets.

  The framework is currently being used for some people and we would
  like to see it integrated into the kernel for 3.3.  There is no HW
  controller drivers in this pull, but some people have already some of
  them pending which they would like to push as soon as this integrated.
  I am also working on the acceptance for an TI OMAP one, based on a
  compatible legacy version of the interface called SSI."

Ok, so it didn't get into 3.3, but here it is pulled into 3.4.

Several people piped up to say "yeah, we want this".

* 'for-next' of git://gitorious.org/kernel-hsi/kernel-hsi:
  HSI: hsi_char: Update ioctl-number.txt
  HSI: Add HSI API documentation
  HSI: hsi_char: Add HSI char device kernel configuration
  HSI: hsi_char: Add HSI char device driver
  HSI: hsi: Introducing HSI framework
2012-04-02 09:50:40 -07:00
Linus Torvalds
8f6b7676ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 - Fix for CPU hotplug hang in padata.
 - Avoid using cpu_active inappropriately in pcrypt and padata.
 - Fix for user-space algorithm lookup hang with IV generators.
 - Fix for netlink dump of algorithms where stuff went missing due to
   incorrect calculation of message size.

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: user - Fix size of netlink dump message
  crypto: user - Fix lookup of algorithms with IV generator
  crypto: pcrypt - Use the online cpumask as the default
  padata: Fix cpu hotplug
  padata: Use the online cpumask as the default
  padata: Add a reference to the api documentation
2012-04-02 09:40:24 -07:00
Linus Torvalds
deb74f5ca1 Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPc+5PAAoJENkgDmzRrbjx8qwQAIRGDWGAJ7fiu8QBVbjycXJG
 7828enxrbBQodNmc+uAkYvTv3KEoi8tlweMsk/lWDv8WovZV4IlQDEFCX/f4hWVY
 S+2PmqJkN/alsG3dXd00zotK9mOJD+mQPAdjUBaNnRdp3QoV3YrjgihkWiL23DyT
 dZTgqXdbUJkHk/d9YD1qcDvWdSr1EufSLYa52PhLJqYiYVk8zCdX82deJX1MWh64
 v9I6htA73ORoX4JBGsFAOHO8fmLaq1yhBUMHOL4+gfEJVv4kSTU05GgepBHQP1fm
 BbG2hN6G4vqqiqhV5A59+h271o/2d/KBGKx8/twRGk8tNJIwTIVnr/qcGuUfytC3
 vA1fmq3vul0bzbqRgph8bGJyoVIg8CHjq24BFJQOXiQ1/6HOvjxnKBYs+3sVA829
 ZYQYuEoRKmTsD3vv3nmcqAdZZDzehBQ499bEqDNsnQRLOjOVNag/pJSaENkeVC4T
 CKYXt9BEabYnermPLdrjiabPE27GaEznX11SzCSXiWJsKX2kJnvz5RxVo8nlh1fc
 /KQWJyWi/QVmAdy4eCJFp48513BqncHvKtPZ6zN9+Y6NHKmnmAqieZhh4yV/SCqi
 EcK2oHQXmioKldn5DANQjeUCWlmEYXHbR08ahGRLNc7GZ1qKCgDr8+WEC0XYB/gQ
 XLH3KKLM+VmvtonqjDV7
 =W59/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://github.com/rustyrussell/linux

Pull cpumask cleanups from Rusty Russell:
 "(Somehow forgot to send this out; it's been sitting in linux-next, and
  if you don't want it, it can sit there another cycle)"

I'm a sucker for things that actually delete lines of code.

Fix up trivial conflict in arch/arm/kernel/kprobes.c, where Rusty fixed
a user of &cpu_online_map to be cpu_online_mask, but that code got
deleted by commit b21d55e98a ("ARM: 7332/1: extract out code patch
function from kprobes").

* tag 'for-linus' of git://github.com/rustyrussell/linux:
  cpumask: remove old cpu_*_map.
  documentation: remove references to cpu_*_map.
  drivers/cpufreq/db8500-cpufreq: remove references to cpu_*_map.
  remove references to cpu_*_map in arch/
2012-04-02 08:53:24 -07:00
Ben Greear
edbc0bb3fb net: Report dev->promiscuity in netlink reports.
The standard ways of probing a device's promiscuity
(ifi_flags, for instance) does not report the actual
state of the device.  This patch adds dev->promiscuity
to the netlink netdevice report so that users can know
for certain if the device is acting PROMISC or not.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:46 -04:00
David S. Miller
7cf7899d9e ipset: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:41 -04:00
Tejun Heo
959d851caa Merge branch 'for-3.5' of ../cgroup into block/for-3.5/core-merged
cgroup/for-3.5 contains the following changes which blk-cgroup needs
to proceed with the on-going cleanup.

* Dynamic addition and removal of cftypes to make config/stat file
  handling modular for policies.

* cgroup removal update to not wait for css references to drain to fix
  blkcg removal hang caused by cfq caching cfqgs.

Pull in cgroup/for-3.5 into block/for-3.5/core.  This causes the
following conflicts in block/blk-cgroup.c.

* 761b3ef50e "cgroup: remove cgroup_subsys argument from callbacks"
  conflicts with blkiocg_pre_destroy() addition and blkiocg_attach()
  removal.  Resolved by removing @subsys from all subsys methods.

* 676f7c8f84 "cgroup: relocate cftype and cgroup_subsys definitions in
  controllers" conflicts with ->pre_destroy() and ->attach() updates
  and removal of modular config.  Resolved by dropping forward
  declarations of the methods and applying updates to the relocated
  blkio_subsys.

* 4baf6e3325 "cgroup: convert all non-memcg controllers to the new
  cftype interface" builds upon the previous item.  Resolved by adding
  ->base_cftypes to the relocated blkio_subsys.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01 12:55:00 -07:00
Tejun Heo
48ddbe1946 cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.

This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).

Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.

Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.

  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251

Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
2012-04-01 12:09:56 -07:00
Tejun Heo
28b4c27b8e cgroup: use negative bias on css->refcnt to block css_tryget()
When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.

This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.

This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.

This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.

css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().

This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01 12:09:56 -07:00
Tejun Heo
79578621b4 cgroup: implement cgroup_rm_cftypes()
Implement cgroup_rm_cftypes() which removes an array of cftypes from a
subsystem.  It can be called whether the target subsys is attached or
not.  cgroup core will remove the specified file from all existing
cgroups.

This will be used to improve sub-subsys modularity and will be helpful
for unified hierarchy.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01 12:09:56 -07:00
Tejun Heo
05ef1d7c4a cgroup: introduce struct cfent
This patch adds cfent (cgroup file entry) which is the association
between a cgroup and a file.  This is in-cgroup representation of
files under a cgroup directory.  This simplifies walking walking
cgroup files and thus cgroup_clear_directory(), which is now
implemented in two parts - cgroup_rm_file() and a loop around it.

cgroup_rm_file() will be used to implement cftype removal and cfent is
scheduled to serve cgroup specific per-file data (e.g. for sysfs-like
"sever" semantics).

v2: - cfe was freed from cgroup_rm_file() which led to use-after-free
      if the file had openers at the time of removal.  Moved to
      cgroup_diput().

    - cgroup_clear_directory() triggered WARN_ON_ONCE() if d_subdirs
      wasn't empty after removing all files.  This triggered
      spuriously if some files were open during directory clearing.
      Removed.

v3: - In cgroup_diput(), WARN_ONCE(!list_empty(&cfe->node)) could be
      spuriously triggered for root cgroups because they don't go
      through cgroup_clear_directory() on unmount.  Don't trigger WARN
      for root cgroups.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
2012-04-01 12:09:56 -07:00