android_kernel_msm-6.1_noth.../drivers/base
David Hildenbrand 3fcebf9020 mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy
Currently, the "auto-movable" online policy does not allow for hotplugged
KERNEL (ZONE_NORMAL) memory to increase the amount of MOVABLE memory we
can have, primarily, because there is no coordiantion across memory
devices and we don't want to create zone-imbalances accidentially when
unplugging memory.

However, within a single memory device it's different.  Let's allow for
KERNEL memory within a dynamic memory group to allow for more MOVABLE
within the same memory group.  The only thing we have to take care of is
that the managing driver avoids zone imbalances by unplugging MOVABLE
memory first, otherwise there can be corner cases where unplug of memory
could result in (accidential) zone imbalances.

virtio-mem is the only user of dynamic memory groups and recently added
support for prioritizing unplug of ZONE_MOVABLE over ZONE_NORMAL, so we
don't need a new toggle to enable it for dynamic memory groups.

We limit this handling to dynamic memory groups, because:

* We want to keep the runtime overhead for collecting stats when
  onlining a single memory block small.  We tend to have only a handful of
  dynamic memory groups, but we can have quite some static memory groups
  (e.g., 256 DIMMs).

* It doesn't make too much sense for static memory groups, as we try
  onlining all applicable memory blocks either completely to ZONE_MOVABLE
  or not.  In ordinary operation, we won't have a mixture of zones within
  a static memory group.

When adding memory to a dynamic memory group, we'll first online memory to
ZONE_MOVABLE as long as early KERNEL memory allows for it.  Then, we'll
online the next unit(s) to ZONE_NORMAL, until we can online the next
unit(s) to ZONE_MOVABLE.

For a simple virtio-mem device with a MOVABLE:KERNEL ratio of 3:1, it will
result in a layout like:

  [M][M][M][M][M][M][M][M][N][M][M][M][N][M][M][M]...
  ^ movable memory due to early kernel memory
			   ^ allows for more movable memory ...
			      ^-----^ ... here
				       ^ allows for more movable memory ...
				          ^-----^ ... here

While the created layout is sub-optimal when it comes to contiguous zones,
it gives us the maximum flexibility when dynamically growing/shrinking a
device; we can grow small VMs really big in small steps, and still shrink
reliably to e.g., 1/4 of the maximum VM size in this example, removing
full memory blocks along with meta data more reliably.

Mark dynamic memory groups in the xarray such that we can efficiently
iterate over them when collecting stats.  In usual setups, we have one
virtio-mem device per NUMA node, and usually only a small number of NUMA
nodes.

Note: for now, there seems to be no compelling reason to make this
behavior configurable.

Link: https://lkml.kernel.org/r/20210806124715.17090-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hui Zhu <teawater@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
..
firmware_loader firmware_loader: fix use-after-free in firmware_fallback_sysfs 2021-07-29 17:22:15 +02:00
power PM: domains: Improve runtime PM performance state handling 2021-08-25 20:15:54 +02:00
regmap regmap: mdio: Reject invalid addresses 2021-06-14 15:00:29 +01:00
test device property: Remove some casts in property-entry-test 2021-06-23 16:37:21 -06:00
arch_numa.c arch_numa: fix common code printing of phys_addr_t 2021-02-18 23:18:04 -08:00
arch_topology.c arch_topology: Avoid use-after-free for scale_freq_data 2021-07-01 07:32:14 +05:30
attribute_container.c driver core: attribute_container: fix W=1 warnings 2021-05-14 13:37:10 +02:00
auxiliary.c driver core: auxiliary bus: Fix memory leak when driver_register() fail 2021-07-21 16:36:06 +02:00
base.h driver core: Export device_driver_attach() 2021-06-21 15:29:24 -06:00
bus.c driver core: Flow the return code from ->probe() through to sysfs bind 2021-06-21 15:29:24 -06:00
cacheinfo.c drivers core: Use sysfs_emit for shared_cpu_map_show and shared_cpu_list_show 2020-10-02 13:24:40 +02:00
class.c drivers: base: fix some kernel-doc markups 2020-11-09 18:56:49 +01:00
component.c component: Rename 'dev' to 'parent' 2021-05-27 15:49:59 +02:00
container.c
core.c PCI/MSI: Protect msi_desc::masked for multi-MSI 2021-08-10 10:59:20 +02:00
cpu.c drivers/base: Constify static attribute_group structs 2021-06-04 15:06:28 +02:00
dd.c drivers core: Fix oops when driver probe fails 2021-07-27 14:44:43 +02:00
devcoredump.c devcoredump: remove contact information 2021-06-04 15:05:44 +02:00
devres.c devres: Enable trace events 2021-06-15 17:14:36 +02:00
devtmpfs.c devtmpfs: actually reclaim some init memory 2021-03-23 14:57:35 +01:00
driver.c drivers: base: Convert to printk alias functions 2020-07-10 14:16:44 +02:00
firmware.c
hypervisor.c
init.c driver core: auxiliary bus: Fix calling stage for auxiliary bus init 2021-02-11 08:43:03 +01:00
isa.c isa: Make the remove callback for isa drivers return void 2021-01-26 07:42:27 +01:00
Kconfig RISC-V Patches for the 5.12 Merge Window 2021-02-26 10:28:35 -08:00
Makefile devres: Enable trace events 2021-06-15 17:14:36 +02:00
map.c
memory.c mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy 2021-09-08 11:50:23 -07:00
module.c
node.c mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE 2021-09-08 11:50:22 -07:00
pinctrl.c
platform-msi.c platform-msi: fix kernel-doc warnings 2021-04-02 16:40:08 +02:00
platform.c drivers/base: Constify static attribute_group structs 2021-06-04 15:06:28 +02:00
property.c Driver core changes for 5.14-rc1 2021-07-05 13:51:41 -07:00
soc.c soc: fix comment for freeing soc_dev_attr 2020-12-09 19:46:31 +01:00
swnode.c software node: Handle software node injection to an existing device properly 2021-06-23 19:34:58 +02:00
syscore.c syscore: Use pm_pr_dbg() for syscore_{suspend,resume}() 2020-09-08 13:32:06 +02:00
topology.c drivers core: Miscellaneous changes for sysfs_emit 2020-10-02 13:12:07 +02:00
trace.c devres: Enable trace events 2021-06-15 17:14:36 +02:00
trace.h devres: Enable trace events 2021-06-15 17:14:36 +02:00
transport_class.c scsi: drivers: base: Propagate errors through the transport component 2020-01-15 22:55:37 -05:00