To balance memory usage and performance, GenieZone supports larger
granularity demand paging, called block-based demand paging.
Gzvm driver uses enable_cap to query the hypervisor if it supports
block-based demand paging and the given granularity or not. Meanwhile,
the gzvm driver allocates a shared buffer for storing the physical
pages later.
If the hypervisor supports, every time the gzvm driver handles guest
page faults, it allocates more memory in advance (default: 2MB) for
demand paging. And fills those physical pages into the allocated shared
memory, then calls the hypervisor to map to guest's memory.
The physical pages allocated for block-based demand paging is not
necessary to be contiguous because in many cases, 2MB block is not
followed. 1st, the memory is allocated because of VMM's page fault
(VMM loads kernel image to guest memory before running). In this case,
the page is allocated by the host kernel and using PAGE_SIZE. 2nd is
that guest may return memory to host via ballooning and that is still
4KB (or PAGE_SIZE) granularity. Therefore, we do not have to allocate
physically contiguous 2MB pages.
Change-Id: Id101da5a8ff73dbf5c9a3ab03526cdf5d2f6006e
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: kevenny hsieh <kevenny.hsieh@mediatek.com>
Signed-off-by: Liju-Clr Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 301179926
Link: https://lore.kernel.org/all/20230919111210.19615-16-yi-de.wu@mediatek.com/
This page fault handler helps GenieZone hypervisor to do demand paging.
On a lower level translation fault, GenieZone hypervisor will first
check the fault GPA (guest physical address or IPA in ARM) is valid
e.g. within the registered memory region, then it will setup the
vcpu_run->exit_reason with necessary information for returning to
gzvm driver.
With the fault information, the gzvm driver looks up the physical
address and call the MT_HVC_GZVM_MAP_GUEST to request the hypervisor
maps the found PA to the fault GPA (IPA).
There is one exception, for protected vm, we will populate full VM's
memory region in advance in order to improve performance.
Change-Id: Ia295ef4c43201d731202d03e74889fa6403e7176
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Jerry Wang <ze-yu.wang@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 301179926
Link: https://lore.kernel.org/all/20230919111210.19615-14-yi-de.wu@mediatek.com/
- Consolidate address translation functions to gzvm_mmu.c.
- To improve performance (especially for booting) we allocate memory
pages in advance for protected VM. When the virtual machine monitor
(e.g., crosvm) sets the virtual machine as protected via
enable_cap ioctl, we allocate full memory pages in advance for the VM.
Change-Id: Ifee061e55e8cfa0e46d2b6dd4b84388a875580cb
Signed-off-by: Jerry Wang <ze-yu.wang@mediatek.com>
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju-Clr Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 301179926
Link: https://lore.kernel.org/all/20230919111210.19615-12-yi-de.wu@mediatek.com/
Refactor the implementation of virtual gic as below.
- Remove the codes related to vgic ppi because the virtual interrupt
injection interface is only exposed to VMM which does peripheral
device emulation, we can assume only spi is used.
- Simplify the api for virtual interrupt injection.
Change-Id: I18ece99f8a678b72483a93ab47b080871d2bb81c
Signed-off-by: kevenny hsieh <kevenny.hsieh@mediatek.com>
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju-Clr Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 301179926
Link: https://lore.kernel.org/all/20230919111210.19615-9-yi-de.wu@mediatek.com/
The used definitions related to GZVM_REG_* are below and move them to
uapi header file `gzvm.h`.
- Keep `GZVM_REG_SIZE_SHIFT` and `GZVM_REG_SIZE_MASK` so that we can
determine the reg size.
- The other definitions related to GZVM_REG_* are used by crosvm.
- Rename the definitions to make it easy to understand.
Also removed the unused definitions related to GZVM_REG_* in
gzvm_arch.h and fixed sparse warnings.
Change-Id: I7f709f2532b95d418d22beeb4a46257afa4d92f2
Signed-off-by: kevenny hsieh <kevenny.hsieh@mediatek.com>
Signed-off-by: Yingshiuan Pan <yingshiuan.pan@mediatek.com>
Signed-off-by: Liju-Clr Chen <liju-clr.chen@mediatek.com>
Signed-off-by: Yi-De Wu <yi-de.wu@mediatek.com>
Bug: 301179926
Link: https://lore.kernel.org/all/20230919111210.19615-8-yi-de.wu@mediatek.com/
The purpose of the symbol is for getting runnable latency.
Bug: 275806676
Change-Id: I34d8450df16f63d133aaf3e9773a1f5436cbae58
Signed-off-by: Ziyi Cui <ziyic@google.com>
Get direct reclaim info.
Bug: 190795589
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: Ie66a3c87484a364a918c19b8e044c82f1afd6749
Signed-off-by: Jack Lee <liangjlee@google.com>
(cherry picked from commit d705ab99ab73b3dd4f2131d69277870d2f48163b)
Since Android has pcp list for MIGRATE_CMA[1], it could cause
CMA allocation latency due to not freeing the MIGRATE_ISOLATE
page immediately.
Originally, MIGRATE_ISOLATED page is supposed to go buddy list
with skipping pcp list. Otherwise, the page could be reallocated
from pcp list or staying on the pcp list until the pcp is drained
so that CMA keeps retrying since it couldn't find the freed page
from buddy list. That worked before since the CMA pfnblocks changed
only from MIGRATE_CMA to MIGRATE_ISOLATE and free function logic
in page allocator has checked MIGRATE_ISOLATEness on every CMA
pages using below.
free_unref_page_commit
if (migratetype >= MIGRATE_PCPTYPES)
if(is_migrate_isolate(migratetype))
free_one_page(page);
It worked since enum MIGRATE_CMA was bigger than enum
MIGRATE_PCPTYPES but since [1], the enum MIGRATE_CMA is less than
MIGRATE_PCPTYPES so the logic above doesn't work any more.
It could cause following race
CPU 0 CPU 1
free_unref_page
migratetype = get_pfnblock_migratetype()
set_pcppage_migratetype(MIGRATE_CMA)
cma_alloc
alloc_contig_range
set_migrate_isolate(MIGRATE_ISOLATE)
add the page into pcp list
the page could be reallocated
This patch couldn't fix the race completely due to missing zone->lock
in order-0 page free(for performance reason). However, it's not a new
problem so we need to deal with the issue separately.
[1] ANDROID: mm: add cma pcp list
Bug: 218731671
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ibea20085ce5bfb4b74b83b041f9bda9a380120f9
(cherry picked from commit d9e4b67784866047e8cfb5598cdf1ebc0c71f3d9)
Signed-off-by: Richard Chang <richardycc@google.com>
Allow pKVM to set device attributes (nGnRE) on stage-2 entries when
KVM_PGTABLE_PROT_DEVICE is used.
Bug: 303529066
Change-Id: I19ddbd627cb67fb4ad295af6ea5fff129d7a94f7
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
The hyp tracing support depends on CONFIG_TRACING, not CONFIG_FTRACE.
Also, TRACING might be selected while FTRACE is not leading to a build
error.
Bug: 306320920
Change-Id: I69614b6d1eb0e3d9013e00c2d10836b37034b929
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Current implementation of iommu busses driver has temporarily enforce global restriction to a single driver.
This was already the de-facto behaviour, since any possible combination of existing drivers would compete
for at least the PCI or platform bus. Due to this restriction we are not able to probe SMMU v3 driver for PCI bus.
There is an ongoing work in upstream(https://lore.kernel.org/linux-iommu/cover.1696253096.git.robin.murphy@arm.com/#t)
to fix this but we can't backport now as still review in progress.
However, Some of our targets have both SMMU v2 (used by all peripherals except PCIe) and SMMU v3 (PCie) and
they are expected to co-exit in current kernel version.
To cater this requirement, we have implemented a vendor hook, which will skip the pci bus probe for smmuv2 and in
smmuv3 skips all iommu buses execept pcie bus.
Bug: 305648820
Signed-off-by: Venkata Rao Kakani <vkakani@qti.qualcomm.com>
Change-Id: I9304962a7fc7afad93295cc08b3c68f8e340ffe8
commit e99476497687ef9e850748fe6d232264f30bc8f9 upstream.
sctp_mt_check doesn't validate the flag_count field. An attacker can
take advantage of that to trigger a OOB read and leak memory
information.
Add the field validation in the checkentry function.
Bug: 304913898
Fixes: 2e4e6a17af ("[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables")
Cc: stable@vger.kernel.org
Reported-by: Lucas Leong <wmliang@infosec.exchange>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4921f9349b)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ife4e69f6218fdaca2a8647b5ed00d875a5ed0d34
lru_gen_rotate_memcg() can happen in softirq if memory.soft_limit_in_bytes
is set. This requires memcg_lru->lock to be irq safe. Lockdep warns on
this.
This problem only affects memcg v1.
Bug: 254441685
Link: https://lkml.kernel.org/r/20230619193821.2710944-1-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: syzbot+87c490fd2be656269b6a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=87c490fd2be656269b6a
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 814bc1de03ea4361101408e63a68e4b82aef22cb)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I8ae02e92135faad78be6509c3bf18109b0f97a13
Some qdiscs and classifiers have recently been retired from kernel.
However, tc-testing config is still cluttered with them which causes noise
when using merge_config.sh script to update existing config for tc-testing
compatibility. Remove the config settings for affected qdiscs and
classifiers.
Bug: 254441685
Fixes: fb38306ceb9e ("net/sched: Retire ATM qdisc")
Fixes: 051d44209842 ("net/sched: Retire CBQ qdisc")
Fixes: bbe77c14ee61 ("net/sched: Retire dsmark qdisc")
Fixes: 265b4da82dbf ("net/sched: Retire rsvp classifier")
Fixes: 8c710f75256b ("net/sched: Retire tcindex classifier")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 11b8b2e70a9b4a1b60eefc0cd79cd2d3c08545f1)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ifd5de69f462686075f43c7c98937eb7999a4bd17
This patch adds GKI symbol list for Exynosauto SoC. We need to add
below 3 function symbols and it required by DRM(Direct Rendering Manager)
driver.
3 function symbol(s) added
'void display_timings_release(struct display_timings*)'
'struct display_timings* of_get_display_timings(const struct device_node*)'
'int videomode_from_timings(const struct display_timings*, struct videomode*, unsigned int)'
Bug: 305126879
Change-Id: Ieaf3b82e18c0a3a90c274ed752af8ed84df5c150
Signed-off-by: Hoyoung Lee <hy_fifty.lee@samsung.com>
Currently, while calculating residency and latency values, right
operands may overflow if resulting values are big enough.
To prevent this, albeit unlikely case, play it safe and convert
right operands to left ones' type s64.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Bug: 296029082
Fixes: 30f604283e ("PM / Domains: Allow domain power states to be read from DT")
Change-Id: Id0355d95ff18dc2273fca719aa64e2d32b1f9da5
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit e5d1c8722083f0332dcd3c85fa1273d85fb6bed8)
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Add vendor hook for compaction begin/end. The first use would be
to measure compaction durations.
Bug: 229927848
Test: echo 1 > /proc/sys/vm/compact_memory and observe output change in
/sys/kernel/pixel_stat/mm/compaction/mm_compaction_duration
Signed-off-by: Robin Hsu <robinhsu@google.com>
Change-Id: I3d95434bf49b37199056dc9ddfc36a59a7de17b7
Signed-off-by: Richard Chang <richardycc@google.com>
(cherry picked from commit 13b6bd38bb1f43bfffdb08c8f3a4a20d36ccd670)
Signed-off-by: liangjlee <liangjlee@google.com>
commit 69c5d284f67089b4750d28ff6ac6f52ec224b330 upstream.
The xt_u32 module doesn't validate the fields in the xt_u32 structure.
An attacker may take advantage of this to trigger an OOB read by setting
the size fields with a value beyond the arrays boundaries.
Add a checkentry function to validate the structure.
This was originally reported by the ZDI project (ZDI-CAN-18408).
Bug: 304913716
Fixes: 1b50b8a371 ("[NETFILTER]: Add u32 match")
Cc: stable@vger.kernel.org
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1c164c1e9e)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ic2ff70b303f55f9c3c5db24295bcb223ed7175a7
[ Upstream commit f4f8a7803119005e87b716874bec07c751efafec ]
The opt_num field is controlled by user mode and is not currently
validated inside the kernel. An attacker can take advantage of this to
trigger an OOB read and potentially leak information.
BUG: KASAN: slab-out-of-bounds in nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
Read of size 2 at addr ffff88804bc64272 by task poc/6431
CPU: 1 PID: 6431 Comm: poc Not tainted 6.0.0-rc4 #1
Call Trace:
nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
nf_osf_find+0x186/0x2f0 net/netfilter/nfnetlink_osf.c:281
nft_osf_eval+0x37f/0x590 net/netfilter/nft_osf.c:47
expr_call_ops_eval net/netfilter/nf_tables_core.c:214
nft_do_chain+0x2b0/0x1490 net/netfilter/nf_tables_core.c:264
nft_do_chain_ipv4+0x17c/0x1f0 net/netfilter/nft_chain_filter.c:23
[..]
Also add validation to genre, subtype and version fields.
Bug: 304913642
Fixes: 11eeef41d5 ("netfilter: passive OS fingerprint xtables match")
Reported-by: Lucas Leong <wmliang@infosec.exchange>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7bb8d52b42)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: If79c79e3f55de8c81b70c19661cb0084b02c3da2
[ Upstream commit 0113d9c9d1ccc07f5a3710dac4aa24b6d711278c ]
Currently, we assume the skb is associated with a device before calling
__ip_options_compile, which is not always the case if it is re-routed by
ipvs.
When skb->dev is NULL, dev_net(skb->dev) will become null-dereference.
This patch adds a check for the edge case and switch to use the net_device
from the rtable when skb->dev is NULL.
Bug: 304913674
Fixes: ed0de45a10 ("ipv4: recompile ip options in ipv4_link_failure")
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Cc: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2712545e53)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie840ff3351e487f7095c49fac4fdd1e81021a982
commit 265b4da82dbf5df04bee5a5d46b7474b1aaf326a upstream.
The rsvp classifier has served us well for about a quarter of a century but has
has not been getting much maintenance attention due to lack of known users.
Bug: 304913975
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 08569c92f7f339de21b7a68d43d6795fc0aa24f2)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I4716954c5e8d5414a580eb34c699908028aa754b
When a broken USB accessory connects to a USB host, usbcore might
keep doing enumeration retries. If the host has a watchdog mechanism,
the kernel panic will happen on the host.
This patch provides an attribute early_stop to limit the numbers of retries
for each port of a hub. If a port was marked with early_stop attribute,
unsuccessful connection attempts will fail quickly. In addition, if an
early_stop port has failed to initialize, it will ignore all future
connection events until early_stop attribute is clear.
Signed-off-by: Ray Chi <raychi@google.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20221107072754.3336357-1-raychi@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 282876796
Change-Id: I48eff1dbbc341ef893c8abc20953b7e9a62244da
(cherry picked from commit 430d57f53eb1cdbf9ba9bbd397317912b3cd2de5)
Signed-off-by: Ray Chi <raychi@google.com>
(cherry picked from commit 278999b347af30f37e67314d7a198478b89be2dc)
__kvm_hyp_host_forward_smc() forwards SMCs to EL3, which means we
exit and enter the hypervisor without tracing those.
Add missing hyp events.
Bug: 304445720
Change-Id: I0b66c37f1521702764b12c038324c3fec3e499a6
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Store the hyp address in kvm_arm_hyp_host_fp_state[], to avoid having to
calculate it with kern_hyp_va() on every access.
Bug: 303684934
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I52902c297f9b957a8d035be942e3cbeb32fed0a2
Allocate and map hyp memory to maintain the host's fp/simd state,
which is also used for SVE and SME, later in the initialization
process. The amount of memory needed to track the host's state
varies depending on the number of cpus in the system, whether
there's SVE support, as well as the SVE vector size. Much of the
state needed to extract this information isn't initialized yet at
kvm_hyp_reserve().
Fixes: 6dc9af85f7 ("ANDROID: KVM: arm64: Allocate host fp state at pkvm init rather than per cpu")
Bug: 303684934
Signed-off-by: Fuad Tabba <tabba@google.com>
Change-Id: I744be685a107ddd92c6975bafb0149aebad7bb55
[ Upstream commit f15f29fd4779be8a418b66e9d52979bb6d6c2325 ]
Chain binding only requires the rule addition/insertion command within
the same transaction. Removal of rules from chain bindings within the
same transaction makes no sense, userspace does not utilize this
feature. Replace nft_chain_is_bound() check to nft_chain_binding() in
rule deletion commands. Replace command implies a rule deletion, reject
this command too.
Rule flush command can also safely rely on this nft_chain_binding()
check because unbound chains are not allowed since 62e1e94b246e
("netfilter: nf_tables: reject unbound chain set before commit phase").
Bug: 302085977
Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9af8bb2afe)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I8b05dc37062824db4c2901000fdf701b38605d32
commit e6e43b8aa7cd3c3af686caf0c2e11819a886d705 upstream.
Forget to reset ctx->password to NULL will lead to bug like double free
Bug: 303146572
Cc: stable@vger.kernel.org
Cc: Willy Tarreau <w@1wt.eu>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Quang Le <quanglex97@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f555a50808)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Iec1eb857124e3b6ffa6dbbeb5f796087a6194057
Adding the following symbols:
- kmemdup_nul
Bug: 304675894
Change-Id: Ib6ab20a2c034d3e8dc9aff7384876d10468cd15b
Signed-off-by: David Chiang <davidchiang@google.com>
Currently if ucsi_send_command() fails, then we bail out without
clearing EVENT_PENDING flag. So when the next connector change
event comes, ucsi_connector_change() won't queue the con->work,
because of which none of the new events will be processed.
Fix this by clearing EVENT_PENDING flag if ucsi_send_command()
fails.
Cc: stable@vger.kernel.org # 5.16
Fixes: 512df95b94 ("usb: typec: ucsi: Better fix for missing unplug events issue")
Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/1694423055-8440-1-git-send-email-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 304466904
(cherry picked from commit a00e197daec52bcd955e118f5f57d706da5bfe50
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/ usb-linus)
Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
Change-Id: I4d3eef684a04e73b060cf242c5943c4ac7e05b2e
While backporting, a check for vma locking inside do_wp_page() was
missed. Add it.
Fixes: 3ebafb7b46 ("BACKPORT: FROMGIT: mm: handle faults that merely update the accessed bit under the VMA lock")
Bug: 293665307
Change-Id: Ibd7f21ae8fec7b8edc6e3d88954714b5fad41516
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
After commit f5d39b0208 ("freezer,sched: Rewrite core freezer logic"),
tasks that transition directly from TASK_FREEZABLE to TASK_FROZEN are
always woken up on the thaw path. Prior to that commit, tasks could ask
freezer to consider them "frozen enough" via freezer_do_not_count(). The
commit replaced freezer_do_not_count() with a TASK_FREEZABLE state which
allows freezer to immediately mark the task as TASK_FROZEN without
waking up the task. This is efficient for the suspend path, but on the
thaw path, the task is always woken up even if the task didn't need to
wake up and goes back to its TASK_(UN)INTERRUPTIBLE state. Although
these tasks are capable of handling of the wakeup, we can observe a
power/perf impact from the extra wakeup.
We observed on Android many tasks wait in the TASK_FREEZABLE state
(particularly due to many of them being binder clients). We observed
nearly 4x the number of tasks and a corresponding linear increase in
latency and power consumption when thawing the system. The latency
increased from ~15ms to ~50ms.
Avoid the spurious wakeups by saving the state of TASK_FREEZABLE tasks.
If the task was running before entering TASK_FROZEN state
(__refrigerator()) or if the task received a wake up for the saved
state, then the task is woken on thaw. saved_state from PREEMPT_RT locks
can be re-used because freezer would not stomp on the rtlock wait flow:
TASK_RTLOCK_WAIT isn't considered freezable.
Reported-by: Prakash Viswalingam <quic_prakashv@quicinc.com>
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 8f0eed4a78a81668bc78923ea09f51a7a663c2b0)
(cherry picked from commit e4d93065a5085dbb862aa4bd06fb3e51b02e8857
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Bug: 292064955
Change-Id: I121cfff46536a13e59b5eb60842984aed0d73faa
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
In preparation for freezer to also use saved_state, remove the
CONFIG_PREEMPT_RT compilation guard around saved_state.
On the arm64 platform I tested which did not have CONFIG_PREEMPT_RT,
there was no statistically significant deviation by applying this patch.
Test methodology:
perf bench sched message -g 40 -l 40
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit fa14aa2c23d31eb39bc615feb920f28d32d2a87e
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Bug: 292064955
Change-Id: I9c11ab7ce31ba3b48b304229898d4c7c18a6cb2c
[eberman: Use KABI reservation to preserve CRC/ABI of struct task_struct and
preserved raw_spin_(un)lock instead of new guard(...) syntax in task_state_match]
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Set the block size to that specified in on-disk superblock.
Also remove the hard constraint of PAGE_SIZE block size for the
uncompressed device backend. This constraint is temporarily remained
for compressed device and fscache backend, as there is more work needed
to handle the condition where the block size is not equal to PAGE_SIZE.
It is worth noting that the on-disk block size is read prior to
erofs_superblock_csum_verify(), as the read block size is needed in the
latter.
Besides, later we are going to make erofs refer to tar data blobs (which
is 512-byte aligned) for OCI containers, where the block size is 512
bytes. In this case, the 512-byte block size may not be adequate for a
directory to contain enough dirents. To fix this, we are also going to
introduce directory block size independent on the block size.
Due to we have already supported block size smaller than PAGE_SIZE now,
disable all these images with such separated directory block size until
we supported this feature later.
Bug: 303691233
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230313135309.75269-3-jefflexu@linux.alibaba.com
[ Gao Xiang: update documentation. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(cherry picked from commit d3c4bdcc756e60b95365c66ff58844ce75d1c8f8)
[dhavale: resolved minor conflict in Documentation/filesystems/erofs.rst]
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Change-Id: I9a46bfb7ec9f79751e0df8f9d6369192dc861736
As the first step of converting hardcoded blocksize to that specified in
on-disk superblock, convert all call sites of hardcoded blocksize to
sb->s_blocksize except for:
1) use sbi->blkszbits instead of sb->s_blocksize in
erofs_superblock_csum_verify() since sb->s_blocksize has not been
updated with the on-disk blocksize yet when the function is called.
2) use inode->i_blkbits instead of sb->s_blocksize in erofs_bread(),
since the inode operated on may be an anonymous inode in fscache mode.
Currently the anonymous inode is allocated from an anonymous mount
maintained in erofs, while in the near future we may allocate anonymous
inodes from a generic API directly and thus have no access to the
anonymous inode's i_sb. Thus we keep the block size in i_blkbits for
anonymous inodes in fscache mode.
Be noted that this patch only gets rid of the hardcoded blocksize, in
preparation for actually setting the on-disk block size in the following
patch. The hard limit of constraining the block size to PAGE_SIZE still
exists until the next patch.
Bug: 303691233
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230313135309.75269-2-jefflexu@linux.alibaba.com
[ Gao Xiang: fold a patch to fix incorrect truncated offsets. ]
Link: https://lore.kernel.org/r/20230413035734.15457-1-zhujia.zj@bytedance.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(cherry picked from commit 3acea5fc335420ba7ef53947cf2d98d07fac39f7)
[dhavale: resolved few conflicts due to missing other features]
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Change-Id: I4751529e42e4a646b1a2dda75981cdc41b39b6d4
erofs_inode_datablocks() has the only one caller, let's just get
rid of it entirely. No logic changes.
Bug: 303691233
Change-Id: I15f4e5df8ddd53c570408cc80b255b6934c06fdb
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Change-Id: I96195a960204c313649c510766e6a54d49a01784
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230204093040.97967-1-hsiangkao@linux.alibaba.com
(cherry picked from commit 4efdec36dc9907628e590a68193d6d8e5e74d032)
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Actually we could pass in inodes directly to clean up all callers.
Also rename iloc() as erofs_iloc().
Bug: 303691233
Link: https://lore.kernel.org/r/20230114150823.432069-1-xiang@kernel.org
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(cherry picked from commit b780d3fc6107464dcc43631a6208c43b6421f1e6)
[dhavale: resolved minor conflict in fs/erofs/zmap.c]
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Change-Id: Iea7a97040cebdc984e2956d421755230263c97ae
Adding the following symbols:
- reserve_iova
Bug: 303000855
Change-Id: I84b0eeb179b4194d0d8294b789f2cecc388f0963
Signed-off-by: Junghoon Jang <junghoonjang@google.com>
To monitor the reclaiming ability of kswapd, add vendor hook recording when the kswapd finish the reclaiming job and the reclaim progress.
android_vh_vmscan_kswpad_done(int, unsigned int, unsigned int, unsigned int)
Bug: 301044280
Change-Id: Id6e0a97003f0a156cff4d0996bc38bcd89b1dc69
Signed-off-by: John Hsu <john.hsu@mediatek.com>