From cc647e05db6729084914bbea54df9389d16b9d76 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Wed, 5 Apr 2023 18:02:35 +0200 Subject: [PATCH 01/28] mm/userfaultfd: fix uffd-wp handling for THP migration entries commit 24bf08c4376be417f16ceb609188b16f461b0443 upstream. Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()") similarly applies to THP. Setting/clearing uffd-wp on THP migration entries is not implemented properly. Further, while removing migration PMDs considers the uffd-wp bit, inserting migration PMDs does not consider the uffd-wp bit. We have to set/clear independently of the migration entry type in change_huge_pmd() and properly copy the uffd-wp bit in set_pmd_migration_entry(). Verified using a simple reproducer that triggers migration of a THP, that the set_pmd_migration_entry() no longer loses the uffd-wp bit. Link: https://lkml.kernel.org/r/20230405160236.587705-2-david@redhat.com Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") Signed-off-by: David Hildenbrand Reviewed-by: Peter Xu Cc: Cc: Muhammad Usama Anjum Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/huge_memory.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b8d654963df8..cfe8d77af7ab 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1805,10 +1805,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, if (is_swap_pmd(*pmd)) { swp_entry_t entry = pmd_to_swp_entry(*pmd); struct page *page = pfn_swap_entry_to_page(entry); + pmd_t newpmd; VM_BUG_ON(!is_pmd_migration_entry(*pmd)); if (is_writable_migration_entry(entry)) { - pmd_t newpmd; /* * A protection check is difficult so * just be safe and disable write @@ -1822,8 +1822,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, newpmd = pmd_swp_mksoft_dirty(newpmd); if (pmd_swp_uffd_wp(*pmd)) newpmd = pmd_swp_mkuffd_wp(newpmd); - set_pmd_at(mm, addr, pmd, newpmd); + } else { + newpmd = *pmd; } + + if (uffd_wp) + newpmd = pmd_swp_mkuffd_wp(newpmd); + else if (uffd_wp_resolve) + newpmd = pmd_swp_clear_uffd_wp(newpmd); + if (!pmd_same(*pmd, newpmd)) + set_pmd_at(mm, addr, pmd, newpmd); goto unlock; } #endif @@ -3233,6 +3241,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, pmdswp = swp_entry_to_pmd(entry); if (pmd_soft_dirty(pmdval)) pmdswp = pmd_swp_mksoft_dirty(pmdswp); + if (pmd_uffd_wp(pmdval)) + pmdswp = pmd_swp_mkuffd_wp(pmdswp); set_pmd_at(mm, address, pvmw->pmd, pmdswp); page_remove_rmap(page, vma, true); put_page(page); From 519dbe737f0d26b2c34a1f3990bf27dd82d3778d Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Wed, 5 Apr 2023 11:51:20 -0400 Subject: [PATCH 02/28] mm/khugepaged: check again on anon uffd-wp during isolation commit dd47ac428c3f5f3bcabe845f36be870fe6c20784 upstream. Khugepaged collapse an anonymous thp in two rounds of scans. The 2nd round done in __collapse_huge_page_isolate() after hpage_collapse_scan_pmd(), during which all the locks will be released temporarily. It means the pgtable can change during this phase before 2nd round starts. It's logically possible some ptes got wr-protected during this phase, and we can errornously collapse a thp without noticing some ptes are wr-protected by userfault. e1e267c7928f wanted to avoid it but it only did that for the 1st phase, not the 2nd phase. Since __collapse_huge_page_isolate() happens after a round of small page swapins, we don't need to worry on any !present ptes - if it existed khugepaged will already bail out. So we only need to check present ptes with uffd-wp bit set there. This is something I found only but never had a reproducer, I thought it was one caused a bug in Muhammad's recent pagemap new ioctl work, but it turns out it's not the cause of that but an userspace bug. However this seems to still be a real bug even with a very small race window, still worth to have it fixed and copy stable. Link: https://lkml.kernel.org/r/20230405155120.3608140-1-peterx@redhat.com Fixes: e1e267c7928f ("khugepaged: skip collapse if uffd-wp detected") Signed-off-by: Peter Xu Reviewed-by: David Hildenbrand Reviewed-by: Yang Shi Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: Mike Rapoport Cc: Nadav Amit Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/khugepaged.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 77a76bcf15f5..ef72d3df4b65 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -561,6 +561,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_PTE_NON_PRESENT; goto out; } + if (pte_uffd_wp(pteval)) { + result = SCAN_PTE_UFFD_WP; + goto out; + } page = vm_normal_page(vma, address, pteval); if (unlikely(!page) || unlikely(is_zone_device_page(page))) { result = SCAN_PAGE_NULL; From e8a7bdb6f76cdaef4183669554ad76e5ed197d92 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 6 Apr 2023 17:20:04 +0900 Subject: [PATCH 03/28] mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO commit 4737edbbdd4958ae29ca6a310a6a2fa4e0684b01 upstream. split_huge_page_to_list() WARNs when called for huge zero pages, which sounds to me too harsh because it does not imply a kernel bug, but just notifies the event to admins. On the other hand, this is considered as critical by syzkaller and makes its testing less efficient, which seems to me harmful. So replace the VM_WARN_ON_ONCE_FOLIO with pr_warn_ratelimited. Link: https://lkml.kernel.org/r/20230406082004.2185420-1-naoya.horiguchi@linux.dev Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page") Signed-off-by: Naoya Horiguchi Reported-by: syzbot+07a218429c8d19b1fb25@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/000000000000a6f34a05e6efcd01@google.com/ Reviewed-by: Yang Shi Cc: Miaohe Lin Cc: Tetsuo Handa Cc: Xu Yu Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/huge_memory.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index cfe8d77af7ab..b20fef29e5bb 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2655,9 +2655,10 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); is_hzp = is_huge_zero_page(&folio->page); - VM_WARN_ON_ONCE_FOLIO(is_hzp, folio); - if (is_hzp) + if (is_hzp) { + pr_warn_ratelimited("Called split_huge_page for huge zero page\n"); return -EBUSY; + } if (folio_test_writeback(folio)) return -EBUSY; From 433a7ecaed4b41e0bd2857a7b1a11aea9f8c8955 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 13 Apr 2023 15:12:21 +0200 Subject: [PATCH 04/28] mm: kmsan: handle alloc failures in kmsan_ioremap_page_range() commit fdea03e12aa2a44a7bb34144208be97fc25dfd90 upstream. Similarly to kmsan_vmap_pages_range_noflush(), kmsan_ioremap_page_range() must also properly handle allocation/mapping failures. In the case of such, it must clean up the already created metadata mappings and return an error code, so that the error can be propagated to ioremap_page_range(). Without doing so, KMSAN may silently fail to bring the metadata for the page range into a consistent state, which will result in user-visible crashes when trying to access them. Link: https://lkml.kernel.org/r/20230413131223.4135168-2-glider@google.com Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Alexander Potapenko Reported-by: Dipanjan Das Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver Cc: Christoph Hellwig Cc: Dmitry Vyukov Cc: Uladzislau Rezki (Sony) Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- include/linux/kmsan.h | 19 ++++++++------- mm/kmsan/hooks.c | 55 ++++++++++++++++++++++++++++++++++++------- mm/vmalloc.c | 4 ++-- 3 files changed, 59 insertions(+), 19 deletions(-) diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h index e38ae3c34618..6167257d219d 100644 --- a/include/linux/kmsan.h +++ b/include/linux/kmsan.h @@ -159,11 +159,12 @@ void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end); * @page_shift: page_shift argument passed to vmap_range_noflush(). * * KMSAN creates new metadata pages for the physical pages mapped into the - * virtual memory. + * virtual memory. Returns 0 on success, callers must check for non-zero return + * value. */ -void kmsan_ioremap_page_range(unsigned long addr, unsigned long end, - phys_addr_t phys_addr, pgprot_t prot, - unsigned int page_shift); +int kmsan_ioremap_page_range(unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot, + unsigned int page_shift); /** * kmsan_iounmap_page_range() - Notify KMSAN about a iounmap_page_range() call. @@ -294,12 +295,12 @@ static inline void kmsan_vunmap_range_noflush(unsigned long start, { } -static inline void kmsan_ioremap_page_range(unsigned long start, - unsigned long end, - phys_addr_t phys_addr, - pgprot_t prot, - unsigned int page_shift) +static inline int kmsan_ioremap_page_range(unsigned long start, + unsigned long end, + phys_addr_t phys_addr, pgprot_t prot, + unsigned int page_shift) { + return 0; } static inline void kmsan_iounmap_page_range(unsigned long start, diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c index 3807502766a3..ec0da72e65aa 100644 --- a/mm/kmsan/hooks.c +++ b/mm/kmsan/hooks.c @@ -148,35 +148,74 @@ void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end) * into the virtual memory. If those physical pages already had shadow/origin, * those are ignored. */ -void kmsan_ioremap_page_range(unsigned long start, unsigned long end, - phys_addr_t phys_addr, pgprot_t prot, - unsigned int page_shift) +int kmsan_ioremap_page_range(unsigned long start, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot, + unsigned int page_shift) { gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO; struct page *shadow, *origin; unsigned long off = 0; - int nr; + int nr, err = 0, clean = 0, mapped; if (!kmsan_enabled || kmsan_in_runtime()) - return; + return 0; nr = (end - start) / PAGE_SIZE; kmsan_enter_runtime(); - for (int i = 0; i < nr; i++, off += PAGE_SIZE) { + for (int i = 0; i < nr; i++, off += PAGE_SIZE, clean = i) { shadow = alloc_pages(gfp_mask, 1); origin = alloc_pages(gfp_mask, 1); - __vmap_pages_range_noflush( + if (!shadow || !origin) { + err = -ENOMEM; + goto ret; + } + mapped = __vmap_pages_range_noflush( vmalloc_shadow(start + off), vmalloc_shadow(start + off + PAGE_SIZE), prot, &shadow, PAGE_SHIFT); - __vmap_pages_range_noflush( + if (mapped) { + err = mapped; + goto ret; + } + shadow = NULL; + mapped = __vmap_pages_range_noflush( vmalloc_origin(start + off), vmalloc_origin(start + off + PAGE_SIZE), prot, &origin, PAGE_SHIFT); + if (mapped) { + __vunmap_range_noflush( + vmalloc_shadow(start + off), + vmalloc_shadow(start + off + PAGE_SIZE)); + err = mapped; + goto ret; + } + origin = NULL; + } + /* Page mapping loop finished normally, nothing to clean up. */ + clean = 0; + +ret: + if (clean > 0) { + /* + * Something went wrong. Clean up shadow/origin pages allocated + * on the last loop iteration, then delete mappings created + * during the previous iterations. + */ + if (shadow) + __free_pages(shadow, 1); + if (origin) + __free_pages(origin, 1); + __vunmap_range_noflush( + vmalloc_shadow(start), + vmalloc_shadow(start + clean * PAGE_SIZE)); + __vunmap_range_noflush( + vmalloc_origin(start), + vmalloc_origin(start + clean * PAGE_SIZE)); } flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end)); flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end)); kmsan_leave_runtime(); + return err; } void kmsan_iounmap_page_range(unsigned long start, unsigned long end) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d606e53c650e..e703586dcb3d 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -321,8 +321,8 @@ int ioremap_page_range(unsigned long addr, unsigned long end, ioremap_max_page_shift); flush_cache_vmap(addr, end); if (!err) - kmsan_ioremap_page_range(addr, end, phys_addr, prot, - ioremap_max_page_shift); + err = kmsan_ioremap_page_range(addr, end, phys_addr, prot, + ioremap_max_page_shift); return err; } From bd6f3421a586ee75437630e6ceabea9564ec9cbb Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 13 Apr 2023 15:12:20 +0200 Subject: [PATCH 05/28] mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush() commit 47ebd0310e89c087f56e58c103c44b72a2f6b216 upstream. As reported by Dipanjan Das, when KMSAN is used together with kernel fault injection (or, generally, even without the latter), calls to kcalloc() or __vmap_pages_range_noflush() may fail, leaving the metadata mappings for the virtual mapping in an inconsistent state. When these metadata mappings are accessed later, the kernel crashes. To address the problem, we return a non-zero error code from kmsan_vmap_pages_range_noflush() in the case of any allocation/mapping failure inside it, and make vmap_pages_range_noflush() return an error if KMSAN fails to allocate the metadata. This patch also removes KMSAN_WARN_ON() from vmap_pages_range_noflush(), as these allocation failures are not fatal anymore. Link: https://lkml.kernel.org/r/20230413131223.4135168-1-glider@google.com Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Alexander Potapenko Reported-by: Dipanjan Das Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver Cc: Christoph Hellwig Cc: Dmitry Vyukov Cc: Uladzislau Rezki (Sony) Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- include/linux/kmsan.h | 20 +++++++++++--------- mm/kmsan/shadow.c | 27 ++++++++++++++++++--------- mm/vmalloc.c | 6 +++++- 3 files changed, 34 insertions(+), 19 deletions(-) diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h index 6167257d219d..30b17647ce3c 100644 --- a/include/linux/kmsan.h +++ b/include/linux/kmsan.h @@ -134,11 +134,12 @@ void kmsan_kfree_large(const void *ptr); * @page_shift: page_shift passed to vmap_range_noflush(). * * KMSAN maps shadow and origin pages of @pages into contiguous ranges in - * vmalloc metadata address range. + * vmalloc metadata address range. Returns 0 on success, callers must check + * for non-zero return value. */ -void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end, - pgprot_t prot, struct page **pages, - unsigned int page_shift); +int kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end, + pgprot_t prot, struct page **pages, + unsigned int page_shift); /** * kmsan_vunmap_kernel_range_noflush() - Notify KMSAN about a vunmap. @@ -282,12 +283,13 @@ static inline void kmsan_kfree_large(const void *ptr) { } -static inline void kmsan_vmap_pages_range_noflush(unsigned long start, - unsigned long end, - pgprot_t prot, - struct page **pages, - unsigned int page_shift) +static inline int kmsan_vmap_pages_range_noflush(unsigned long start, + unsigned long end, + pgprot_t prot, + struct page **pages, + unsigned int page_shift) { + return 0; } static inline void kmsan_vunmap_range_noflush(unsigned long start, diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c index a787c04e9583..b8bb95eea5e3 100644 --- a/mm/kmsan/shadow.c +++ b/mm/kmsan/shadow.c @@ -216,27 +216,29 @@ void kmsan_free_page(struct page *page, unsigned int order) kmsan_leave_runtime(); } -void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end, - pgprot_t prot, struct page **pages, - unsigned int page_shift) +int kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end, + pgprot_t prot, struct page **pages, + unsigned int page_shift) { unsigned long shadow_start, origin_start, shadow_end, origin_end; struct page **s_pages, **o_pages; - int nr, mapped; + int nr, mapped, err = 0; if (!kmsan_enabled) - return; + return 0; shadow_start = vmalloc_meta((void *)start, KMSAN_META_SHADOW); shadow_end = vmalloc_meta((void *)end, KMSAN_META_SHADOW); if (!shadow_start) - return; + return 0; nr = (end - start) / PAGE_SIZE; s_pages = kcalloc(nr, sizeof(*s_pages), GFP_KERNEL); o_pages = kcalloc(nr, sizeof(*o_pages), GFP_KERNEL); - if (!s_pages || !o_pages) + if (!s_pages || !o_pages) { + err = -ENOMEM; goto ret; + } for (int i = 0; i < nr; i++) { s_pages[i] = shadow_page_for(pages[i]); o_pages[i] = origin_page_for(pages[i]); @@ -249,10 +251,16 @@ void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end, kmsan_enter_runtime(); mapped = __vmap_pages_range_noflush(shadow_start, shadow_end, prot, s_pages, page_shift); - KMSAN_WARN_ON(mapped); + if (mapped) { + err = mapped; + goto ret; + } mapped = __vmap_pages_range_noflush(origin_start, origin_end, prot, o_pages, page_shift); - KMSAN_WARN_ON(mapped); + if (mapped) { + err = mapped; + goto ret; + } kmsan_leave_runtime(); flush_tlb_kernel_range(shadow_start, shadow_end); flush_tlb_kernel_range(origin_start, origin_end); @@ -262,6 +270,7 @@ void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end, ret: kfree(s_pages); kfree(o_pages); + return err; } /* Allocate metadata for pages allocated at boot time. */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index e703586dcb3d..d5dc361dc104 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -613,7 +613,11 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end, int vmap_pages_range_noflush(unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, unsigned int page_shift) { - kmsan_vmap_pages_range_noflush(addr, end, prot, pages, page_shift); + int ret = kmsan_vmap_pages_range_noflush(addr, end, prot, pages, + page_shift); + + if (ret) + return ret; return __vmap_pages_range_noflush(addr, end, prot, pages, page_shift); } From 059f24aff65cf82956cd48ce010505943f391c78 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Fri, 14 Apr 2023 15:14:29 +0100 Subject: [PATCH 06/28] mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages commit 4d73ba5fa710fe7d432e0b271e6fecd252aef66e upstream. A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is taking an excessive amount of time for large amounts of memory. Further testing allocating huge pages that the cost is linear i.e. if allocating 1G pages in batches of 10 then the time to allocate nr_hugepages from 10->20->30->etc increases linearly even though 10 pages are allocated at each step. Profiles indicated that much of the time is spent checking the validity within already existing huge pages and then attempting a migration that fails after isolating the range, draining pages and a whole lot of other useless work. Commit eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig") removed two checks, one which ignored huge pages for contiguous allocations as huge pages can sometimes migrate. While there may be value on migrating a 2M page to satisfy a 1G allocation, it's potentially expensive if the 1G allocation fails and it's pointless to try moving a 1G page for a new 1G allocation or scan the tail pages for valid PFNs. Reintroduce the PageHuge check and assume any contiguous region with hugetlbfs pages is unsuitable for a new 1G allocation. The hpagealloc test allocates huge pages in batches and reports the average latency per page over time. This test happens just after boot when fragmentation is not an issue. Units are in milliseconds. hpagealloc 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6 vanilla hugeallocrevert-v1r1 hugeallocsimple-v1r2 Min Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%) 1st-qrtle Latency 356.61 ( 0.00%) 5.34 ( 98.50%) 19.85 ( 94.43%) 2nd-qrtle Latency 697.26 ( 0.00%) 5.47 ( 99.22%) 20.44 ( 97.07%) 3rd-qrtle Latency 972.94 ( 0.00%) 5.50 ( 99.43%) 20.81 ( 97.86%) Max-1 Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%) Max-5 Latency 82.14 ( 0.00%) 5.11 ( 93.78%) 19.31 ( 76.49%) Max-10 Latency 150.54 ( 0.00%) 5.20 ( 96.55%) 19.43 ( 87.09%) Max-90 Latency 1164.45 ( 0.00%) 5.53 ( 99.52%) 20.97 ( 98.20%) Max-95 Latency 1223.06 ( 0.00%) 5.55 ( 99.55%) 21.06 ( 98.28%) Max-99 Latency 1278.67 ( 0.00%) 5.57 ( 99.56%) 22.56 ( 98.24%) Max Latency 1310.90 ( 0.00%) 8.06 ( 99.39%) 26.62 ( 97.97%) Amean Latency 678.36 ( 0.00%) 5.44 * 99.20%* 20.44 * 96.99%* 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6 vanilla revert-v1 hugeallocfix-v2 Duration User 0.28 0.27 0.30 Duration System 808.66 17.77 35.99 Duration Elapsed 830.87 18.08 36.33 The vanilla kernel is poor, taking up to 1.3 second to allocate a huge page and almost 10 minutes in total to run the test. Reverting the problematic commit reduces it to 8ms at worst and the patch takes 26ms. This patch fixes the main issue with skipping huge pages but leaves the page_count() out because a page with an elevated count potentially can migrate. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022 Link: https://lkml.kernel.org/r/20230414141429.pwgieuwluxwez3rj@techsingularity.net Fixes: eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig") Signed-off-by: Mel Gorman Reported-by: Yuanxi Liu Acked-by: Vlastimil Babka Reviewed-by: David Hildenbrand Acked-by: Michal Hocko Reviewed-by: Oscar Salvador Cc: Matthew Wilcox Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/page_alloc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5cae08963984..6806a4824ae1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -9418,6 +9418,9 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn, if (PageReserved(page)) return false; + + if (PageHuge(page)) + return false; } return true; } From 7e6631f782a16dcf528679478a9a6a8f71215d6f Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Fri, 14 Apr 2023 14:59:19 -0400 Subject: [PATCH 07/28] mm/mmap: regression fix for unmapped_area{_topdown} commit 58c5d0d6d522112577c7eeb71d382ea642ed7be4 upstream. The maple tree limits the gap returned to a window that specifically fits what was asked. This may not be optimal in the case of switching search directions or a gap that does not satisfy the requested space for other reasons. Fix the search by retrying the operation and limiting the search window in the rare occasion that a conflict occurs. Link: https://lkml.kernel.org/r/20230414185919.4175572-1-Liam.Howlett@oracle.com Fixes: 3499a13168da ("mm/mmap: use maple tree for unmapped_area{_topdown}") Signed-off-by: Liam R. Howlett Reported-by: Rick Edgecombe Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/mmap.c | 48 +++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 43 insertions(+), 5 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index fe1db604dc49..14ca259189b7 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1565,7 +1565,8 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags) */ static unsigned long unmapped_area(struct vm_unmapped_area_info *info) { - unsigned long length, gap; + unsigned long length, gap, low_limit; + struct vm_area_struct *tmp; MA_STATE(mas, ¤t->mm->mm_mt, 0, 0); @@ -1574,12 +1575,29 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info) if (length < info->length) return -ENOMEM; - if (mas_empty_area(&mas, info->low_limit, info->high_limit - 1, - length)) + low_limit = info->low_limit; +retry: + if (mas_empty_area(&mas, low_limit, info->high_limit - 1, length)) return -ENOMEM; gap = mas.index; gap += (info->align_offset - gap) & info->align_mask; + tmp = mas_next(&mas, ULONG_MAX); + if (tmp && (tmp->vm_flags & VM_GROWSDOWN)) { /* Avoid prev check if possible */ + if (vm_start_gap(tmp) < gap + length - 1) { + low_limit = tmp->vm_end; + mas_reset(&mas); + goto retry; + } + } else { + tmp = mas_prev(&mas, 0); + if (tmp && vm_end_gap(tmp) > gap) { + low_limit = vm_end_gap(tmp); + mas_reset(&mas); + goto retry; + } + } + return gap; } @@ -1595,7 +1613,8 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info) */ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) { - unsigned long length, gap; + unsigned long length, gap, high_limit, gap_end; + struct vm_area_struct *tmp; MA_STATE(mas, ¤t->mm->mm_mt, 0, 0); /* Adjust search length to account for worst case alignment overhead */ @@ -1603,12 +1622,31 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) if (length < info->length) return -ENOMEM; - if (mas_empty_area_rev(&mas, info->low_limit, info->high_limit - 1, + high_limit = info->high_limit; +retry: + if (mas_empty_area_rev(&mas, info->low_limit, high_limit - 1, length)) return -ENOMEM; gap = mas.last + 1 - info->length; gap -= (gap - info->align_offset) & info->align_mask; + gap_end = mas.last; + tmp = mas_next(&mas, ULONG_MAX); + if (tmp && (tmp->vm_flags & VM_GROWSDOWN)) { /* Avoid prev check if possible */ + if (vm_start_gap(tmp) <= gap_end) { + high_limit = vm_start_gap(tmp); + mas_reset(&mas); + goto retry; + } + } else { + tmp = mas_prev(&mas, 0); + if (tmp && vm_end_gap(tmp) > gap) { + high_limit = tmp->vm_start; + mas_reset(&mas); + goto retry; + } + } + return gap; } From fe1c982958c51507cbb3371a8365905f931860fa Mon Sep 17 00:00:00 2001 From: Qais Yousef Date: Tue, 18 Apr 2023 15:04:52 +0100 Subject: [PATCH 08/28] sched/fair: Detect capacity inversion commit: 44c7b80bffc3a657a36857098d5d9c49d94e652b upstream. Check each performance domain to see if thermal pressure is causing its capacity to be lower than another performance domain. We assume that each performance domain has CPUs with the same capacities, which is similar to an assumption made in energy_model.c We also assume that thermal pressure impacts all CPUs in a performance domain equally. If there're multiple performance domains with the same capacity_orig, we will trigger a capacity inversion if the domain is under thermal pressure. The new cpu_in_capacity_inversion() should help users to know when information about capacity_orig are not reliable and can opt in to use the inverted capacity as the 'actual' capacity_orig. Signed-off-by: Qais Yousef Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220804143609.515789-9-qais.yousef@arm.com (cherry picked from commit 44c7b80bffc3a657a36857098d5d9c49d94e652b) Signed-off-by: Qais Yousef (Google) Signed-off-by: Greg Kroah-Hartman --- kernel/sched/fair.c | 63 +++++++++++++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 19 +++++++++++++ 2 files changed, 79 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ec2d913280e6..3e2f77afc75f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8866,16 +8866,73 @@ static unsigned long scale_rt_capacity(int cpu) static void update_cpu_capacity(struct sched_domain *sd, int cpu) { + unsigned long capacity_orig = arch_scale_cpu_capacity(cpu); unsigned long capacity = scale_rt_capacity(cpu); struct sched_group *sdg = sd->groups; + struct rq *rq = cpu_rq(cpu); - cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(cpu); + rq->cpu_capacity_orig = capacity_orig; if (!capacity) capacity = 1; - cpu_rq(cpu)->cpu_capacity = capacity; - trace_sched_cpu_capacity_tp(cpu_rq(cpu)); + rq->cpu_capacity = capacity; + + /* + * Detect if the performance domain is in capacity inversion state. + * + * Capacity inversion happens when another perf domain with equal or + * lower capacity_orig_of() ends up having higher capacity than this + * domain after subtracting thermal pressure. + * + * We only take into account thermal pressure in this detection as it's + * the only metric that actually results in *real* reduction of + * capacity due to performance points (OPPs) being dropped/become + * unreachable due to thermal throttling. + * + * We assume: + * * That all cpus in a perf domain have the same capacity_orig + * (same uArch). + * * Thermal pressure will impact all cpus in this perf domain + * equally. + */ + if (static_branch_unlikely(&sched_asym_cpucapacity)) { + unsigned long inv_cap = capacity_orig - thermal_load_avg(rq); + struct perf_domain *pd = rcu_dereference(rq->rd->pd); + + rq->cpu_capacity_inverted = 0; + + for (; pd; pd = pd->next) { + struct cpumask *pd_span = perf_domain_span(pd); + unsigned long pd_cap_orig, pd_cap; + + cpu = cpumask_any(pd_span); + pd_cap_orig = arch_scale_cpu_capacity(cpu); + + if (capacity_orig < pd_cap_orig) + continue; + + /* + * handle the case of multiple perf domains have the + * same capacity_orig but one of them is under higher + * thermal pressure. We record it as capacity + * inversion. + */ + if (capacity_orig == pd_cap_orig) { + pd_cap = pd_cap_orig - thermal_load_avg(cpu_rq(cpu)); + + if (pd_cap > inv_cap) { + rq->cpu_capacity_inverted = inv_cap; + break; + } + } else if (pd_cap_orig > inv_cap) { + rq->cpu_capacity_inverted = inv_cap; + break; + } + } + } + + trace_sched_cpu_capacity_tp(rq); sdg->sgc->capacity = capacity; sdg->sgc->min_capacity = capacity; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d6d488e8eb55..5f18460f62f0 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1041,6 +1041,7 @@ struct rq { unsigned long cpu_capacity; unsigned long cpu_capacity_orig; + unsigned long cpu_capacity_inverted; struct balance_callback *balance_callback; @@ -2878,6 +2879,24 @@ static inline unsigned long capacity_orig_of(int cpu) return cpu_rq(cpu)->cpu_capacity_orig; } +/* + * Returns inverted capacity if the CPU is in capacity inversion state. + * 0 otherwise. + * + * Capacity inversion detection only considers thermal impact where actual + * performance points (OPPs) gets dropped. + * + * Capacity inversion state happens when another performance domain that has + * equal or lower capacity_orig_of() becomes effectively larger than the perf + * domain this CPU belongs to due to thermal pressure throttling it hard. + * + * See comment in update_cpu_capacity(). + */ +static inline unsigned long cpu_in_capacity_inversion(int cpu) +{ + return cpu_rq(cpu)->cpu_capacity_inverted; +} + /** * enum cpu_util_type - CPU utilization type * @FREQUENCY_UTIL: Utilization used to select frequency From 799c7301ded6cc44c5b7b716f3fe707a41722ed1 Mon Sep 17 00:00:00 2001 From: Qais Yousef Date: Tue, 18 Apr 2023 15:04:53 +0100 Subject: [PATCH 09/28] sched/fair: Consider capacity inversion in util_fits_cpu() commit: aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8 upstream. We do consider thermal pressure in util_fits_cpu() for uclamp_min only. With the exception of the biggest cores which by definition are the max performance point of the system and all tasks by definition should fit. Even under thermal pressure, the capacity of the biggest CPU is the highest in the system and should still fit every task. Except when it reaches capacity inversion point, then this is no longer true. We can handle this by using the inverted capacity as capacity_orig in util_fits_cpu(). Which not only addresses the problem above, but also ensure uclamp_max now considers the inverted capacity. Force fitting a task when a CPU is in this adverse state will contribute to making the thermal throttling last longer. Signed-off-by: Qais Yousef Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220804143609.515789-10-qais.yousef@arm.com (cherry picked from commit aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8) Signed-off-by: Qais Yousef (Google) Signed-off-by: Greg Kroah-Hartman --- kernel/sched/fair.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3e2f77afc75f..4759728da281 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4465,12 +4465,16 @@ static inline int util_fits_cpu(unsigned long util, * For uclamp_max, we can tolerate a drop in performance level as the * goal is to cap the task. So it's okay if it's getting less. * - * In case of capacity inversion, which is not handled yet, we should - * honour the inverted capacity for both uclamp_min and uclamp_max all - * the time. + * In case of capacity inversion we should honour the inverted capacity + * for both uclamp_min and uclamp_max all the time. */ - capacity_orig = capacity_orig_of(cpu); - capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu); + capacity_orig = cpu_in_capacity_inversion(cpu); + if (capacity_orig) { + capacity_orig_thermal = capacity_orig; + } else { + capacity_orig = capacity_orig_of(cpu); + capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu); + } /* * We want to force a task to fit a cpu as implied by uclamp_max. From d362a03d920edc0bb9a85c5150ef4ff5fd365c72 Mon Sep 17 00:00:00 2001 From: Qais Yousef Date: Tue, 18 Apr 2023 15:04:54 +0100 Subject: [PATCH 10/28] sched/fair: Fixes for capacity inversion detection commit: da07d2f9c153e457e845d4dcfdd13568d71d18a4 upstream. Traversing the Perf Domains requires rcu_read_lock() to be held and is conditional on sched_energy_enabled(). Ensure right protections applied. Also skip capacity inversion detection for our own pd; which was an error. Fixes: 44c7b80bffc3 ("sched/fair: Detect capacity inversion") Reported-by: Dietmar Eggemann Signed-off-by: Qais Yousef (Google) Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Vincent Guittot Link: https://lore.kernel.org/r/20230112122708.330667-3-qyousef@layalina.io (cherry picked from commit da07d2f9c153e457e845d4dcfdd13568d71d18a4) Signed-off-by: Qais Yousef (Google) Signed-off-by: Greg Kroah-Hartman --- kernel/sched/fair.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4759728da281..f70c4a7fb4ef 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8900,16 +8900,23 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu) * * Thermal pressure will impact all cpus in this perf domain * equally. */ - if (static_branch_unlikely(&sched_asym_cpucapacity)) { + if (sched_energy_enabled()) { unsigned long inv_cap = capacity_orig - thermal_load_avg(rq); - struct perf_domain *pd = rcu_dereference(rq->rd->pd); + struct perf_domain *pd; + rcu_read_lock(); + + pd = rcu_dereference(rq->rd->pd); rq->cpu_capacity_inverted = 0; for (; pd; pd = pd->next) { struct cpumask *pd_span = perf_domain_span(pd); unsigned long pd_cap_orig, pd_cap; + /* We can't be inverted against our own pd */ + if (cpumask_test_cpu(cpu_of(rq), pd_span)) + continue; + cpu = cpumask_any(pd_span); pd_cap_orig = arch_scale_cpu_capacity(cpu); @@ -8934,6 +8941,8 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu) break; } } + + rcu_read_unlock(); } trace_sched_cpu_capacity_tp(rq); From 9e7976c0cd634e23dda2b6849f33f44e5cbd4728 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 18 Apr 2023 13:57:37 +0100 Subject: [PATCH 11/28] KVM: arm64: Make vcpu flag updates non-preemptible commit 35dcb3ac663a16510afc27ba2725d70c15e012a5 upstream. Per-vcpu flags are updated using a non-atomic RMW operation. Which means it is possible to get preempted between the read and write operations. Another interesting thing to note is that preemption also updates flags, as we have some flag manipulation in both the load and put operations. It is thus possible to lose information communicated by either load or put, as the preempted flag update will overwrite the flags when the thread is resumed. This is specially critical if either load or put has stored information which depends on the physical CPU the vcpu runs on. This results in really elusive bugs, and kudos must be given to Mostafa for the long hours of debugging, and finally spotting the problem. Fix it by disabling preemption during the RMW operation, which ensures that the state stays consistent. Also upgrade vcpu_get_flag path to use READ_ONCE() to make sure the field is always atomically accessed. Fixes: e87abb73e594 ("KVM: arm64: Add helpers to manipulate vcpu flags among a set") Reported-by: Mostafa Saleh Signed-off-by: Marc Zyngier Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230418125737.2327972-1-maz@kernel.org Signed-off-by: Oliver Upton Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/kvm_host.h | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 45e2136322ba..e2b45c937c58 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -449,9 +449,22 @@ struct kvm_vcpu_arch { ({ \ __build_check_flag(v, flagset, f, m); \ \ - v->arch.flagset & (m); \ + READ_ONCE(v->arch.flagset) & (m); \ }) +/* + * Note that the set/clear accessors must be preempt-safe in order to + * avoid nesting them with load/put which also manipulate flags... + */ +#ifdef __KVM_NVHE_HYPERVISOR__ +/* the nVHE hypervisor is always non-preemptible */ +#define __vcpu_flags_preempt_disable() +#define __vcpu_flags_preempt_enable() +#else +#define __vcpu_flags_preempt_disable() preempt_disable() +#define __vcpu_flags_preempt_enable() preempt_enable() +#endif + #define __vcpu_set_flag(v, flagset, f, m) \ do { \ typeof(v->arch.flagset) *fset; \ @@ -459,9 +472,11 @@ struct kvm_vcpu_arch { __build_check_flag(v, flagset, f, m); \ \ fset = &v->arch.flagset; \ + __vcpu_flags_preempt_disable(); \ if (HWEIGHT(m) > 1) \ *fset &= ~(m); \ *fset |= (f); \ + __vcpu_flags_preempt_enable(); \ } while (0) #define __vcpu_clear_flag(v, flagset, f, m) \ @@ -471,7 +486,9 @@ struct kvm_vcpu_arch { __build_check_flag(v, flagset, f, m); \ \ fset = &v->arch.flagset; \ + __vcpu_flags_preempt_disable(); \ *fset &= ~(m); \ + __vcpu_flags_preempt_enable(); \ } while (0) #define vcpu_get_flag(v, ...) __vcpu_get_flag((v), __VA_ARGS__) From 8d6a870a428f3164dc437d183077c40e835c8fcb Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 19 Apr 2023 13:16:13 +0300 Subject: [PATCH 12/28] KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg() commit a25bc8486f9c01c1af6b6c5657234b2eee2c39d6 upstream. The KVM_REG_SIZE() comes from the ioctl and it can be a power of two between 0-32768 but if it is more than sizeof(long) this will corrupt memory. Fixes: 99adb567632b ("KVM: arm/arm64: Add save/restore support for firmware workaround state") Signed-off-by: Dan Carpenter Reviewed-by: Steven Price Reviewed-by: Eric Auger Reviewed-by: Marc Zyngier Link: https://lore.kernel.org/r/4efbab8c-640f-43b2-8ac6-6d68e08280fe@kili.mountain Signed-off-by: Oliver Upton Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kvm/hypercalls.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index c9f401fa01a9..950e35b993d2 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -397,6 +397,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) u64 val; int wa_level; + if (KVM_REG_SIZE(reg->id) != sizeof(val)) + return -ENOENT; if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id))) return -EFAULT; From f9a20ef5e83c4ae3a4b2deb5535b3913680768f2 Mon Sep 17 00:00:00 2001 From: Jiaxun Yang Date: Sat, 8 Apr 2023 21:33:48 +0100 Subject: [PATCH 13/28] MIPS: Define RUNTIME_DISCARD_EXIT in LD script commit 6dcbd0a69c84a8ae7a442840a8cf6b1379dc8f16 upstream. MIPS's exit sections are discarded at runtime as well. Fixes link error: `.exit.text' referenced in section `__jump_table' of fs/fuse/inode.o: defined in discarded section `.exit.text' of fs/fuse/inode.o Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv") Reported-by: "kernelci.org bot" Signed-off-by: Jiaxun Yang Signed-off-by: Thomas Bogendoerfer Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/vmlinux.lds.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S index 1f98947fe715..91d6a5360bb9 100644 --- a/arch/mips/kernel/vmlinux.lds.S +++ b/arch/mips/kernel/vmlinux.lds.S @@ -15,6 +15,8 @@ #define EMITS_PT_NOTE #endif +#define RUNTIME_DISCARD_EXIT + #include #undef mips From 7ca973d830c0050ed8def693f7a8b2ef588b6142 Mon Sep 17 00:00:00 2001 From: Jiachen Zhang Date: Wed, 28 Sep 2022 20:19:34 +0800 Subject: [PATCH 14/28] fuse: always revalidate rename target dentry commit ccc031e26afe60d2a5a3d93dabd9c978210825fb upstream. The previous commit df8629af2934 ("fuse: always revalidate if exclusive create") ensures that the dentries are revalidated on O_EXCL creates. This commit complements it by also performing revalidation for rename target dentries. Otherwise, a rename target file that only exists in kernel dentry cache but not in the filesystem will result in EEXIST if RENAME_NOREPLACE flag is used. Signed-off-by: Jiachen Zhang Signed-off-by: Zhang Tianci Signed-off-by: Miklos Szeredi Signed-off-by: Yang Bo Signed-off-by: Greg Kroah-Hartman --- fs/fuse/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index bb97a384dc5d..904673a4f690 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -214,7 +214,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) if (inode && fuse_is_bad(inode)) goto invalid; else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) || - (flags & (LOOKUP_EXCL | LOOKUP_REVAL))) { + (flags & (LOOKUP_EXCL | LOOKUP_REVAL | LOOKUP_RENAME_TARGET))) { struct fuse_entry_out outarg; FUSE_ARGS(args); struct fuse_forget_link *forget; From 588d682251e64444bfab28baa86c6befb9d7ad05 Mon Sep 17 00:00:00 2001 From: Alyssa Ross Date: Sun, 26 Mar 2023 18:21:21 +0000 Subject: [PATCH 15/28] purgatory: fix disabling debug info commit d83806c4c0cccc0d6d3c3581a11983a9c186a138 upstream. Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS. Instead, it includes -g, the appropriate -gdwarf-* flag, and also the -Wa versions of both of those if building with Clang and GNU as. As a result, debug info was being generated for the purgatory objects, even though the intention was that it not be. Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files") Signed-off-by: Alyssa Ross Cc: stable@vger.kernel.org Acked-by: Nick Desaulniers Signed-off-by: Masahiro Yamada Signed-off-by: Greg Kroah-Hartman --- arch/riscv/purgatory/Makefile | 4 +--- arch/x86/purgatory/Makefile | 3 +-- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile index dd58e1d99397..659e21862077 100644 --- a/arch/riscv/purgatory/Makefile +++ b/arch/riscv/purgatory/Makefile @@ -74,9 +74,7 @@ CFLAGS_string.o += $(PURGATORY_CFLAGS) CFLAGS_REMOVE_ctype.o += $(PURGATORY_CFLAGS_REMOVE) CFLAGS_ctype.o += $(PURGATORY_CFLAGS) -AFLAGS_REMOVE_entry.o += -Wa,-gdwarf-2 -AFLAGS_REMOVE_memcpy.o += -Wa,-gdwarf-2 -AFLAGS_REMOVE_memset.o += -Wa,-gdwarf-2 +asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x)) $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE $(call if_changed,ld) diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile index 17f09dc26381..82fec66d46d2 100644 --- a/arch/x86/purgatory/Makefile +++ b/arch/x86/purgatory/Makefile @@ -69,8 +69,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS) CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE) CFLAGS_string.o += $(PURGATORY_CFLAGS) -AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2 -AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2 +asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x)) $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE $(call if_changed,ld) From a8cf1141057a04e9630e73a4d543f10a939980b2 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 19 Oct 2022 15:35:59 -0700 Subject: [PATCH 16/28] inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy(). commit b5fc29233d28be7a3322848ebe73ac327559cdb9 upstream. After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources. Now we can remove unnecessary inet6_destroy_sock() calls in sk->sk_prot->destroy(). DCCP and SCTP have their own sk->sk_destruct() function, so we change them separately in the following patches. Signed-off-by: Kuniyuki Iwashima Reviewed-by: Matthieu Baerts Signed-off-by: David S. Miller Signed-off-by: Ziyang Xuan Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ping.c | 6 ------ net/ipv6/raw.c | 2 -- net/ipv6/tcp_ipv6.c | 8 +------- net/ipv6/udp.c | 2 -- net/l2tp/l2tp_ip6.c | 2 -- net/mptcp/protocol.c | 7 ------- 6 files changed, 1 insertion(+), 26 deletions(-) diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index 86c26e48d065..808983bc2ec9 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -23,11 +23,6 @@ #include #include -static void ping_v6_destroy(struct sock *sk) -{ - inet6_destroy_sock(sk); -} - /* Compatibility glue so we can support IPv6 when it's compiled as a module */ static int dummy_ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len) @@ -205,7 +200,6 @@ struct proto pingv6_prot = { .owner = THIS_MODULE, .init = ping_init_sock, .close = ping_close, - .destroy = ping_v6_destroy, .pre_connect = ping_v6_pre_connect, .connect = ip6_datagram_connect_v6_only, .disconnect = __udp_disconnect, diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 9ee1506e23ab..4fc511bdf176 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -1175,8 +1175,6 @@ static void raw6_destroy(struct sock *sk) lock_sock(sk); ip6_flush_pending_frames(sk); release_sock(sk); - - inet6_destroy_sock(sk); } static int rawv6_init_sk(struct sock *sk) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index ea1ecf5fe947..81afb40bfc0b 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1951,12 +1951,6 @@ static int tcp_v6_init_sock(struct sock *sk) return 0; } -static void tcp_v6_destroy_sock(struct sock *sk) -{ - tcp_v4_destroy_sock(sk); - inet6_destroy_sock(sk); -} - #ifdef CONFIG_PROC_FS /* Proc filesystem TCPv6 sock list dumping. */ static void get_openreq6(struct seq_file *seq, @@ -2149,7 +2143,7 @@ struct proto tcpv6_prot = { .accept = inet_csk_accept, .ioctl = tcp_ioctl, .init = tcp_v6_init_sock, - .destroy = tcp_v6_destroy_sock, + .destroy = tcp_v4_destroy_sock, .shutdown = tcp_shutdown, .setsockopt = tcp_setsockopt, .getsockopt = tcp_getsockopt, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 17d721a6add7..0b8127988adb 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1668,8 +1668,6 @@ void udpv6_destroy_sock(struct sock *sk) udp_encap_disable(); } } - - inet6_destroy_sock(sk); } /* diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c index 9db7f4f5a441..5137ea1861ce 100644 --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -257,8 +257,6 @@ static void l2tp_ip6_destroy_sock(struct sock *sk) if (tunnel) l2tp_tunnel_delete(tunnel); - - inet6_destroy_sock(sk); } static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 19f35869a164..b1bbb0b75a13 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3939,12 +3939,6 @@ static const struct proto_ops mptcp_v6_stream_ops = { static struct proto mptcp_v6_prot; -static void mptcp_v6_destroy(struct sock *sk) -{ - mptcp_destroy(sk); - inet6_destroy_sock(sk); -} - static struct inet_protosw mptcp_v6_protosw = { .type = SOCK_STREAM, .protocol = IPPROTO_MPTCP, @@ -3960,7 +3954,6 @@ int __init mptcp_proto_v6_init(void) mptcp_v6_prot = mptcp_prot; strcpy(mptcp_v6_prot.name, "MPTCPv6"); mptcp_v6_prot.slab = NULL; - mptcp_v6_prot.destroy = mptcp_v6_destroy; mptcp_v6_prot.obj_size = sizeof(struct mptcp6_sock); err = proto_register(&mptcp_v6_prot, 1); From a530b33fe98691b88c46128e9c07019696ab247e Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 19 Oct 2022 15:36:00 -0700 Subject: [PATCH 17/28] dccp: Call inet6_destroy_sock() via sk->sk_destruct(). commit 1651951ebea54970e0bda60c638fc2eee7a6218f upstream. After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources. DCCP sets its own sk->sk_destruct() in the dccp_init_sock(), and DCCPv6 socket shares it by calling the same init function via dccp_v6_init_sock(). To call inet6_sock_destruct() from DCCPv6 sk->sk_destruct(), we export it and set dccp_v6_sk_destruct() in the init function. Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller Signed-off-by: Ziyang Xuan Signed-off-by: Greg Kroah-Hartman --- net/dccp/dccp.h | 1 + net/dccp/ipv6.c | 15 ++++++++------- net/dccp/proto.c | 8 +++++++- net/ipv6/af_inet6.c | 1 + 4 files changed, 17 insertions(+), 8 deletions(-) diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index 7dfc00c9fb32..9ddc3a9e89e4 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -278,6 +278,7 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb, int dccp_rcv_established(struct sock *sk, struct sk_buff *skb, const struct dccp_hdr *dh, const unsigned int len); +void dccp_destruct_common(struct sock *sk); int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized); void dccp_destroy_sock(struct sock *sk); diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 7a736c352dc4..b9d7c3dd1cb3 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -1004,6 +1004,12 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_mapped = { .sockaddr_len = sizeof(struct sockaddr_in6), }; +static void dccp_v6_sk_destruct(struct sock *sk) +{ + dccp_destruct_common(sk); + inet6_sock_destruct(sk); +} + /* NOTE: A lot of things set to zero explicitly by call to * sk_alloc() so need not be done here. */ @@ -1016,17 +1022,12 @@ static int dccp_v6_init_sock(struct sock *sk) if (unlikely(!dccp_v6_ctl_sock_initialized)) dccp_v6_ctl_sock_initialized = 1; inet_csk(sk)->icsk_af_ops = &dccp_ipv6_af_ops; + sk->sk_destruct = dccp_v6_sk_destruct; } return err; } -static void dccp_v6_destroy_sock(struct sock *sk) -{ - dccp_destroy_sock(sk); - inet6_destroy_sock(sk); -} - static struct timewait_sock_ops dccp6_timewait_sock_ops = { .twsk_obj_size = sizeof(struct dccp6_timewait_sock), }; @@ -1049,7 +1050,7 @@ static struct proto dccp_v6_prot = { .accept = inet_csk_accept, .get_port = inet_csk_get_port, .shutdown = dccp_shutdown, - .destroy = dccp_v6_destroy_sock, + .destroy = dccp_destroy_sock, .orphan_count = &dccp_orphan_count, .max_header = MAX_DCCP_HEADER, .obj_size = sizeof(struct dccp6_sock), diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 85e35c5e8890..a06b5641287a 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -171,12 +171,18 @@ const char *dccp_packet_name(const int type) EXPORT_SYMBOL_GPL(dccp_packet_name); -static void dccp_sk_destruct(struct sock *sk) +void dccp_destruct_common(struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk); ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); dp->dccps_hc_tx_ccid = NULL; +} +EXPORT_SYMBOL_GPL(dccp_destruct_common); + +static void dccp_sk_destruct(struct sock *sk) +{ + dccp_destruct_common(sk); inet_sock_destruct(sk); } diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index fb1bf6eb0ff8..b5309ae87fd7 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -114,6 +114,7 @@ void inet6_sock_destruct(struct sock *sk) inet6_cleanup_sock(sk); inet_sock_destruct(sk); } +EXPORT_SYMBOL_GPL(inet6_sock_destruct); static int inet6_create(struct net *net, struct socket *sock, int protocol, int kern) From a09b9383b7495681c9bae41752ee456cf42e41f0 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 19 Oct 2022 15:36:01 -0700 Subject: [PATCH 18/28] sctp: Call inet6_destroy_sock() via sk->sk_destruct(). commit 6431b0f6ff1633ae598667e4cdd93830074a03e8 upstream. After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources. SCTP sets its own sk->sk_destruct() in the sctp_init_sock(), and SCTPv6 socket reuses it as the init function. To call inet6_sock_destruct() from SCTPv6 sk->sk_destruct(), we set sctp_v6_destruct_sock() in a new init function. Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller Signed-off-by: Ziyang Xuan Signed-off-by: Greg Kroah-Hartman --- net/sctp/socket.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 507b2ad5ef7c..17185200079d 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -5102,13 +5102,17 @@ static void sctp_destroy_sock(struct sock *sk) } /* Triggered when there are no references on the socket anymore */ -static void sctp_destruct_sock(struct sock *sk) +static void sctp_destruct_common(struct sock *sk) { struct sctp_sock *sp = sctp_sk(sk); /* Free up the HMAC transform. */ crypto_free_shash(sp->hmac); +} +static void sctp_destruct_sock(struct sock *sk) +{ + sctp_destruct_common(sk); inet_sock_destruct(sk); } @@ -9431,7 +9435,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk, sctp_sk(newsk)->reuse = sp->reuse; newsk->sk_shutdown = sk->sk_shutdown; - newsk->sk_destruct = sctp_destruct_sock; + newsk->sk_destruct = sk->sk_destruct; newsk->sk_family = sk->sk_family; newsk->sk_protocol = IPPROTO_SCTP; newsk->sk_backlog_rcv = sk->sk_prot->backlog_rcv; @@ -9666,11 +9670,20 @@ struct proto sctp_prot = { #if IS_ENABLED(CONFIG_IPV6) -#include -static void sctp_v6_destroy_sock(struct sock *sk) +static void sctp_v6_destruct_sock(struct sock *sk) { - sctp_destroy_sock(sk); - inet6_destroy_sock(sk); + sctp_destruct_common(sk); + inet6_sock_destruct(sk); +} + +static int sctp_v6_init_sock(struct sock *sk) +{ + int ret = sctp_init_sock(sk); + + if (!ret) + sk->sk_destruct = sctp_v6_destruct_sock; + + return ret; } struct proto sctpv6_prot = { @@ -9680,8 +9693,8 @@ struct proto sctpv6_prot = { .disconnect = sctp_disconnect, .accept = sctp_accept, .ioctl = sctp_ioctl, - .init = sctp_init_sock, - .destroy = sctp_v6_destroy_sock, + .init = sctp_v6_init_sock, + .destroy = sctp_destroy_sock, .shutdown = sctp_shutdown, .setsockopt = sctp_setsockopt, .getsockopt = sctp_getsockopt, From a93c20f5832221c2bf5f80199c4eaebc0ba28e16 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 23 Apr 2023 09:56:20 -0700 Subject: [PATCH 19/28] gcc: disable '-Warray-bounds' for gcc-13 too commit 0da6e5fd6c3726723e275603426e09178940dace upstream. We started disabling '-Warray-bounds' for gcc-12 originally on s390, because it resulted in some warnings that weren't realistically fixable (commit 8b202ee21839: "s390: disable -Warray-bounds"). That s390-specific issue was then found to be less common elsewhere, but generic (see f0be87c42cbd: "gcc-12: disable '-Warray-bounds' universally for now"), and then later expanded the version check was expanded to gcc-11 (5a41237ad1d4: "gcc: disable -Warray-bounds for gcc-11 too"). And it turns out that I was much too optimistic in thinking that it's all going to go away, and here we are with gcc-13 showing all the same issues. So instead of expanding this one version at a time, let's just disable it for gcc-11+, and put an end limit to it only when we actually find a solution. Yes, I'm sure some of this is because the kernel just does odd things (like our "container_of()" use, but also knowingly playing games with things like linker tables and array layouts). And yes, some of the warnings are likely signs of real bugs, but when there are hundreds of false positives, that doesn't really help. Oh well. Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- init/Kconfig | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 0c214af99085..2028ed4d50f5 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -892,18 +892,14 @@ config CC_IMPLICIT_FALLTHROUGH default "-Wimplicit-fallthrough=5" if CC_IS_GCC && $(cc-option,-Wimplicit-fallthrough=5) default "-Wimplicit-fallthrough" if CC_IS_CLANG && $(cc-option,-Wunreachable-code-fallthrough) -# Currently, disable gcc-11,12 array-bounds globally. -# We may want to target only particular configurations some day. +# Currently, disable gcc-11+ array-bounds globally. +# It's still broken in gcc-13, so no upper bound yet. config GCC11_NO_ARRAY_BOUNDS def_bool y -config GCC12_NO_ARRAY_BOUNDS - def_bool y - config CC_NO_ARRAY_BOUNDS bool - default y if CC_IS_GCC && GCC_VERSION >= 110000 && GCC_VERSION < 120000 && GCC11_NO_ARRAY_BOUNDS - default y if CC_IS_GCC && GCC_VERSION >= 120000 && GCC_VERSION < 130000 && GCC12_NO_ARRAY_BOUNDS + default y if CC_IS_GCC && GCC_VERSION >= 110000 && GCC11_NO_ARRAY_BOUNDS # # For architectures that know their GCC __int128 support is sound From 342c1db4fa8c005d96efaf78cc87e3876b55cfe3 Mon Sep 17 00:00:00 2001 From: Soumya Negi Date: Sun, 9 Apr 2023 19:12:04 -0700 Subject: [PATCH 20/28] Input: pegasus-notetaker - check pipe type when probing commit b3d80fd27a3c2d8715a40cbf876139b56195f162 upstream. Fix WARNING in pegasus_open/usb_submit_urb Syzbot bug: https://syzkaller.appspot.com/bug?id=bbc107584dcf3262253ce93183e51f3612aaeb13 Warning raised because pegasus_driver submits transfer request for bogus URB (pipe type does not match endpoint type). Add sanity check at probe time for pipe value extracted from endpoint descriptor. Probe will fail if sanity check fails. Reported-and-tested-by: syzbot+04ee0cb4caccaed12d78@syzkaller.appspotmail.com Signed-off-by: Soumya Negi Link: https://lore.kernel.org/r/20230404074145.11523-1-soumya.negi97@gmail.com Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/tablet/pegasus_notetaker.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/input/tablet/pegasus_notetaker.c b/drivers/input/tablet/pegasus_notetaker.c index d836d3dcc6a2..a68da2988f9c 100644 --- a/drivers/input/tablet/pegasus_notetaker.c +++ b/drivers/input/tablet/pegasus_notetaker.c @@ -296,6 +296,12 @@ static int pegasus_probe(struct usb_interface *intf, pegasus->intf = intf; pipe = usb_rcvintpipe(dev, endpoint->bEndpointAddress); + /* Sanity check that pipe's type matches endpoint's type */ + if (usb_pipe_type_check(dev, pipe)) { + error = -EINVAL; + goto err_free_mem; + } + pegasus->data_len = usb_maxpacket(dev, pipe); pegasus->data = usb_alloc_coherent(dev, pegasus->data_len, GFP_KERNEL, From f8c3eb751a9bdbd1371da17f856d030bcde91f8e Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 29 Mar 2023 07:35:32 +0300 Subject: [PATCH 21/28] iio: adc: at91-sama5d2_adc: fix an error code in at91_adc_allocate_trigger() commit 73a428b37b9b538f8f8fe61caa45e7f243bab87c upstream. The at91_adc_allocate_trigger() function is supposed to return error pointers. Returning a NULL will cause an Oops. Fixes: 5e1a1da0f8c9 ("iio: adc: at91-sama5d2_adc: add hw trigger and buffer support") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/5d728f9d-31d1-410d-a0b3-df6a63a2c8ba@kili.mountain Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/at91-sama5d2_adc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/adc/at91-sama5d2_adc.c b/drivers/iio/adc/at91-sama5d2_adc.c index 870f4cb60923..3ad5678f2613 100644 --- a/drivers/iio/adc/at91-sama5d2_adc.c +++ b/drivers/iio/adc/at91-sama5d2_adc.c @@ -1409,7 +1409,7 @@ static struct iio_trigger *at91_adc_allocate_trigger(struct iio_dev *indio, trig = devm_iio_trigger_alloc(&indio->dev, "%s-dev%d-%s", indio->name, iio_device_id(indio), trigger_name); if (!trig) - return NULL; + return ERR_PTR(-ENOMEM); trig->dev.parent = indio->dev.parent; iio_trigger_set_drvdata(trig, indio); From 71b6df69f17e5dc31aa25a8d292980aabc8a703c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alexis=20Lothor=C3=A9?= Date: Tue, 4 Apr 2023 15:31:02 +0200 Subject: [PATCH 22/28] fpga: bridge: properly initialize bridge device before populating children MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit dc70eb868b9cd2ca01313e5a394e6ea001d513e9 upstream. The current code path can lead to warnings because of uninitialized device, which contains, as a consequence, uninitialized kobject. The uninitialized device is passed to of_platform_populate, which will at some point, while creating child device, try to get a reference on uninitialized parent, resulting in the following warning: kobject: '(null)' ((ptrval)): is not initialized, yet kobject_get() is being called. The warning is observed after migrating a kernel 5.10.x to 6.1.x. Reverting commit 0d70af3c2530 ("fpga: bridge: Use standard dev_release for class driver") seems to remove the warning. This commit aggregates device_initialize() and device_add() into device_register() but this new call is done AFTER of_platform_populate Fixes: 0d70af3c2530 ("fpga: bridge: Use standard dev_release for class driver") Signed-off-by: Alexis Lothoré Acked-by: Xu Yilun Link: https://lore.kernel.org/r/20230404133102.2837535-2-alexis.lothore@bootlin.com Signed-off-by: Xu Yilun Signed-off-by: Greg Kroah-Hartman --- drivers/fpga/fpga-bridge.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/fpga/fpga-bridge.c b/drivers/fpga/fpga-bridge.c index 727704431f61..13918c8c839e 100644 --- a/drivers/fpga/fpga-bridge.c +++ b/drivers/fpga/fpga-bridge.c @@ -360,7 +360,6 @@ fpga_bridge_register(struct device *parent, const char *name, bridge->dev.parent = parent; bridge->dev.of_node = parent->of_node; bridge->dev.id = id; - of_platform_populate(bridge->dev.of_node, NULL, NULL, &bridge->dev); ret = dev_set_name(&bridge->dev, "br%d", id); if (ret) @@ -372,6 +371,8 @@ fpga_bridge_register(struct device *parent, const char *name, return ERR_PTR(ret); } + of_platform_populate(bridge->dev.of_node, NULL, NULL, &bridge->dev); + return bridge; error_device: From b528537d131f99e2c3f0231bb8a216b3743e6043 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Tue, 4 Apr 2023 23:31:58 +0900 Subject: [PATCH 23/28] mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 1007843a91909a4995ee78a538f62d8665705b66 upstream. syzbot is reporting circular locking dependency which involves zonelist_update_seq seqlock [1], for this lock is checked by memory allocation requests which do not need to be retried. One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler. CPU0 ---- __build_all_zonelists() { write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd // e.g. timer interrupt handler runs at this moment some_timer_func() { kmalloc(GFP_ATOMIC) { __alloc_pages_slowpath() { read_seqbegin(&zonelist_update_seq) { // spins forever because zonelist_update_seq.seqcount is odd } } } } // e.g. timer interrupt handler finishes write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even } This deadlock scenario can be easily eliminated by not calling read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation requests. But Michal Hocko does not know whether we should go with this approach. Another deadlock scenario which syzbot is reporting is a race between kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with port->lock held and printk() from __build_all_zonelists() with zonelist_update_seq held. CPU0 CPU1 ---- ---- pty_write() { tty_insert_flip_string_and_push_buffer() { __build_all_zonelists() { write_seqlock(&zonelist_update_seq); build_zonelists() { printk() { vprintk() { vprintk_default() { vprintk_emit() { console_unlock() { console_flush_all() { console_emit_next_record() { con->write() = serial8250_console_write() { spin_lock_irqsave(&port->lock, flags); tty_insert_flip_string() { tty_insert_flip_string_fixed_flag() { __tty_buffer_request_room() { tty_buffer_alloc() { kmalloc(GFP_ATOMIC | __GFP_NOWARN) { __alloc_pages_slowpath() { zonelist_iter_begin() { read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held } } } } } } } } spin_unlock_irqrestore(&port->lock, flags); // message is printed to console spin_unlock_irqrestore(&port->lock, flags); } } } } } } } } } write_sequnlock(&zonelist_update_seq); } } } This deadlock scenario can be eliminated by preventing interrupt context from calling kmalloc(GFP_ATOMIC) and preventing printk() from calling console_flush_all() while zonelist_update_seq.seqcount is odd. Since Petr Mladek thinks that __build_all_zonelists() can become a candidate for deferring printk() [2], let's address this problem by disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC) and disabling synchronous printk() in order to avoid console_flush_all() . As a side effect of minimizing duration of zonelist_update_seq.seqcount being odd by disabling synchronous printk(), latency at read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and __GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e. do not record unnecessary locking dependency) from interrupt context is still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq) section... Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2] Reported-by: syzbot Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1] Signed-off-by: Tetsuo Handa Acked-by: Michal Hocko Acked-by: Mel Gorman Cc: Petr Mladek Cc: David Hildenbrand Cc: Ilpo Järvinen Cc: John Ogness Cc: Patrick Daly Cc: Sergey Senozhatsky Cc: Steven Rostedt Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/page_alloc.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6806a4824ae1..69668817fed3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6599,7 +6599,21 @@ static void __build_all_zonelists(void *data) int nid; int __maybe_unused cpu; pg_data_t *self = data; + unsigned long flags; + /* + * Explicitly disable this CPU's interrupts before taking seqlock + * to prevent any IRQ handler from calling into the page allocator + * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. + */ + local_irq_save(flags); + /* + * Explicitly disable this CPU's synchronous printk() before taking + * seqlock to prevent any printk() from trying to hold port->lock, for + * tty_insert_flip_string_and_push_buffer() on other CPU might be + * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. + */ + printk_deferred_enter(); write_seqlock(&zonelist_update_seq); #ifdef CONFIG_NUMA @@ -6638,6 +6652,8 @@ static void __build_all_zonelists(void *data) } write_sequnlock(&zonelist_update_seq); + printk_deferred_exit(); + local_irq_restore(flags); } static noinline void __init From 7a6593b5d7ad19ef61f5199e3c1420c829529b5d Mon Sep 17 00:00:00 2001 From: Daniel Baluta Date: Wed, 5 Apr 2023 12:26:55 +0300 Subject: [PATCH 24/28] ASoC: SOF: pm: Tear down pipelines only if DSP was active commit 0b186bb06198653d74a141902a7739e0bde20cf4 upstream. With PCI if the device was suspended it is brought back to full power and then suspended again. This doesn't happen when device is described via DT. We need to make sure that we tear down pipelines only if the device was previously active (thus the pipelines were setup). Otherwise, we can break the use_count: [ 219.009743] sof-audio-of-imx8m 3b6e8000.dsp: sof_ipc3_tear_down_all_pipelines: widget PIPELINE.2.SAI3.IN is still in use: count -1 and after this everything stops working. Fixes: d185e0689abc ("ASoC: SOF: pm: Always tear down pipelines before DSP suspend") Reviewed-by: Pierre-Louis Bossart Reviewed-by: Ranjani Sridharan Signed-off-by: Daniel Baluta Link: https://lore.kernel.org/r/20230405092655.19587-1-daniel.baluta@oss.nxp.com Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/sof/pm.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/sound/soc/sof/pm.c b/sound/soc/sof/pm.c index 8722bbd7fd3d..26ffcbb6e30f 100644 --- a/sound/soc/sof/pm.c +++ b/sound/soc/sof/pm.c @@ -183,6 +183,7 @@ static int sof_suspend(struct device *dev, bool runtime_suspend) const struct sof_ipc_tplg_ops *tplg_ops = sdev->ipc->ops->tplg; pm_message_t pm_state; u32 target_state = snd_sof_dsp_power_target(sdev); + u32 old_state = sdev->dsp_power_state.state; int ret; /* do nothing if dsp suspend callback is not set */ @@ -192,7 +193,12 @@ static int sof_suspend(struct device *dev, bool runtime_suspend) if (runtime_suspend && !sof_ops(sdev)->runtime_suspend) return 0; - if (tplg_ops && tplg_ops->tear_down_all_pipelines) + /* we need to tear down pipelines only if the DSP hardware is + * active, which happens for PCI devices. if the device is + * suspended, it is brought back to full power and then + * suspended again + */ + if (tplg_ops && tplg_ops->tear_down_all_pipelines && (old_state == SOF_DSP_PM_D0)) tplg_ops->tear_down_all_pipelines(sdev, false); if (sdev->fw_state != SOF_FW_BOOT_COMPLETE) From 6cb818ed5f08777e971cddda21a632490083aa1e Mon Sep 17 00:00:00 2001 From: Nikita Zhandarovich Date: Mon, 17 Apr 2023 06:32:42 -0700 Subject: [PATCH 25/28] ASoC: fsl_asrc_dma: fix potential null-ptr-deref commit 86a24e99c97234f87d9f70b528a691150e145197 upstream. dma_request_slave_channel() may return NULL which will lead to NULL pointer dereference error in 'tmp_chan->private'. Correct this behaviour by, first, switching from deprecated function dma_request_slave_channel() to dma_request_chan(). Secondly, enable sanity check for the resuling value of dma_request_chan(). Also, fix description that follows the enacted changes and that concerns the use of dma_request_slave_channel(). Fixes: 706e2c881158 ("ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End") Co-developed-by: Natalia Petrova Signed-off-by: Nikita Zhandarovich Acked-by: Shengjiu Wang Link: https://lore.kernel.org/r/20230417133242.53339-1-n.zhandarovich@fintech.ru Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/fsl/fsl_asrc_dma.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/sound/soc/fsl/fsl_asrc_dma.c b/sound/soc/fsl/fsl_asrc_dma.c index 3b81a465814a..05a7d1588d20 100644 --- a/sound/soc/fsl/fsl_asrc_dma.c +++ b/sound/soc/fsl/fsl_asrc_dma.c @@ -209,14 +209,19 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component *component, be_chan = soc_component_to_pcm(component_be)->chan[substream->stream]; tmp_chan = be_chan; } - if (!tmp_chan) - tmp_chan = dma_request_slave_channel(dev_be, tx ? "tx" : "rx"); + if (!tmp_chan) { + tmp_chan = dma_request_chan(dev_be, tx ? "tx" : "rx"); + if (IS_ERR(tmp_chan)) { + dev_err(dev, "failed to request DMA channel for Back-End\n"); + return -EINVAL; + } + } /* * An EDMA DEV_TO_DEV channel is fixed and bound with DMA event of each * peripheral, unlike SDMA channel that is allocated dynamically. So no * need to configure dma_request and dma_request2, but get dma_chan of - * Back-End device directly via dma_request_slave_channel. + * Back-End device directly via dma_request_chan. */ if (!asrc->use_edma) { /* Get DMA request of Back-End */ From 1831d8cbaea8c168ba1ff9ad0b2ab3879bc76b40 Mon Sep 17 00:00:00 2001 From: Chancel Liu Date: Tue, 18 Apr 2023 17:42:59 +0800 Subject: [PATCH 26/28] ASoC: fsl_sai: Fix pins setting for i.MX8QM platform commit 238787157d83969e5149c8e99787d5d90e85fbe5 upstream. SAI on i.MX8QM platform supports the data lines up to 4. So the pins setting should be corrected to 4. Fixes: eba0f0077519 ("ASoC: fsl_sai: Enable combine mode soft") Signed-off-by: Chancel Liu Acked-by: Shengjiu Wang Reviewed-by: Iuliana Prodan Link: https://lore.kernel.org/r/20230418094259.4150771-1-chancel.liu@nxp.com Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/fsl/fsl_sai.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c index df7c0bf37245..6d88af5b287f 100644 --- a/sound/soc/fsl/fsl_sai.c +++ b/sound/soc/fsl/fsl_sai.c @@ -1541,7 +1541,7 @@ static const struct fsl_sai_soc_data fsl_sai_imx8qm_data = { .use_imx_pcm = true, .use_edma = true, .fifo_depth = 64, - .pins = 1, + .pins = 4, .reg_offset = 0, .mclk0_is_mclk1 = false, .flags = 0, From ab91b09f399fe50de43c36549ee2c72b66ca1d3b Mon Sep 17 00:00:00 2001 From: Ekaterina Orlova Date: Fri, 21 Apr 2023 15:35:39 +0100 Subject: [PATCH 27/28] ASN.1: Fix check for strdup() success commit 5a43001c01691dcbd396541e6faa2c0077378f48 upstream. It seems there is a misprint in the check of strdup() return code that can lead to NULL pointer dereference. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 4520c6a49af8 ("X.509: Add simple ASN.1 grammar compiler") Signed-off-by: Ekaterina Orlova Cc: David Woodhouse Cc: James Bottomley Cc: Jarkko Sakkinen Cc: keyrings@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Link: https://lore.kernel.org/r/20230315172130.140-1-vorobushek.ok@gmail.com/ Signed-off-by: David Howells Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- scripts/asn1_compiler.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/asn1_compiler.c b/scripts/asn1_compiler.c index 71d4a7c87900..c3e501451b41 100644 --- a/scripts/asn1_compiler.c +++ b/scripts/asn1_compiler.c @@ -625,7 +625,7 @@ int main(int argc, char **argv) p = strrchr(argv[1], '/'); p = p ? p + 1 : argv[1]; grammar_name = strdup(p); - if (!p) { + if (!grammar_name) { perror(NULL); exit(1); } From ca1c9012c941ab1520851938d5f695f5a4d23634 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 26 Apr 2023 14:28:44 +0200 Subject: [PATCH 28/28] Linux 6.1.26 Link: https://lore.kernel.org/r/20230424131133.829259077@linuxfoundation.org Tested-by: Takeshi Ogasawara Tested-by: Guenter Roeck Tested-by: Markus Reichelt Tested-by: Bagas Sanjaya Tested-by: Salvatore Bonaccorso Tested-by: Conor Dooley Tested-by: Ron Economos Tested-by: Chris Paterson (CIP) Tested-by: Jon Hunter Tested-by: Linux Kernel Functional Testing Tested-by: Florian Fainelli Tested-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 1a4c4af370db..d2eff3747f2f 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 6 PATCHLEVEL = 1 -SUBLEVEL = 25 +SUBLEVEL = 26 EXTRAVERSION = NAME = Hurr durr I'ma ninja sloth