I encountered a race issue after lengthy (~594647 sec) stress tests on
a 64k-page arm64 VM with several 4k-block EROFS images. The timing
is like below:
z_erofs_try_inplace_io z_erofs_fill_bio_vec
cmpxchg(&compressed_bvecs[].page,
NULL, ..)
[access bufvec]
compressed_bvecs[] = *bvec;
Previously, z_erofs_submit_queue() just accessed bufvec->page only, so
other fields in bufvec didn't matter. After the subpage block support
is landed, .offset and .end can be used too, but filling bufvec isn't
an atomic operation which can cause inconsistency.
Let's use a spinlock to keep the atomicity of each bufvec.
Fixes: 192351616a9d ("erofs: support I/O submission for sub-page compressed blocks")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Link: https://lore.kernel.org/r/20240125120039.3228103-1-hsiangkao@linux.alibaba.com
Bug: 324640522
(cherry picked from commit cc4b2dd95f0d1eba8c691b36e8f4d1795582f1ff
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ master)
[dhavale: introduced spinlock in struct erofs_workgroup, upstream has a
change where atomic is replaced by lockref but pulling that and related
changes will cause unnecessary churn. Adding spinlock keeps the spirit
of the change in-tact by fixing the race. Also updated commit message
as we are not using lockref.]
Change-Id: Id20a0a433277ab71d46bce48c81824564d1b391d
Signed-off-by: Sandeep Dhavale <dhavale@google.com>