android_kernel_msm-6.1_nothing_sm7635

NothingOSS/android_kernel_msm-6.1_nothing_sm7635

Author	SHA1	Message	Date
Yan, Zheng	ad88f23f42	ceph: drop CAP_LINK_SHARED when sending "link" request to MDS To handle "link" request, the MDS need to xlock inode's linklock, which requires revoking any CAP_LINK_SHARED. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-08-09 17:54:32 -07:00
Nathaniel Yazdani	c338c07c51	ceph: fix null pointer dereference When register_session() is given an out-of-range argument for mds, ceph_mdsmap_get_addr() will return a null pointer, which would be given to ceph_con_open() & be dereferenced, causing a kernel oops. This fixes bug #4685 in the Ceph bug tracker <http://tracker.ceph.com/issues/4685>. Signed-off-by: Nathaniel Yazdani <n1ght.4nd.d4y@gmail.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-08-09 17:52:58 -07:00
majianpeng	494ddd11be	ceph: Don't forget the 'up_read(&osdc->map_sem)' if met error. CC: stable@vger.kernel.org Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-08-09 17:49:39 -07:00
Zach Brown	db62efbbf8	btrfs: don't loop on large offsets in readdir When btrfs readdir() hits the last entry it sets the readdir offset to a huge value to stop buggy apps from breaking when the same name is returned by readdir() with concurrent rename()s. But unconditionally setting the offset to INT_MAX causes readdir() to loop returning any entries with offsets past INT_MAX. It only takes a few hours of constant file creation and removal to create entries past INT_MAX. So let's set the huge offset to LLONG_MAX if the last entry has already overflowed 32bit loff_t. Without large offsets behaviour is identical. With large offsets 64bit apps will work and 32bit apps will be no more broken than they currently are if they see large offsets. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:34:56 -04:00
Josef Bacik	cfad392b22	Btrfs: check to see if root_list is empty before adding it to dead roots A user reported a panic when running with autodefrag and deleting snapshots. This is because we could end up trying to add the root to the dead roots list twice. To fix this check to see if we are empty before adding ourselves to the dead roots list. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:30:23 -04:00
Josef Bacik	f3b15ccdbb	Btrfs: release both paths before logging dir/changed extents The ceph guys tripped over this bug where we were still holding onto the original path that we used to copy the inode with when logging. This is based on Chris's fix which was reported to fix the problem. We need to drop the paths in two cases anyway so just move the drop up so that we don't have duplicate code. Thanks, Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:30:16 -04:00
Josef Bacik	ee20a98314	Btrfs: allow splitting of hole em's when dropping extent cache I noticed while running multi-threaded fsync tests that sometimes fsck would complain about an improper gap. This happens because we fail to add a hole extent to the file, which was happening when we'd split a hole EM because btrfs_drop_extent_cache was just discarding the whole em instead of splitting it. So this patch fixes this by allowing us to split a hole em properly, which means that added holes actually get logged properly and we no longer see this fsck error. Thankfully we're tolerant of these sort of problems so a user would not see any adverse effects of this bug, other than fsck complaining. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:30:09 -04:00
Josef Bacik	ed8c4913da	Btrfs: make sure the backref walker catches all refs to our extent Because we don't mess with the offset into the extent for compressed we will properly find both extents for this case [extent a][extent b][rest of extent a] but because we already added a ref for the front half we won't add the inode information for the second half. This causes us to leak that memory and not print out the other offset when we do logical-resolve. So fix this by calling ulist_add_merge and then add our eie to the existing entry if there is one. With this patch we get both offsets out of logical-resolve. With this and the other 2 patches I've sent we now pass btrfs/276 on my vm with compress-force=lzo set. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:30:03 -04:00
Josef Bacik	8ca15e05e6	Btrfs: fix backref walking when we hit a compressed extent If you do btrfs inspect-internal logical-resolve on a compressed extent that has been partly overwritten it won't find anything. This is because we try and match the extent offset we've searched for based on the extent offset in the data extent entry. However this doesn't work for compressed extents because the offsets are for the uncompressed size, not the compressed size. So instead only do this check if we are not compressed, that way we can get an actual entry for the physical offset rather than nothing for compressed. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:29:56 -04:00
Josef Bacik	b76bb70136	Btrfs: do not offset physical if we're compressed xfstest btrfs/276 was freaking out on slower boxes partly because fiemap was offsetting the physical based on the extent offset. This is perfectly fine with uncompressed extents, however the extent offset is into the uncompressed area, not the compressed. So we can return a physical value that isn't at all within the area we have allocated on disk. Fix this by returning the start of the extent if it is compressed no matter what the offset. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:29:50 -04:00
Liu Bo	b5b9b5b318	Btrfs: fix extent buffer leak after backref walking commit 47fb091fb787420cd195e66f162737401cce023f(Btrfs: fix unlock after free on rewinded tree blocks) takes an extra increment on the reference of allocated dummy extent buffer, so now we cannot free this dummy one, and end up with extent buffer leak. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:29:42 -04:00
Liu Bo	e68afa49ae	Btrfs: fix a bug of snapshot-aware defrag to make it work on partial extents For partial extents, snapshot-aware defrag does not work as expected, since a) we use the wrong logical offset to search for parents, which should be disk_bytenr + extent_offset, not just disk_bytenr, b) 'offset' returned by the backref walking just refers to key.offset, not the 'offset' stored in btrfs_extent_data_ref which is (key.offset - extent_offset). The reproducer: $ mkfs.btrfs sda $ mount sda /mnt $ btrfs sub create /mnt/sub $ for i in `seq 5 -1 1`; do dd if=/dev/zero of=/mnt/sub/foo bs=5k count=1 seek=$i conv=notrunc oflag=sync; done $ btrfs sub snap /mnt/sub /mnt/snap1 $ btrfs sub snap /mnt/sub /mnt/snap2 $ sync; btrfs filesystem defrag /mnt/sub/foo; $ umount /mnt $ btrfs-debug-tree sda (Here we can check whether the defrag operation is snapshot-awared. This addresses the above two problems. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:29:17 -04:00
Jie Liu	7cddc19392	btrfs: fix file truncation if FALLOC_FL_KEEP_SIZE is specified Create a small file and fallocate it to a big size with FALLOC_FL_KEEP_SIZE option, then truncate it back to the small size again, the disk free space is not changed back in this case. i.e, total 4 -rw-r--r-- 1 root root 512 Jun 28 11:35 test Filesystem Size Used Avail Use% Mounted on .... /dev/sdb1 8.0G 56K 7.2G 1% /mnt -rw-r--r-- 1 root root 512 Jun 28 11:35 /mnt/test Filesystem Size Used Avail Use% Mounted on .... /dev/sdb1 8.0G 5.1G 2.2G 70% /mnt Filesystem Size Used Avail Use% Mounted on .... /dev/sdb1 8.0G 5.1G 2.2G 70% /mnt With this fix, the truncated up space is back as: Filesystem Size Used Avail Use% Mounted on .... /dev/sdb1 8.0G 56K 7.2G 1% /mnt Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:28:56 -04:00
Oleg Nesterov	201d3dfa4d	dlm: kill the unnecessary and wrong device_close()->recalc_sigpending() device_close()->recalc_sigpending() is not needed, sigprocmask() takes care of TIF_SIGPENDING correctly. And without ->siglock it is racy and wrong, it can wrongly clear TIF_SIGPENDING and miss a signal. But even with this patch device_close() is still buggy: 1. sigprocmask() should not be used, we have set_task_blocked(), but this is minor. 2. We should never block SIGKILL or SIGSTOP, and this is what the code tries to do. 3. This can't protect against SIGKILL or SIGSTOP anyway. Another thread can do signal_wake_up(), say, do_signal_stop() or complete_signal() or debugger. 4. sigprocmask(SIG_BLOCK, allsigs) doesn't necessarily clears TIF_SIGPENDING, say, freezing() or ->jobctl. 5. device_write() looks equally wrong by the same reason. Looks like, this tries to protect some wait_event_interruptible() logic from signals, it should be turned into uninterruptible wait. Or we need to implement something like signals_stop/start for such a use-case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-09 10:48:20 -07:00
Jan Kara	97a2847d06	Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeffm/linux-reiserfs into for_next_testing Reiserfs locking fixes.	2013-08-09 10:49:58 +02:00
Paul Gortmaker	e99a03c6f5	jbd: use a single printk for jbd_debug() Backport of jbd2 commit `169f1a2a87` ("jbd2: use a single printk for jbd_debug()") Since the jbd_debug() is implemented with two separate printk() calls, it can lead to corrupted and misleading debug output like the following (see lines marked with ""): [ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes [ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104 [ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ [ 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: [* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103 [* 290.339382] JBD2: starting commit of transaction 42104 [ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records [ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079 i.e. the debug output from log_wait_commit and journal_commit_transaction have become interleaved. The output should have been: (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103 (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104 It is expected that this is not easy to replicate -- I was only able to cause it on preempt-rt kernels, and even then only under heavy I/O load. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jan Kara <jack@suse.cz>	2013-08-09 10:49:00 +02:00
Jaegeuk Kim	d71b5564c0	f2fs: introduce cur_cp_version function to reduce code size This patch introduces a new inline function, cur_cp_version, to reduce redundant codes. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-09 15:25:37 +09:00
Jaegeuk Kim	e518ff81c3	f2fs: fix inconsistency between xattr node blocks and its inode Previously xattr node blocks are stored to the COLD_NODE log, which means that our roll-forward mechanism doesn't recover the xattr node blocks at all. Only the direct node blocks in the WARM_NODE log can be recovered. So, let's resolve the issue simply by conducting checkpoint during fsync when a file has a modified xattr node block. This approach is able to degrade the performance, but normally the checkpoint overhead is shown at the initial fsync call after the xattr entry changes. Once the checkpoint is done, no additional overhead would be occurred. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-09 15:25:24 +09:00
Jaegeuk Kim	dbe6a5ff4f	f2fs: fix the use of XATTR_NODE_OFFSET This patch fixes the use of XATTR_NODE_OFFSET. o The offset should not use several MSB bits which are used by marking node blocks. o IS_DNODE should handle XATTR_NODE_OFFSET to avoid potential abnormality during the fsync call. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-09 14:57:56 +09:00
Piotr Sarna	6ae6514b33	ext4: fix mount/remount error messages for incompatible mount options Commit `5688978` ("ext4: improve handling of conflicting mount options") introduced incorrect messages shown while choosing wrong mount options. First of all, both cases of incorrect mount options, "data=journal,delalloc" and "data=journal,dioread_nolock" result in the same error message. Secondly, the problem above isn't solved for remount option: the mismatched parameter is simply ignored. Moreover, ext4_msg states that remount with options "data=journal,delalloc" succeeded, which is not true. To fix it up, I added a simple check after parse_options() call to ensure that data=journal and delalloc/dioread_nolock parameters are not present at the same time. Signed-off-by: Piotr Sarna <p.sarna@partner.samsung.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2013-08-08 23:02:24 -04:00
Theodore Ts'o	59d9fa5c2e	ext4: allow the mount options nodelalloc and data=journal Commit `26092bf` ("ext4: use a table-driven handler for mount options") wrongly disallows the specifying the mount options nodelalloc and data=journal simultaneously. This is incorrect; it should have only disallowed the combination of delalloc and data=journal simultaneously. Reported-by: Piotr Sarna <p.sarna@partner.samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2013-08-08 23:01:24 -04:00
Tejun Heo	8af01f56a0	cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/ The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-08-08 20:11:22 -04:00
Jeff Mahoney	d2d0395fd1	reiserfs: locking, release lock around quota operations Previous commits released the write lock across quota operations but missed several places. In particular, the free operations can also call into the file system code and take the write lock, causing deadlocks. This patch introduces some more helpers and uses them for quota call sites. Without this patch applied, reiserfs + quotas runs into deadlocks under anything more than trivial load. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2013-08-08 17:34:47 -04:00
Jeff Mahoney	278f6679f4	reiserfs: locking, handle nested locks properly The reiserfs write lock replaced the BKL and uses similar semantics. Frederic's locking code makes a distinction between when the lock is nested and when it's being acquired/released, but I don't think that's the right distinction to make. The right distinction is between the lock being released at end-of-use and the lock being released for a schedule. The unlock should return the depth and the lock should restore it, rather than the other way around as it is now. This patch implements that and adds a number of places where the lock should be dropped. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2013-08-08 17:34:46 -04:00
Jeff Mahoney	4c05141df5	reiserfs: locking, push write lock out of xattr code The reiserfs xattr code doesn't need the write lock and sleeps all over the place. We can simplify the locking by releasing it and reacquiring after the xattr call. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2013-08-08 17:27:19 -04:00
Linus Torvalds	55f5bfd4c9	ext4 bugfixes for v3.11-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJR+YUoAAoJENNvdpvBGATwxsgQAJzxe8x1SWW8jrvasMELW3bS I7gl3lu789GROmZo5MI6+XI1DSCjHcMTisoPnPGx8q9LNhiYQGjp9Loxgl+L+Cem wbDfkSHrhxtDoEw2rdpPb/hFHOpsv8t6TT53GVbdc5l/lxwyvdrblXxhWbLuTRCQ G0X2VoxvTph7SyVPpRqoU4U99vW3g6LRI0dsxyU8kLc/kp7/t89gwh6Gj0wlb6tb 4pswvEuxK4HauMPjajCIaEeGpQoHbvCVpB4P9HvEtL/wklc5hNIAWy5+KYjMFWil 8Vc6RsJv6DZc66z+X4PT1oYrx6bkhNKNRGBETV2/dzeLU/7o2ttUIQ0vlOlN9gKV xSne19OKwqhhRMuWrc3LDzunlFiLex2zHbZJVDOeKNXmEJVpO9ZPVkvpoi/g0vOE DPwcc89J+763CVmXu8uyPqgQRfJTp2dXNggP3lINCDi1mFayrBqR0S6rQ1B2m/6R 8HYDjJ6gBpXBcs0FTGh1cfbpI8GpPx9Xu3Q/FtjvczEXUO3831KCm+k7O3cUjB72 I7I3wmyYQvUanOc39fg+EZ0cXpiSIZG4Mjv+8vWgADFlCMUKMaC+vp2PuKmBxTuk 7dAkxKr2kXfMirGWF1R/vo/CaKnCWmJbAJDFJ8Mt/Nk3zn3NOl/kCRtxndlj7k+m /s8DHlVLyD8dgvzRO9gp =xqB3 -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o. Misc ext4 fixes, delayed by Ted moving mail servers and email getting marked as spam due to bad spf records. * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: add WARN_ON to check the length of allocated blocks ext4: fix retry handling in ext4_ext_truncate() ext4: destroy ext4_es_cachep on module unload ext4: make sure group number is bumped after a inode allocation race	2013-08-08 09:38:19 -07:00
Andy Adamson	97431204ea	NFSv4.1 Use clientid management rpc_clnt for secinfo_no_name As per RFC 5661 Security Considerations Commit `4edaa308` "NFS: Use "krb5i" to establish NFSv4 state whenever possible" uses the nfs_client cl_rpcclient for all clientid management operations. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-08 11:46:25 -04:00
Andy Adamson	5ec16a8500	NFSv4.1 Use clientid management rpc_clnt for secinfo As per RFC 3530 and RFC 5661 Security Considerations Commit `4edaa308` "NFS: Use "krb5i" to establish NFSv4 state whenever possible" uses the nfs_client cl_rpcclient for all clientid management operations. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-08 11:46:25 -04:00
Jaegeuk Kim	c2d715d144	f2fs: fix a build failure due to missing the kobject header This patch should resolve the following error reported by kbuild test robot. All error/warnings: In file included from fs/f2fs/dir.c:13:0: >> fs/f2fs/f2fs.h:435:17: error: field 's_kobj' has incomplete type struct kobject s_kobj; The failure was caused by missing the kobject header file in dir.c. So, this patch move the header file to the right location, f2fs.h. CC: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-08 15:56:49 +09:00
Trond Myklebust	b72888cb0b	NFSv4: Fix up nfs4_proc_lookup_mountpoint Currently, we do not check the return value of client = rpc_clone_client(), nor do we shut down the resulting cloned rpc_clnt in the case where a NFS4ERR_WRONGSEC has caused nfs4_proc_lookup_common() to replace the original value of 'client' (causing a memory leak). Fix both issues and simplify the code by moving the call to rpc_clone_client() until after nfs4_proc_lookup_common() has done its business. Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-07 20:47:26 -04:00
Benjamin LaHaise	f30d704fe1	aio: table lookup: verify ctx pointer Another shortcoming of the table lookup patch was revealed where the pointer was not being tested before being dereferenced. Verify this to avoid the NULL pointer dereference. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2013-08-07 18:23:48 -04:00
Trond Myklebust	eddffa4084	NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget() We only need to call it on the creation of the inode. Reported-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Steve Dickson <SteveD@redhat.com> Cc: Dave Quigley <dpquigl@davequigley.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-07 17:07:41 -04:00
Scott Mayhew	e890db0104	NFSv4: Fix the sync mount option for nfs4 mounts The sync mount option stopped working for NFSv4 mounts after commit `c02d7adf8c` (NFSv4: Replace nfs4_path_walk() with FS path lookup in a private namespace). If MS_SYNCHRONOUS is set in the super_block that we're cloning from, then it should be set in the new super_block as well. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-07 17:07:41 -04:00
Trond Myklebust	f8806c843f	NFS: Fix writeback performance issue on cache invalidation If a cache invalidation is triggered, and we happen to have a lot of writebacks cached at the time, then the call to invalidate_inode_pages2() will end up calling ->launder_page() on each and every dirty page in order to sync its contents to disk, thus defeating write coalescing. The following patch ensures that we try to sync the inode to disk before calling invalidate_inode_pages2() so that we do the writeback as efficiently as possible. Reported-by: William Dauchy <william@gandi.net> Reported-by: Pascal Bouchareine <pascal@gandi.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: William Dauchy <william@gandi.net> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2013-08-07 17:07:40 -04:00
Linus Torvalds	b7bc9e7d80	Oleg Nesterov has been working hard in closing all the holes that can lead to race conditions between deleting an event and accessing an event debugfs file. This included a fix to the debugfs system (acked by Greg Kroah-Hartman). We think that all the holes have been patched and hopefully we don't find more. I haven't marked all of them for stable because I need to examine them more to figure out how far back some of the changes need to go. Along the way, some other fixes have been made. Alexander Z Lam fixed some logic where the wrong buffer was being modifed. Andrew Vagin found a possible corruption for machines that actually allocate cpumask, as a reference to one was being zeroed out by mistake. Dhaval Giani found a bad prototype when tracing is not configured. And I not only had some changes to help Oleg, but also finally fixed a long standing bug that Dave Jones and others have been hitting, where a module unload and reload can cause the function tracing accounting to get screwed up. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAABAgAGBQJR/7FvAAoJEOdOSU1xswtMCAgH/RpxS5mQcQnhrGfu1qnSk+3v CyL1X9IkvgP5f+N59tbU3ZuE8Q3YQvTII35na38fgbHbwWrhYjLSvlf3Pwtre8zr Rjm9b8zA6iSobHu3DyuuupzMsLE+SpBakVSmG6mi8izZyuCi0YHMZMnHTGNnW9Vv YFqkfGgQQWMqiUHLcueetOqWAjR2JiJaLfr47rsRLNflF6dB2MbG/7YbYJ0TBk7E hemVmR7JH7606eJZ6nR7bh/hYv0sL2SSqVVaOHO5nWFLFbI8CybpNVGcoD+nihZY M7LrZT1C1mn3nKrfDhyXy4dwwx6wBON0fY23eJTHMVNnc0tvY1xqH52vTdPILWg= =UfsE -----END PGP SIGNATURE----- Merge tag 'trace-fixes-3.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Oleg Nesterov has been working hard in closing all the holes that can lead to race conditions between deleting an event and accessing an event debugfs file. This included a fix to the debugfs system (acked by Greg Kroah-Hartman). We think that all the holes have been patched and hopefully we don't find more. I haven't marked all of them for stable because I need to examine them more to figure out how far back some of the changes need to go. Along the way, some other fixes have been made. Alexander Z Lam fixed some logic where the wrong buffer was being modifed. Andrew Vagin found a possible corruption for machines that actually allocate cpumask, as a reference to one was being zeroed out by mistake. Dhaval Giani found a bad prototype when tracing is not configured. And I not only had some changes to help Oleg, but also finally fixed a long standing bug that Dave Jones and others have been hitting, where a module unload and reload can cause the function tracing accounting to get screwed up" * tag 'trace-fixes-3.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix reset of time stamps during trace_clock changes tracing: Make TRACE_ITER_STOP_ON_FREE stop the correct buffer tracing: Fix trace_dump_stack() proto when CONFIG_TRACING is not set tracing: Fix fields of struct trace_iterator that are zeroed by mistake tracing/uprobes: Fail to unregister if probe event files are in use tracing/kprobes: Fail to unregister if probe event files are in use tracing: Add comment to describe special break case in probe_remove_event_call() tracing: trace_remove_event_call() should fail if call/file is in use debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs) ftrace: Check module functions being traced on reload ftrace: Consolidate some duplicate code for updating ftrace ops tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private tracing: Introduce remove_event_file_dir() tracing: Change f_start() to take event_mutex and verify i_private != NULL tracing: Change event_filter_read/write to verify i_private != NULL tracing: Change event_enable/disable_read() to verify i_private != NULL tracing: Turn event/id->i_private into call->event.type	2013-08-07 13:01:30 -07:00
Andy Adamson	bc4b2a86a5	NFSv4.1 Increase NFS4_DEF_SLOT_TABLE_SIZE Increase NFS4_DEF_SLOT_TABLE_SIZE which is used as the client ca_maxreequests value in CREATE_SESSION. Current non-dynamic session slot server implementations use the client ca_maxrequests as a maximum slot number: 64 session slots can handle most workloads. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-07 13:10:40 -04:00
Andy Adamson	f8407299f6	NFS Remove unused authflavour parameter from init_client Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-07 13:09:30 -04:00
Chuck Lever	73d8bde5e4	NFS: Never use user credentials for lease renewal Never try to use a non-UID 0 user credential for lease management, as that credential can change out from under us. The server will block NFSv4 lease recovery with NFS4ERR_CLID_INUSE. Since the mechanism to acquire a credential for lease management is now the same for all minor versions, replace the minor version- specific callout with a single function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-07 13:06:08 -04:00
Chuck Lever	d688f7b8f6	NFS: Use root's credential for lease management when keytab is missing Commit `05f4c350` "NFS: Discover NFSv4 server trunking when mounting" Fri Sep 14 17:24:32 2012 introduced Uniform Client String support, which forces our NFS client to establish a client ID immediately during a mount operation rather than waiting until a user wants to open a file. Normally machine credentials (eg. from a keytab) are used to perform a mount operation that is protected by Kerberos. Before 05fc350, SETCLIENTID used a machine credential, or fell back to a regular user's credential if no keytab is available. On clients that don't have a keytab, performing SETCLIENTID early means there's no user credential to fall back on, since no regular user has kinit'd yet. `05f4c350` seems to have broken the ability to mount with sec=krb5 on clients that don't have a keytab in kernels 3.7 - 3.10. To address this regression, commit `4edaa308` (NFS: Use "krb5i" to establish NFSv4 state whenever possible), Sat Mar 16 15:56:20 2013, was merged in 3.10. This commit forces the NFS client to fall back to AUTH_SYS for lease management operations if no keytab is available. Neil Brown noticed that, since root is required to kinit to do a sec=krb5 mount when a client doesn't have a keytab, we can try to use root's Kerberos credential before AUTH_SYS. Now, when determining a principal and flavor to use for lease management, the NFS client tries in this order: 1. Flavor: AUTH_GSS, krb5i Principal: service principal (via keytab) 2. Flavor: AUTH_GSS, krb5i Principal: user principal established for UID 0 (via kinit) 3. Flavor: AUTH_SYS Principal: UID 0 / GID 0 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-07 13:05:10 -04:00
Trond Myklebust	6da1a03436	NFSv4: Refuse mount attempts with proto=udp RFC3530 disallows the use of udp as a transport protocol for NFSv4. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-07 12:37:04 -04:00
Jeff Layton	9597c13b2f	nfs: verify open flags before allowing an atomic open Currently, you can open a NFSv4 file with O_APPEND\|O_DIRECT, but cannot fcntl(F_SETFL,...) with those flags. This flag combination is explicitly forbidden on NFSv3 opens, and it seems like it should also be on NFSv4. Reported-by: Chao Ye <cye@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-08-07 12:16:22 -04:00
Weston Andros Adamson	58cd57bfd9	nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID - don't BUG_ON() when not SP4_NONE - calculate recv and send reserve sizes correctly Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-08-07 12:06:07 -04:00
J. Bruce Fields	c47205914c	nfsd4: Fix MACH_CRED NULL dereference Fixes a NULL-dereference on attempts to use MACH_CRED protection over auth_sys. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-08-07 12:05:51 -04:00
Jeff Layton	757c4f6260	cifs: don't instantiate new dentries in readdir for inodes that need to be revalidated immediately David reported that commit `c2b93e06` (cifs: only set ops for inodes in I_NEW state) caused a regression with mfsymlinks. Prior to that patch, if a mfsymlink dentry was instantiated at readdir time, the inode would get a new set of ops when it was revalidated. After that patch, this did not occur. This patch addresses this by simply skipping instantiating dentries in the readdir codepath when we know that they will need to be immediately revalidated. The next attempt to use that dentry will cause a new lookup to occur (which is basically what we want to happen anyway). Cc: <stable@vger.kernel.org> Cc: "Stefan (metze) Metzmacher" <metze@samba.org> Cc: Sachin Prabhu <sprabhu@redhat.com> Reported-and-Tested-by: David McBride <dwm37@cam.ac.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2013-08-07 10:57:06 -05:00
Jin Xu	a569469e96	f2fs: fix a deadlock in fsync This patch fixes a deadlock bug that occurs quite often when there are concurrent write and fsync on a same file. Following is the simplified call trace when tasks get hung. fsync thread: - f2fs_sync_file ... - f2fs_write_data_pages ... - update_extent_cache ... - update_inode - wait_on_page_writeback bdi writeback thread - __writeback_single_inode - f2fs_write_data_pages - mutex_lock(sbi->writepages) The deadlock happens when the fsync thread waits on a inode page that has been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately, no one else could be able to submit the cached bio to block layer for writeback. This is because the fsync thread already hold a sbi->fs_lock and the sbi->writepages lock, causing the bdi thread being blocked when attempt to write data pages for the same inode. At the same time, f2fs_gc thread does not notice the situation and could not help. Even the sync syscall gets blocked. To fix it, we could submit the cached bio first before waiting on a inode page that is being written back. Signed-off-by: Jin Xu <jinuxstyle@gmail.com> [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-06 22:00:36 +09:00
Namjae Jeon	df273efc36	f2fs: remove redundant code from f2fs_write_begin This code is being used for nobh_write_end() function. But since now f2fs_write_end function is added so there is no need for this code. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-06 22:00:35 +09:00
Namjae Jeon	d2dc095f42	f2fs: add sysfs entries to select the gc policy Add sysfs entry gc_idle to control the gc policy. Where gc_idle = 1 corresponds to selecting a cost benefit approach, while gc_idle = 2 corresponds to selecting a greedy approach to garbage collection. The selection is mutually exclusive one approach will work at any point. If gc_idle = 0, then this option is disabled. Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: change the select_gc_type() flow slightly] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-06 22:00:18 +09:00
Namjae Jeon	b59d0bae6c	f2fs: add sysfs support for controlling the gc_thread Add sysfs entries to control the timing parameters for f2fs gc thread. Various Sysfs options introduced are: gc_min_sleep_time: Min Sleep time for GC in ms gc_max_sleep_time: Max Sleep time for GC in ms gc_no_gc_sleep_time: Default Sleep time for GC in ms Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: fix an umount bug and some minor changes] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-06 21:53:34 +09:00
Trond Myklebust	9a1b6bf818	LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in which case we're in entirely the wrong namespace. Secondly, commit `8aac62706a` (move exit_task_namespaces() outside of exit_notify()) now means that exit_task_work() is called after exit_task_namespaces(), which triggers an Oops when we're freeing up the locks. Fix this by ensuring that we initialise the nlm_host's rpc_client at mount time, so that the cl_nodename field is initialised to the value of utsname()->nodename that the net namespace uses. Then replace the lockd callers of utsname()->nodename. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Toralf Förster <toralf.foerster@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Nix <nix@esperi.org.uk> Cc: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 3.10.x	2013-08-05 15:03:46 -04:00
Benjamin LaHaise	da90382c2e	aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3" In the patch "aio: convert the ioctx list to table lookup v3", incorrect handling in the ioctx_alloc() error path was introduced that lead to an ioctx being added via ioctx_add_table() while freed when the ioctx_alloc() call returned -EAGAIN due to hitting the aio_max_nr limit. Fix this by only calling ioctx_add_table() as the last step in ioctx_alloc(). Also, several unnecessary rcu_dereference() calls were added that lead to RCU warnings where the system was already protected by a spin lock for accessing mm->ioctx_table. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2013-08-05 13:21:43 -04:00

... 138 139 140 141 142 ...

39703 commits