Commit graph

39703 commits

Author SHA1 Message Date
Jaegeuk Kim
ae51fb31b8 f2fs: fix to call WRITE_FLUSH at the end of fsync
The fsync call should be ended after flushing the in-device caches.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-20 18:30:14 +09:00
Jaegeuk Kim
04431c44e5 f2fs: fix not to allocate max_nid
The build_free_nid should not add free nids over nm_i->max_nid.
But, there was a hole that invalid free nid was added by the following scenario.

Let's suppose nm_i->max_nid = 150 and the last NAT page has 100 ~ 200 nids.

build_free_nids
  - get_current_nat_page loads the last NAT page
  - scan_nat_page can add 100 ~ 200 nids
    -> Bug here!
So, when scanning an NAT page, we should check each candidate whether it is
over max_nid or not.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-20 18:30:13 +09:00
Jaegeuk Kim
c3850aa1cb f2fs: fix return value of releasepage for node and data
If the return value of releasepage is equal to zero, the page cannot be reclaimed.
Instead, we should return 1 in order to reclaim clean pages.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-20 18:30:13 +09:00
Jaegeuk Kim
48cb76c7be f2fs: scan next nat page to reuse free nids in there
When we build new free nids, let's scan the just next NAT page instead of
skipping a couple of previously scanned pages in order to reuse free nids in
there.
Otherwise, we can use too much wide range of nids even though several nids were
deallocated, and also their node pages can be cached in the node_inode's address
space.
This means that we can retain lots of clean pages in the main memory, which
induces mm's reclaiming overhead.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-20 18:30:12 +09:00
Jaegeuk Kim
08d8058be6 f2fs: should check the node page was truncated first
Currently, f2fs doesn't reclaim any node pages.
However, if we found that a node page was truncated by checking its block
address with zero during f2fs_write_node_page, we should not skip that node
page and return zero to reclaim it.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-20 18:30:12 +09:00
Jaegeuk Kim
393ff91f57 f2fs: reduce unncessary locking pages during read
This patch reduces redundant locking and unlocking pages during read operations.
In f2fs_readpage, let's use wait_on_page_locked() instead of lock_page.
And then, when we need to modify any data finally, let's lock the page so that
we can avoid lock contention.

[readpage rule]
- The f2fs_readpage returns unlocked page, or released page too in error cases.
- Its caller should handle read error, -EIO, after locking the page, which
  indicates read completion.
- Its caller should check PageUptodate after grab_cache_page.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-20 18:30:06 +09:00
Linus Torvalds
10b38669d6 - Fix for a potential infinite loop which was introduced in 4d559a3bcb
- Fix for the return type of xfs_iomap_eof_prealloc_initial_size
   from a1e16c2666
 - Fix for a failed buffer readahead causing subsequent callers to
   fail incorrectly
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRSOIAAAoJENaLyazVq6ZODqQP/2m1iZVIA9CXFf5hS2QZgkc2
 MHq+QaQ1aaZlAIRCnZO4XrWoLw4tH7AmsHA7dVJVz/ZhVrJg4ahfdSS6qR5EGWFb
 I5uE8LD8ZhpIiW6mBytJ7g9ST6xnaeean2sMwa0BcVK3uF84nO/uBopntZVrVlZE
 sMuklZe8GfxDpF6SBxVGG+5+OeLXzFmf+s+xoCYN410uuzYoT8/jveFP6a5ARcmH
 xEcOJA2+3o2z4/fsdx/Euf6LnDMSyOsAFUJCtnmBdKUA5w9DrJJqGpDDPEkg9h6d
 /DTPYXEWx6+w4xoMnIf09oEdCSamBVTWcRFXtftN03VNrbRNtyVwAc8HUaSNmt0p
 I3P/b5NJ5guH7uK72jp61N2RP7D5KOqwkwR58Y1SJWuwcgatYuB3NM5UeUyJBILj
 ViZ4DsKGE6BCl8T3hwkN+mxSxB+o7O8AypjWdEviBXbVIG9CwOxr1IEatl3eyV5T
 8QsNFb0LJcWzl1+F/uUYe1Goeqxvzupt7omUaRONdMnac3uFIk0ARrdxXFgawIJ9
 lgeftBCmMkqqLZUACSfmfCYNwyupz3E6bYB7Azwx01qg7CzTPUfIL2SxqDYp2dup
 /s+R7HL4HOJ0FCzjCZxHHO/1jsWgu265dJdpaQw/UcIe2IuEFGr558deHEM62bDW
 rWCVHj5eY5NRGyzSwzqB
 =41Vk
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.9-rc4' of git://oss.sgi.com/xfs/xfs

Pull XFS fixes from Ben Myers:

 - Fix for a potential infinite loop which was introduced in commit
   4d559a3bcb ("xfs: limit speculative prealloc near ENOSPC
   thresholds")

 - Fix for the return type of xfs_iomap_eof_prealloc_initial_size from
   commit a1e16c2666 ("xfs: limit speculative prealloc size on sparse
   files")

 - Fix for a failed buffer readahead causing subsequent callers to fail
   incorrectly

* tag 'for-linus-v3.9-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: ensure we capture IO errors correctly
  xfs: fix xfs_iomap_eof_prealloc_initial_size type
  xfs: fix potential infinite loop in xfs_iomap_prealloc_size()
2013-03-19 15:17:40 -07:00
Masanari Iida
434720fa98 f2fs: Fix typo in comments
Correct spelling typo in comments

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-03-19 09:44:55 +01:00
Alexandru Gheorghiu
c53026722b pstore: Replace calls to kmalloc and memcpy with kmemdup
Replaced calls to kmalloc and memcpy with a single call to kmemdup. This
patch was found using coccicheck.

Signed-off-by: Alexandru Gheorghiu <gheorghiuandru@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Anton Vorontsov <anton@enomsg.org>
2013-03-18 19:38:26 -07:00
Jeff Layton
ac534ff2d5 nfsd: fix startup order in nfsd_reply_cache_init
If we end up doing "goto out_nomem" in this function, we'll call
nfsd_reply_cache_shutdown. That will attempt to walk the LRU list and
free entries, but that list may not be initialized yet if the server is
starting up for the first time. It's also possible for the shrinker to
kick in before we've initialized the LRU list.

Rearrange the initialization so that the LRU list_head and cache size
are initialized before doing any of the allocations that might fail.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-18 17:21:30 -04:00
Jeff Layton
a517b608fa nfsd: only unhash DRC entries that are in the hashtable
It's not safe to call hlist_del() on a newly initialized hlist_node.
That leads to a NULL pointer dereference. Only do that if the entry
is hashed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-18 14:58:32 -04:00
Dave Chinner
e001873853 xfs: ensure we capture IO errors correctly
Failed buffer readahead can leave the buffer in the cache marked
with an error. Most callers that then issue a subsequent read on the
buffer do not zero the b_error field out, and so we may incorectly
detect an error during IO completion due to the stale error value
left on the buffer.

Avoid this problem by zeroing the error before IO submission. This
ensures that the only IO errors that are detected those captured
from are those captured from bio submission or completion.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit c163f9a176)
2013-03-18 13:39:10 -05:00
Mark Tinguely
3325beed46 xfs: fix xfs_iomap_eof_prealloc_initial_size type
Fix the return type of xfs_iomap_eof_prealloc_initial_size() to
xfs_fsblock_t to reflect the fact that the return value may be an
unsigned 64 bits if XFS_BIG_BLKNOS is defined.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit e8108cedb1)
2013-03-18 13:38:50 -05:00
Brian Foster
83cdadd8b0 xfs: fix potential infinite loop in xfs_iomap_prealloc_size()
If freesp == 0, we could end up in an infinite loop while squashing
the preallocation. Break the loop when we've killed the prealloc
entirely.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit e78c420bfc)
2013-03-18 13:30:38 -05:00
Dmitry Monakhov
0e401101db ext4: fix memory leakage in mext_check_coverage
Regression was introduced by following commit 8c854473
TESTCASE (git://oss.sgi.com/xfs/cmds/xfstests.git):
#while true;do ./check 301 || break ;done

Also fix potential memory leakage in get_ext_path() once
ext4_ext_find_extent() have failed.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-18 11:40:19 -04:00
Zhang Yanfei
ee68a3c625 fs: befs: remove cast for kmalloc return value
remove cast for kmalloc return value.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-03-18 14:15:59 +01:00
Zhang Yanfei
194c8767ce fs: ufs: remove cast for kmalloc return value
remove cast for kmalloc return value.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-03-18 14:15:58 +01:00
Namjae Jeon
25c0a6e529 f2fs: avoid extra ++ while returning from get_node_path
In all the breaking conditions in get_node_path, 'n' is used to
track index in offset[] array, but while breaking out also, in all
paths n++ is done.
So, remove the ++ from breaking paths. Also, avoid
reset of 'level=0' in first case.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-18 21:00:36 +09:00
Jaegeuk Kim
5a20d339c7 f2fs: align f2fs maximum name length to linux based filesystem
The maximum filename length supported in linux is 255 characters.
So let's follow that.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-18 21:00:35 +09:00
Namjae Jeon
3aa770a9c9 f2fs: optimize and change return path in lookup_free_nid_list
Optimize and change return path in lookup_free_nid_list

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-18 21:00:35 +09:00
Namjae Jeon
e0f56cb44b f2fs: optimize get node page readahead part
We can remove the call to find_get_page to get a page from the cache
and check for up-to-date, instead we can make use of grab_cache_page
part itself to fetch the page from the cache.
So, removing the call and moving the PageUptodate at proper place, also
taken care of moving the lock_page condition in the page_hit part.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-18 21:00:34 +09:00
Changman Lee
52c2db3f95 f2fs: check the level before calling get_nid function
The caller of get_nid should be careful not to put lower value than
NODE_DIR1_BLOCK in case of level is zero.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-18 21:00:34 +09:00
Jaegeuk Kim
266e97a81c f2fs: introduce readahead mode of node pages
Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
with RDONLY_NODE flag.
And, this flag is set by the following functions.
- get_data_block_ro
- get_lock_data_page
- do_write_data_page
- truncate_blocks
- truncate_hole

However, this readahead mechanism is initially introduced for the use of
get_data_block_ro to enhance the sequential read performance.

So, let's clarify all the cases with the additional modes as follows.

enum {
	ALLOC_NODE,	/* allocate a new node page if needed */
	LOOKUP_NODE,	/* look up a node without readahead */
	LOOKUP_NODE_RA,	/*
			 * look up a node with readahead called
			 * by get_datablock_ro.
			 */
}

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2013-03-18 21:00:33 +09:00
Jaegeuk Kim
66d36a2944 f2fs: read with READ_SYNC when getting dnode page
The get_node_page_ra tries to:
1. grab or read a target node page for the given nid,
2. then, call ra_node_page to read other adjacent node pages in advance.

So, when we try to read a target node page by #1, we should submit bio with
READ_SYNC instead of READA.
And, in #2, READA should be used.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2013-03-18 21:00:33 +09:00
Jaegeuk Kim
12faafe454 f2fs: fix to unlock node page when it was truncated
If the node page was truncated, its block address became zero.
This means that we don't need to write the node page, but have to unlock
NODE_WRITE, decrease the number of dirty node pages, and then unlock_page
before returning the f2fs_write_node_page with zero.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-18 21:00:09 +09:00
Linus Torvalds
08637024ab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Eric's rcu barrier patch fixes a long standing problem with our
  unmount code hanging on to devices in workqueue helpers.  Liu Bo
  nailed down a difficult assertion for in-memory extent mappings."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix warning of free_extent_map
  Btrfs: fix warning when creating snapshots
  Btrfs: return as soon as possible when edquot happens
  Btrfs: return EIO if we have extent tree corruption
  btrfs: use rcu_barrier() to wait for bdev puts at unmount
  Btrfs: remove btrfs_try_spin_lock
  Btrfs: get better concurrency for snapshot-aware defrag work
2013-03-17 11:04:14 -07:00
Liu Bo
3b2775942d Btrfs: fix warning of free_extent_map
Users report that an extent map's list is still linked when it's actually
going to be freed from cache.

The story is that

a) when we're going to drop an extent map and may split this large one into
smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
that it's on the list to be logged, then the smaller ems split from it will also
be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.

b) we'll keep ems from unlinking the list and freeing when they are flagged with
EXTENT_FLAG_LOGGING, because the log code holds one reference.

The end result is the warning, but the truth is that we set the flag
EXTENT_FLAG_LOGGING only during fsync.

So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.

Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-15 21:51:49 -04:00
Christoph Hellwig
56cea2d088 xfs: take inode version into account in XFS_LITINO
Add a version argument to XFS_LITINO so that it can return different values
depending on the inode version.  This is required for the upcoming v3 inodes
with a larger fixed layout dinode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-14 16:19:14 -05:00
Dave Chinner
c163f9a176 xfs: ensure we capture IO errors correctly
Failed buffer readahead can leave the buffer in the cache marked
with an error. Most callers that then issue a subsequent read on the
buffer do not zero the b_error field out, and so we may incorectly
detect an error during IO completion due to the stale error value
left on the buffer.

Avoid this problem by zeroing the error before IO submission. This
ensures that the only IO errors that are detected those captured
from are those captured from bio submission or completion.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-14 15:56:53 -05:00
Jeff Liu
d8ddfe81c7 xfs: Remove obsoleted m_inode_shrink from xfs_mount structure
Looks the old m_inode_shrink is obsoleted as we perform inodes reclaim per AG via
m_reclaim_workqueue, this patch remove it from the xfs_mount structure if so.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-03-14 15:55:32 -05:00
Linus Torvalds
40e4591d94 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3, reiserfs, quota fixes from Jan Kara:
 "A fix for regression in ext2, and a format string issue in ext3.  The
  rest isn't too serious."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Fix BUG_ON in evict() on inode deletion
  reiserfs: Use kstrdup instead of kmalloc/strcpy
  ext3: Fix format string issues
  quota: add missing use of dq_data_lock in __dquot_initialize
2013-03-14 12:11:28 -07:00
Liu Bo
7c2ec3f073 Btrfs: fix warning when creating snapshots
Creating snapshot passes extent_root to commit its transaction,
but it can lead to the warning of checking root for quota in
the __btrfs_end_transaction() when someone else is committing
the current transaction.  Since we've recorded the needed root
in trans_handle, just use it to get rid of the warning.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:57:30 -04:00
Wang Shilong
720f1e2060 Btrfs: return as soon as possible when edquot happens
If one of qgroup fails to reserve firstly, we should return immediately,
it is unnecessary to continue check.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:57:29 -04:00
Josef Bacik
492104c866 Btrfs: return EIO if we have extent tree corruption
The callers of lookup_inline_extent_info all handle getting an error back
properly, so return an error if we have corruption instead of being a jerk and
panicing.  Still WARN_ON() since this is kind of crucial and I've been seeing it
a bit too much recently for my taste, I think we're doing something wrong
somewhere.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:57:29 -04:00
Eric Sandeen
bc178622d4 btrfs: use rcu_barrier() to wait for bdev puts at unmount
Doing this would reliably fail with -EBUSY for me:

# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy

because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.

Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:

btrfs_close_devices
	__btrfs_close_devices
		call_rcu(&device->rcu, free_device);
			free_device
				INIT_WORK(&device->rcu_work, __free_device);
				schedule_work(&device->rcu_work);

so unmount might complete before __free_device fires & does its blkdev_put.

Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:57:29 -04:00
Liu Bo
d340d2475c Btrfs: remove btrfs_try_spin_lock
Remove a useless function declaration

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:57:10 -04:00
Liu Bo
a09a0a705d Btrfs: get better concurrency for snapshot-aware defrag work
Using spinning case instead of blocking will result in better concurrency
overall.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:50:19 -04:00
Artem Bityutskiy
67e753ca41 UBIFS: make space fixup work in the remount case
The UBIFS space fixup is a useful feature which allows to fixup the "broken"
flash space at the time of the first mount. The "broken" space is usually the
result of using a "dumb" industrial flasher which is not able to skip empty
NAND pages and just writes all 0xFFs to the empty space, which has grave
side-effects for UBIFS when UBIFS trise to write useful data to those empty
pages.

The fix-up feature works roughly like this:
1. mkfs.ubifs sets the fixup flag in UBIFS superblock when creating the image
   (see -F option)
2. when the file-system is mounted for the first time, UBIFS notices the fixup
   flag and re-writes the entire media atomically, which may take really a lot
   of time.
3. UBIFS clears the fixup flag in the superblock.

This works fine when the file system is mounted R/W for the very first time.
But it did not really work in the case when we first mount the file-system R/O,
and then re-mount R/W. The reason was that we started the fixup procedure too
late, which we cannot really do because we have to fixup the space before it
starts being used.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reported-by: Mark Jackson <mpfj-list@mimc.co.uk>
Cc: stable@vger.kernel.org # 3.0+
2013-03-14 11:20:22 +02:00
Linus Torvalds
aea8b5d1e5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace bugfixes from Eric Biederman:
 "This tree includes a partial revert for "fs: Limit sys_mount to only
  request filesystem modules." When I added the new style module aliases
  to the filesystems I deleted the old ones.  A bad move.  It turns out
  that distributions like Arch linux use module aliases when
  constructing ramdisks.  Which meant ultimately that an ext3 filesystem
  mounted with ext4 would not result in the ext4 module being put into
  the ramdisk.

  The other change in this tree adds a handful of filesystem module
  alias I simply failed to add the first time.  Which inconvinienced a
  few folks using cifs.

  I don't want to inconvinience folks any longer than I have to so here
  are these trivial fixes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  fs: Readd the fs module aliases.
  fs: Limit sys_mount to only request filesystem modules. (Part 3)
2013-03-13 15:47:50 -07:00
Tejun Heo
ebd6c70714 nfsd: convert to idr_alloc()
idr_get_new*() and friends are about to be deprecated.  Convert to the
new idr_alloc() interface.

Only compile-tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-13 15:21:45 -07:00
Tejun Heo
801cb2d62d nfsd: remove unused get_new_stid()
get_new_stid() is no longer used since commit 3abdb60712 ("nfsd4:
simplify idr allocation").  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-13 15:21:45 -07:00
Mateusz Guzik
24261fc23d cifs: delay super block destruction until all cifsFileInfo objects are gone
cifsFileInfo objects hold references to dentries and it is possible that
these will still be around in workqueues when VFS decides to kill super
block during unmount.

This results in panics like this one:
BUG: Dentry ffff88001f5e76c0{i=66b4a,n=1M-2} still in use (1) [unmount of cifs cifs]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:943!
[..]
Process umount (pid: 1781, threadinfo ffff88003d6e8000, task ffff880035eeaec0)
[..]
Call Trace:
 [<ffffffff811b44f3>] shrink_dcache_for_umount+0x33/0x60
 [<ffffffff8119f7fc>] generic_shutdown_super+0x2c/0xe0
 [<ffffffff8119f946>] kill_anon_super+0x16/0x30
 [<ffffffffa036623a>] cifs_kill_sb+0x1a/0x30 [cifs]
 [<ffffffff8119fcc7>] deactivate_locked_super+0x57/0x80
 [<ffffffff811a085e>] deactivate_super+0x4e/0x70
 [<ffffffff811bb417>] mntput_no_expire+0xd7/0x130
 [<ffffffff811bc30c>] sys_umount+0x9c/0x3c0
 [<ffffffff81657c19>] system_call_fastpath+0x16/0x1b

Fix this by making each cifsFileInfo object hold a reference to cifs
super block, which implicitly keeps VFS super block around as well.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-and-Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-13 14:12:06 -05:00
Sachin Prabhu
47c78f4a70 cifs: map NT_STATUS_SHARING_VIOLATION to EBUSY instead of ETXTBSY
NT_SHARING_VIOLATION errors are mapped to ETXTBSY which is unexpected
for operations such as unlink where we can hit these errors.

The patch maps the error NT_SHARING_VIOLATION to EBUSY instead. The
patch also replaces all instances of ETXTBSY in
cifs_rename_pending_delete() with EBUSY.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-13 14:09:20 -05:00
Jan Kara
c288d29696 ext2: Fix BUG_ON in evict() on inode deletion
Commit 8e3dffc6 introduced a regression where deleting inode with
large extended attributes leads to triggering
  BUG_ON(inode->i_state != (I_FREEING | I_CLEAR))
in fs/inode.c:evict(). That happens because freeing of xattr block
dirtied the inode and it happened after clear_inode() has been called.

Fix the issue by moving removal of xattr block into ext2_evict_inode()
before clear_inode() call close to a place where data blocks are
truncated. That is also more logical place and removes surprising
requirement that ext2_free_blocks() mustn't dirty the inode.

Reported-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-13 15:23:44 +01:00
Eric W. Biederman
fa7614ddd6 fs: Readd the fs module aliases.
I had assumed that the only use of module aliases for filesystems
prior to "fs: Limit sys_mount to only request filesystem modules."
was in request_module.  It turns out I was wrong.  At least mkinitcpio
in Arch linux uses these aliases.

So readd the preexising aliases, to keep from breaking userspace.

Userspace eventually will have to follow and use the same aliases the
kernel does.  So at some point we may be delete these aliases without
problems.  However that day is not today.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-12 18:55:21 -07:00
Mathieu Desnoyers
8aec0f5d41 Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys
Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
compat_process_vm_rw() shows that the compatibility code requires an
explicit "access_ok()" check before calling
compat_rw_copy_check_uvector(). The same difference seems to appear when
we compare fs/read_write.c:do_readv_writev() to
fs/compat.c:compat_do_readv_writev().

This subtle difference between the compat and non-compat requirements
should probably be debated, as it seems to be error-prone. In fact,
there are two others sites that use this function in the Linux kernel,
and they both seem to get it wrong:

Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
also ends up calling compat_rw_copy_check_uvector() through
aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
be missing. Same situation for
security/keys/compat.c:compat_keyctl_instantiate_key_iov().

I propose that we add the access_ok() check directly into
compat_rw_copy_check_uvector(), so callers don't have to worry about it,
and it therefore makes the compat call code similar to its non-compat
counterpart. Place the access_ok() check in the same location where
copy_from_user() can trigger a -EFAULT error in the non-compat code, so
the ABI behaviors are alike on both compat and non-compat.

While we are here, fix compat_do_readv_writev() so it checks for
compat_rw_copy_check_uvector() negative return values.

And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
handling.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-12 11:05:45 -07:00
Lukas Czerner
4f42f80a8f ext4: use s_extent_max_zeroout_kb value as number of kb
Currently when converting extent to initialized, we have to decide
whether to zeroout part/all of the uninitialized extent in order to
avoid extent tree growing rapidly.

The decision is made by comparing the size of the extent with the
configurable value s_extent_max_zeroout_kb which is in kibibytes units.

However when converting it to number of blocks we currently use it as it
was in bytes. This is obviously bug and it will result in ext4 _never_
zeroout extents, but rather always split and convert parts to
initialized while leaving the rest uninitialized in default setting.

Fix this by using s_extent_max_zeroout_kb as kibibytes.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-03-12 12:40:04 -04:00
Al Viro
a930d87905 vfs: fix pipe counter breakage
If you open a pipe for neither read nor write, the pipe code will not
add any usage counters to the pipe, causing the 'struct pipe_inode_info"
to be potentially released early.

That doesn't normally matter, since you cannot actually use the pipe,
but the pipe release code - particularly fasync handling - still expects
the actual pipe infrastructure to all be there.  And rather than adding
NULL pointer checks, let's just disallow this case, the same way we
already do for the named pipe ("fifo") case.

This is ancient going back to pre-2.4 days, and until trinity, nobody
naver noticed.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-12 08:29:17 -07:00
Theodore Ts'o
90ba983f68 ext4: use atomic64_t for the per-flexbg free_clusters count
A user who was using a 8TB+ file system and with a very large flexbg
size (> 65536) could cause the atomic_t used in the struct flex_groups
to overflow.  This was detected by PaX security patchset:

http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551

This bug was introduced in commit 9f24e4208f, so it's been around
since 2.6.30.  :-(

Fix this by using an atomic64_t for struct orlav_stats's
free_clusters.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org
2013-03-11 23:39:59 -04:00
Ionut-Gabriel Radu
af591ad896 reiserfs: Use kstrdup instead of kmalloc/strcpy
Signed-off-by: Ionut-Gabriel Radu <ihonius@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11 22:05:57 +01:00