#
df152c24 |
| 22-Jun-2009 |
Sunil Mushran <sunil.mushran@oracle.com> |
ocfs2: Disable orphan scanning for local and hard-ro mounts Local and Hard-RO mounts do not need orphan scanning. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-
ocfs2: Disable orphan scanning for local and hard-ro mounts Local and Hard-RO mounts do not need orphan scanning. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
show more ...
|
#
83273932 |
| 03-Jun-2009 |
Srinivas Eeda <srinivas.eeda@oracle.com> |
ocfs2: timer to queue scan of all orphan slots When a dentry is unlinked, the unlinking node takes an EX on the dentry lock before moving the dentry to the orphan directory. Other nodes
ocfs2: timer to queue scan of all orphan slots When a dentry is unlinked, the unlinking node takes an EX on the dentry lock before moving the dentry to the orphan directory. Other nodes that have this dentry in cache have a PR on the same dentry lock. When the EX is requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED during downconvert. The inode is finally deleted when the last node to iput the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set. A problem arises if a node is forced to free dentry locks because of memory pressure. If this happens, the node will no longer get downconvert notifications for the dentries that have been unlinked on another node. If it also happens that node is actively using the corresponding inode and happens to be the one performing the last iput on that inode, it will fail to delete the inode as it will not have the MAYBE_ORPHANED flag set. This patch fixes this shortcoming by introducing a periodic scan of the orphan directories to delete such inodes. Care has been taken to distribute the workload across the cluster so that no one node has to perform the task all the time. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
show more ...
|
#
dfa13f39 |
| 29-Apr-2009 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Fix a missing credit when deleting from indexed directories. The ocfs2 directory index updates two blocks when we remove an entry - the dx root and the dx leaf. OCFS2_DELETE_INOD
ocfs2: Fix a missing credit when deleting from indexed directories. The ocfs2 directory index updates two blocks when we remove an entry - the dx root and the dx leaf. OCFS2_DELETE_INODE_CREDITS was only accounting for the dx leaf. This shows up when ocfs2_delete_inode() runs out of credits in jbd2_journal_dirty_metadata() at "J_ASSERT_JH(jh, handle->h_buffer_credits > 0);". The test that caught this was running dirop_file_racer from the ocfs2-test suite with a 250-character filename PREFIX. Run on a 512B blocksize, it forces the orphan dir index to grow large enough to trigger. Signed-off-by: Joel Becker <joel.becker@oracle.com>
show more ...
|
#
9140db04 |
| 06-Mar-2009 |
Srinivas Eeda <srinivas.eeda@oracle.com> |
ocfs2: recover orphans in offline slots during recovery and mount During recovery, a node recovers orphans in it's slot and the dead node(s). But if the dead nodes were holding orphans i
ocfs2: recover orphans in offline slots during recovery and mount During recovery, a node recovers orphans in it's slot and the dead node(s). But if the dead nodes were holding orphans in offline slots, they will be left unrecovered. If the dead node is the last one to die and is holding orphans in other slots and is the first one to mount, then it only recovers it's own slot, which leaves orphans in offline slots. This patch queues complete_recovery to clean orphans for all offline slots during mount and node recovery. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
Revision tags: v2.6.29-rc4 |
|
#
e7c17e43 |
| 29-Jan-2009 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Introduce dir free space list The only operation which doesn't get faster with directory indexing is insert, which still has to walk the entire unindexed directory portion to
ocfs2: Introduce dir free space list The only operation which doesn't get faster with directory indexing is insert, which still has to walk the entire unindexed directory portion to find a free block. This patch provides an improvement in directory insert performance by maintaining a singly linked list of directory leaf blocks which have space for additional dirents. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
show more ...
|
Revision tags: v2.6.29-rc3, v2.6.29-rc2, v2.6.29-rc1, v2.6.28, v2.6.28-rc9, v2.6.28-rc8, v2.6.28-rc7 |
|
#
4ed8a6bb |
| 24-Nov-2008 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Store dir index records inline Allow us to store a small number of directory index records in the ocfs2_dx_root_block. This saves us a disk read on small to medium sized direc
ocfs2: Store dir index records inline Allow us to store a small number of directory index records in the ocfs2_dx_root_block. This saves us a disk read on small to medium sized directories (less than about 250 entries). The inline root is automatically turned into a root block with extents if the directory size increases beyond it's capacity. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
show more ...
|
Revision tags: v2.6.28-rc6, v2.6.28-rc5 |
|
#
9b7895ef |
| 12-Nov-2008 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Add a name indexed b-tree to directory inodes This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of sm
ocfs2: Add a name indexed b-tree to directory inodes This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
show more ...
|
#
96a6c64b |
| 16-Dec-2008 |
Sunil Mushran <sunil.mushran@oracle.com> |
ocfs2: Move struct recovery_map to a header file Move the definition of struct recovery_map from journal.c to journal.h. This is preparation for the next patch. Signed-off-by: S
ocfs2: Move struct recovery_map to a header file Move the definition of struct recovery_map from journal.c to journal.h. This is preparation for the next patch. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
#
7f5aa215 |
| 10-Feb-2009 |
Jan Kara <jack@suse.cz> |
jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() If we race with commit code setting i_transaction to NULL, we could possibly dereference it. Proper lockin
jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() If we race with commit code setting i_transaction to NULL, we could possibly dereference it. Proper locking requires the journal pointer (to access journal->j_list_lock), which we don't have. So we have to change the prototype of the function so that filesystem passes us the journal pointer. Also add a more detailed comment about why the function jbd2_journal_begin_ordered_truncate() does what it does and how it should be used. Thanks to Dan Carpenter <error27@gmail.com> for pointing to the suspitious code. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Joel Becker <joel.becker@oracle.com> CC: linux-ext4@vger.kernel.org CC: ocfs2-devel@oss.oracle.com CC: mfasheh@suse.de CC: Dan Carpenter <error27@gmail.com>
show more ...
|
Revision tags: v2.6.28-rc4, v2.6.28-rc3, v2.6.28-rc2, v2.6.28-rc1 |
|
#
13723d00 |
| 17-Oct-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2 commit triggers and allow us to compute metadata ec
ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2 commit triggers and allow us to compute metadata ecc right before the buffers are written out. This commit provides ecc for inodes, extent blocks, group descriptors, and quota blocks. It is not safe to use extened attributes and metaecc at the same time yet. The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide the type of block at their root. Before, it didn't matter, but now the root block must use the appropriate ocfs2_journal_access_*() function. To keep this abstract, the structures now have a pointer to the matching journal_access function and a wrapper call to call it. A few places use naked ocfs2_write_block() calls instead of adding the blocks to the journal. We make sure to calculate their checksum and ecc before the write. Since we pass around the journal_access functions. Let's typedef them in ocfs2.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
Revision tags: v2.6.27, v2.6.27-rc9, v2.6.27-rc8, v2.6.27-rc7 |
|
#
50655ae9 |
| 11-Sep-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Add journal_access functions with jbd2 triggers. We create wrappers for ocfs2_journal_access() that are specific to the type of metadata block. This allows us to associate jbd2 c
ocfs2: Add journal_access functions with jbd2 triggers. We create wrappers for ocfs2_journal_access() that are specific to the type of metadata block. This allows us to associate jbd2 commit triggers with the block. The triggers will compute metadata ecc in a future commit. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
#
2205363d |
| 20-Oct-2008 |
Jan Kara <jack@suse.cz> |
ocfs2: Implement quota recovery Implement functions for recovery after a crash. Functions just read local quota file and sync info to global quota file. Signed-off-by: Jan Kara
ocfs2: Implement quota recovery Implement functions for recovery after a crash. Functions just read local quota file and sync info to global quota file. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
#
a90714c1 |
| 09-Oct-2008 |
Jan Kara <jack@suse.cz> |
ocfs2: Add quota calls for allocation and freeing of inodes and space Add quota calls for allocation and freeing of inodes and space, also update estimates on number of needed credits fo
ocfs2: Add quota calls for allocation and freeing of inodes and space Add quota calls for allocation and freeing of inodes and space, also update estimates on number of needed credits for a transaction. Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
#
53ef99ca |
| 18-Nov-2008 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Remove JBD compatibility layer JBD2 is fully backwards compatible with JBD and it's been tested enough with Ocfs2 that we can clean this code up now. Signed-off-by: Mark
ocfs2: Remove JBD compatibility layer JBD2 is fully backwards compatible with JBD and it's been tested enough with Ocfs2 that we can clean this code up now. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
Revision tags: v2.6.27-rc6 |
|
#
2b4e30fb |
| 03-Sep-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Switch over to JBD2. ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most fu
ocfs2: Switch over to JBD2. ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
Revision tags: v2.6.27-rc5, v2.6.27-rc4 |
|
#
cf1d6c76 |
| 18-Aug-2008 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: Add extended attribute support This patch implements storing extended attributes both in inode or a single external block. We only store EA's in-inode when blocksize > 512 or that
ocfs2: Add extended attribute support This patch implements storing extended attributes both in inode or a single external block. We only store EA's in-inode when blocksize > 512 or that inode block has free space for it. When an EA's value is larger than 80 bytes, we will store the value via b-tree outside inode or block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
#
811f933d |
| 18-Aug-2008 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode. ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and ocfs2_reserve_new_metadata() are all useful for extent tree operations.
ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode. ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and ocfs2_reserve_new_metadata() are all useful for extent tree operations. But they are all limited to an inode btree because they use a struct ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list (the part of an ocfs2_dinode they actually use) so that the xattr btree code can use these functions. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
Revision tags: v2.6.27-rc3, v2.6.27-rc2, v2.6.27-rc1 |
|
#
539d8264 |
| 14-Jul-2008 |
Sunil Mushran <sunil.mushran@oracle.com> |
[PATCH 2/2] ocfs2: Fix race between mount and recovery As the fs recovery is asynchronous, there is a small chance that another node can mount (and thus recover) the slot before the reco
[PATCH 2/2] ocfs2: Fix race between mount and recovery As the fs recovery is asynchronous, there is a small chance that another node can mount (and thus recover) the slot before the recovery thread gets to it. If this happens, the recovery thread will block indefinitely on the journal/slot lock as that lock will be held for the duration of the mount (by design) by the node assigned to that slot. The solution implemented is to keep track of the journal replays using a recovery generation in the journal inode, which will be incremented by the thread replaying that journal. The recovery thread, before attempting the blocking lock on the journal/slot lock, will compare the generation on disk with what it has cached and skip recovery if it does not match. This bug appears to have been inadvertently introduced during the mount/umount vote removal by mainline commit 34d024f84345807bf44163fac84e921513dde323. In the mount voting scheme, the messaging would indirectly indicate that the slot was being recovered. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
Revision tags: v2.6.26, v2.6.26-rc9, v2.6.26-rc8, v2.6.26-rc7, v2.6.26-rc6, v2.6.26-rc5, v2.6.26-rc4, v2.6.26-rc3, v2.6.26-rc2, v2.6.26-rc1, v2.6.25, v2.6.25-rc9, v2.6.25-rc8, v2.6.25-rc7, v2.6.25-rc6, v2.6.25-rc5, v2.6.25-rc4, v2.6.25-rc3, v2.6.25-rc2, v2.6.25-rc1 |
|
#
553abd04 |
| 01-Feb-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Change the recovery map to an array of node numbers. The old recovery map was a bitmap of node numbers. This was sufficient for the maximum node number of 254. Going forward, we
ocfs2: Change the recovery map to an array of node numbers. The old recovery map was a bitmap of node numbers. This was sufficient for the maximum node number of 254. Going forward, we want node numbers to be UINT32. Thus, we need a new recovery map. Note that we can't keep track of slots here. We must write down the node number to recovery *before* we get the locks needed to convert a node number into a slot number. The recovery map is now an array of unsigned ints, max_slots in size. It moves to journal.c with the rest of recovery. Because it needs to be initialized, we move all of recovery initialization into a new function, ocfs2_recovery_init(). This actually cleans up ocfs2_initialize_super() a little as well. Following on, recovery cleaup becomes part of ocfs2_recovery_exit(). A number of node map functions are rendered obsolete and are removed. Finally, waiting on recovery is wrapped in a function rather than naked checks on the recovery_event. This is a cleanup from Mark. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
show more ...
|
Revision tags: v2.6.24, v2.6.24-rc8, v2.6.24-rc7, v2.6.24-rc6 |
|
#
7909f2bf |
| 18-Dec-2007 |
Tao Ma <tao.ma@oracle.com> |
[PATCH 2/2] ocfs2: Implement group add for online resize This patch adds the ability for a userspace program to request that a properly formatted cluster group be added to the main alloc
[PATCH 2/2] ocfs2: Implement group add for online resize This patch adds the ability for a userspace program to request that a properly formatted cluster group be added to the main allocation bitmap for an Ocfs2 file system. The request is made via an ioctl, OCFS2_IOC_GROUP_ADD. On a high level, this is similar to ext3, but we use a different ioctl as the structure which has to be passed through is different. During an online resize, tunefs.ocfs2 will format any new cluster groups which must be added to complete the resize, and call OCFS2_IOC_GROUP_ADD on each one. Kernel verifies that the core cluster group information is valid and then does the work of linking it into the global allocation bitmap. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
show more ...
|
#
d659072f |
| 18-Dec-2007 |
Tao Ma <tao.ma@oracle.com> |
[PATCH 1/2] ocfs2: Add group extend for online resize This patch adds the ability for a userspace program to request an extend of last cluster group on an Ocfs2 file system. The request
[PATCH 1/2] ocfs2: Add group extend for online resize This patch adds the ability for a userspace program to request an extend of last cluster group on an Ocfs2 file system. The request is made via ioctl, OCFS2_IOC_GROUP_EXTEND. This is derived from EXT3_IOC_GROUP_EXTEND, but is obviously Ocfs2 specific. tunefs.ocfs2 would call this for an online-resize operation if the last cluster group isn't full. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
show more ...
|
Revision tags: v2.6.24-rc5, v2.6.24-rc4, v2.6.24-rc3, v2.6.24-rc2, v2.6.24-rc1, v2.6.23, v2.6.23-rc9, v2.6.23-rc8, v2.6.23-rc7, v2.6.23-rc6 |
|
#
1afc32b9 |
| 07-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Write support for inline data This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline inode data. For the most part, the changes to the core write co
ocfs2: Write support for inline data This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline inode data. For the most part, the changes to the core write code can be relied on to do the heavy lifting. Any code calling ocfs2_write_begin (including shared writeable mmap) can count on it doing the right thing with respect to growing inline data to an extent tree. Size reducing truncates, including UNRESVP can simply zero that portion of the inode block being removed. Size increasing truncatesm, including RESVP have to be a little bit smarter and grow the inode to an extent tree if necessary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
show more ...
|
Revision tags: v2.6.23-rc5, v2.6.23-rc4, v2.6.23-rc3, v2.6.23-rc2, v2.6.23-rc1, v2.6.22 |
|
#
063c4561 |
| 03-Jul-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: support for removing file regions Provide an internal interface for the removal of arbitrary file regions. ocfs2_remove_inode_range() takes a byte range within a file and wil
ocfs2: support for removing file regions Provide an internal interface for the removal of arbitrary file regions. ocfs2_remove_inode_range() takes a byte range within a file and will remove existing extents within that range. Partial clusters will be zeroed so that any read from within the region will return zeros. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
show more ...
|
Revision tags: v2.6.22-rc7, v2.6.22-rc6, v2.6.22-rc5, v2.6.22-rc4, v2.6.22-rc3, v2.6.22-rc2, v2.6.22-rc1, v2.6.21, v2.6.21-rc7, v2.6.21-rc6, v2.6.21-rc5, v2.6.21-rc4 |
|
#
e48edee2 |
| 07-Mar-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: make room for unwritten extents flag Due to the size of our group bitmaps, we'll never have a leaf node extent record with more than 16 bits worth of clusters. Split e_clusters up
ocfs2: make room for unwritten extents flag Due to the size of our group bitmaps, we'll never have a leaf node extent record with more than 16 bits worth of clusters. Split e_clusters up so that leaf nodes can get a flags field where we can mark unwritten extents. Interior nodes whose length references all the child nodes beneath it can't split their e_clusters field, so we use a union to preserve sizing there. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
show more ...
|
Revision tags: v2.6.21-rc3, v2.6.21-rc2, v2.6.21-rc1, v2.6.20 |
|
#
e051fda4 |
| 01-Feb-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: ocfs2_link() journal credits update Commit 592282cf2eaa33409c6511ddd3f3ecaa57daeaaa fixed some missing directory c/mtime updates in part by introducing a dinode update in ocfs2_ad
ocfs2: ocfs2_link() journal credits update Commit 592282cf2eaa33409c6511ddd3f3ecaa57daeaaa fixed some missing directory c/mtime updates in part by introducing a dinode update in ocfs2_add_entry(). Unfortunately, ocfs2_link() (which didn't update the directory inode before) is now missing a single journal credit. Fix this by doubling the number of inode updates expected during hard link creation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
show more ...
|