History log of /openbmc/linux/fs/gfs2/lops.c (Results 176 – 200 of 316)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# d14e1ca3 27-Feb-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Warn when a journal replay overwrites a rgrp with buffers

This patch adds some instrumentation in gfs2's journal replay that
indicates when we're about to overwrite a rgrp for whic

gfs2: Warn when a journal replay overwrites a rgrp with buffers

This patch adds some instrumentation in gfs2's journal replay that
indicates when we're about to overwrite a rgrp for which we already
have a valid buffer_head.

When this problem occurs, it's a situation in which this node has
been granted a rgrp glock and subsequently read in buffer_heads for
it, and possibly even made changes to the rgrp bits and/or
allocation values. But now another node has failed and forced us to
replay its journal, but its journal contains a copy of the same
rgrp, without a revoke, which means we're about to overwrite a
rgrp that we now rightfully own, with an obsolete copy. That is
always a problem. It means the other node (which failed and left
its journal to be replayed) failed to flush out its rgrp buffers,
write out the revoke, and invalidate its copy before it released
the glock to our possession.

No node should ever release a glock until its metadata has been
written to the journal and revoked and invalidated..

We also kludge around the problem and refuse to replace our good
copy with the journals bad copy by not marking the buffer dirty,
but never do it silently. That's wallpapering over a larger problem
that still exists. IOW, if this situation can happen to this node,
it can also happen to a different node and we wouldn't even know it
or be able to circumvent it: Suppose we have a 3-node cluster:
Node 1 fails, leaving an obsolete rgrp block in its journal without
a revoke. Node 2 grabs the rgrp as soon as the rgrp glock is
released and starts making changes, allocating and freeing blocks
from the rgrp, etc. Node 3 replays the journal from node 1,
oblivious and unaware that it's about to overwrite node 2's changes.
So we still need to be vocal and log the error to make it apparent
that a corruption path still exists in gfs2.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

show more ...


# 9331b674 08-Jun-2019 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull yet more SPDX updates from Greg KH:
"Another round of SPDX header file fixes for 5.2-rc

Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull yet more SPDX updates from Greg KH:
"Another round of SPDX header file fixes for 5.2-rc4

These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
added, based on the text in the files. We are slowly chipping away at
the 700+ different ways people tried to write the license text. All of
these were reviewed on the spdx mailing list by a number of different
people.

We now have over 60% of the kernel files covered with SPDX tags:
$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
Files checked: 64533
Files with SPDX: 40392
Files with errors: 0

I think the majority of the "easy" fixups are now done, it's now the
start of the longer-tail of crazy variants to wade through"

* tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
...

show more ...


# 638803d4 06-Jun-2019 Bob Peterson <rpeterso@redhat.com>

Revert "gfs2: Replace gl_revokes with a GLF flag"

Commit 73118ca8baf7 introduced a glock reference counting bug in
gfs2_trans_remove_revoke. Given that, replacing gl_revokes with a GLF

Revert "gfs2: Replace gl_revokes with a GLF flag"

Commit 73118ca8baf7 introduced a glock reference counting bug in
gfs2_trans_remove_revoke. Given that, replacing gl_revokes with a GLF flag is
no longer useful, so revert that commit.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

show more ...


# 7336d0e6 31-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398

Based on 1 normalized pattern(s):

this copyrighted material is made available to anyone wishing to use
mod

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398

Based on 1 normalized pattern(s):

this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license version 2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 44 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# ef75bd71 08-May-2019 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'gfs2-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Andreas Gruenbacher:
"We've got the following patches ready for this mer

Merge tag 'gfs2-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Andreas Gruenbacher:
"We've got the following patches ready for this merge window:

- "gfs2: Fix loop in gfs2_rbm_find (v2)"

A rework of a fix we ended up reverting in 5.0 because of an
iozone performance regression.

- "gfs2: read journal in large chunks"
"gfs2: fix race between gfs2_freeze_func and unmount"

An improved version of a commit we also ended up reverting in 5.0
because of a regression in xfstest generic/311. It turns out that
the journal changes were mostly innocent and that unfreeze didn't
wait for the freeze to complete, which caused the filesystem to be
unmounted before it was actually idle.

- "gfs2: Fix occasional glock use-after-free"
"gfs2: Fix iomap write page reclaim deadlock"
"gfs2: Fix lru_count going negative"

Fixes for various problems reported and partially fixed by Citrix
engineers. Thank you very much.

- "gfs2: clean_journal improperly set sd_log_flush_head"

Another fix from Bob.

- .. and a few other minor cleanups"

* tag 'gfs2-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: read journal in large chunks
gfs2: Fix iomap write page reclaim deadlock
gfs2: fix race between gfs2_freeze_func and unmount
gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke}
gfs2: Rename sd_log_le_{revoke,ordered}
gfs2: Remove unnecessary extern declarations
gfs2: Remove misleading comments in gfs2_evict_inode
gfs2: Replace gl_revokes with a GLF flag
gfs2: Fix occasional glock use-after-free
gfs2: clean_journal improperly set sd_log_flush_head
gfs2: Fix lru_count going negative
gfs2: Fix loop in gfs2_rbm_find (v2)

show more ...


# f4686c26 02-May-2019 Abhi Das <adas@redhat.com>

gfs2: read journal in large chunks

Use bios to read in the journal into the address space of the journal inode
(jd_inode), sequentially and in large chunks. This is faster for locating

gfs2: read journal in large chunks

Use bios to read in the journal into the address space of the journal inode
(jd_inode), sequentially and in large chunks. This is faster for locating the
journal head that the previous binary search approach. When performing
recovery, we keep the journal in the address space until recovery is done,
which further speeds up things.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

show more ...


# a5b1d3fc 05-Apr-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename sd_log_le_{revoke,ordered}

Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to
sd_log_ordered: not sure what le stands for here, but it doesn't add
clarit

gfs2: Rename sd_log_le_{revoke,ordered}

Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to
sd_log_ordered: not sure what le stands for here, but it doesn't add
clarity, and if it stands for list entry, it's actually confusing as
those are both list heads but not list entries.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

show more ...


# 32ac43f6 04-Apr-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove unnecessary extern declarations

Make log operations statuc; they are only used locally.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 73118ca8 04-Apr-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Replace gl_revokes with a GLF flag

The gl_revokes value determines how many outstanding revokes a glock has
on the superblock revokes list; this is used to avoid unnecessary log

gfs2: Replace gl_revokes with a GLF flag

The gl_revokes value determines how many outstanding revokes a glock has
on the superblock revokes list; this is used to avoid unnecessary log
flushes. However, gl_revokes is only ever tested for being zero, and it's
only decremented in revoke_lo_after_commit, which removes all revokes
from the list, so we know that the gl_revoke values of all the glocks on
the list will reach zero. Therefore, we can replace gl_revokes with a
bit flag. This saves an atomic counter in struct gfs2_glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

show more ...


# 9287c645 04-Apr-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix occasional glock use-after-free

This patch has to do with the life cycle of glocks and buffers. When
gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata

gfs2: Fix occasional glock use-after-free

This patch has to do with the life cycle of glocks and buffers. When
gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
object is assigned to track the buffer, and that is queued to various
lists, including the glock's gl_ail_list to indicate it's on the active
items list. Once the page associated with the buffer has been written,
it is removed from the ail list, but its life isn't over until a revoke
has been successfully written.

So after the block is written, its bufdata object is moved from the
glock's gl_ail_list to a file-system-wide list of pending revokes,
sd_log_le_revoke. At that point the glock still needs to track how many
revokes it contributed to that list (in gl_revokes) so that things like
glock go_sync can ensure all the metadata has been not only written, but
also revoked before the glock is granted to a different node. This is
to guarantee journal replay doesn't replay the block once the glock has
been granted to another node.

Ross Lagerwall recently discovered a race in which an inode could be
evicted, and its glock freed after its ail list had been synced, but
while it still had unwritten revokes on the sd_log_le_revoke list. The
evict decremented the glock reference count to zero, which allowed the
glock to be freed. After the revoke was written, function
revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
and clear its GLF_LFLUSH flag, at which time it referenced the freed
glock.

This patch fixes the problem by incrementing the glock reference count
in gfs2_add_revoke when the glock's first bufdata object is moved from
the glock to the global revokes list. Later, when the glock's last such
bufdata object is freed, the reference count is decremented. This
guarantees that whichever process finishes last (the revoke writing or
the evict) will properly free the glock, and neither will reference the
glock after it has been freed.

Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>

show more ...


# 7c70b896 25-Mar-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: clean_journal improperly set sd_log_flush_head

This patch fixes regressions in 588bff95c94efc05f9e1a0b19015c9408ed7c0ef.
Due to that patch, function clean_journal was setting the v

gfs2: clean_journal improperly set sd_log_flush_head

This patch fixes regressions in 588bff95c94efc05f9e1a0b19015c9408ed7c0ef.
Due to that patch, function clean_journal was setting the value of
sd_log_flush_head, but that's only valid if it is replaying the node's
own journal. If it's replaying another node's journal, that's completely
wrong and will lead to multiple problems. This patch tries to clean up
the mess by passing the value of the logical journal block number into
gfs2_write_log_header so the function can treat non-owned journals
generically. For the local journal, the journal extent map is used for
best performance. For other nodes from other journals, new function
gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get.

This patch also tries to establish more consistency when passing journal
block parameters by changing several unsigned int types to a consistent
u32.

Fixes: 588bff95c94e ("GFS2: Reduce code redundancy writing log headers")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

show more ...


# 2b070cfe 25-Apr-2019 Christoph Hellwig <hch@lst.de>

block: remove the i argument to bio_for_each_segment_all

We only have two callers that need the integer loop iterator, and they
can easily maintain it themselves.

Suggested-by:

block: remove the i argument to bio_for_each_segment_all

We only have two callers that need the integer loop iterator, and they
can easily maintain it themselves.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 80201fe1 08-Mar-2019 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
"Not a huge amount of changes in this round, the biggest one is that we

Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
"Not a huge amount of changes in this round, the biggest one is that we
finally have Mings multi-page bvec support merged. Apart from that,
this pull request contains:

- Small series that avoids quiescing the queue for sysfs changes that
match what we currently have (Aleksei)

- Series of bcache fixes (via Coly)

- Series of lightnvm fixes (via Mathias)

- NVMe pull request from Christoph. Nothing major, just SPDX/license
cleanups, RR mp policy (Hannes), and little fixes (Bart,
Chaitanya).

- BFQ series (Paolo)

- Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
for the fast path (Jianchao)

- fops->iopoll() added for async IO polling, this is a feature that
the upcoming io_uring interface will use (Christoph, me)

- Partition scan loop fixes (Dongli)

- mtip32xx conversion from managed resource API (Christoph)

- cdrom registration race fix (Guenter)

- MD pull from Song, two minor fixes.

- Various documentation fixes (Marcos)

- Multi-page bvec feature. This brings a lot of nice improvements
with it, like more efficient splitting, larger IOs can be supported
without growing the bvec table size, and so on. (Ming)

- Various little fixes to core and drivers"

* tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
block: fix updating bio's front segment size
block: Replace function name in string with __func__
nbd: propagate genlmsg_reply return code
floppy: remove set but not used variable 'q'
null_blk: fix checking for REQ_FUA
block: fix NULL pointer dereference in register_disk
fs: fix guard_bio_eod to check for real EOD errors
blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
block: optimize bvec iteration in bvec_iter_advance
block: introduce mp_bvec_for_each_page() for iterating over page
block: optimize blk_bio_segment_split for single-page bvec
block: optimize __blk_segment_map_sg() for single-page bvec
block: introduce bvec_nth_page()
iomap: wire up the iopoll method
block: add bio_set_polled() helper
block: wire up block device iopoll method
fs: add an iopoll method to struct file_operations
loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
loop: do not print warn message if partition scan is successful
block: bounce: make sure that bvec table is updated
...

show more ...


# 6dc4f100 15-Feb-2019 Ming Lei <ming.lei@redhat.com>

block: allow bio_for_each_segment_all() to iterate over multi-page bvec

This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_se

block: allow bio_for_each_segment_all() to iterate over multi-page bvec

This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 23e93c9b 13-Feb-2019 Bob Peterson <rpeterso@redhat.com>

Revert "gfs2: read journal in large chunks to locate the head"

This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241.

This patch causes xfstests generic/311 to fail. Revertin

Revert "gfs2: read journal in large chunks to locate the head"

This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241.

This patch causes xfstests generic/311 to fail. Reverting this for
now until we have a proper fix.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 2a5f14f2 09-Nov-2018 Abhi Das <adas@redhat.com>

gfs2: read journal in large chunks to locate the head

Use bio(s) to read in the journal sequentially in large chunks and
locate the head of the journal.

This version addresses t

gfs2: read journal in large chunks to locate the head

Use bio(s) to read in the journal sequentially in large chunks and
locate the head of the journal.

This version addresses the issues Christoph pointed out w.r.t error handling
and using deprecated API.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>

show more ...


# 5b846095 09-Nov-2018 Abhi Das <adas@redhat.com>

gfs2: changes to gfs2_log_XXX_bio

Change gfs2_log_XXX_bio family of functions so they can be used
with different bios, not just sdp->sd_log_bio.

This patch also contains some cl

gfs2: changes to gfs2_log_XXX_bio

Change gfs2_log_XXX_bio family of functions so they can be used
with different bios, not just sdp->sd_log_bio.

This patch also contains some clean up suggested by Andreas.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>

show more ...


Revision tags: v4.18.11
# 281b4952 26-Sep-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename bitmap.bi_{len => bytes}

This field indicates the size of the bitmap in bytes, similar to how the
bi_blocks field indicates the size of the bitmap in blocks.

In cou

gfs2: Rename bitmap.bi_{len => bytes}

This field indicates the size of the bitmap in bytes, similar to how the
bi_blocks field indicates the size of the bitmap in blocks.

In count_unlinked, replace an instance of bi_bytes * GFS2_NBBY by
bi_blocks.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>

show more ...


Revision tags: v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3, v4.17.2, v4.17.1
# 9e1a9ecd 07-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Don't withdraw under a spin lock

In two places, the gfs2_io_error_bh macro is called while holding the
sd_ail_lock spin lock. This isn't allowed because gfs2_io_error_bh
withd

gfs2: Don't withdraw under a spin lock

In two places, the gfs2_io_error_bh macro is called while holding the
sd_ail_lock spin lock. This isn't allowed because gfs2_io_error_bh
withdraws the filesystem, which can sleep because it issues a uevent.
To fix that, add a gfs2_io_error_bh_wd macro that does withdraw the
filesystem and change gfs2_io_error_bh to not withdraw the filesystem.
In those places where the new gfs2_io_error_bh is used, withdraw the
filesystem after releasing sd_ail_lock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>

show more ...


Revision tags: v4.17, v4.16, v4.15
# 4519eaad 25-Jan-2018 Bob Peterson <rpeterso@redhat.com>

GFS2: Fix minor comment typo

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# c1696fb8 16-Jan-2018 Bob Peterson <rpeterso@redhat.com>

GFS2: Introduce new gfs2_log_header_v2

This patch adds a new structure called gfs2_log_header_v2 which is used
to store expanded fields into previously unused areas of the log headers

GFS2: Introduce new gfs2_log_header_v2

This patch adds a new structure called gfs2_log_header_v2 which is used
to store expanded fields into previously unused areas of the log headers
(i.e., this change is backwards compatible). Some of these are used for
debug purposes so we can backtrack when problems occur. Others are
reserved for future expansion.

This patch is based on a prototype from Steve Whitehouse.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

show more ...


# a0725ab0 07-Sep-2017 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
ch

Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:

- Fix for a registration race in loop, from Anton Volkov.

- Overflow complaint fix from Arnd for DAC960.

- Series of drbd changes from the usual suspects.

- Conversion of the stec/skd driver to blk-mq. From Bart.

- A few BFQ improvements/fixes from Paolo.

- CFQ improvement from Ritesh, allowing idling for group idle.

- A few fixes found by Dan's smatch, courtesy of Dan.

- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.

- A few nbd fixes from Josef.

- Support for cgroup info in blktrace, from Shaohua.

- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.

- Various corner cases and error handling fixes from Weiping Zhang.

- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.

- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.

- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...

show more ...


Revision tags: v4.13.16, v4.14, v4.13.5, v4.13
# 942b0cdd 16-Aug-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Withdraw for IO errors writing to the journal or statfs

Before this patch, if GFS2 encountered IO errors while writing to
the journal, it would not report the problem, so they woul

GFS2: Withdraw for IO errors writing to the journal or statfs

Before this patch, if GFS2 encountered IO errors while writing to
the journal, it would not report the problem, so they would go
unnoticed, sometimes for many hours. Sometimes this would only be
noticed later, when recovery tried to do journal replay and failed
due to invalid metadata at the blocks that resulted in IO errors.

This patch makes GFS2's log daemon check for IO errors. If it
encounters one, it withdraws from the file system and reports
why in dmesg. A similar action is taken when IO errors occur when
writing to the system statfs file.

These errors are also reported back to any callers of fsync, since
that requires the journal to be flushed. Therefore, any IO errors
that would previously go unnoticed are now noticed and the file
system is withdrawn as early as possible, thus preventing further
file system damage.

Also note that this reintroduces superblock variable sd_log_error,
which Christoph removed with commit f729b66fca.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>

show more ...


# 74d46992 23-Aug-2017 Christoph Hellwig <hch@lst.de>

block: replace bi_bdev with a gendisk pointer and partitions index

This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the

block: replace bi_bdev with a gendisk pointer and partitions index

This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 088737f4 07-Jul-2017 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull Writeback error handling updates from Jeff Layton:
"This pile represents the bulk of th

Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull Writeback error handling updates from Jeff Layton:
"This pile represents the bulk of the writeback error handling fixes
that I have for this cycle. Some of the earlier patches in this pile
may look trivial but they are prerequisites for later patches in the
series.

The aim of this set is to improve how we track and report writeback
errors to userland. Most applications that care about data integrity
will periodically call fsync/fdatasync/msync to ensure that their
writes have made it to the backing store.

For a very long time, we have tracked writeback errors using two flags
in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
writeback error occurs (via mapping_set_error) and are cleared as a
side-effect of filemap_check_errors (as you noted yesterday). This
model really sucks for userland.

Only the first task to call fsync (or msync or fdatasync) will see the
error. Any subsequent task calling fsync on a file will get back 0
(unless another writeback error occurs in the interim). If I have
several tasks writing to a file and calling fsync to ensure that their
writes got stored, then I need to have them coordinate with one
another. That's difficult enough, but in a world of containerized
setups that coordination may even not be possible.

But wait...it gets worse!

The calls to filemap_check_errors can be buried pretty far down in the
call stack, and there are internal callers of filemap_write_and_wait
and the like that also end up clearing those errors. Many of those
callers ignore the error return from that function or return it to
userland at nonsensical times (e.g. truncate() or stat()). If I get
back -EIO on a truncate, there is no reason to think that it was
because some previous writeback failed, and a subsequent fsync() will
(incorrectly) return 0.

This pile aims to do three things:

1) ensure that when a writeback error occurs that that error will be
reported to userland on a subsequent fsync/fdatasync/msync call,
regardless of what internal callers are doing

2) report writeback errors on all file descriptions that were open at
the time that the error occurred. This is a user-visible change,
but I think most applications are written to assume this behavior
anyway. Those that aren't are unlikely to be hurt by it.

3) document what filesystems should do when there is a writeback
error. Today, there is very little consistency between them, and a
lot of cargo-cult copying. We need to make it very clear what
filesystems should do in this situation.

To achieve this, the set adds a new data type (errseq_t) and then
builds new writeback error tracking infrastructure around that. Once
all of that is in place, we change the filesystems to use the new
infrastructure for reporting wb errors to userland.

Note that this is just the initial foray into cleaning up this mess.
There is a lot of work remaining here:

1) convert the rest of the filesystems in a similar fashion. Once the
initial set is in, then I think most other fs' will be fairly
simple to convert. Hopefully most of those can in via individual
filesystem trees.

2) convert internal waiters on writeback to use errseq_t for
detecting errors instead of relying on the AS_* flags. I have some
draft patches for this for ext4, but they are not quite ready for
prime time yet.

This was a discussion topic this year at LSF/MM too. If you're
interested in the gory details, LWN has some good articles about this:

https://lwn.net/Articles/718734/
https://lwn.net/Articles/724307/"

* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
btrfs: minimal conversion to errseq_t writeback error reporting on fsync
xfs: minimal conversion to errseq_t writeback error reporting
ext4: use errseq_t based error handling for reporting data writeback errors
fs: convert __generic_file_fsync to use errseq_t based reporting
block: convert to errseq_t based writeback error tracking
dax: set errors in mapping when writeback fails
Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
fs: new infrastructure for writeback error handling and reporting
lib: add errseq_t type and infrastructure for handling it
mm: don't TestClearPageError in __filemap_fdatawait_range
mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
jbd2: don't clear and reset errors after waiting on writeback
buffer: set errors in mapping at the time that the error occurs
fs: check for writeback errors after syncing out buffers in generic_file_fsync
buffer: use mapping_set_error instead of setting the flag
mm: fix mapping_set_error call in me_pagecache_dirty

show more ...


12345678910>>...13