History log of /openbmc/linux/fs/btrfs/space-info.c (Results 51 – 75 of 156)
Revision Date Author Comments
# 9cd8dcdc 09-Nov-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: check for priority ticket granting before flushing

Since we're dropping locks before we enter the priority flushing loops
we could have had our ticket granted before we got the space_info->lo

btrfs: check for priority ticket granting before flushing

Since we're dropping locks before we enter the priority flushing loops
we could have had our ticket granted before we got the space_info->lock.
So add this check to avoid doing some extra flushing in the priority
flushing cases.

The case in priority_reclaim_metadata_space is an optimization. Think
we came in to reserve, we didn't have the space, we added our ticket to
the list. But at the same time somebody was waiting on the space_info
lock to add space and do btrfs_try_granting_ticket(), so we drop the
lock, get satisfied, come in to do our loop, and we have been
satisfied.

This is the priority reclaim path, so to_reclaim could be !0 still
because we may have only satisfied the priority tickets and still left
non priority tickets on the list. We would then have to_reclaim but
->bytes == 0.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add note about the optimization ]
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 9f35f76d 09-Nov-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: handle priority ticket failures in their respective helpers

Currently the error case for the priority tickets is handled where we
deal with all of the tickets, priority and non-priority. Thi

btrfs: handle priority ticket failures in their respective helpers

Currently the error case for the priority tickets is handled where we
deal with all of the tickets, priority and non-priority. This is OK in
general, but it makes for some awkward locking. We take and drop the
space_info->lock back to back because of these different types of
tickets.

Rework the code to handle priority ticket failures in their respective
helpers. This allows us to be less wonky with our space_info->lock
usage, and means that the main handler simply has to check
ticket->error, as the ticket is guaranteed to be off any list and
completely handled by the time it exits one of the handlers.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 0e24f6d8 05-Oct-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: do not infinite loop in data reclaim if we aborted

Error injection stressing uncovered a busy loop in our data reclaim
loop. There are two cases here, one where we loop creating block groups

btrfs: do not infinite loop in data reclaim if we aborted

Error injection stressing uncovered a busy loop in our data reclaim
loop. There are two cases here, one where we loop creating block groups
until space_info->full is set, or in the main loop we will skip erroring
out any tickets if space_info->full == 0. Unfortunately if we aborted
the transaction then we will never allocate chunks or reclaim any space
and thus never get ->full, and you'll see stack traces like this:

watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/u4:4:139]
CPU: 0 PID: 139 Comm: kworker/u4:4 Tainted: G W 5.13.0-rc1+ #328
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: events_unbound btrfs_async_reclaim_data_space
RIP: 0010:btrfs_join_transaction+0x12/0x20
RSP: 0018:ffffb2b780b77de0 EFLAGS: 00000246
RAX: ffffb2b781863d58 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000801 RSI: ffff987952b57400 RDI: ffff987940aa3000
RBP: ffff987954d55000 R08: 0000000000000001 R09: ffff98795539e8f0
R10: 000000000000000f R11: 000000000000000f R12: ffffffffffffffff
R13: ffff987952b574c8 R14: ffff987952b57400 R15: 0000000000000008
FS: 0000000000000000(0000) GS:ffff9879bbc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0703da4000 CR3: 0000000113398004 CR4: 0000000000370ef0
Call Trace:
flush_space+0x4a8/0x660
btrfs_async_reclaim_data_space+0x55/0x130
process_one_work+0x1e9/0x380
worker_thread+0x53/0x3e0
? process_one_work+0x380/0x380
kthread+0x118/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x1f/0x30

Fix this by checking to see if we have a btrfs fs error in either of the
reclaim loops, and if so fail the tickets and bail. In addition to
this, fix maybe_fail_all_tickets() to not try to grant tickets if we've
aborted, simply fail everything.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 0619b790 16-Sep-2021 Qu Wenruo <wqu@suse.com>

btrfs: prevent __btrfs_dump_space_info() to underflow its free space

It's not uncommon where __btrfs_dump_space_info() gets called
under over-commit situations.

In that case free space would underf

btrfs: prevent __btrfs_dump_space_info() to underflow its free space

It's not uncommon where __btrfs_dump_space_info() gets called
under over-commit situations.

In that case free space would underflow as total allocated space is not
enough to handle all the over-committed space.

Such underflow values can sometimes cause confusion for users enabled
enospc_debug mount option, and takes some seconds for developers to
convert the underflow value to signed result.

Just output the free space as s64 to avoid such problem.

Reported-by: Eli V <eliventer@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAJtFHUSy4zgyhf-4d9T+KdJp9w=UgzC2A0V=VtmaeEpcGgm1-Q@mail.gmail.com/
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 11462397 11-Aug-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: do not do preemptive flushing if the majority is global rsv

A common characteristic of the bug report where preemptive flushing was
going full tilt was the fact that the vast majority of the

btrfs: do not do preemptive flushing if the majority is global rsv

A common characteristic of the bug report where preemptive flushing was
going full tilt was the fact that the vast majority of the free metadata
space was used up by the global reserve. The hard 90% threshold would
cover the majority of these cases, but to be even smarter we should take
into account how much of the outstanding reservations are covered by the
global block reserve. If the global block reserve accounts for the vast
majority of outstanding reservations, skip preemptive flushing, as it
will likely just cause churn and pain.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212185
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 93c60b17 11-Aug-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: reduce the preemptive flushing threshold to 90%

The preemptive flushing code was added in order to avoid needing to
synchronously wait for ENOSPC flushing to recover space. Once we're
almost

btrfs: reduce the preemptive flushing threshold to 90%

The preemptive flushing code was added in order to avoid needing to
synchronously wait for ENOSPC flushing to recover space. Once we're
almost full however we can essentially flush constantly. We were using
98% as a threshold to determine if we were simply full, however in
practice this is a really high bar to hit. For example reports of
systems running into this problem had around 94% usage and thus
continued to flush. Fix this by lowering the threshold to 90%, which is
a more sane value, especially for smaller file systems.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212185
CC: stable@vger.kernel.org # 5.12+
Fixes: 576fa34830af ("btrfs: improve preemptive background space flushing")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# e1646070 14-Jul-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: wait on async extents when flushing delalloc

I've been debugging an early ENOSPC problem in production and finally
root caused it to this problem. When we switched to the per-inode in
38d715

btrfs: wait on async extents when flushing delalloc

I've been debugging an early ENOSPC problem in production and finally
root caused it to this problem. When we switched to the per-inode in
38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in
shrink_delalloc") I pulled out the async extent handling, because we
were doing the correct thing by calling filemap_flush() if we had async
extents set. This would properly wait on any async extents by locking
the page in the second flush, thus making sure our ordered extents were
properly set up.

However when I switched us back to page based flushing, I used
sync_inode(), which allows us to pass in our own wbc. The problem here
is that sync_inode() is smarter than the filemap_* helpers, it tries to
avoid calling writepages at all. This means that our second call could
skip calling do_writepages altogether, and thus not wait on the pagelock
for the async helpers. This means we could come back before any ordered
extents were created and then simply continue on in our flushing
mechanisms and ENOSPC out when we have plenty of space to use.

Fix this by putting back the async pages logic in shrink_delalloc. This
allows us to bulk write out everything that we need to, and then we can
wait in one place for the async helpers to catch up, and then wait on
any ordered extents that are created.

Fixes: e076ab2a2ca7 ("btrfs: shrink delalloc pages instead of full inodes")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 03fe78cc 14-Jul-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: use delalloc_bytes to determine flush amount for shrink_delalloc

We have been hitting some early ENOSPC issues in production with more
recent kernels, and I tracked it down to us simply not f

btrfs: use delalloc_bytes to determine flush amount for shrink_delalloc

We have been hitting some early ENOSPC issues in production with more
recent kernels, and I tracked it down to us simply not flushing delalloc
as aggressively as we should be. With tracing I was seeing us failing
all tickets with all of the block rsvs at or around 0, with very little
pinned space, but still around 120MiB of outstanding bytes_may_used.
Upon further investigation I saw that we were flushing around 14 pages
per shrink call for delalloc, despite having around 2GiB of delalloc
outstanding.

Consider the example of a 8 way machine, all CPUs trying to create a
file in parallel, which at the time of this commit requires 5 items to
do. Assuming a 16k leaf size, we have 10MiB of total metadata reclaim
size waiting on reservations. Now assume we have 128MiB of delalloc
outstanding. With our current math we would set items to 20, and then
set to_reclaim to 20 * 256k, or 5MiB.

Assuming that we went through this loop all 3 times, for both
FLUSH_DELALLOC and FLUSH_DELALLOC_WAIT, and then did the full loop
twice, we'd only flush 60MiB of the 128MiB delalloc space. This could
leave a fair bit of delalloc reservations still hanging around by the
time we go to ENOSPC out all the remaining tickets.

Fix this two ways. First, change the calculations to be a fraction of
the total delalloc bytes on the system. Prior to this change we were
calculating based on dirty inodes so our math made more sense, now it's
just completely unrelated to what we're actually doing.

Second add a FLUSH_DELALLOC_FULL state, that we hold off until we've
gone through the flush states at least once. This will empty the system
of all delalloc so we're sure to be truly out of space when we start
failing tickets.

I'm tagging stable 5.10 and forward, because this is where we started
using the page stuff heavily again. This affects earlier kernel
versions as well, but would be a pain to backport to them as the
flushing mechanisms aren't the same.

CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# fcdef39c 14-Jul-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: enable a tracepoint when we fail tickets

When debugging early enospc problems it was useful to have a tracepoint
where we failed all tickets so I could check the state of the enospc
counters

btrfs: enable a tracepoint when we fail tickets

When debugging early enospc problems it was useful to have a tracepoint
where we failed all tickets so I could check the state of the enospc
counters at failure time to validate my fixes. This adds the tracpoint
so you can easily get that information.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 138a12d8 22-Jun-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: rip out btrfs_space_info::total_bytes_pinned

We used this in may_commit_transaction() in order to determine if we
needed to commit the transaction. However we no longer have that logic
and t

btrfs: rip out btrfs_space_info::total_bytes_pinned

We used this in may_commit_transaction() in order to determine if we
needed to commit the transaction. However we no longer have that logic
and thus have no use of this counter anymore, so delete it.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 3ffad696 22-Jun-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: rip the first_ticket_bytes logic from fail_all_tickets

This was a trick implemented to handle the case where we had a giant
reservation in front of a bunch of little reservations in the ticke

btrfs: rip the first_ticket_bytes logic from fail_all_tickets

This was a trick implemented to handle the case where we had a giant
reservation in front of a bunch of little reservations in the ticket
queue. If the giant reservation was too large for the transaction
commit to make a difference we'd ENOSPC everybody out instead of
committing the transaction. This logic was put in to force us to go
back and re-try the transaction commit logic to see if we could make
progress.

Instead now we know we've committed the transaction, so any space that
would have been recovered is now available, and would be caught by the
btrfs_try_granting_tickets() in this loop, so we no longer need this
code and can simply delete it.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 04808553 22-Jun-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: remove FLUSH_DELAYED_REFS from data ENOSPC flushing

Since we unconditionally commit the transaction now we no longer need to
run the delayed refs to make sure our total_bytes_pinned value is

btrfs: remove FLUSH_DELAYED_REFS from data ENOSPC flushing

Since we unconditionally commit the transaction now we no longer need to
run the delayed refs to make sure our total_bytes_pinned value is
uptodate, we can simply commit the transaction. Remove this stage from
the data flushing list.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# c416a30c 22-Jun-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: rip out may_commit_transaction

may_commit_transaction was introduced before the ticketing
infrastructure existed. There was a problem where we'd legitimately be
out of space, but every reser

btrfs: rip out may_commit_transaction

may_commit_transaction was introduced before the ticketing
infrastructure existed. There was a problem where we'd legitimately be
out of space, but every reservation would trigger a transaction commit
and then fail. Thus if you had 1000 things trying to make a
reservation, they'd all do the flushing loop and thus commit the
transaction 1000 times before they'd get their ENOSPC.

This helper was introduced to short circuit this, if there wasn't space
that could be reclaimed by committing the transaction then simply ENOSPC
out. This made true ENOSPC tests much faster as we didn't waste a bunch
of time.

However many of our bugs over the years have been from cases where we
didn't account for some space that would be reclaimed by committing a
transaction. The delayed refs rsv space, delayed rsv, many pinned bytes
miscalculations, etc. And in the meantime the original problem has been
solved with ticketing. We no longer will commit the transaction 1000
times. Instead we'll get 1000 waiters, we will go through the flushing
mechanisms, and if there's no progress after 2 loops we ENOSPC everybody
out. The ticketing infrastructure gives us a deterministic way to see
if we're making progress or not, thus we avoid a lot of extra work.

So simplify this step by simply unconditionally committing the
transaction. This removes what is arguably our most common source of
early ENOSPC bugs and will allow us to drastically simplify many of the
things we track because we simply won't need them with this stuff gone.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 1a9fd417 21-May-2021 David Sterba <dsterba@suse.com>

btrfs: fix typos in comments

Fix typos that have snuck in since the last round. Found by codespell.

Signed-off-by: David Sterba <dsterba@suse.com>


# 385f421f 28-Apr-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: handle preemptive delalloc flushing slightly differently

If we decide to flush delalloc from the preemptive flusher, we really do
not want to wait on ordered extents, as it gains us nothing.

btrfs: handle preemptive delalloc flushing slightly differently

If we decide to flush delalloc from the preemptive flusher, we really do
not want to wait on ordered extents, as it gains us nothing. However
there was logic to go ahead and wait on ordered extents if there was
more ordered bytes than delalloc bytes. We do not want this behavior,
so pass through whether this flushing is for preemption, and do not wait
for ordered extents if that's the case. Also break out of the shrink
loop after the first flushing, as we just want to one shot shrink
delalloc.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 3e101569 28-Apr-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: only ignore delalloc if delalloc is much smaller than ordered

While testing heavy delalloc workloads I noticed that sometimes we'd
just stop preemptively flushing when we had loads of delallo

btrfs: only ignore delalloc if delalloc is much smaller than ordered

While testing heavy delalloc workloads I noticed that sometimes we'd
just stop preemptively flushing when we had loads of delalloc available
to flush. This is because we skip preemptive flushing if delalloc <=
ordered. However if we start with say 4gib of delalloc, and we flush
2gib of that, we'll stop flushing there, when we still have 2gib of
delalloc to flush.

Instead adjust the ordered bytes down by half, this way if 2/3 of our
outstanding delalloc reservations are tied up by ordered extents we
don't bother preemptive flushing, as we're getting close to the state
where we need to wait on ordered extents.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 30acce4e 28-Apr-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: don't include the global rsv size in the preemptive used amount

When deciding if we should preemptively flush space, we will add in the
amount of space used by all block rsvs. However this a

btrfs: don't include the global rsv size in the preemptive used amount

When deciding if we should preemptively flush space, we will add in the
amount of space used by all block rsvs. However this also includes the
global block rsv, which isn't flushable so shouldn't be accounted for in
this calculation. If we decide to use ->bytes_may_use in our used
calculation we need to subtract the global rsv size from this amount so
it most closely matches the flushable space.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 1239e2da 28-Apr-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: use the global rsv size in the preemptive thresh calculation

We calculate the amount of "free" space available for normal
reservations by taking the total space and subtracting out the hard u

btrfs: use the global rsv size in the preemptive thresh calculation

We calculate the amount of "free" space available for normal
reservations by taking the total space and subtracting out the hard used
space, which is readonly, used, and reserved space.

However we weren't taking into account the global block rsv, which is
essentially hard used space. Handle this by subtracting it from the
available free space, so that our threshold more closely mirrors
reality.

We need to do the check because it's possible that the global_rsv_size +
used is > total_bytes, sometimes the global reserve can end up being
calculated as larger than the available size (think small filesystems
where we only have the original 8MiB chunk of metadata). It doesn't
usually happen, but that can get us into trouble so this is safer.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 610a6ef4 28-Apr-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: take into account global rsv in need_preemptive_reclaim

Global rsv can't be used for normal allocations, and for very full file
systems we can decide to try and async flush constantly even th

btrfs: take into account global rsv in need_preemptive_reclaim

Global rsv can't be used for normal allocations, and for very full file
systems we can decide to try and async flush constantly even though
there's really not a lot of space to reclaim. Deal with this by
including the global block rsv size in the "total used" calculation.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 0aae4ca9 28-Apr-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: only clamp the first time we have to start flushing

We were clamping the threshold for preemptive reclaim any time we added
a ticket to wait on, which if we have a lot of threads means we'd
e

btrfs: only clamp the first time we have to start flushing

We were clamping the threshold for preemptive reclaim any time we added
a ticket to wait on, which if we have a lot of threads means we'd
essentially max out the clamp the first time we start to flush.

Instead of doing this, simply do it every time we have to start
flushing, this will make us ramp up gradually instead of going to max
clamping as soon as we start needing to do flushing.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# ed738ba7 28-Apr-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: check worker before need_preemptive_reclaim

need_preemptive_reclaim() does some calculations, which aren't heavy,
but if we're already running preemptive reclaim there's no reason to do
them

btrfs: check worker before need_preemptive_reclaim

need_preemptive_reclaim() does some calculations, which aren't heavy,
but if we're already running preemptive reclaim there's no reason to do
them at all, so re-order the checks so that we don't do the calculation
if we're already doing reclaim.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 2cdb3909 24-Mar-2021 Josef Bacik <josef@toxicpanda.com>

btrfs: use percpu_read_positive instead of sum_positive for need_preempt

Looking at perf data for a fio workload I noticed that we were spending
a pretty large chunk of time (around 5%) doing percpu

btrfs: use percpu_read_positive instead of sum_positive for need_preempt

Looking at perf data for a fio workload I noticed that we were spending
a pretty large chunk of time (around 5%) doing percpu_counter_sum() in
need_preemptive_reclaim. This is silly, as we only want to know if we
have more ordered than delalloc to see if we should be counting the
delayed items in our threshold calculation. Change this to
percpu_read_positive() to avoid the overhead.

I ran this through fsperf to validate the changes, obviously the latency
numbers in dbench and fio are quite jittery, so take them as you wish,
but overall the improvements on throughput, iops, and bw are all
positive. Each test was run two times, the given value is the average
of both runs for their respective column.

btrfs ssd normal test results

bufferedrandwrite16g results
metric baseline current diff
==========================================================
write_io_kbytes 16777216 16777216 0.00%
read_clat_ns_p99 0 0 0.00%
write_bw_bytes 1.04e+08 1.05e+08 1.12%
read_iops 0 0 0.00%
write_clat_ns_p50 13888 11840 -14.75%
read_io_kbytes 0 0 0.00%
read_io_bytes 0 0 0.00%
write_clat_ns_p99 35008 29312 -16.27%
read_bw_bytes 0 0 0.00%
elapsed 170 167 -1.76%
write_lat_ns_min 4221.50 3762.50 -10.87%
sys_cpu 39.65 35.37 -10.79%
write_lat_ns_max 2.67e+10 2.50e+10 -6.63%
read_lat_ns_min 0 0 0.00%
write_iops 25270.10 25553.43 1.12%
read_lat_ns_max 0 0 0.00%
read_clat_ns_p50 0 0 0.00%

dbench60 results
metric baseline current diff
==================================================
qpathinfo 11.12 12.73 14.52%
throughput 416.09 445.66 7.11%
flush 3485.63 1887.55 -45.85%
qfileinfo 0.70 1.92 173.86%
ntcreatex 992.60 695.76 -29.91%
qfsinfo 2.43 3.71 52.48%
close 1.67 3.14 88.09%
sfileinfo 66.54 105.20 58.10%
rename 809.23 619.59 -23.43%
find 16.88 15.46 -8.41%
unlink 820.54 670.86 -18.24%
writex 3375.20 2637.91 -21.84%
deltree 386.33 449.98 16.48%
readx 3.43 3.41 -0.60%
mkdir 0.05 0.03 -38.46%
lockx 0.26 0.26 -0.76%
unlockx 0.81 0.32 -60.33%

dio4kbs16threads results
metric baseline current diff
================================================================
write_io_kbytes 5249676 3357150 -36.05%
read_clat_ns_p99 0 0 0.00%
write_bw_bytes 89583501.50 57291192.50 -36.05%
read_iops 0 0 0.00%
write_clat_ns_p50 242688 263680 8.65%
read_io_kbytes 0 0 0.00%
read_io_bytes 0 0 0.00%
write_clat_ns_p99 15826944 36732928 132.09%
read_bw_bytes 0 0 0.00%
elapsed 61 61 0.00%
write_lat_ns_min 42704 42095 -1.43%
sys_cpu 5.27 3.45 -34.52%
write_lat_ns_max 7.43e+08 9.27e+08 24.71%
read_lat_ns_min 0 0 0.00%
write_iops 21870.97 13987.11 -36.05%
read_lat_ns_max 0 0 0.00%
read_clat_ns_p50 0 0 0.00%

randwrite2xram results
metric baseline current diff
================================================================
write_io_kbytes 24831972 28876262 16.29%
read_clat_ns_p99 0 0 0.00%
write_bw_bytes 83745273.50 92182192.50 10.07%
read_iops 0 0 0.00%
write_clat_ns_p50 13952 11648 -16.51%
read_io_kbytes 0 0 0.00%
read_io_bytes 0 0 0.00%
write_clat_ns_p99 50176 52992 5.61%
read_bw_bytes 0 0 0.00%
elapsed 314 332 5.73%
write_lat_ns_min 5920.50 5127 -13.40%
sys_cpu 7.82 7.35 -6.07%
write_lat_ns_max 5.27e+10 3.88e+10 -26.44%
read_lat_ns_min 0 0 0.00%
write_iops 20445.62 22505.42 10.07%
read_lat_ns_max 0 0 0.00%
read_clat_ns_p50 0 0 0.00%

untarfirefox results
metric baseline current diff
==============================================
elapsed 47.41 47.40 -0.03%

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 169e0da9 04-Feb-2021 Naohiro Aota <naohiro.aota@wdc.com>

btrfs: zoned: track unusable bytes for zones

In a zoned filesystem a once written then freed region is not usable
until the underlying zone has been reset. So we need to distinguish such
unusable sp

btrfs: zoned: track unusable bytes for zones

In a zoned filesystem a once written then freed region is not usable
until the underlying zone has been reset. So we need to distinguish such
unusable space from usable free space.

Therefore we need to introduce the "zone_unusable" field to the block
group structure, and "bytes_zone_unusable" to the space_info structure
to track the unusable space.

Pinned bytes are always reclaimed to the unusable space. But, when an
allocated region is returned before using e.g., the block group becomes
read-only between allocation time and reservation time, we can safely
return the region to the block group. For the situation, this commit
introduces "btrfs_add_free_space_unused". This behaves the same as
btrfs_add_free_space() on regular filesystem. On zoned filesystems, it
rewinds the allocation offset.

Because the read-only bytes tracks free but unusable bytes when the block
group is read-only, we need to migrate the zone_unusable bytes to
read-only bytes when a block group is marked read-only.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# e5ad49e2 09-Oct-2020 Josef Bacik <josef@toxicpanda.com>

btrfs: add a trace class for dumping the current ENOSPC state

Often when I'm debugging ENOSPC related issues I have to resort to
printing the entire ENOSPC state with trace_printk() in different spo

btrfs: add a trace class for dumping the current ENOSPC state

Often when I'm debugging ENOSPC related issues I have to resort to
printing the entire ENOSPC state with trace_printk() in different spots.
This gets pretty annoying, so add a trace state that does this for us.
Then add a trace point at the end of preemptive flushing so you can see
the state of the space_info when we decide to exit preemptive flushing.
This helped me figure out we weren't kicking in the preemptive flushing
soon enough.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 4b02b00f 09-Oct-2020 Josef Bacik <josef@toxicpanda.com>

btrfs: adjust the flush trace point to include the source

Since we have normal ticketed flushing and preemptive flushing, adjust
the tracepoint so that we know the source of the flushing action to m

btrfs: adjust the flush trace point to include the source

Since we have normal ticketed flushing and preemptive flushing, adjust
the tracepoint so that we know the source of the flushing action to make
it easier to debug problems.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


1234567