How to tell if linux disk IO is causing excessive (> 1 second) application stalls

Well one easy test would be to mount that ext3 fs as ext2 and then profile the application's performance.

The answer is "Yes" (journaling ALWAYS adds latency :-)

The question of how significant it is can really only be answered by a direct test, but generally assume that for every (journaled) operation it takes around twice as long as it would without journaling enabled.

Since you mentioned in your comments on another answer that you can't do the direct test in your production environment (and presumably don't have a dev/test environment you can use) you do have one other option: Look at your disk statistics and see how much time you spend writing to the journal device.
Unfortunately this only really helps if your journal device is discrete and can be instrumented separately from the "main" disk.

Second time I'm plugging a McKusick video today, but if you wade through this video there's a great discussion of some of the work a journaling filesystem has to do (and the performance impact involved).
Not directly useful/relevant to you and your particular question, but a great general background on filesystems and journaling.

Yes, journaling causes latency. But it's a small piece of the equation. I'd consider it the 5th or 6th item to look at... However, this is another in a trend of systems storage questions that do not include enough relevant information.

What type of server hardware are you using? (make and model)
Please describe the storage setup (RAID controller, cache configuration, number and arrangement of disks)
What operating system are you using? Distribution and kernel versions would be helpful.

Why do I ask for this information?

Your hardware setup and RAID level can have a HUGE impact on your observed performance. Read and write caching on hardware RAID controllers can and should be tuned to accommodate your workload and I/O patterns. The operating system matters because it impacts the tool recommendations and tuning techniques that would be helpful to you. Different distributions and kernels have different default settings, thus performance characteristics vary between them.

So in this case, there are a number of possibilities:

Your RAID array may not be able to keep up with the workload (not enough spindles).
Or you could benefit from write caching.
You may have fragmentation issues (how full is the filesystem?).
You could have an ill-fitting RAID level that's counter to the requisite performance characteristics.
Your RAID controller may need tuning.
You may need to change your system's I/O scheduler and run some block-device tuning.
You could consider a more performance-optimized filesystem like XFS.
You could drop the journal and remount your filesystems as ext2. This can be done on the fly.
You might have cheap SATA disks that may be experiencing bus timeouts.

But as-is, we don't have enough information to go on.

How to tell if linux disk IO is causing excessive (> 1 second) application stalls

Tags:

Linux

Performance

Redhat

Storage Area Network

Veritas

Related

Recent Posts