Am I reading the iostat command output correctly?

Iostat command came from the same sysstat family package

# rpm -qf `which iostat`sysstat-11.7.3-6.el8.x86_64

It mainly read data from /proc/diskstats

# cat /proc/diskstats259       0 nvme1n1 147 0 6536 2888 0 0 0 0 0 798 2888 0 0 0 0259       1 nvme0n1 33608 16 1588184 382879 2001 376 142595 20368 0 195069 403248 0 0 0 0259       2 nvme0n1p1 41 0 328 402 0 0 0 0 0 331 402 0 0 0 0259       3 nvme0n1p2 33524 16 1585472 382298 2001 376 142595 20368 0 195013 402667 0 0 0 0

NOTE:

Now look at each column bit by bit

The output is not very useful because it’s not telling us

To solve the issue we need to use -tkx along -p with iostat

To check the scheduler on the system. To know more about schedulers check the following doc https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/managing_storage_devices/index#available-disk-schedulers_setting-the-disk-scheduler

cat /sys/block/nvme0n1/queue/scheduler[none] mq-deadline kyber bfq

If you read the the iostat man page it used the device a lot. But device actually consist of following sub-par.

rareq-sz/wareq-sz: It’s always in sectors(512b sectors). . It you want to calculate the individual average read io size(rkb/s*2)/(r/s) and same with write (wkb/s*2)/(w/s).It’s you want the combination of both read and writes, it can be calculated using (rkb/s +wkb/s)*2/(r/s|w/s)

aqu-sz: There is two level of queues which is been maintained one at scheduler level and other at device level. Driver doesn’t maintain any queue but it will keep track of all the I/O requests passing through it. So it’s the average number of requests with I/O scheduler queue(nr_requests) plus outstanding number of requests in storage(lun_queue)

/sys/block/nvme0n1/queue/nr_requests

r_await/w_await: This is the average time in millisecond for read/write request to be completed by storage. This include the time spent in the scheduler queue and the time storage spent in servicing the request.

svctm: The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version.

%util: Percentage of elapsed time during which I/O requests were issued to the device. Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs(i.e its not single disk device), this number does not reflect their performance limits.

So the bottom line is as long as the avgqu-sz value remains below the lun queue depth, io is passed quickly from the scheduler to the driver which passes it to the hba. The moment this exceed you will start seeing IO issue and you cant do much as your storage is not capable of handling this much traffic(So either you can change storage to much faster or ask your developer to change the I/O pattern of your application.)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Prashant Lakhera

AWS Community Builder, Ex-Redhat, Author, Blogger, YouTuber, RHCA, RHCDS, RHCE, Docker Certified,4XAWS, CCNA, MCP, Certified Jenkins, Terraform Certified, 1XGCP