One of the most vicious and hard to detect issues in database performance deterioration is I/O. When the I/O of a database is lagging there are multiple and unpredicted issues occuring.
Some of the most common are:
- Increased amount of slow queries
- Write operations get very slow in an inexplicable way
- Because of the above two reasons, queries start piling up and database will eventually come to a halt
The immediate reaction of the person troubleshooting a growing list of pending queries, is to check the slow query log. If the slow log contains queries (probably will) then one will start investigating which of the queries was the cause of the problem.
However when machine I/O is the problem, it is likely that none of the queries is actually problematic.
This is the reason that I/O issues are very difficult to detect – infrastructure is the last thing to come to mind as the root of the problems.
Detecting I/O issues using AWS metrics
When using AWS RDS, one does not have traditional OS tools such as systat, iostat, dtstat or sar. The only tool to understand what is happening in RDS is cloudwatch metrics and the graphs provided.
Read and Write IOPS metrics
The IOPS cloudwatch metrics provide great insights into how much IOPS occur in your db.
You can view them by visiting cloudwatch, selecting RDS and then finding the ReadIOPS and WriteIOPS metrics for your database.
Once the graph shows up, select the 1 minute granularity and “average” from the dropdown.
The DiskQueueDepth metric provides the number of outstanding IOs (read/write requests) waiting to access the disk. If this metrics is frequently above 2, then you should expect sooner or later to face performance issues.
By using this metric you can immediately identify how many requests are waiting queued at your disk.
Do I have an I/O problem?
Using the above two graphs it is easy to identify if you are under-provisioned or over-provisioned in IOPS.
- If your DiskQueueDepth is consistsently between 0 and 0.5 you are over provisioned
- If your DiskQueueDepth is consistsently above 2 then you are under provisioned
How many IOPS do I need and how do I acquire them?
To see how many IOPS are needed to have a steady performance, use the ReadIOPS and WriteIOPS metrics and sum up the values. Choose a descent time interval or a typical day from a performance point of view and also remove outliers. Compare this value with the IOPS you have provisioned.
Once you calculate how many IOPS are needed, then you have two ways to acquire them.
The first is to purchase PIOPS, which is more reliable but a lot more costly. The second is to use a gp2 disk for your RDS instance, which provides 3 IOPS per GB of storage.
Lets take an example.
Assuming on a typical day we have an avearge of 400 ReadIOPS and 500 WriteIOPS, it means that our disk is consuming 900 IOPS. It therefore makes sense to acquire approximately that amount of IOPS.
Using the above two ways one has the below options:
- Purchase 900 PIOPS
- Use a 300GB ssd disk that comes with 3 IOPS / GB (therefore 300 GB * 3 IOPS = 900 IOPS).
PIOPS is considered more reliable however it is more costly.
Hope this guide provided some good undertsanding into how IOPS work in RDS.
For more information please also check http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html