Hard lesson from Dell’s PERC H700 Battery Write Cache

We ran into a huge issue recently… At about 9PM on a Friday night our primary MySQL database began slowing to a crawl. This is an extremely slow time of day for our web application so we were all quite confused. All of our tables live on a pretty beefy SAN and everything checked out clear there.

Lone behold, the Battery on our Dell Controller (PERC H700 Integrated) decided that it was time to re-learn its battery cycle.

/var/log/messsages

 Sep 16 17:36:31 DB01 Server Administrator: Storage Service EventID: 2176 The controller battery Learn cycle has started.: Battery 0 Controller 0
 Sep 16 17:37:36 DB01 Server Administrator: Storage Service EventID: 2415 Controller battery is discharging: Battery 0 Controller 0
 Sep 16 17:37:36 DB01 Server Administrator: Storage Service EventID: 2248 The controller battery is executing a Learn cycle.: Battery 0 Controller 0
 Sep 16 18:37:52 DB01 Server Administrator: Storage Service EventID: 2278 The controller battery charge level is below a normal threshold.: Battery 0 Controller 0
 Sep 16 18:37:52 DB01 Server Administrator: Storage Service EventID: 2188 The controller write policy has been changed to Write Through.: Battery 0 Controller 0
 Sep 16 18:37:53 DB01 Server Administrator: Storage Service EventID: 2199 The virtual disk cache policy has changed.: Virtual Disk 0 (Virtual Disk 0) Controller 0 (PERC H700 Integrated)

It turns out that this re-learning task happens by default on Dell servers every 90 days. While our Data didn’t reside on local disk, our binlogs did. This was apparently enough to bring MySQL to a crawl.

The only way to change the behavior permanently is in the BIOS of the controller, there we can set it to only warn us that it needs to be checked. We can use some of Dell’s Open Manage tools to get more information on the status of the battery.

The first command here shows battery’s status (with example output)

[[email protected] /]$ omreport storage battery controller=0 battery=0
Battery 0 on Controller PERC H700 Integrated (Embedded)

Controller PERC H700 Integrated (Slot Embedded)
ID                        : 0
Status                    : Non-Critical
Name                      : Battery 0
State                     : Degraded
Recharge Count            : Not Applicable
Max Recharge Count        : Not Applicable
Predicted Capacity Status : Ready
Learn State               : Due
Next Learn Time           : 13 days 2 hours
Maximum Learn Delay       : 7 days 0 hours
Learn Mode                : Auto
[[email protected] /]$

We’re unable to disable the learn cycle all together. However, we can push out when it happens to 7 days from now with this command which adds 7 days to end end of the learn cycle. This cycle should still be run, just preferably at a very off-peak time when we can monitor it

 omconfig storage battery action=delaylearn controller=0 battery=0 days=7

 

The last command here forces write back cache even if battery is not available.

omconfig storage vdisk action=changepolicy writepolicy=fwb controller=0 vdisk=0

If you’re not much of a command guru, using Dell’s Open Manage GUI can also show you the status.
If you see this first image, a battery learn cycle is about to begin

 

If you see this second image, a battery learn cycle is currently in progress! You may see some of the symptoms that I mentioned above

 

For some more info on this issue, I found a nice blog posting here: http://yo61.com/dell-drac-bbu-auto-learn-tests-kill-disk-performance.html

One thought on “Hard lesson from Dell’s PERC H700 Battery Write Cache

  1. Hi there,I read your blog named “Hard lesson from Dell’s PERC H700 Battery Write Cache | j0e.us” regularly.Your humoristic style is witty, keep doing what you’re doing! And you can look our website about free powerful love spell.

Leave a Reply

Your email address will not be published.