Solid State Hard Drives for database performance
Solid State Disks (SSDs) aren’t just for netbooks: they have the potential to transform server specification, especially for database systems. SSDs aren’t much in the business eye at present but we expect them to become a hot topic over the coming year.
A solid state drive is a block of memory that can replace a hard drive on a computer. The name SSD is, incidentally, a great example of skeuomorphism; the D stands for ‘disk’ or ‘drive’ even though an SSD contains exactly zero driven disks.
The reason this matters for databases goes back to the fact that the advances that have been made in CPU and hard disk technology are dramatically different. For many years, CPU performance has followed Moore’s Law and doubled every 18-24 months. The net result is that the processing power available on even a modest modern machine is massive. However, while the capacity of hard disks has scaled remarkably well, their I/O (input/output) performance has not increased at the same rate. SSDs offer an alternative to the spinning hard disk. Their I/O rates have the potential to be much, much higher, they’re more resistant to shock, quieter and consume less energy. All of which is highly desirable in laptops and netbooks, if for no other reason than it means longer battery life and cooler knees (or other bits). The only down side to these paragons of virtue is the cost, but at least for now, prices are in the process of tumbling. However even at higher prices SSDs could also offer huge gains on servers, especially in database and Business Intelligence (BI) systems. The benefits are less obvious, and a slightly deeper understanding of SSDs is required before the advantages become apparent.
Current SSD technology
We’re concentrating on the latest batch of SSDs, like Intel’s X25-M Mainstream and X25-E Extreme SATA SSDs.
Earlier SSDs were certainly robust but failed to deliver truly impressive performance and energy saving. These SSDs use NAND flash memory technology. Flash memory is non-volatile (i.e. it does not require power to maintain its content); NAND stands for Not AND, a Boolean logic operator that describes how the device stores data, deep in the underworld of bits and electrical impulses.
NAND flash comes in two flavours: multi-level cell (MLC) which has two bits per cell and single-level cell (SLC) with one bit per cell. Both display the same rapid read rate, which is so rapid that the disks can easily saturate the 3 Gigabit per second SATA 3 bus (250 MB/sec). However in other ways the two forms are very different. For instance, you can write to SLC memory twice as fast as you can to MLC. This makes SLC sound immediately more attractive but there are many applications (such as the majority of Business Intelligence applications) where the write rate, whilst not immaterial, is not particularly important.
And then there is the price. In general terms MLC is about a third of the price of SLC.
SSDs for Servers
Server cooling for large systems has become a major issue so the energy usage characteristics of SSDs are of particular interest. The amount of waste heat is significantly reduced. In turn this means that the cooling system can be simplified and the disks can be packed closer together. More importantly, the amount of energy required to drive the system also drops dramatically. There is even a measure of this efficiency – IOPS (I/O operations per second) per Watt. A cynic might suggest that this measure was invented by those with a vested interest in selling SSDs: nevertheless the figures from companies like Intel are fascinating. These suggest that 10Krpm hard disk deliver in the region of 300 IOPS per Watt. SSDs can produce a value of 35,000 IOPS per Watt. No, that’s not a misprint, it’s a comfortable two orders of magnitude increase in performance.
For robustness and reliability, SSDs are hard to fault. With no moving parts, an SSD is much less susceptible to damage than a relatively fragile spinning hard disk, and although they have a fixed life span their reliability within that is excellent, remarkably so for a new technology.
Database servers are frequently bound by disk I/O – in many cases you can actually hear the heads in the disk noisily thrashing themselves to death. One obvious approach is to put the entire data stack on SSDs, but this is highly unlikely to be the most cost- effective answer for the simple reason that SSD is more expensive than conventional rotating technology.
A better solution is to start by looking at data usage. In many cases only a tiny proportion of the stored data is accessed with any frequency. So we can describe data as hot (frequently accessed) and cold (infrequently accessed) data. You’ll be way ahead of me by this time: hot data on SSD, cold data on hard disk. It is, of course, perfectly possible to do this manually but a company called Teradata is ahead of this curve. The Teradata engine automatic recognises hot and cold data and the system will dynamically migrate hot data onto fast disks. Teradata built this technology in order to optimise slow and fast rotating disks but guess what? It works fabulously if you add SSDs to the mix of disks you are using.
There’s even more to this than meets the eye because not all hard disks are created equal. To simplify a somewhat complex picture, there are fast, expensive, power-hungry hard disks (typically 15Krpm), and there are slow, cost-effective, less power-hungry disks (7.5Krpm). An ideal combination for many companies might turn out to be hot data on SSD and cold data on those cost-effective slower hard disks.
Of course, the size of the database is a consideration here. Many small- to medium-sized enterprises have databases in the 100GB range or smaller. For these it may already be cost effective to move the entire database onto SSD. At the top end of database sizing, Teradata builds systems for the largest organisations in the world: the company has already built a complete SSD data warehouse system.
Many BI applications are intensive readers of data but do not have much necessity to write it. This is because most such systems have a strong bias towards analysis, working with the data already stored in the database by slicing, dicing, mining and displaying it graphically. Happily, the cheaper MLC SSDs are well suited to this type of task.
Planning for SSDs
As RAM has become cheaper it is not uncommon to populate servers with 4, 8, 16 or 32GB of RAM, particularly for database applications. This enables the database to cache significant parts of the data in RAM and given only rotating disks this is a highly effective solution. However RAM is expensive and generates a lot of heat. One effect of using SSD technology is that we can obtain the same or better performance with significantly less heat production. In itself this is not a reason to move into SSDs, but on the other hand the savings in electricity, in disposing of waste heat and saved server room space all serve to offset the initial high cost of SSDs, all of which can improve your TCO.
SSDs are just reaching the cusp point where they become mainstream. Now is the time to take them seriously. For example, suppose you work for an SME with, say, 100GB of data in your BI system or 50GB in an operational system. For about £500 you can essentially guarantee a significant speed increase simply by moving the former to MLC and the latter to SLC. At significantly less than a day’s database tuning consultancy, that has to be a bargain.