Jeff Kabachinski, MS-T, BS-ETE, MCNE

There always seems to be a need for more hard drives, making this a perennial and pertinent topic of discussion. This month, we take up where the October column left off. Last time, we covered the growing need for more storage capacity, the pros and cons and a comparison of hard drive and solid-state drives (SSDs), and a description of storage area networks (SANs). We continue now with more ways to increase storage capacity without buying more drives. You will see some newly popular terms below. Read on and find out what it is all about.

Deduplication

As we discussed last month, there are tools, methods, and concepts that can help your strategies to get smarter about storing data. One way is to employ optimization technologies like data deduplication. Deduplication makes a lot of sense because it eliminates redundancy in data files and data blocks within data files that may be repeated hundreds or even thousands of times. When a memo is e-mailed to all employees and then individually saved hundreds of times on hundreds of computers, that represents redundant storage. Maybe there is a piece of information, like the corporate logo, that appears on every page of every PowerPoint. Deduplication scans every file for repeated scraps of data and replaces the data with a pointer to the original. Deduplication continues to learn over time and remove redundant stores of files, data, and data scraps. This has the effect of reducing storage needs by 20 to 50 times!

With new technologies, hospitals can increase storage capacity without buying more hard drives.

Deduplication makes disk backups much smaller and up to 75% faster. It allows IT to backup across the network (to avoid storing backed-up data in the same physical location as the working data) without hogging network bandwidth in the process. The International Data Corp found that the average amount of time for a data deduplication system to pay for itself in reduced data storage needs, improved IT productivity, and shorter backup and restore times is 6.6 months. For example, according to the Virginia Commonwealth University Health System (VCUHS),1 110TB of data can be found at any moment spread out in its network system. That is data that needs to be backed up and immediately available in the case of an emergency. In addition, each backup instance needs to be kept 90 days, increasing the total amount of data by three or four times. Greg Johnson, VCUHS’s chief technology officer and director of engineering services for the health system, says that even with a conservative five times data reduction with deduplication, that original 110TB shrinks to 22TB of actual storage. For size comparison, it is said that the entire contents of the Library of Congress would consume a mere 10TB.

The IT staff also learns, along with the tiering software, to make intelligent decisions on what kinds of information needs to be considered top-tier storage, which affects the information’s availability and backup speeds. Initial space savings might be in the 20% range as you get started with deduplication. As the deduplication effort becomes more familiar with its client’s consumption, the savings can grow to 50%. Do not create storage pools based on departments or individual function—let the storage system do its thing. It is much better at creating an efficient, fast, and cost-effective data center!

Tiering and Thin Provisioning

To build a solid storage plan, strategize to compress data and improve efficiency instead of automatically adding more disk drives.

Another part of the puzzle is automatic tiering. Let the storage technology also determine what data is accessed most often. Make that data available on the fastest portion of the SAN, and sock away rarely accessed data on the slowest portion—back in the cobwebs somewhere.

Thin provisioning stems from virtualization. To virtualize storage would be to gather the location of all storage disks or partitions and let the virtualization process dole out what is really needed. Instead of carving out disk space for the sole use of an online app, a range is considered and the system makes available whatever is needed at the time as a just-in-time storage capability. The app and virtual machine think they have got sole access to a big memory allocation, while the storage servers make available only what is actually being used.

When implementing storage strategies, continue to be vigilant with security. If you consolidate data onto one system then fail to protect it, you are leaving yourself open to criminals. In an interview with InformationWeek on February 5, 2011, Doug Davis, IS coordinator for Monical’s Pizza’s 65 restaurants in the Midwest, said, “Stored data is one of the most vulnerable parts of an organization. Data at rest being captured in small bytes is one of the hardest things to control. As the head of IT, it is my job to be sure nobody is removing needles from any of the haystacks in my whole field.”2

Also consider a survey that the Ponemon Institute and Juniper Research3 conducted this past June, in which it was reported that 90% of organizations have experienced a security breach within the past year—90%!

Below you will find general keys to a solid storage plan:

  • Strategize to compress data and improve efficiency. Squeeze what you have instead of throwing more disk drives at the problem.
  • Ensure that deduplication is near the top of any storage-strategy list.
  • Consider SSDs. Will savings and end user delight pay off from initial costs? Convince yourself (and others) by doing a return-on-investment exercise.
  • Don’t pool on purpose—let the storage system manage the storage resources.
  • Consider virtualization and the cloud to allow thin provisioning. Do not listen to FUD (fear, uncertainty, and doubt). After all, deduplicated files in the cloud are much better than individual employees’ thousand individual storage points of possible data loss outside of IT’s control!

Finally, are you using iSCSI4 yet? Check it out!


Jeff Kabachinski, MS-T, BS-ETE, MCNE, has more than 20 years of experience as an organizational development and training professional. Visit his Web site at kabachinski.vpweb.com. For more information, contact .

References
  1. Kugler L. Built-In Dedupe. EdTech eNewsletter. June 2010. www.edtechmagazine.com/higher/june-2010/built-in-dedupe.html. Accessed October 12, 2011.
  2. Marko K. State of storage survey. InformationWeek. February 5, 2011. Available at: www.informationweek.com/news/storage/virtualization/229201146. Accessed September 1, 2011.
  3. Ponemon Institute Research Report. Perceptions About Network Security. Sponsored by Juniper Networks, independently conducted by Ponemon Institute LLC. June 2011. Available at: www.juniper.net/us/en/…/ponemon-perceptions-network-security.pdf. Accessed October 11, 2011.
  4. For more information on iSCSI, or Internet Small Computer System Interface, visit Wikipedia: http://en.wikipedia.org/wiki/ISCSI.