The goal of any cloud initiative is to create a cost-effective, flexible environment. The architectures will typically store large data sets for long periods of times, so one of the challenges to being cost-effective is the physical cost of storage. Deduplication is critical to extracting maximum value from a cloud first initiative but the cloud requires a different, more flexible software defined implementation.
Why We Still Need Deduplication?
While the cost per GB of hard disk and even flash storage continues to plummet, when purchased in the quantities needed to meet the typical cloud architectures capacity demands, storage continues to be the most expensive aspect of the design. And it’s not just the per GB cost, it is the physical space that each additional storage node consumes. Too many nodes can force the construction of a new data center, which is a much bigger cost concern than the price per GB of storage.
Deduplication provides a return on the investment by making sure that the architecture stores only the unique data. That not only reduces the capacity requirement it also reduces the physical storage footprint.
Organizations will have different cloud strategies. A few may only use the public cloud. Some may only use private cloud architectures. Most, however, will take a hybrid approach, leveraging the public cloud when it makes sense and a private cloud when performance or data retention concerns force them to. In the hybrid model data should flow seamlessly and frequently between public and private architectures.
If the same deduplication technology is implemented in both the hybrid and public cloud architectures then the technology’s understanding of data can be leveraged to limit the amount of data that has to be transferred, making the network connection between the two more efficient because only unique data segments would need to be transferred.
Why We Need Software Defined Deduplication
The other aspect of a cloud initiative is flexibility so IT can more quickly respond to any issues. Part of that flexibility is defined in the hybrid model itself. The storage architecture is split into two parts. The public cloud owns a section of it and a private cloud owns the rest. While the public cloud has the advantage of low upfront costs, IT can not specify what types of storage hardware, if any, it uses.
The public cloud’s consumer-only model requires all storage services will be available as software only. This includes deduplication, hence it has to be available as a software-defined component of the overall data management solution. Software defined deduplication allows the data management software to execute and manage the data efficiency process, which should allow it to use anyone’s hardware.
Most private cloud solutions will leverage an object storage system as part of the architecture. It may or may not come with its own data deduplication feature but it is unlikely to include a robust data management engine. Implementing a data management software solution that includes a deduplication capability on top of the object storage system provides more flexibility. The organization is free to select any storage hardware. And because it is software, IT can implement the same data efficiency in the cloud, redundant data between private and public cloud does not need to be re-transmitted, improving network efficiency.
[to continue, click HERE]