containers

The Missing Piece in Containers: Storage!

Container technologies have revolutionized the computing world by allowing developers and operators to be more agile.

Particularly powerful at first for making stateless applications elastic, Docker’s popularity has now reached a point where developers want their applications to do more, always more. And this requires state, hence storage one way or another: for databases, for media files served by websites, for logs and more.

The storage problem has been here for decades and still represents a $40B market that spans very different fields from object storage (e.g AWS S3), all-flash arrays (e.g Pure Storage), cloud gateways, distributing caching, backup and more.

The inherent limitations of traditional appliance-based centralized storage

Many could think that storage for containers is nothing different from storage for virtual machines but that could not be farther from the truth.

In order to understand this statement, one must understand the fundamental differences between virtual machines (e.g Xen) and containers (e.g Docker).

What makes Docker extremely powerful is its ability to spin off containers and scale them very quickly. This ability comes from the nature of containers which are orders of magnitude lighter than virtual machines.

While Docker is still a young technology, it is already used in production by companies to deploy and scale applications over hundreds if not thousands of nodes, something which was so complicated to achieve with virtual machines because of their more complex nature than only very large technology companies could afford to do it, namely Amazon, Google, Microsoft etc.

In addition, container technologies like Docker enhance the important micro-services philosophy. This philosophy, in contrast to monolithic architectures, allows developers to break applications into several independent services.

It is important to understand that historically, all storage solutions were (and still are for a very large portion of the enterprise market) monolithic, ofter represented by disk arrays.

The problem with such solutions are threefold:

  • Price: Storage vendors keep selling these solutions because their margin is high, encapsulating hardware sold at 3 to 10 times its price.
  • Scalability: Existing storage appliances are limited in terms of scalability and performance both because of their centralized architecture and use of ancient protocols like NFS but also because intrinsically limited in storage capacity.
  • Flexibility: Hardware-based storage solutions are more prone to vulnerabilities since their software is less often updated. Even more important is the impossibility to programmatically control your storage system to spawn multiple storage infrastructure with different properties.

Many storage solutions for virtual machines have been developed over the years but all sensibly follow the same philosophy of providing a monolithic infrastructure that is set up once and then used by a set of client computers to store and access data.

The need for modern software-defined distributed storage

While traditional storage systems may be distributed in nature, it will be in a way that assumes that its underlying components are reliable and powerful servers with a lot of storage capacity.

Containers however scale over thousands of nodes that are probably not powerful, small and likely to fail at some point.

As such containers require a natively elastic and distributed storage system that adapts to the behavior of a Docker application.

Even more important is the programmatic aspect that is lacking in almost all existing storage systems. Applications cannot be assumed to have the same needs. As such, DevOps should have the possibility to spawn multiple storage infrastructure, each with specific properties to serve an application’s needs: encryption, datacenter-aware replication, object/file storage interface and more.

In other words, containers need infrastructure that spins up storage with specific properties, as fast as containers (i.e within seconds) and scales as easily.

While software-defined storage is already making its way into the market of traditional storage, its flexibility, scalability and potential programmability make it the obvious choice when it comes to containers.

Some software-defined storage solutions have been on the market for several years, most notably Ceph (object/block storage) and GlusterFS (file storage), both from Red Hat.

Even though these solutions have done a lot to democratize software-defined storage, their monolithic architecture (master/slave), requirements (initial number of servers) and lack of programmability make them complicated to integrate with containers.

A new wave of solutions is coming however, specifically tackling the challenge of storage for containerized applications.

One one end, solutions with a traditional architecture (master/slave) that provide programmability but impose too many requirements (number of servers, number of disks etc.), the most well-known vendor being Hedvig.

On the other end of the spectrum, solutions like Infinit with a decentralized architecture (i.e peer-topeer) that removes single points of failure and bottlenecks while allowing for a better scalability. Because of its high programmability, Infinit can scale alongside an application, following Docker Swarm as it expands/shrinks for instance, essentially providing an hyper-converged infrastructure (scaling both storage and compute together).

Obviously, many other solutions exist, from ClusterHQ to Rancher Labs to Portworx and more.

And expect to see more of those in the future as storage is by far the number one challenge in the container world right now.

Conclusion

Thanks to software-defined storage and the advent of container technologies like Docker, developers and operators can now benefit from the capabilities of a large-scale cloud infrastructure like AWS while relying on the hosting provider of their choice.

Interestingly, even bare metal is becoming appealing given it is orders of magnitude more cost effective than virtual machines in the cloud.

Enterprises have been told, repeatedly, of the advantages of the public cloud, without ever being able to truly benefit from it because of various barriers (legal, security etc.). After a decade of promises from cloud providers along with expensive investments in on-premise storage appliances from NetApp, Dell, EMC etc. enterprises are now ready to embrace container technologies.

Latest posts by Julian Quintard (see all)

Comments are closed.

Scroll to Top