DevOps Chat: Mohit Bhatnagar of ClusterHQ
This episode of DevOps Chat feataures Mohit Bhatnagar, vice president of product at ClusterHQ. We spoke about the recent survey DevOps.com did in partnership with ClusterHQ.
What are the biggest concerns regarding container deployments? Where and what is being deployed? Bhatnagar speaks about that and other topics. He is very knowledgeable about the market and what issues and challenges organizations are facing in deploying containers. He is also seeing some great results as container adoption continues to soar.
The streaming audio of our conversation is below and the transcript of our discussion is below that. Enjoy!
Alan Shimel: Hi. This is Alan Shimel, DevOps.com (and Container Journal) here for another DevOps chat. This episode’s guest is Mohit Bhatnagar, VP Products and Solutions at ClusterHQ. Hi Mohit, how are you?
Bhatnagar: Hi, Alan. Thanks for having me on the show.
Shimel: Thank you. So Mohit, we’re going to spend a little bit of time talking about a recent survey that ClusterHQ partnered with us here at DevOps.com on around container usage, persistent storage and so forth.
But before we do just a quick—maybe some of our listeners are not familiar with ClusterHQ, and in the issue of transparency and just to let them know, tell us a little bit about Cluster HQ.
Bhatnagar:Â Yeah. ClusterHQ is a startup based out of Silicon Valley in San Francisco. We are a very passionate bunch of engineers and product people who want to solve the problem of container data management. In a sense we want to do for container data what Docker did for compute.
We look at data management aspects of container as a vital part of the container evolution and that means that data management will go across the entire DevOps cycle. So the notion that containers offer a stateless service and not for stateful is a false notion and it is actually gratifying for us to see how the industry at large is beginning to come to that realization which is the reason why the company was started three years back.
So anytime you think of containers, anytime you think of intersection of that with persistent storage data that’s the kind of problems we solve.
Our most commonly known product at this point is Flocker which is an open-source software; this is a contribution we made to the community last year and since then Flocker has become the leading persistent storage volume driver, we support across 20-plus underlying storage systems such as AWS, GCE and OpenStack, et cetera.
Shimel: Excellent. And then—well, with that being said let’s dive into the survey. You know I thought there were some really interesting—and of course this was actually a second—a follow-on survey to a similar one we did last year, so to me what was kind of most interesting was looking at the year-to-year changes in some of these responses from people and seeing sort of the evolution if you will of the container marketplace.
I have my own favorites but since you’re the guest I’ll let you go first. What do you think are some of the most important nuggets for people to take away from this?
Bhatnagar: Yeah, so this is one of the beauties of surveys—they do two things: (A) they bring a statistical rigor to things that we might know or we might think are true, and the second it gives you trends in a much more quantifiable manner.
So a couple of things, first of all everyone knows that the container is a hot technology, lots of organization and developers are adopting them, et cetera, so the survey this year—actually the 79 percent of the respondents said that their organizations run container technology, but 76 percent of them said they are using it in a production environment.
Okay, so here’s the first piece of nugget for us. One is the number we are looking at is four-fifths of the people who surveyed—which is more then 300 people—said that they are using it, and this is in a production environment. Well, what does it mean?
What it means is that this is a beginning to containers becoming a mainstream technology, they are getting used in production environments ranging from stateful services or even CI/CD (continuous integration/continuous delivery) kinds of environments. So the use of containers in production environment that was the first interesting insight.
Just to calibrate, this is a significant increase from last year’s survey where approximately one-half of the respondents said they were deploying containers in production. So the fact that there is a substantial usage and that is a usage that is taking place in production—it is growing significantly fast—and what was very interesting for me is that when we asked this question last year, “How many of you will be using containers in production as a forecast in 2016?” we actually found that ease of technology adoption is actually comparable or higher than what people thought it will be, which is remarkable because quite often technology adoption forecasts into, can be faster than the reality, so in this case technology is being adopted. So that was the first major insight.
I think the one which I found it very interesting was we asked a question in 2015 and this year is, “What are your barriers to container adoption?” It was an open-ended question, people could choose from a range of options they had and what was remarkable is that in both the surveys—last year and this year’s—networking, security and persistent storage and data management came out to be the top three barriers to adoption, and they accounted for more than 55 percent-plus of the responses.
What was surprising was the order. Last year security was the No. 1 barrier to adoption, then there was networking and then there was persistent storage/data management. This year the same three categories were there but the order got flipped, and it was actually persistent storage—more than 25 percent of people said was the No. 1 barrier to adoption.
So clearly it is very gratifying in a sense because that is something which we believed in and it is good to see people beginning to realize it.
What is also very important for us is to drill down and say, “Well, what does that barrier mean and what kind of solutions are there?”
So those were the two important ones. There are a couple others; time permitting, I can drill down on those as well.
Shimel: Got it. So you know what? I’m going to give you a chance to drill down because I know time is of the essence and I feel guilty taking up our time here with kind of my views on it, so why don’t we drill down into a couple of these if it’s okay with you.
Bhatnagar: Right. So let’s talk about persistent storage, networking, security. So first of all these three are all technically hard problems and I think what happened is that as customers and developers started using containers they went and started using them in stateless services, they used them probably in a non-clustered model and they used them in environments where it was —security was either not a high concern or it was not a critical amount of data that was being included.
Well, as the technology adoption takes place several things happen: Scale becomes important. Well, suddenly the notion of networking becomes important and the system needs to understand containers as a part of a cluster as opposed to individual containers or set of containers sitting on a single host, they need to think of it across a cluster. So that means networking becomes important.
Security obviously becomes important as people start using these systems in a production environment; you want to make sure that there are—security vulnerabilities are not creeping up—and so security becomes an issue.
And then the data part of it—or data management—persistent storage becomes important because as containers get adoption they’re no longer just being used for stateless services. If I’m an organization that is building a next-generation PaaS (platform as a service) for my company I need to think of that as a large cloud.
Well, if it is a container-based PaaS it makes complete sense to actually deliver the whole range of stateless and stateful services so that I have a single environment that I’m using for managing my entire cloud. What I don’t want is certain pieces to be sitting on a next generation PaaS and other pieces—some kind of stateful services—sitting on old-fashioned, siloed, isolated clusters environment.
So then you start talking about the notion of a large pass or an environment where stateful services such as database services or for that matter, actually if you think about it, CI/CD Jenkins Master is a stateful service, so you want to make sure that you can preserve that state, and that is making the persistent storage as the key problem.
So that is an example of why networking, security and persistent storage/data management are consistently appearing as top barriers to an option.
Shimel:Â Yeah, they are.
Bhatnagar: Yeah. Our focus clearly is data management and persistent storage and we are passionate about it.
So let me switch from the problems to the kinds of solutions and approaches that are there.
So the first thing was the notion of—Docker has a notion of volume—and what ClusterHQ did working with the community is we made a contribution last year to open source where we brought the concept of volume drivers—and Flocker is the leading volume driver—and what it does is that traditionally when a container had a volume associated with it and that container died or that host died the container’s associated bits were lost.
The notion of volume driver and what it does is that in a shared-storage context if a container moves from host one to host two the volume that was associated—and consequently the data that was associated with it—gets unmounted and remounted so that the storage persists.
And the beauty of this solution is that Flocker integrates with Kubernetes, Mesos, Swarm or any of the orchestration frameworks as well as Docker Compose and Docker Registry in the north, and with shared storage in the south—and the shared storage can be a public cloud—as I mentioned, it could be AWS, GCE, or it could SAP Cinder, or it could be a whole range of partnerships we have done across EMC, HP, NetApp, PureStorage, Hedwig, et cetera.
So what this means is a customer can actually take any underlying storage infrastructure, any of the orchestration frameworks, and we can actually ensure that Flocker can provide persistent storage across these environments.
So that is the first step in the solution that we are bringing to the market.
The other piece which gets very interesting is—well, think about microservice architectures. In a 12-factor application developing model microservices are being developed where you want to separate the stateful component from the stateless component, but guess what? During the developing cycle, either as an individual developer or the QA or the CI/CD or staging, testing needs to be done against a known test fixtures or state.
Well, what if we can actually allow those states to be captured? And in sort of trying to re-create the test from an issue or team one doing testing using one set of test fixtures and the other team something else, we allow that state, if you will, to be—or the data—to be managed in a manner similar to what the code is managed; so the notion for data becomes very powerful. And I now have Github for my code, I have Docker Registry for my Docker components, and you have a git for data—which is one of the capabilities we are building—where the state is captured.
Now suddenly a combination of code plus Docker Registry plus the data can allow the development teams or the CI/CD automated tools or staging environments to be able to actually move—to use the data which is appropriate for the development process.
So the notion of persistent storage on the Ops side and the notion of git for data and a concept of volume or data hub for the Dev community becomes a very powerful set of tools so then solve the problem of data management and persistent storage.
Shimel: Got it. So Mohit, as I warned you before we started recording our issue always comes down to one of time with these podcasts and that’s why I hesitated to jump in with a lot of my thoughts, only because I knew time was short—it always is.
But let me—we have a minute or two left on our time here and I want to try to pivot a little bit and say, okay, let’s say when we do the third iteration of this survey next year, what—and I realize it’s a little crystal balling—but what do you think the biggest—what’s going to jump out next year?
Bhatnagar: Yes, that’s a great question. Love it. So I think one thing that will happen is that I’m expecting Alan—and I would love to hear your thoughts‚is that the number of people who said that they are running containers which is in production today—76 percent—that number will continue to increase; that’s one thing that I expect to happen.
Secondly, this survey today said that the number of companies that are making financial investments was 52 percent—where the majority of them did that only in less than a year. I expect a lot more of those people will start investing in the container-based technology.
I think what I will expect—and this is going to be interesting—I do think that storage, security and networking will remain the top three areas. What is going to happen, however, is the problems that we are being solved are going to be addressed.
So what I would expect in the container data management and persistent storage space is the survey to show that companies such as ClusterHQ, working with Docker or Kubernetes or Mesos and working with our storage partners indeed solve a large number of problems in persistent storage.
But people will then—obviously, as happens with technology—they’ll be expecting the next set of capabilities, such as, “Well, what does—in a container stack—what does workload movement mean? If I want to move my work load from data center one to data center two, how do I actually make that movement happen?” And it’s not just a movement of, “Oh, I can move the Docker compose file.” Well, you need to take the elements which are with and Swarm, but that appropriate data.
Likewise, what if we can—as we talk about the notion of git for data and volume hub—more sophisticated set of capabilities will be needed so as to do automated integrations where a staging environment can instantly be duplicated during the testing purposes and the staging purposes, and when those things are done they’ll automatically shut down.
Now think about the cost saving, flexibility, agility, the speed of development.
So I expect the categories in terms of barriers will still remain—persistent storage data management, networking, security—the order may move in bits and pieces here and there but what will really happen is the kinds of problems that we are talking today many of them will get solved and a new class of problems is what the vendors like us will be solving.
Shimel: Got it. Well, Mohit unfortunately we’re over the 15 minutes—that always happens to us—but we’re going to have to call it a wrap on this episode of DevOps Chat. Maybe we can have you back on and we’ll talk more about—I’d love to talk more about the container market and the container space as a whole.
Bhatnagar: Would love to. It’s a highly dynamic space and let’s discuss thoughts and hear other people’s perspective.
Shimel: Absolutely. Look, we could do webinars and stuff with that. Well, we’ll discuss it offline, but for now we’re going to have to call it a wrap on this episode of DevOps Chat.
Mohit Bhagnagar, VP Products and Solutions ClusterHQ, thanks for being our guest today.
Bhatnagar:Â Thank you Alan, my pleasure.
Shimel:Â Okay. Continued success to you and the ClusterHQ team. This is Alan Shimel of DevOps.com. Thanks and have a great day.