Big Data 2018: Cloud storage becomes the de facto data lake
As we bat cleanup with our 2018 predictions for big data, we’re going to pick up where Big on Data bros Andrew Brust and George Anadiotis have left off. Yes, it’s getting harder and harder to stay oblivious to the impact of AI, with implications from the geopolitical to the mundane and the positively creepy. It’s getting harder to miss the growing impact of IoT on everything from our homes to the way hospitals deliver care, autonomous cars are driven, factories are run, and smart cities are managed. And the arrival of GDPR, which will start taking effect in 2018, is forcing the issue for organizations the privacy and national sovereignty implications for the data sitting in everything from traction databases to data lakes and cloud storage. But beneath the surface, we’re seeing the beginnings of tectonic shifts in how enterprises manage their cloud, streaming analytics, and data lake strategies.
For our look ahead, we’re focusing on how the data is being managed. Rewind the tape to a year ago and we stated that “increasingly, Big Data, whether from IoT or more traditional sources, is going to live and be processed in the cloud.” Last year, we forecast that 35-40 percent of new big data workloads would be deployed in the cloud, and that by year end 2018, new deployments would pass the 50 percent threshold.
Our predictions weren’t far off the mark; Ovum’s latest global survey for all big data workloads shows that 27.5 percent of them are already deployed in the cloud. And according to Ovum research, big data is hardly an outlier for enterprise cloud adoption, which ranges from 26-30 percent across different workloads.
By inertia, most organizations have ended up with the same polyglot environments in the cloud that characterize their data centers. Most organizations use more than one cloud provider, just like on premises where they often have one of everything. Like history repeating itself, this is the consequence of a combination of top-down policies mandating a corporate standard, and departmental decisions made for expedience.
So, just as your organization might have SAP for its accounts payable, different segments might have Workday for HR or Salesforce for CRM. Or maybe they have multiple ERP systems that have not yet been converged as the legacy of M&A. In the cloud, your corporate email system might be on Office 365 while departmental IT groups use AWS for DevTest, and corporate marketing uses Google Analytics.
In 2018, we expect the early majority to start formalizing multi-cloud strategies as cloud evolves from a target for running standalone workloads to enterprise-critical applications. So, as we saw cloud deployment as the sleeper issue for big data in 2017, multi-cloud will become the looming issue for 2018. That’s the back story for why Oracle doubled prices for running its database on Amazon’s RDS service and why the Aurora OLTP database is now Amazon’s fastest growing service (succeeding Redshift before it).
More than a reactive decision about the fears of cloud vendor lock-in, multi-cloud decisions will be about platform choices. When you decide to run an Oracle database or Hadoop cluster on EC2, that is a tactical choice that can revisited should Azure or Google Cloud change their pricing.
When you choose Aurora, Cosmos DB, Google BigQuery, Oracle Autonomous database 18c, or the IBM Analytics system on the IBM cloud, you are not just choosing your cloud, but your data platform. You are choosing whether the value-add of running a data platform that is native to a specific cloud outweighs concerns over relying on a specific cloud provider. It’s like making your Oracle or SQL Server platform decision all over again.
And that’s why Amazon and Microsoft, are offering database migration services almost as freebies. They want your enterprise database. We also expect Google Cloud, Oracle, and IBM will actively promote loss leader database migration offerings in the coming year, and why more enterprises will elevate to the front burner the issue of how many eggs to put in each cloud basket.
Multi-cloud strategies will also figure heavily in organizations determining how to manage the reality of hybrid cloud. Just as few organizations of any size are likely to rely on a single cloud provider, few organizations (apart from startups) are likely to go 100 percent cloud. The transparency of maintaining sensitive customer records on premise either by design or because of data sovereignty issues while running analytics in the cloud will become major factors in cloud platform selection.