Posted by on
Categories: Apache Big Data Hadoop Spark

If #Hadoop is leaving your data lake project all wet, you may be a good candidate for an emerging architectural concept called the #bigdatafabric . As industry experts explain, #bigdata fabrics bring a certain level of cohesion and automation to the processes and tools that companies are adopting as they try to get value out of big data. Forrester analyst Noel Yuhanna explained the genesis of big data fabrics in his recent report on the matter. According to Yuhanna, the gap between the expectations that companies have with big data technologies like Hadoop and the real-world challenges of working with those technologies gave rise to big data fabrics. Conceptually, a big data fabric is essentially a way of architecting a disparate collection of data tools that address key pain points in big data projects in a cohesive and self-service manner. Specifically, data fabric solutions deliver capabilities in the areas of data access, discovery, transformation, integration, security, governance, lineage, and orchestration challenges, according to Forrester. Yuhanna writes: “The solution must be able to process and curate large amounts of structured, semi-structured, and unstructured data stored in big data platforms such as Apache Hadoop, MPP EDWs, NoSQL, Apache Spark, in-memory technologies, and other related commercial and open source platforms, including Apache projects. In addition, it must leverage big data technologies such as Spark, Hadoop, and in-memory as a compute and storage layer to assist the big data fabric with aggregation, transformation, and curation processing.” Some data fabric vendors can tick off all of the capabilities in their products, while others tackle only a portion of the overall data fabric requirement. In any case, a common thread runs through all big data fabric solutions, in that they’re working toward a cohesive vision of data accessibility, while respecting the needs of automation, security, integration, and self-service. Stitching Clouds into Fabric The rise of cloud repositories also plays heavily into the emergence of big data fabrics, according to Ravi Shankar, chief marketing officer of Denodo Technologies, which was one of 11 big data fabric vendors profiled in Forrester’s recent report. “The big data fabric is needed because, if you look at the underlying problem, before big data, data was divergent and located in many different systems,” Shankar tells Datanami. “Twenty year back, it was all on-premise. In the last 10 years, it has been evolving more to the cloud. Now it’s more into big data [platforms like Hadoop]. So data continues to be bifurcated across all these different points, and each of them adds some challenge.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.