Demystifying data fabrics – bridging the gap between data sources and workloads
The term “data fabric” is used across the tech industry, yet its definition and implementation can vary. I have seen this across vendors: in autumn last year, British Telecom (BT) talked about their data fabric at an analyst event; meanwhile, in storage, NetApp has been re-orienting their brand to intelligent infrastructure but was previously using the term. Application platform vendor Appian has a data fabric product, and database provider MongoDB has also been talking about data fabrics and similar ideas.
At its core, a data fabric is a unified architecture that abstracts and integrates disparate data sources to create a seamless data layer. The principle is to create a unified, synchronized layer between disparate sources of data and the workloads that need access to data—your applications, workloads, and, increasingly, your AI algorithms or learning engines.
There are plenty of reasons to want such an overlay. The data fabric acts as a generalized integration layer, plugging into different data sources or adding advanced capabilities to facilitate access for applications, workloads, and models, like enabling access to those sources while keeping them synchronized.
So far, so good. The challenge, however, is that we have a gap between the principle of a data fabric and its actual implementation. People are using the term to represent different things. To return to our four examples:
- BT defines data fabric as a network-level overlay designed to optimize data transmission across long distances.
- NetApp’s interpretation (even with the term intelligent data infrastructure) emphasizes storage efficiency and centralized management.
- Appian positions its data fabric product as a tool for unifying data at the application layer, enabling faster development and customization of user-facing tools.
- MongoDB (and other structured data solution providers) consider data fabric principles in the context of data management infrastructure.
How do we cut through all of this? One answer is to accept that we can approach it from multiple angles. You can talk about data fabric conceptually—recognizing the need to bring together data sources—but without overreaching. You don’t need a universal “uber-fabric” that covers absolutely everything. Instead, focus on the specific data you need to manage.
If we rewind a couple of decades, we can see similarities with the principles of service-oriented architecture, which looked to decouple service provision from database systems. Back then, we discussed the difference between services, processes, and data. The same applies now: you can request a service or request data as a service, focusing on what’s needed for your workload. Create, read, update and delete remain the most straightforward of data services!
I am also reminded of the origins of network acceleration, which would use caching to speed up data transfers by holding versions of data locally rather than repeatedly accessing the source. Akamai built its business on how to transfer unstructured content like music and films efficiently and over long distances.
That’s not to suggest data fabrics are reinventing the wheel. We are in a different (cloud-based) world technologically; plus, they bring new aspects, not least around metadata management, lineage tracking, compliance and security features. These are especially critical for AI workloads, where data governance, quality and provenance directly impact model performance and trustworthiness.
If you are considering deploying a data fabric, the best starting point is to think about what you want the data for. Not only will this help orient you towards what kind of data fabric might be the most appropriate, but this approach also helps avoid the trap of trying to manage all the data in the world. Instead, you can prioritize the most valuable subset of data and consider what level of data fabric works best for your needs:
- Network level: To integrate data across multi-cloud, on-premises, and edge environments.
- Infrastructure level: If your data is centralized with one storage vendor, focus on the storage layer to serve coherent data pools.
- Application level: To pull together disparate datasets for specific applications or platforms.
For example, in BT’s case, they’ve found internal value in using their data fabric to consolidate data from multiple sources. This reduces duplication and helps streamline operations, making data management more efficient. It’s clearly a useful tool for consolidating silos and improving application rationalization.
In the end, data fabric isn’t a monolithic, one-size-fits-all solution. It’s a strategic conceptual layer, backed up by products and features, that you can apply where it makes the most sense to add flexibility and improve data delivery. Deployment fabric isn’t a “set it and forget it” exercise: it requires ongoing effort to scope, deploy, and maintain—not only the software itself but also the configuration and integration of data sources.
While a data fabric can exist conceptually in multiple places, it’s important not to replicate delivery efforts unnecessarily. So, whether you’re pulling data together across the network, within infrastructure, or at the application level, the principles remain the same: use it where it’s most appropriate for your needs, and enable it to evolve with the data it serves.
The post Demystifying data fabrics – bridging the gap between data sources and workloads appeared first on Gigaom.