3 Pathways to Data Fabric Realization
For decades we have witnessed data being compared to commodities like oil, gold, and plastic. Additionally, the endless number of buzzwords that have come into use over the years has increased the complexity of data management. Despite our aversion to buzzwords, we acknowledge that the concept of data fabric is a game changer for modern data architectures. A modern data architecture is the backbone for a sustainable data-driven organization. Reusable data production facilities are essential, especially when it comes to data product management. Yet many CDOs / data leads struggle to define the critical technology capabilities needed to future proof their data architecture. Data fabric has been proposed as way to enable federated data governance. It privileges federated computing resources over monolithic data lakes / warehouses / lakehouses.
Data-driven companies have deeply rooted the idea of data as a product in their organizational culture. Their goal is to democratize data and give power to business departments to govern their data and provide consumable data products to other units. This cultural change is further accelerated by self-service big data and analytics, as well as AI, since individuals now consume information and data whenever and however they want to. While data mesh describes the organizational perspective toward modern data architectures, a data fabric encompasses the technological aspects (see Power to the Business Users with Data Mesh).
What Is a Data Fabric?
A data fabric describes a modern data architecture with the required capabilities that encompass composable technologies and provide services across hybrid multi-cloud environments. In simple terms, a data fabric is a net that spans multiple data sources and applies machine learning to provide access and meaning to distributed data. While companies realize the challenges and shortcomings of monolithic data lakes / warehouses / lakehouses as a single source of truth, a data fabric enables the management of the data where it resides. The core engine of a data fabric is metadata. Gathering, analyzing, and enriching metadata, paired with the ability to automate these processes powered by machine learning, allows for the analysis of the underlying data without the need to move and transform it right away. A data fabric is not provided by a single vendor or solution; it is a composable, flexible, and scalable architecture.
Advantages of a Data Fabric
Organizations can benefit from data fabrics in multiple ways, including enhancing organizations’ architectures through increases in efficiency, making organizations more scalable, enabling better integration, and helping organizations gain more control and agility. However, the ultimate goal of a data fabric is to maximize the value of data and accelerate digital transformation through data democratization. This means giving power to business departments to govern their data and provide consumable data products to other units. Before data fabric, the gap between data and business users as data consumers had been artificially closed by expert data teams. Without their intervention, business users were unable to use, understand, or apply data.
Pathways to data fabric realization
There are three pathways to building a data fabric. The right pathway depends on the organization’s previous architecture decisions and use case. These factors determine the prioritization of capabilities. With a case-by-case approach, you ensure faster results and keep the motivation level of your team high. Ideally, you can start with the most advanced business department to serve as a leading example.
Path 1: Start with the basics
Don’t get overwhelmed with the complexity of building a data fabric. Lay the foundation by starting with the collection of various kinds of metadata. Previously unused metadata (passive metadata) gets activated by putting it into context. Metadata enrichment should be conducted following the governance structures of your organization (domain-driven). This pathway is the best choice for companies that struggle to find, inventory, search, integrate, and deliver data from heterogenous sources.
Path 2: Explore new insights
With the ambition of investigating new unstructured or multi-structured data sources, organizations should build capabilities in machine learning to enrich their metadata semantics. This will allow them to explore data sets where the schema has not yet been assigned. Activated metadata knowledge graphs can present multi-relationship data and allow the enrichment of data models with semantics. The semantics layer of knowledge graphs adds additional context and meaning to the models.
Path 3: Increase productivity
Organizations that struggle with too much cost or productivity loss from their data management should start building automation capabilities. Pipeline preparations can be fostered by strengthening data integration with machine learning, recommendations, and self-service data. The implementation of abstraction layers allows business users to self-generate data integrations that are supported by automated recommendations for next-best integration jobs. These integrations can be ETL/ELT, data replication, data virtualization, or stream data integration.
The key to becoming a data-driven organization is to empower business users and thus democratize data. Other areas of IT management have adopted a more business-oriented approach for decades. For example, operating systems are aligned with business capabilities, thus easily justifying their need and existence. Similarly, efforts in agile IT management focus on products with the end user in mind. Data-driven organizations understand data as a product in their organizational culture. All of this has given rise to the emerging concept of a data fabric. Advancements in metadata management have further accelerated the transition toward modern architectures governed across distributed data sources.
Continue reading:
For decades we have witnessed data being compared to commodities like oil, gold, and plastic. Additionally, the endless number of buzzwords that have come into use over the years has increased the complexity of data…