What is a Data Product?

To unlock the full potential of data, organizations are looking to apply product management practices to make their data assets consumable. These organizations aim to increase the utilization of high-quality (trusted) data sets and the methods of analyzing them. The idea behind managing data as a product is to generate more value from data while ensuring a high level of sustainability through efficient and effective product design and the reuse of production facilities. This entails the development of more compelling data products with reusable patterns. Organizations typically apply either a gradual or big-bang data strategy. In a gradual approach, individual teams build up their data and technology in isolation, which results in duplicate effort. In a big-bang approach, the data and technology architecture are built more broadly but are not aligned to specific use case needs. Applying product thinking  can help in creating reusable data assets while building sustainable production facilities.

Data Product Definition

A data product is a reusable data asset that makes a trusted data set or AI and analytics method accessible to any authorized data consumer. A data product comprises one or more digital assets or services that support transactions between data product owners and data consumers, as well as the ongoing consumption of the assets. The transactions are controlled and scalable. Data products can vary in terms of combination of assets and digital format used. They can be static or updating and of any size or volume. Some data products incorporate AI and analytics while others do not; thus, some organizations use two terms: data products (data sets suitable for reuse) and analytics products (which incorporate analytics or AI methods to analyze the data). Our own definition of data products includes both data and analytics/AI, but if an organization is clear on its terminology there should be no confusion.

Examples

Some examples of data products are data sets (tables, columns, views), reports, dashboards, data streams, data feeds, and APIs. As noted above, data products may include code or data models, or AI or analytics models that can be embedded into consumers’ workflows.  

Benefits of Data Products

The aim of a data product is to reduce the time to value and cost of ownership for the data consumer, while providing the data product owner with control, auditability, and ease of receiving feedback. Organizations involved in data product management are able to build high-quality, democratized data assets, which results in improved efficiency and fosters collaboration. Teams that use data products spend less time searching for data, ensuring data quality, building new data pipelines, and making decisions. These efficiencies become significant when added up across an organization’s data ecosystem and life cycle. Additionally, data products speed up time to insight because they can be reused and repurposed. The overall effect is to increase trust in an organization’s data.

Data Product Characteristics

A recent Forbes article by Sanjeev Mohan adroitly defined 5 characteristics of data products (see below).

1. Discoverable

One goal of data products should be reusability. For example, if an organization has invested to develop a cross-functional customer-360 data product, then it should be leveraged by various departments. For this to happen, products need to be stored in a registry with adequate metadata description so that users can easily search.

Data catalogs have been used to link technical and business metadata while providing capabilities like lineage and integration with data quality, security and BI tools. As data catalogs are a single pane-of-glass to discover data, they should also be extended to include data products.

2. Quality

There is no bigger kiss of death to the adoption of data products than the loss of trust in the information’s veracity. As a data product collates data from various sources to provide a value-add, domain-driven decentralized data quality rises as a key data product consideration.

The data team must invest in modern data quality approaches to detect and fix anomalies before productionalizing data products. Data quality should be treated as a business initiative with its primary focus on context, instead of technical dimensions.

3. Secure

Self-service analytics adoption requires security across two dimensions: dynamic access and authorization to only the right people, and ensuring adherence to data privacy standards, such as HIPAA and GDPR for sensitive, personally identifiable information (PII).

The principles I described in a previous data security modernization article also apply to data products. Data security products control access and allow different consumers to see different results from the same data product because they enforce specific security policies to protect sensitive data and meet data sovereignty laws.

4. Observability

Unlike software applications, data constantly changes. These changes emanate from various sources and SaaS applications used to build the data products with no warning. These “anomalies” may pertain to changes in schema, late and out-of-order arriving data or data entry errors. In addition, there may be breakdowns in the pipelines and infrastructure that may cause some tasks to fail and go undetected for a long time.

As a result, it can be helpful to invest in data observability tools. Their capabilities can include automated and proactive discovery of anomalies, root cause analysis, monitoring, notifications and recommendations to fix anomalies. The end result is higher reliability of data products and expedited remediation of errors.

5. Operations

Good data skills are hard to find and architectures are becoming ever more complex. Mature organizations should adopt a factory-style assembly line for building and deploying data products to increase agility of decision-making.

DataOps has evolved as the necessary capability to deliver efficient, agile data engineering. Its many features include automation, low/no-code development, continuous integration, testing and deployment. The end goal of DataOps tools should be to speed up development of reliable data products.

Recommendations for Data Product Management

1. Develop empathy for your data customer – understanding customer needs is essential for introducing compelling data products. Data must be fit for purpose and meet quality constraints to realize the intended use cases. The business context and customer (data user) needs must be well understood to derive quality requirements. Dimensions for data quality are accuracy, completeness, timeliness, consistency, integrity, reliability, uniqueness, and accessibility (see What is Data Quality and Why is it Important for Business?). For example, while a data set may meet the requirements for a business use case related to shipping, because it contains required location information in the form of a postal address, it may not be appropriate for use cases requiring more precise information in the form of customer geographic position. Developing empathy for the data user and analyzing the use case allows the definition of fitness for an intended purpose.

2. Allow data product customization – Empowering customer to give their flavor to the final product is common for products such as cars and trending for customization of sneakers. Similarly, data consumers should have flexibility in the design of the final product to make data fit their specific needs. For example, a data set should be applicable to multiple use cases and with this compatible to a variety of end systems such as business applications, advanced analytics, reporting or external sharing. Additionally the final analysis of the data set may vary based on business context and use case.      

3. Design sustainable data products – Sustainability is more relevant than ever and it should be for data products too. Each product has a design that determines its functionalities, performance and cost. For data products the production and maintenance efforts are decisive for efficiency and effectiveness. For example, a highly customized data set that is only fits one use case while requiring high maintenance (cost exceeding value of output) would need a redesign. Levers could be to make the data product applicable to more use cases (and end systems see above) or to find efficiencies in the data production and maintenance process.

4. Leverage data product families – Successful fast moving consumer goods companies demonstrate the elevation through product variations e.g. if Coke doesn’t server your need, take Sprit, Fanta or Mezzo Mix. Similarly, data products should evolve to product families. If there is a certain demand for customer sales data, a variety of sub data sets, reports and insights could be offered under this product segment. Other product families could be created along employees, product lines/service lines, branches, and vendors to name a few.

5. Reuse production facilities and processes – Data products have (like other products) the ability to evolve to product families and with that they offer a wealth of synergies. Facilities and processes that have been established to develop the data product can be reused producing the entire product family. Finding the right balance of abstraction from specific products to create a tool set of composable units is essential. For example, while the process and facilities for capturing, ingesting and cleansing data is the same for a product family, the retrieval, distribution and presentation may very for each specific product. The analytics models are developed and can be adjusted with minor effort to cover additional use cases.

6. Manage the data productization process – to ensure high quality products it is important to have a clearly defined production process with dedicated roles. The data product manager is an emerging role in the data and analytics domain. Data product managers need the ability to manage cross functional teams for the development and deployment of data products. They need some technical skills to design the data production process and business skills to communicate effectively with business leaders. A recent Harvard Business Review Article addresses this topic. Beyond the dedicated roles it requires funding, best practices, performance tracking and quality assurance.

7. Continuously enhance data products – Do not ignore the fact that data products have an entire life cycle too. Like other products, data products undergo several life cycle stages (introduction, growth, maturity, decline). Constantly enhance the data product according to its stage and deploy regular updates. For digital products, agile methodologies emphasize the importance of rapid development of first (minimum viable) products to test their acceptance by customers and to constantly enhance with new versions. These learnings should be applied to data products as well. 

Fostering Data Product Collaboration

The purpose of managing data as a product is to connect the data provider and consumers by making data findable, understandable and accessible. First and foremost, the ambition should be to foster collaboration and establish a data-driven culture. A data product collaboration platform like Assefy cultivate data as an asset; allow data collaboration; and make data products findable, accessible, and understandable. The core components of a data product collaboration platform incorporates a data inventory for data governance and metadata management, as well as an access layer via the storefront, exposing data products and connecting people.

To unlock the full potential of data, organizations are looking to apply product management practices to make their data assets consumable. These organizations aim to increase the utilization of high-quality (trusted) data sets and the…