If the promise of effective data sharing is to reach its full potential, definition of the data sharing process and solutions must be clearly defined and commonly accepted.
Data sharing was recently identified as a top transformational trend by Gartner in their data and analytics programs and practices hype cycle. Gartner research conducted over the last two years shows data sharing can provide exceptional value to organizations and business leaders, especially CDOs, who embrace it as part of their data strategy.
As transformational as data sharing may be, it’s currently not well-defined. Gartner calls data sharing a “business KPI” that promotes data reuse – but provides no specifics on data sharing use cases. This means data sharing could encompass any situation where data moves between people, databases, or companies. And if it’s truly just a key practice indicator (KPI), then one could argue that data sharing doesn’t require any discipline for managing a new or novel business process – which surely cannot be true given the stated potential benefits.
This rather squishy definition of data sharing was highlighted in a recent blog to illustrate where data sharing spans every potential form of data distribution or exchange, from for-profit data monetization to “data for good” and everything in between. “Sharing” can include sharing data internally, or outside a company, so a traditional ETL process between two database tables could be considered “data sharing”, which is most certainly not novel, and most certainly can’t represent the transformational nature of data sharing that Gartner touts in its recent hype cycle.
Problems with Poor Definitions
Many problems arise based on poorly defined data sharing practices and solutions. First, and most importantly, if data experts can’t describe data sharing (or quantify it), it will be extremely difficult for companies to justify investments in the process and technologies. It would be like asking a company to invest in World Peace. On paper it may sound like a great idea, but if one does not know what it involves or how to define it, solutions will be impossible to define, and proving results will be hopeless. Something that means everything ultimately means nothing.
A second problem with a poor data sharing definition is that it allows software vendors to say they can sell a sharing solution. Data sharing, much like Master Data Management (MDM), is a software-enabled discipline. However, thanks to a horrible definition, no single solution yet exists in the market to support it. That’s not stopping several large data and analytics vendors from marketing data sharing as a key function of their solutions – many of whom are in the cloud data warehouse/data lake markets. Creating a common data store in the cloud with an access/permissions layer on top t is an extremely basic approach to sharing that may provide some value by negating the need for complex integration processes, but simply having access to data is a LONG way from enabling a true business transformation.
This doesn’t mean that software doesn’t have a role to play here – it most certainly will. However, if the definition of data sharing remains nebulous, it’s impossible to drive clarity on how the software market can deliver a true data sharing solution. As the current definition suggests, any software that allows data to move from one entity to another could be considered a sharing solution, even email.
As murky as the data sharing landscape may be, its business benefits are very real – and have been for decades. A well-known example can be the various data consortiums that facilitate global trade or credit (such as data standards around product data, including UPC codes). However, if data sharing is to reach its full potential, the definition of the data sharing process and solutions must be clearly defined and commonly accepted.
Guiding Principles for Effective Data Sharing
Before providing specifics on a new definition, it’s important to highlight eight guiding principles that must be supported to describe what data sharing is (or isn’t):
- Organizations will share data with the expectation of receiving something of value in return. This means that donating data without an expectation of any value exchange or realization is not data sharing, but rather, “data for good”, or data charity.
- Data sharing requires that at least two parties are involved in contributing/sharing data, where both parties benefit from sharing. It also means that this one-way flow of data, from a single data producer to an individual consumer, is not data sharing. This flow from a data source to a destination, or from a producer to a consumer, is what it’s always been – a data transfer (or a data integration). Data sharing requires two datasets to be combined (physically or virtually), to allow for the combination of data to generate additional insights and achievable value.
- Value from data sharing can be accelerated through the creation of network effects that are created by the combination of data and metadata from multiple contributors/sharers. The more participants in the network, the greater the potential network effects.
- At scale, data sharing creates data sharing ecosystems. These can enable additional benefits, surpassing the stated value from original expectations, and should be evaluated to revise the original approach.
- Data sharing is an IT-supported business discipline, and not simply a KPI or the “plumbing” between two data sets. Since data sharing is a discipline, that means there are proven ways to optimize the benefits of data sharing by improving the underlying business and technical processes supporting it.
- Data sharing results in the creation of a shared data asset. In other words, data sharing creates something new to the benefit of all contributors.
- If data sharing happens between / across corporate entities, legal agreements which define the ownership rights of the new shared asset should be required. These agreements should protect each party’s interests and may be brokered between individual participants, or through an intermediary.
- The value of data sharing is optimized when all parties participate in a shared data governance framework, which will operationalize any constraints of the legal agreement (including data access, usage limits, and redistribution), and enforce any data quality or governance policies. A lack of this framework will severely limit the ability for parties to realize value from sharing.
With these guiding principles in mind, the following definition for data sharing emerges:
If data sharing represents a transformational business force, then it must be defined in a way that represents something beyond the simple distribution or monetization of data, since companies have tried, and typically failed, for many years to realize transformational value through data marketplaces or other data monetization schemes1. One-way flows of data between tables, people, databases, or companies have been supported via legacy data pipelines for as long as computers have existed, and certainly do not represent anything worthy of the “hype” that data sharing has garnered.
Focusing a data sharing definition on an act of reciprocity for multiple benefit will enable all participants, from vendors to data leaders, to better understand what’s required to move beyond the hype and into true business value for this essential function.
1. The exception here is for companies where their core business *is* data monetization.