Self Service Data: Democratization or an Illusion

Article Summary: The article discusses the critical importance of self-service data management and the concept of data democratization, which aims to provide users with seamless access to the data they need while maintaining necessary controls for data security and quality. It emphasizes that a successful self-service data environment requires collaboration, clear strategies, and robust governance frameworks to ensure high-quality, accessible data.

Last Updated: September 21, 2024

Although much has been written about self-service capabilities, most of the content is focused on the positive or potential benefits. However, much misunderstanding remains about the concept of self-service, especially for data, and how often IT ignore (or is unaware of) the users’ experience and needs, or the need for consistent data management practices.

What is data democratization?

The simple answer to that question is: “bringing data to users”. It means providing any data to any user or perhaps, all data to all users. Going one step further, democratizing is a way to enable users to seamlessly access the data they need without major bottlenecks, in a timely fashion, and get real value from it by applying the necessary analytics.

While doing this, there’s also the need to establish the right controls to ensure data security, bringing robust data catalogs to help the users navigate and find the data they are looking for, and guaranteeing a certain level of quality for the data. These points may seem to be easy (or at least not too complicated), but the additional questions to ask are: Why is this strategy failing repeatedly? Why there is an illusion of self-service data management that never comes to reality?

Failed experiences gathered from customers

There are three major topics that usually appear when discussing self-service data with organizations.

Wrong paradigm from IT. Too many times IT does things “just because”. It is not unusual to hear sentences like “Self-service access to data is what every organization is doing so we need to do it to be perceived as a modern IT organization”. Meanwhile, users are still struggling to get the information they need.

Many IT organizations don’t understand that for users to be served effectively, IT must change their processes to a collaborative approach where the users can “do what they need, while IT cares for their interests and needs”. Instead, most IT organizations allow users “do what they want, because IT doesn’t care”. For example: IT brings a (shiny, new) Data Catalog tool to enable searching – but they do not provide any control or data governance. Sometimes, IT simply lets users bring their data to a production box to make their own analysis without further guidance. This may look like self-service, but it is probably the opposite if users can’t get value from it or leverage certified/governed information.

These situations happen when IT is too focused on tools and technology instead of being accountable for data strategy and data architecture. Tools are useful in helping to narrow the data accessibility gap but are useless if there’s no serious practice for governing and managing the data proactively. Without a formal data strategy that includes guidelines for product selection and implementation, the organization must “reinvent the wheel” each time the user needs to create a new analysis. Separation of production and discovery data is also key to success as this enables a faster way of developing insights that can be operationalized only after full value realization is confirmed.

No data governance. Having a catalog or data dictionary tool is different than having a data governance practice in place. This commonly seen issue is tightly related to the previous one and could lead to a massive failure if found together. What value can anyone get from the data if there’s no assurance of what it means or contains? Even further, having the same KPI built from different sources of information is not an uncommon issue which should be prohibited in this era, but it’s still easily found.

The key foundation for enabling self-service is having curated and properly governed data that is known and that can be trusted. No user wants to have information that has no value, no matter how easy it was to access it. A worse scenario occurs when the users don’t know where to find the data and struggle for days trying to reach the repository. Eventually, they discover there’s no guarantee that the repository contains the data needed because there’s no definition, or other metadata to support understanding what is presented. Metadata is key to success, and metadata can (and should) be hosted in a tool. However, the practice of getting and supporting metadata management goes beyond any tool and is the basis for data governance.

Lack of efficiency and waste of money. When there is no data governance and no data integration, things keep being redone each time a user needs them. Loading new data, building a data set, copying rows from one place to another or even having the same data duplicated on multiple repositories are common challenges without effective data management. Large sums of money are wasted each year due to the lack of an integrated solution that can accommodate the needs of different lines of business. Having a practice, supported by tools, that can federate or virtualize data is useful although not sufficient. Some organizations may want to integrate data and put it in action by enabling different views. Cross-domain integration and cross-platform integration help in building a more efficient ecosystem by allowing the reuse of the information. Self-service, supported by enterprise data management, is key to increasing value and efficiency, not the other way round.

What should be done?

Having the ability to load some tables in a repository connected to a catalog tool and a dashboarding tool is clearly not all there is to self-service. Both IT and business units must collaborate to prioritize which information is key to enable the company analytics, bringing the appropriate data to a centralized repository where it can be curated and easily accessed while being governed. Approaches like Data Mesh are getting more attention because they can get the data quickly into action but should be interpreted based on the enterprise’s data strategy and data governance program.

Conclusion

Users may find it easier to fulfill their analytics needs if they do not need to recreate the same process to access and understand the data. In addition, effectively governed processes can allow for experimentation to be applied on the data based on the foundation already built. Such a balance is needed. It doesn’t mean that it is right to spend months or years trying to create the perfect data warehouse without tangible value. It is also wrong to have only sparse data marts (or a data mesh) in a crazy ungoverned environment or a data lake turning rapidly into a swamp. IT should leverage tools and practices to allow a better user experience for discovery analytics based on both centralized and decentralized data, but everything relates to a good data strategy. The key to self-service data is not the tools or technologies. It’s the process that leads to quality information that is simply to access and analyze to get value from it.

Sebastián Barreda

Sebastian Barreda is an architect with many years’ experience in data and analytics solutions. As a consultant, Sebastian has worked in many roles, from ETL and BI development to requirements gathering and logical data modeling, gaining significant practical experience. As a solution architect, he analyzed customers’ business and technical requirements to translate them into products, solutions, and services, understanding the key link between business needs and technology enablers. Sebastian delivered data strategy advice for several industries including retail, manufacturing, communications, media & entertainment, and banking. He holds a bachelor’s degree in Computer Science from UADE, Argentina.