5 Best Practices for Managing Reference Data

Article Summary: Effective management of reference data is crucial for ensuring operational efficiency and enhancing outcomes in business intelligence, data analytics, and AI initiatives. The article outlines five best practices for managing reference data, including formalizing reference data management, subscribing to external reference data, governing internal reference data, managing it at the enterprise level, and versioning reference data to maintain accuracy and relevance across the organization.

Last Updated: September 22, 2024

Managing reference data poorly can have a profound impact on operations and business intelligence, data analytics, machine learning (ML) and artificial intelligence (AI) outcomes. In short, reference data needs to be managed, since there is a lot at stake and the organization depends on it. Here are 5 best practices for managing reference data:

1. Formalize reference data management (RDM)

Often, reference data is not maintained if there is no accountability and ownership determined. Usually the IT team performs the initial load into the application or central repository (in cases such as a onetime data quality effort, adoption of a new application, or a data integration project), but then they do not worry about keeping it up to date. Business users lack the time, resources, and even understanding of tracking the changes required and keeping it up to date. Usually, this management is done at the system level or report level and not enterprise wide.

This challenge can be addressed in multiple ways. The best-case scenario is to create an Enterprise Reference Data Unit (also called reference data working group, reference data owners, reference data stewardship team, reference data committee or council, etc.). This central group oversees reference data management across the enterprise, supporting the business needs accordingly. Another way is to ensure that reference data management is in scope of data governance or that the RDM responsibilities are included in data stewards’ job descriptions.

If this is not included in the data governance scope, make sure its management is approved as a program and not a one-time project.

As a default, it’s recommended to first consult reference data provided by 3rd party standard authorities, such as ISO, SWIFT, ACORD, ICD, and so on. Defaulting to these saves a lot of time internally, excluding the resources required to discover, understand, adopt, and sometimes purchase this data. Organizations can be confident they are adopting the standards from a reliable source.

Adopting this data does not ensure a subscription to any updates so it is important to ensure any changes done by the 3rd party authority are quickly determined and integrated into the organization’s master tables. For example, country codes change on an average of 3 times per year. Sometimes a paid subscription is available which would allow instant notification or integration with the technical environment via API. Remember that the ramifications for updating this data don’t stop at the table level, but carry on within documentation and training, metadata and definitions, as well as anything hard coded, such as forms, reports and dashboards.

3. Govern internal reference data

Internal reference data are completely specific to the enterprise. In contrast to the external reference data, the team responsible for the internal reference data must focus on the development of good reference data. Managing this data needs a defined process, guidelines, and ownership. Reference data management needs to be formalized – see 1st best practice- and the assigned individuals responsible for this data need to work with the:

data governance team in ensuring the development of standards and guidelines, ownership identification, and following the operational model
technical team (IT) to be aware of any technical considerations and ensure the IT delivery aligns with the reference data standards
executives for acquiring support and resources
business stakeholders to acquire definitions, use cases, information and feedback, as well as adoption

4. Manage reference data at the enterprise level

Reference data management needs to support multiple domains to avoid reference data silos. This outcome is dependent on the chosen data governance operating model. As this becomes widely used across different enterprise systems, distribution of updates and new entries needs to be addressed. A data hub is a common solution since it provides a central location from which any application can import its reference data from. This import can happen automatically by pulling the data through a API subscription type, or an automatic read/write function, or manually by a batch or static file. If the import occurs manually due to technical limitations, communication of any updates occurring at the data bus level should be passed on to the data stewards within the impacted business domains and to the data custodians of the impacted applications.

5. Version reference data

As reference data has a wide prevalence across every organization’s systems, the RDM team must ensure it is current across the enterprise, and that the team can keep track of its changes. This is particularly useful in data integration projects, addressing master data management needs, and business intelligence or data analytics deliverables. One needs to know the effective date of the change and what was it changed from to address all of these. The more metadata it is provided, the better.

Conclusion

Reference data management is a complex function with several processes and a variety of teams involved in its success. Managing reference data is an essential activity to ensure the continued viability of data used for operations and decision.

Article originally published on https://www.lightsondata.com/

George Firican

George Firican is an exuberant advocate for the importance of data, a frequent conference speaker and a YouTuber. George has been ranked among Top 10 Global Thought Leaders and Influencers on Digital Disruption and Top 15 on Innovation and Big Data. His innovative approach to data management received international recognition through award-winning program and project implementations in data governance, data quality, business intelligence and data analytics. In his spare time, he loves to create informative, practical and engaging educational content, and help organizations get more visibility on social media. George is also the proud founder of LightsOnData.com. He holds degrees from the University of British Columbia in Computer Science and in Business Management