Data Governance in the Digital Age

Initially developed with an IT-centric approach focusing on hierarchical structures in the 1980s, data governance has grown in importance as data has become more and more integral to a company’s success. “Data is like garbage. You’d better know what you are going to do with it before you collect it,” the American humorist, Mark Twain, once said. Although he wasn’t talking about the 0 and 1 bits of data we know today, his words still ring true — you better know what you’re going to do with your data if you’re taking the trouble and paying the expense of collecting it.

From the digital transformation to the evolution of big data to the current frenzy about AI, machine learning, deep learning, and generative AI, data is the lifeblood of a corporation. Get data right and your shareholders will be handomsely remunerated. Get it wrong and the CEO’s job is jeopardized.

According to the 2024 Microsoft Data Security Index annual report, “On average, organizations are juggling 12 different data security solutions, creating complexity that increases their vulnerability. This is especially true for the largest organizations: On average, medium enterprises use nine tools, large enterprises use 11, and extra-large enterprises use 14. In addition, 21% of decision-makers cite the lack of consolidated and complete visibility caused by disparate tools as their biggest challenge and risk.”

The only thing that can address this level of complexity is a well-governed data governance (DG) program, but what exactly is that?

2024 Microsoft Data Security Index Report

“On average, organizations are juggling 12 different data security solutions, creating complexity that increases their vulnerability.”
9
Medium Enterprise security solutions
11
Large Enterprise security solutions
14
Extra-Large Enterprise security solutions
Source: Microsoft Data Security Index annual report, 2024

What is Data Governance?

In An Overview of Data Governance, Ning Zhang and Qin Jian Yuan claim, DG “refers to the overall management of the availability, usability, integrity, and security of the data used in an organization. Since the initial emergence of data governance as an important and fundamental issue to organizations, the data governance community and researchers have published several definitions of the data governance. Although the definition of data governance is still evolving, current usage describes this discipline as being a facilitator for managers to take control over all aspects of their data resource.”

As I explain in Data Governance in Business Intelligence and Analytics:

Data governance refers to the overall management of data availability, usability, integrity, and security within an organization. It encompasses the processes, policies, and standards that ensure data is effectively managed and utilized to support the business’s objectives. It includes:

Core Components of Data Governance

Effective data governance frameworks consist of these six essential components:

  • 1

    Data Policies

    Data management rules and guidelines documenting usage, and access across an organization.

  • 2

    Data Quality Management

    Focuses on ensuring data accuracy, consistency, and reliability. This includes processes for data cleansing, validation, and monitoring.

  • 3

    Data Stewardship

    Involves assigning roles and responsibilities to individuals or teams (data stewards) who oversee data governance initiatives and maintain data quality.

  • 4

    Compliance and Security

    Ensures adherence to legal and regulatory requirements and implements security measures to protect sensitive data.

  • 5

    Metadata Management

    Maintains the data about data that defines corporate data sources, structures, and relationships to develop better data understanding and data usage.

  • 6

    Data Architecture

    Defines the structure and organization of data within the organization, including data models and storage capacity.

History of Data Governance

In the early 1980s, data governance developed as a response to corporate data management needs. Increased regulatory requirements led to a heightened awareness of data. It stoked the need for data governance frameworks as well. In my article, Evolution of Data Governance Principles with AI, I discuss the importance of regulation in the growth of data governance, specifically with the enactment of the 1996’s Health Insurance Portability and Accountability Act (HIPAA) and the Sarbanes-Oxley Act (SOX) of 2002.

These two pieces of legislation “highlighted the need for formal data governance structures to ensure compliance with legal standards in terms of data privacy, data security, and data accuracy. These regulations laid the foundations for more structured data management practices that would follow.

Once data became integral to a business’s operation, the need for regulatory frameworks, such as the Zachman Framework, increased significantly. “In the late 1990s and early 2000s, organizations started focusing on data quality as a critical component of effective decision-making. This led to initiatives aimed at improving data accuracy, consistency, and reliability. This, in turn, laid the groundwork for more formalized and far-ranging governance practices,” I argued.

The 2000s

The early 2000s saw the establishment of frameworks like centralized data governance and decentralized data governance. Best practices evolved from these frameworks. The Data Governance Institute issued guidelines and principles for organizations to utilize when building data governance programs, marking a shift from ad-hoc data management practices to structured governance approaches, I stated.

In the 2010s, big data exploded, and organizations had to face the mounting challenges that the deluge of data presented. Structured, unstructured, and semi-structured data had to be combined together to help with analytical modeling, AI, customer relationship management, and supply chain management. Companies wanted a 360-degree view of their customer to help them with their cross-selling, up-selling, and brand loyalty. This required more sophisticated data governance frameworks. Data was the new oil, and it seemed to be running everything.

“Organizations soon recognized the importance of accountability in managing data assets. Data roles, such as data stewards, data custodians, and data governance specialist, emerged. Organizations created data councils to provide corporate-wide governance over the entire data process. These councils were responsible for overseeing specific data domains, ensuring adherence to governance policies while maintaining data quality across a company’s entire IT system,” I added.

Centralized Data Governance

Centralized data governance (CDG) provides a structured approach to data management. A single, central authority or team within the organization controls all decisions and policies. The model aims to ensure consistency, compliance, and data quality across an organization. Although rigid, a CDG promotes uniformity in data definitions, quality standards, and usage across an organization, which helps maintain data integrity. It streamlines decision-making, reduces redundancy, and provides a clear framework for managing data resources, which helps with auditing and compliance.

On the downside, centralized control can cause decision-making bottlenecks. It is not the most responsive to a business’s specific data needs. It provides limited flexibility. Different internal departments may find it challenging to adapt policies to their unique circumstances. A CDG often results in the over-reliance of the central data team. This might lead to a lack of empowerment among individual departments, which could result in users ignoring data governance practices and procedures.

Transform Your Enterprise Data Strategy
  • Expert insights on data governance
  • Strategic implementation frameworks
  • Industry best practices and trends
  • ROI optimization strategies
Sign Up Now

Centralized Data Governance Case Study: Georgia-Pacific

About a decade ago, Georgia-Pacific, the consumer-packaged goods company, realized its digital transformation would only succeed if it implemented a strong data governance framework. According to a Georgia-Pacific case study, the company “adopted a centralized data management structure, bridging any gaps by creating a community among data stewards and other data owners in different divisions. Universal communications and collaboration became the rule rather than the exception within the data governance community.”

“We were very siloed within the consumer products division,” said Lindsay Savage, director of CPG Data Governance for Georgia-Pacific. “We decided to centralize our data governance decisions, bringing together the owners of the sales platform and marketing platform, as well as the reporting lead and data governance lead.”

Before the new data governance initiative, the data creation processes as well as reporting were separated across the businesses and under IT’s jurisdiction. In the new process, “A reporting lead is responsible for analysis and reporting methodologies; and the data governance lead creates and maintains data governance processes within the organization, adhering to GS1 Standards,” says Georgia-Pacific.

Financial Sector Governance

Explore specialized governance strategies for insurance and financial services industries.

Georgia-Pacific uses a Cubiscan machine that reads the weights and dimensions of each package it sends out. It audits over 100 products on a monthly basis, comparing the readings to what is in the system to ensure data is within pre-set tolerance levels. They target a 98% accuracy rate.

“With GS1 Standards in place, companies can publish product data and provide greater product transparency to information hungry customers,” claims Georgia-Pacific. Transportation costs are lower because trucks can carry more cases,” said Savage. She added, “Now that we’ve cracked the code for a strong data governance program, we are looking to apply the benefits of GS1 Standards to inventory management, the reduction of operational costs and many more future business priorities.”

Stay ahead with practical insights and expert perspectives on modern data governance.

Five Point Best Practice

Georgia-Pacific implemented the Five-point Best Practice for data governance system. This included:

  1. Adhere to GS1 standards and rules. This is the most widely used supply chain system of standards in the world, which are designed to enable efficient business communication between trading partners.
  2. Assign data owners company wide.
  3. Designate one entity, department, or individual as the sole owner of product data.
  4. Audit all new items produced in a sustainable production environment ready for shipment (i.e., finished goods).
  5. Execute communication of initial attributes and package measurements, both internally and externally.

Decentralized Data Governance

Decentralized data governance (DDG) frameworks gained traction in the late 1990s and early 2000s, a shift influenced by the increasing complexity in data, a sharper focus on data quality, the increasing availability of inexpensive data management tools that facilitated the decentralization of data governance, and an unending thirst for agility and Innovation. Organizations prioritized flexibility so they could keep up with the rapidly changing business environment around them. They wanted to operate in real-time, which prompted a move away from the rigid models of old to the newer, more flexible ones.

A DDG framework distributes data management responsibilities and decision-making responsibilities across various business entities within an organization. This model empowers individual teams to govern their own data while still collaborating with a centralized data governance framework or set of guidelines. Each department or business unit has control over its own data governance processes, allowing for tailored approaches based on specific needs.

Retail & Banking Insights

Discover effective data governance practices tailored for retail and banking sectors.

With a DDG model, corporate departments can adapt their own policies and procedures to their unique data requirements and operational contexts. DDG encourages an innovative and rapid response mentality, enabling departments to adjust practices without waiting for central approval. This fosters a sense of employee data ownership, leading to increased engagement and accountability. Individual teams have the power to experiment with new data practices and technologies that suit their unique operational needs. However, while governance is decentralized, there is often a central oversight team that provides guidelines and ensures some level of consistency across the organization.

With decentralization comes some inherent challenges. Different policies across departments can lead to discrepancies in data quality and management practices. Managing a wide array of governance practices can become unwieldly and complicated, making it difficult to maintain an overarching view of data across the organization.

Federated Governance Models

Federated governance models gained prominence in the late 2000s and early 2010s. They emerged as organizations sought to balance centralized control with the need for decentralized, flexible data management practices. A federated governance model (FGM) fits somewhere between the centralized and decentralized frameworks. It balances centralized control with decentralized autonomy in managing data governance across an organization. This approach allows for the establishment of overarching governance standards while empowering individual business units to implement these standards in ways that suit their specific needs.

Federated models scale more effectively than purely centralized ones, distributing responsibilities across multiple teams. This reduces bottlenecks, something extremely important in today’s real-time streaming world. Organizations rapidly adapt to changes in the business environment. Bureaucratic processes aren’t hindered by data requirements either. By involving local teams with specialized knowledge, federated models ensure that data management practices are tailored to the specific needs of different domains. This model promotes a culture of self-service data access, enabling users to utilize data effectively without a heavy reliance on IT resources.

Data Mesh

A decentralized approach that treats data as a product, with individual teams responsible for their own data domains. This framework emphasizes cross-functional teams, self-serve data infrastructure, and a focus on domain-oriented ownership.

As AWS explains:

Business units still maintain tight control over who shares the data, how it is accessed, and what formats it’s accessed in, says AWS. While a data mesh framework adds complexities it also improves data access, security, and scalability adds AWS.

“A data mesh transfers data control to domain experts who create meaningful data products within a decentralized governance framework,” contends AWS. This results in faster access to relevant data, central data pipelines are removed, reducing operational bottlenecks while promoting real-time data streaming availability, adds AWS.

Data Fabric

While Pure Storage believes a data mesh architecture is designed to access and promote collaboration, “A data fabric architecture is a more automated approach to bringing data from various sources and systems together to derive insights from that data,” they say. A data fabric architecture integrates data across various silos and environments, providing seamless data access and management. It utilizes a unified approach to data management, allowing for real-time access and governance across both on-premises and cloud environments.

Pure Storage adds, “Data fabric is a type of data architecture in which data is provisioned through a unified integrated access layer that is available across an organization’s IT infrastructure. The fabric provides a unified, real-time view of data, enabling the business to integrate data management processes with its data from various sources, including hybrid cloud environments, web applications, and edge devices.”

A data fabric solution enables processes like data integration, governance, cataloging, discovery, and orchestration. The solution includes a data transport layer for moving data across the fabric, data analysis algorithms, and APIs to surface data and insights into BI and data visualization tools.

Implementation Resources

Access essential tools and frameworks to build effective data governance programs.

Active Governance

Active data governance is a proactive approach to managing data throughout its lifecycle, focusing on continuous oversight and the integration of governance practices into daily workflows. Unlike traditional, passive data governance methods that react to issues as they arise, active data governance anticipates and mitigates potential problems before they impact the organization. It utilizes AI and machine learning to identify risks and ensure compliance dynamically, rather than relying on periodic audits.

Atlan, the active metadata platform for the modern data stack, claims, “Active data governance is a scalable way of securing your data, upholding its privacy, maintaining its integrity, and promoting data democratization.” Active data governance allows companies to ensure data enablement while encouraging collective responsibility for a company’s data assets. Real-time data validation ensures that only accurate and complete information enters the Atlan system. Atlan’s self-service data catalogs allow users to easily discover, understand, and trust the data they need, which fosters data-driven decision-making.

Data Management Concepts

Understand core data management principles and differentiate between key management approaches.

Collaborative Governance

Collaborative governance puts collaboration between different stakeholders front and center. IT, the corporate business units, and data consumers all work synergistically. This type of governance encourages open communication, shared responsibilities, and collective decision-making among various stakeholders within an organization regarding data usage, data management practices, and data policies.

Augmented Governance

Going forward, AI will dominate data governance. Augmented governance leverages advanced technologies like AI and analytics to enhance traditional governance practices. It automates data lineage, quality checks, and compliance monitoring, making governance more transparent, efficient, and insightful.

Technologies like Robotic Process Automation (RPA), AIOps (Artificial Intelligence for IT Operations), data lineage and provenance tracking tools as well as Natural Language Processing (NLP) and large language models (LLMs) will help improve data quality, data lineage, and data compliance.

Using The Intelligence Of A Laptop To Interact Wit 2023 11 27 05 29 01 Utc
Augmented governance will use AI for data governance

RPA is software automates repetitive, rule-based tasks typically performed by humans. This streamlines workflows and enhances operational efficiency. AIOps automates processes, analyzing vast amounts of operational data, and providing insights that help with decision-making. It streamlines workflows, predicts potential system issues, and automates incident responses. AIOps collects and analyzes data from multiple sources, including logs, performance metrics, and network data, enabling comprehensive visibility into IT environments. It can quickly identify and even resolve incidents by analyzing historical data, predicting potential failures, and then acting proactively to stop those failures occurring.

Data lineage and provenance tracking tools help track data lineage through its lifecycle, providing insights into a piece of data’s origin, any transformations it went under, and even its retirement from service. This transparency is crucial for understanding corporate data flows while ensuring full accountability in data management.

NLP & LLMs

NLP technologies enable the automated classification of data by identifying and categorizing information based on its content. This helps organizations efficiently manage vast amounts of data, ensuring that sensitive information is properly classified and handled according to all necessary regulatory rules.

NLP and LLMs can improve communication around data governance policies by allowing users to query data governance systems using natural language, making it easier for non-technical stakeholders to engage with the data governance processes. By reducing manual effort, organizations can minimize human error and enhance compliance with data protection standards like GDPR and CCPA. NLP can also help with governance by automating compliance checks as well as monitoring the constantly changing regulatory landscape. 

Transform Your Enterprise Data Strategy
  • Expert insights on data governance
  • Strategic implementation frameworks
  • Industry best practices and trends
  • ROI optimization strategies
Sign Up Now

Top Data Governance Providers

The IT software providers offering data governance and metadata management solutions include:

  • IBM Cloud Pak for Data – Offers a comprehensive data governance solution that integrates various data management capabilities.
  • Oracle’s Enterprise Metadata Management – According to Oracle, “OEMM can harvest and catalog metadata from virtually any metadata provider, including relational, Hadoop, ETL, BI, data modeling, and many more.”
  • SAP’s Master Data Governance – SAP claims its “Master Data Governance application helps you pull together master data and manage it centrally using a master data management layer based on SAP Business Technology Platform.”
  • Microsoft Purview – a unified data governance solution that helps organizations effectively manage their data landscape.
  • Snowflake Horizon – the relational database provider, Snowflake claims, “Built-in data governance and discovery for the AI Data Cloud, including powerful compliance, security, privacy, discovery and collaboration capabilities that help data governors, stewards, CISO’s/security admins and data teams both protect and unlock the value of sensitive data, apps and models.”
  • Alation Data Governance App – Alation’s app includes a policy center to get complete visibility into how policies are mapped to data, a governance workflow, a stewardship workbench that uses AI to automate the discovery of candidate data stewards who actually use the data, and governance dashboards that actively measure progress on open tasks as well as on policy usage.
  • Informatica’s Axon Data Governance – With Axon, users can easily define data connections, identify gaps and link policies to the items they affect as well as create a common data dictionary for a consistent source of business context across tools.
  • Denodo Compliance Management – Utilizing a data virtualization system, Denodo “unifies disparate data into a single access layer, serving as the single place where all data consumers in the business can discover and consume the data they need.”

Conclusion

No matter what framework one uses, the mission of data governance has always been to efficiently and securely deliver the right data, to the right people, at the right time, in the right system to support the company’s business goals. This goal is multifold — reduce the IT department’s footprint, make data sharable yet secure, to limit or eliminate data silos, and to minimize application complexity. These are lofty goals, no question.

Decentralized data governance is beneficial for organizations that prioritize flexibility and responsiveness, but it requires careful coordination to ensure that overall data quality and compliance standards are upheld. Centralized governance provides uniformity and control. A federated governance model fits somewhere between the two. It balances centralized control with decentralized autonomy in managing data governance across an organization. A data mesh, data fabric, and active, collaborative, and augmented governance all have their place in today’s highly complicated tech stack. However, an organization’s size, structure, and specific data management needs must be taken into account when choosing between these frameworks.

Establish World-Class Data Governance Framework

Since 1997, Pioneering Enterprise Data Governance Solutions

155+ Successful Client Partners
25+ Years of Excellence
Transform Your Data Strategy

Mark Twain once stated that data was like garbage, you needed to know what you were going to do with it before you collected it. Paul Gillin, the award-winning writer, warned that “Data quality is corporate America’s dirty little secret.” This might be so, but the reality is data quality doesn’t have to be so difficult. Tom Friedman of Gartner believes, “Data is useful. High-quality, well-understood, auditable data is priceless.” He’s probably right and with so many ways for companies to improve their data with strong data governance systems and frameworks, there really is no excuse for them to ignore the priceless data flowing through their IT systems any longer. You never know, one CDO’s garbage could turn into another CDO’s digital gold.