Is there a connection with Artificial Intelligence / Machine Learning and Data Governance? Of course, since without well-governed and accurate data, there is no possibility for an AI or ML project to succeed
What Is Data Governance?
Data governance is defined as the organizational framework that applies to how data is obtained, managed, used, and secured by your organization. Having a strong data governance strategy empowers an organization to trust the integrity of their applications and databases, including artificial intelligence and machine learning models, by ensuring the data originates from valid sources. Effective enterprise data governance also ensures that machine learning models are programmed to follow the organization’s policies and standards for data management and usage. Since most data used in artificial intelligence and machine learning efforts comes from multiple sources, having established a strong data governance program is essential to the success of any artificial intelligence or machine learning project.
What is the Internet of Things (IoT)?
The Internet of things (IoT) is a system of interrelated computing devices, mechanical and digital machines provided with unique identifiers (UIDs) and having the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction.
The objects may be “dumb” and only activated when they enter a predefined area or require scanning. Or, the objects may be “smart”, communicating and interacting over the Internet. These objects can even be remotely monitored, interacted with, and controlled.
IoT’s Impact on Data Governance
There is a rush to push IoT objects into the market, causing data volumes to continue to grow at a rapidly accelerating pace. The need for effective policies, practices, standards, and processes to manage this data is essential to the ability to realize the potential inherent in the collection of data by the IoT objects.
Much IoT data will not be in our control (e.g. a parent lets their teenager try out their fitness tracker for a week), so the trust for the actual data’s source and associated metadata should be questioned before using this data to make decisions.
There are some mitigating actions data governance can take to reduce the difficulty of managing the data collected by these IoT sources:
- Ensure the data governance team is actively working with IoT engineers, and that the engineers understand the importance of designing with data governance / metadata / data quality best practices
- Standardize and define the IoT data to be captured, and define the actual uses that data will undergo
- Build IoT hardware and software correctly from the start with a focus on the proper management of data (data quality, metadata management, data integration, etc.) to allow an organization to integrate IoT data into their enterprise systems
Artificial Intelligence and Machine Learning Definitions
Artificial Intelligence (AI) is defined as “machines that respond to stimulation consistent with traditional responses from humans, given the human capacity for contemplation, judgment and intention”.
Machines that mimic human behavior is a common vision for artificially intelligent objects, but the reality of creating such “human” robots or systems is still more science fiction than science fact.
According to research, 85% of business leaders believe that AI is a strategic competency for business as they work to discover the relevant business cases for this capability. Many organizations are adopting a strategy for incorporating AI into their business operations, but are neglecting the inclusion of a companion data strategy.
Machine Learning (ML) is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns, and make decisions. ML uses models, decision trees, neural networks, natural language processors, etc. to perform its operations.
Three major types of ML
Unsupervised Learning: learns from test data that has not been labeled, classified, or categorized. Unsupervised learning identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data.
Reinforcement Learning: enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. The model does not have the answer.
Supervised Learning: input variables and an output variable use an algorithm to learn the mapping function from the input to the output. The goal is to approximate the mapping function so well that when new input data (x) arrives, one can predict the output variables (Y) for that data. Supervised Machine Learning, using labeled data with historical examples has demonstrated some success, and could provide a true disruptive force for data management professionals.
Data Governance with Artificial Intelligence and Machine Learning
Data governance (and other areas of enterprise data management) helps an organization to manage five things about data: Availability, usability, integrity, and security of their data.
The practice of data governance starts with clearly defined data policies, data standards, data processes, and the identification of data stewards and data owners to support the development and implementation of these policies, practices, standards, for all initiatives. Good data policies manage the coordination of data management across the organization, data standards manage the quality of data and metadata for integrations and data lineage, and effective processes ensure that quality and consistent results can be sustained.
Additionally, following regulatory compliance obligations is an important to building and using a data governance program across the organization. Having a robust data governance program can support compliance with existing laws (e.g., the European Union’s Global Data Protection Regulation (GDPR)).
Managing data properly can save data scientists and machine learning engineers effort and time. Most of the time spent in Information Technology and data science is devoted to cleansing the data, and identifying data that can be used, connecting and integrating the right data. An effective data management program that includes data governance policies, metadata management standards, and data quality dimensions and metrics can reduce the time needed for an IT or data science project significantly.
Data Stewardship teams have the task of understanding data in their functional areas. Many large companies have over 10,000 applications with millions of data elements. To expect data stewards to oversee the data and metadata associated with the thousands of critical data elements is not realistic. Machine Learning, especially the supervised version, holds the promise of dramatically reducing the tasks of the data stewardship teams, allowing them to work on the critical issues instead of the simpler tasks that can be delegated to the machines / algorithms.
Value of Data Governance in Artificial Intelligence
Effective enterprise data management and data governance can use AI, ML, etc., to:
Reduce time to data cleanse and associate the correct metadata for contextual understanding
Improve organizational reliance on accurate, well-governed data to support expanded capability
Achieve higher quality and precision AI and ML, using good data to train machine learning or AI neural networks to create more precise decisions for the systems.
Faster and more efficient online inference by trained models that have “learned” through the application of the approved policies and standards
Data Governance is critical for a successful development and implementation of artificial intelligence, machine learning, and other emerging technologies. Without an effective enterprise data management initiative grounded in data governance, metadata management and data quality management, the promise of these capabilities will never be realized.