Introduction
In its October 11, 2023 press release, Arun Chandrasekaran, Distinguished VP Analyst at Gartner, stated , “Generative AI has become a top priority for the C-suite and has sparked tremendous innovation in new tools beyond foundation models.” Gartner sees demand substantially increasing for generative AI in a multitude of sectors, including healthcare, life sciences, legal, financial services and the public sector.
According to Deloitte , 43% of CEOs have implemented
Gen AI in their organizations
to drive innovation. Many CEOs recognize the importance of this new technology. Some of the early use cases show it has promise and should have a lasting presence in multiple industries.
Unquestionably, Gen AI technology is impressive. However, like with most of today’s other cutting-edge technology, Gen AI requires data, lots and lots of data. This makes data governance all the more important. In its Data governance perspectives on Generative AI , Deloitte argues, “organisations can only unlock the full potential value from their Generative AI use cases through robust data governance capabilities.” Not only that, Deloitte also sees data governance as an essential component to safeguard against the risks inherent in Gen AI. “Without robust data governance capabilities, the potential impact and value added by Generative AI will be severely limited and may even expose organisations to data and cybersecurity risks,” they warn.
What is Gen AI?
Gen AI is a type of artificial intelligence (AI) that excels in areas like text, image, audio, video, coding, and 3-D creation. Although Gen AI has been around for decades, recent advances in machine learning (ML), natural language processing (NLPs), convolutional neural networks (CNNs), and generative adversarial networks (GANs), have really put it on the map.
With simple text prompts, users can create images, audio, animation, music and videos. The results are so good it’s hard to tell the difference between Gen AI created content from human-created ones. Besides the creative aspect, Gen AI tools can also help with video editing, visual effects, and other video postproduction work. Needless to say, Gen AI tools are revolutionizing creativity.
Gen AI is capable of producing various types of content by learning patterns from existing data. It utilizes advanced ML models to generate outputs that resemble the training data. This technology can create anything from written stories and poems to realistic images, animation, videos, and even coding datasets.
In his article What is generative AI? Everything you need to know , George Lawton claims, “The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.” Figure 1 shows Gen AI output from Midjourney, all the images were created with a simple text prompt, usually about a sentence.
Data Governance is Critical for Gen AI
Gen AI is only as good as the data going into the models creating it, so data quality is paramount. Data governance frameworks ensure the highest standards of data quality and integrity. This ensures proper model governance, which helps with the transparency of all model outputs. Organisations that maintain a culture of data literacy are more successful in consistently unlocking value from their Gen AI platforms. Data governance promotes legal, ethical, secure, unambiguous AI driven decision-making, including for Gen AI.
As Dr. David P. Marco explains in his article, Data Governance in Business Intelligence and Analytics , “The principles that drive a data governance effort usually involve components such as data integrity, data standardization and metadata, standardized change management, and audit capabilities.”
For Deloitte, “Data governance plays a pivotal role in fostering innovation in the evolving AI landscape by ensuring responsible data practices, mitigating biases, and safeguarding privacy. A robust data governance strategy is the key to unlocking the full potential of your Generative AI use cases.”
To achieve accurate and quality outputs, Gen AI models should train on data fit for purpose, recommends Deloitte. “Explainable outcomes are essential to instill trust in Generative AI systems, enabling users to comprehend the reasoning behind the outcomes they produce. Effective data governance practices are necessary to ensure quality, integrity, and representativeness of the data,” adds Deloitte. Gen AI models must be free of biased data, inaccurate data, and confidential information. In his article, Generative AI shines spotlight on data governance and trust , Stephen Catanzano warns, “Without robust data governance in place, organizations risk exposing themselves to significant financial, reputational and legal liabilities.”
The Gen AI Differentiator
In their Data governance in the age of generative AI , Rupanagunta, Arni, and Sayed claim, “data is the generative AI differentiator.” They add, “A successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach.” While LLM enterprise use cases require the implementation of quality data along with privacy considerations, enterprise data generated from siloed sources coupled with the lack of a data integration strategy can create serious challenges in provisioning data for generative AI applications, say Rupanagunta, Arni, and Sayed. “The need for an end-to-end strategy for data management and data governance at every step of the journey—from ingesting, storing, and querying data to analyzing, visualizing, and running artificial intelligence (AI) and machine learning (ML) models—continues to be of paramount importance for enterprises,” contend Rupanagunta, Arni, and Sayed.
Many Gen AI use cases utilize enterprise knowledge drawn from unstructured data, such as documents, transcripts, images, and video, as well as typical structured data from corporate data warehouses. “Unstructured data is typically stored across siloed systems in varying formats, and generally not managed or governed with the same level of rigor as structured data,” say Rupanagunta, Arni, and Sayed. Gen AI “applications introduce a higher number of data interactions than conventional applications, which requires that the data security, privacy, and access control policies be implemented as part of the generative AI user workflows,” they add.
Challenges in Implementing Gen AI Data Governance
Gen AI is still in its infancy, which means the field lacks robust governance frameworks and any overarching guidelines. Organizations still struggle to define standardized practices for handling data quality, data governance, privacy, and security. This complicates regulatory compliance and increases the risk of bias in any Gen AI model. Poor data quality can lead to unreliable outputs, which undermines trust in the technology.
Beyond this, Deloitte warns Gen AI can perpetuate harmful and untrue stereotypes if the underlying training data contains this bias. Organizations must source data properly and be wary of any potential biases. IT security protocols and risk management frameworks with mitigation plans can help protect against any pernicious use of Gen AI.
Many organizations building Gen AI tools face difficulties in tracking and understanding the data used in their Gen AI models. The complexity of these models and the prevalence of unstructured data across various systems hinder transparency. This makes it a challenge to identify potential biases or errors in any AI-generated outputs.
Sensitive and private information put into Gen AI systems could pose a risk of accidental release. To ensure this does not occur, organizations must implement stringent data security measures that prevent unauthorized access. Compliance with internal privacy regulations and external global privacy laws must also be ensured.
Hallucinations
Gen AI is not a perfect technology. It suffers from what has become known as the “hallucination” problem. These are instances where AI models, particularly LLMs, produce outputs that are false, misleading, or nonsensical while being presented as completely credible. This occurs because these models generate responses based on patterns gleaned from the training data without a true understanding of the underlying reality of this data. This leads to inaccuracies that can misinform users or perpetuate biases. Gen AI users must be diligent about their data use. Gen AI users should implement processes like data quality monitoring and data quality issue remediation to ensure the data they use is fit for purpose.
Strategies for Effective Data Governance in Gen AI
In his 7 Data Governance Guiding Principles , Jose Almeida argues, “The principles that drive a data governance effort usually involve components such as data integrity, data standardization and metadata, standardized change management, and audit capabilities. These components are especially important in any cross-organizational effort and are essential in business intelligence and analytics.”
For Deloitte, data governance practices must ensure the foundation model training data used is accurate, reliable, and fit-for-purpose. Users should be able to track the data used to train a model back to its original source. This makes spotting potential sources of bias, errors, and unethical content easier, contends Deloitte.
Corporations must “Ensure sufficient data access controls are in place for input/training data depending on the level of sensitivity, and in line with GDPR compliance for personal data relating to data subjects,” says Deloitte. This will mitigate security risks as well as ensure only approved users can access the Gen AI data, adds Deloitte. Well-defined processes promote transparency and help stakeholders understand the data needed for their Gen AI models.
Case Studies and Real-World Applications
“Organizations building GenAI-powered applications should start by defining a use case, such as a GenAI-powered knowledge base where employees and customers can get company and product answers quickly,” advises Stephen Catanzano . The data foundation includes product catalogs, customer service logs, training documents, or one of a thousand other business documents. “This data is processed into a vector-enabled database, using techniques such as retrieval-augmented generation and embeddings from a large language model or foundational model, such as OpenAI’s GPT, Google’s Gemini or a front-end chatbot,” says Catanzano. Users can query the system and receive answers in natural language based on the specific enterprise data foundation, adds Catanzano. The quality and representativeness of responses tie directly back to the accuracy, fairness and reliability of the Gen AI tool, claims Catanzano.
Ferrari Revs Up Personalization with Gen AI
In its article, Ferrari Advances Generative AI for Customer Personalization and Production Efficiency , AWS, the cloud provider, explains, “To bring the luxury experience to customers around the world, Ferrari developed the car configurator on AWS, giving its customers the ability to personalize their own Ferrari, from wheel selection to paint colors to interior options.” Customer can test out various configurations, while customizing their vehicle to suit their needs. User can use the 3D car imagery to rotate and zoom in and out of their chosen Ferrari. The Italian sports car manufacturer uses LLMs to personalize the images. Since introducing the car configurator , Ferrari’s sales leads have risen. They have also seen a 20% reduction in car configuration times.
Gen AI chatbots enhance Ferrari’s after-sales experience as well. They assist sales professionals and technicians with customer issues. Ferrari uses AWS to train “its chatbot to classify and summarize customer care tickets and answer commonly asked questions that help to reduce human error while improving productivity.” They also use generative AI to increase productivity and “make it simpler for our fans, dealers, and employees to have the best digital experiences with Ferrari,” says Silvia Gabrielli , Chief Digital and Data Officer, Ferrari.
Conclusion
Gen AI is a subset of AI that excels in creating various types of content that are often indistinguishable from human-generated content. As organizations increasingly adopt Gen AI, robust data governance becomes essential to ensure ethical use. Deloitte believes, “The fast-evolving landscape of Generative AI presents a massive opportunity for organisations to revolutionise their business and build on new opportunities, but this comes with the obligation to ensure that Generative AI is used in a responsible and ethical manner that minimises risk to organisations and individuals.”
Gen AI is a highly impressive technology, but without proper data governance, Gen AI capabilities could expose the organization to a number of unnecessary risks. Data governance ensures that organisations comply with relevant data protection regulations when using Generative AI. Data governance can help maintain data quality as well as mitigate risks associated with data biases and compliance violations.