Talking about artificial intelligence and data governance

1. Description

        Generative AI has already started shaking up the world of data governance and will continue to do so.

        It's only been 6 months since ChatGPT was released, but it feels like we already need to look back. In this post, I'll explore how generative AI affects data governance, and where it might take us in the near future . Let me stress that because things move quickly , they can go many different ways. This article is not about predicting data governance in the next 100 years, but about actually understanding what's changing now and what's about to happen.

        Before diving in, let's remind ourselves what data governance involves.

        In simple terms, data governance is the set of rules or processes an organization follows to ensure that data is trustworthy. It addresses 5 key areas:

  • Metadata and Documentation
  • search and discover
  • Policies and Standards
  • Data Privacy and Security
  • data quality

        In this post, we'll look at how each of these fields develops once we include generative AI. 

2. Metadata and Documentation

        Metadata and documentation are probably the most important parts of Data Governance, and the others heavily build upon this one done right. AI has started and will continue to change the way we create data context. But I don't want you to have too high expectations. When it comes to documentation, we still need humans involved.

        There are two parts to generating context around data or logging data. The first element accounts for about 70% of the job and involves recording general information, which is common in many companies. A very basic example is the definition of "email", which is common to all companies. The second part is about writing about specific expertise unique to your company.

        Here's the exciting part: AI can do a lot of the heavy lifting for the top 70%. This is because the first element involves general knowledge, and generative AI is very good at handling this.

        Now, what about your company-specific knowledge? Every organization is unique, and that uniqueness generates your own specific company language. This language is your metrics, KPIs and business definitions. And it's not something that can be imported from outside. It was born with people who know the business best = employees.

        In conversations with data leaders, I often discuss how to agree on these business concepts. Many leaders believe that to achieve this alignment, they bring domain teams into the same room to discuss, debate, and agree on a definition that best fits their business model.

        Let's take the definition of "customer" as an example. For a subscription-based business, a customer could be someone who is currently subscribed to their service. But for a retail business, a customer could be anyone who has made a purchase within the past 12 months. Every company defines "customer" in the way that makes the most sense to them, and this understanding usually comes from within the organization.

        When it comes to this fancy knowledge, artificial intelligence, while smart, can't do the part yet. It can't sit in on your meetings, join discussions, or help new concepts blossom. For Andreessen Horowitz, that may become possible when the second wave of AI hits. Currently, we are still in wave 1.

        I also want to touch on a question raised by Benn Stancil. Benn asks: If a robot can write data files for us on demand, what's the point of writing them ?

        This makes some sense: if generative AI can generate content on demand, why not generate content when it is needed instead of bothering to record everything? Unfortunately, it doesn't work like this for two reasons.

        First, as I explained before, part of the document covers unique aspects of the company that AI cannot yet capture . This requires human expertise. It cannot be generated on the fly by AI.

        Second, while AI is advanced, it is not foolproof. The data it generates is not always accurate. You need to ensure that all AI-generated content is reviewed and confirmed by humans.

3. Search and Discovery

        Generative AI is not only changing the way we create documents, but also the way we consume them. In fact, we are witnessing a paradigm shift in search and discovery methods. The traditional method of analysts searching data catalogs to find relevant information is rapidly becoming obsolete.

The real game-changer is in the ability of artificial intelligence to become the personal data assistant         of everyone in the company . In some data catalogs, you can already approach AI with specific data queries. You can ask questions like "Is it possible to perform operation X on the data?", "Why can't I use the data to achieve Y?", or "Do we have the data that says Z?" If your data is enriched with the right context, AI will help propagate that context across the company.

        Another development we look forward to is AI transforming data catalogs from passive entities into active assistants. Think of it this way: if you use a formula incorrectly, an AI assistant can give you a hint. Likewise, if you're writing a query that already exists, AI can let you know and guide you through the existing work.

        It used to be that data catalogs just sat there, waiting for you to sift through them for answers. But with AI, catalogs can start actively helping you, providing insights and solutions before you even realize you need them . This will be a complete shift in the way we work with data, and it may happen soon.

        However, there is one condition for AI assistants to work effectively: a data catalog must be maintained. To ensure that AI assistants provide reliable guidance to stakeholders, the underlying documentation must be 100% trustworthy. If directories are not properly maintained, or policies are not clearly defined, AI assistants will spread incorrect information throughout the company. This is more harmful than no information at all, as it can lead to poor decisions based on the wrong context.

        You probably already understand: AI and data governance are interdependent. AI can enhance data governance, but in turn, strong data governance is needed to drive the capabilities of AI. This leads to a virtuous cycle where each component promotes the other. But you need to remember that no element can replace another.

   

4. Data policies and standards

        Another key component of data governance is the development and enforcement of governance rules.

        This typically involves defining data ownership and domains within the organization. Currently, AI is not up to the task when it comes to defining these policies and standards. AI shines when it comes to enforcing rules or flagging violations, but falls short when it comes to taking charge of creating the rules themselves.

        the reason is simple. Defining ownership and domain has to do with human politics. For example, ownership means deciding who within an organization has access to a particular dataset. This may include the power to decide how and when data is used, who has access to it, and how it is maintained and protected. Making these decisions often involves negotiations between individuals, teams or departments, each with their own interests and perspectives. Human politics cannot be replaced by AI, for obvious reasons.

        We therefore expect that humans will continue to play an important role in this aspect of governance in the near future. Generative AI can play a role in drafting ownership frameworks or suggesting data domains. However, getting humans involved is still a must.

5. Data Privacy and Security

        Generative AI, however, will transform privacy in the governance sector. Managing privacy is a traditionally dreaded aspect of governance. No one likes it. It involves manually creating complex permission architectures to ensure sensitive data is protected.

        The good news: AI can automate much of this process. Given parameters such as the number of users and their respective roles, the AI ​​can create access rights rules. The architectural aspect of access is largely code-based, which aligns well with the capabilities of artificial intelligence. AI systems can process these parameters, generate relevant code, and apply it to efficiently manage data access.

        Another area where AI could have a significant impact is the management of personally identifiable information (PII). Today, PII tagging is often done manually, which is a burden for those responsible. This is something that artificial intelligence can fully automate. By leveraging the pattern recognition capabilities of AI, PII labeling can be done more accurately than when done by humans. In this sense, using AI can actually improve the way we manage privacy protections.

        This does not mean that AI will completely replace human involvement. Despite the capabilities of AI, we still need human oversight to manage the unexpected and make judgment calls when needed.

6. Data quality

        Let's not forget about data quality, which is an important pillar of governance. Data quality ensures that the information a company uses is accurate, consistent and reliable. Maintaining data quality has always been a complex endeavor, but that has changed with the development of generative artificial intelligence.

        As I mentioned above, AI is good at applying rules and flagging violations. This allows algorithms to easily identify anomalies in the data. You can find a detailed explanation of how AI affects different aspects of data quality in this article .

        AI can also lower the technical barriers to data quality. This is something SODA already has in place. Their new tool, SodaGPT, provides a no-code approach to expressing data quality checks, enabling users to perform quality checks using only natural language. This makes data quality maintenance more intuitive and accessible.

7. Conclusion

        We have seen that AI can enhance data governance in a way that triggers the beginning of a paradigm shift. Many changes have occurred, and they are here to stay.

        However, AI can only be built on already solid foundations. For AI to transform your company's search and discovery experience, you must already be maintaining documents. AI is powerful, but it cannot magically fix a flawed system.

        The second point to remember is that even though AI can be used to generate much of the context around data, it cannot fully replace the human element. We still need humans in the loop to validate and record knowledge unique to each company. Thus, our one-sentence prediction for the future of governance: powered by artificial intelligence, grounded in human discernment and cognition.

Guess you like

Origin blog.csdn.net/gongdiwudu/article/details/131743713