Years ago, a senior leader busted my biggest myth. Being a data person, I thought spotting data quality issues was the most exciting. Turns out, executives barely acknowledge these data quality problems. They treat it as routine data engineering, unwilling to invest beyond the existing budget. Won’t spare a dime! We’ve been cleaning and fixing data for years, yet business decision makers still don’t trust what they see.
Look under the hood of most data quality engines, and it’s all routine – formats, lengths, nulls, checksums. Accuracy and timeliness? Largely ignored. Vendors call this “state of the art.” Add unstructured data, and you realize it covers barely 30% of the landscape. The other 70%? Still in the shadows.
In my view, without a clear metadata strategy, we’re just doing “data laundry” all over again. And now there’s Data Observability, the superstar buzzword everyone’s fond of talking about. I hear you! Let’s ask a few strategic, high-level questions and see what resonates.
Metadata is just not “data about data” but metadata is also information about how you can find that data object “in a messy file structure” and it is information about how you can or ought to use that data object; it is also data about the use of data. This is where most data catalogs fall short.
A classic question: “How many customers are truly active, and not just sitting in the database?” But what if I asked instead: “Where exactly do these active customers exist across company data stores?” That’s a metadata question; more difficult to answer. Metadata has existed for a long time but has rarely been put into practice until recently. What if I organize my photography artifacts by its metadata—by location, date, or subject? Sorting by location, for example, brings to my mind Prof. Yuval Noah Harari’s sharp critique in Nexus: A Brief History of Information, where he argues that AI-driven information networks are pushing us toward a total surveillance society. I’ll save that one for later. Let’s get a handle on metadata.
Here’s a simple example of metadata from daily life: packaged food. Working at a large CPG company, I learned to read the labels carefully – sodium in pasta, expiry dates on deli meat, added sugar in yogurt, source of fruits and vegetables, storage instructions for canned goods. I want to consume these products, but I also rely on their metadata. Now imagine if the labels are wrong or smudged. Would you trust the package? What if lab tests show more sodium than listed? Misreading this metadata can have real consequences. The food industry is tightly regulated by bodies like the FDA, ensuring oversight of both quality and quantity. Could we take a page from the packaging industry and apply similar rigor to our data (metadata) strategy?
In the packaging industry, metadata adds context to products, streamlines supply chains, ensures compliance, and improves discoverability. Similarly, Data catalog platforms allow us to apply metadata – structured, contextual information to datasets, data products, and key endpoints like reports and models. If we don’t know whether the data in our business reports comes from authoritative sources, can it be trusted for decision-making? The answer is clearly no.
Data represents a thing, concept, or event and is interpreted by an agent, either human or machine. This relationship introduces quality considerations on both sides of data. Data quality refers to the degree to which data accurately, completely, and timely represents the underlying thing, concept, or event.
However, accurate interpretation depends not only on the data itself but also on the availability and quality of its metadata. For humans and machines to correctly understand and interpret data, we are obliged to rely on well-defined, consistent, and trustworthy metadata. Metadata quality therefore plays a critical role in enabling meaningful interpretation and effective use of data.
I’m not saying Data Quality initiatives don’t matter, they absolutely do. But in the age of unified data; both structured and unstructured, we need to pay much closer attention to Metadata Quality. Example – For customer data, it’s not just clean names or addresses; it’s how the data is used, classified, and protected. That’s what brings customer data properly under governance.
Let’s walk through some real use cases that show why trusted better quality metadata is so important.
A senior manager once approached me regarding the migration of a legacy Oracle application to Hadoop. Due to the age and complexity of the application, he claimed lacking clarity on what data assets actually needed to be migrated. Without a comprehensive catalog of metadata, he realized that it had limited visibility into what he knew and did not know about the existing Oracle application data inventory. He required richer metadata, including insight into which data objects were actively used and which had remained dormant for years. As its simple case, this information was essential for prioritizing critical data assets over redundant or inactive ones. He also sought to capture data transformations to support requirement analysis and ensure continuity in the target system. Poor metadata quality would have posed a significant risk to the success of this migration.
Accurate and complete metadata plays an important role in dataset or data products discovery. High-quality metadata improves the user experience, particularly within a Data catalog. Challenges in Data catalog adoption are often linked to gaps in metadata quality. One common issue is the absence of sufficient lineage and traceability that holds a key to better understand business and operational metadata. In many cases, catalogs contain data dictionaries but lack meaningful context, such as clear descriptions and usage details. This limits the practical value of the catalog. Well-maintained metadata supports reuse and discoverability of data.
I used to rush to my manager every time I discovered something shocking or interesting about the data. He’d just smile and say.
“ Sure, finding it feels good… but do you know when, who, or how it’ll ever get fixed? ”
When metadata quality is weak, Data issue management suffers. If you don’t know who owns the data, who understands it, where it comes from, or how it flows. Tackling data issues is basically climbing Mount Everest, but without sherpas and limited oxygen. Without good metadata, the whole data producer–consumer model starts to fall apart. No clear origin, no authoritative source means no way to hold anyone accountable for data quality.
The age of early AI is already upon us, where the success of intelligent systems increasingly depends on understanding the structure, lineage, relationships, and semantics of data. For AI to be effective and trustworthy, it is critical to know what the data represents, where it originates, how it was collected, and how it should be used. Sufficient or Complete metadata is essential to support explainable AI. Models such as LLMs rely on vast volumes of unstructured data—text, images, and documents. Any uncertainty or unreliability in data sources can compromise model’s business outcomes.
Metadata also underpins training datasets, feature management, and model performance evaluation. By embedding rules, ownership, and access controls, metadata ensures compliance with regulatory and ethical standards. Furthermore, rich metadata enhances data discovery, understanding, and reuse, enabling the creation of modular “data products” that accelerate AI development and deployment.
So, where do we go from here? To manage the growing data landscape effectively, metadata, quality, and issue management must work together. Remember, metadata quality isn’t only a technical issue; it’s a strategic imperative. Metadata quality isn’t just theory; it’s a practical cornerstone for successful Data and AI Governance. And that’s where 1lessclick® come in.