Affiliated with:

Investing Backward for Effective Data Quality

Investing Backward for Effective Data Quality

Effective business intelligence, analytics, AI/ML requires attention to legacy systems for improved data quality and data management

Business Intelligence, analytics, cloud, ML/AI, digital, and business transformations dominate the IT industry trends. All insights derived from data and various facets of analytics are dependent on the underlying data. More organizations need to address data quality and related data management issues stemming from historical (legacy) systems. While advanced technology and expanding algorithms have provided unprecedented options for storing data, analyzing information, and formulating insights, a lack of data quality continues to stop companies from reaping the maximum benefits from such breakthroughs.

The Challenges of Poor Data Quality

Many would agree that technological advancements do not automatically resolve the deficiencies in data quality. If left unresolved, such deficiencies will likely get passed on to the new platforms/algorithms or even magnified in the process. Data quality problems must be addressed at the outset before data moves to the cloud, or before advanced modeling techniques are used, and, most definitely, before the insights and conclusions drawn from such data become formal recommendations for strategic decision making.

The old saying of garbage in and garbage out still applies. Insights coming from imperfect data would require, at the minimum, an understanding of the limitations and implications due to the data quality. Conclusions drawn from data with severe quality limitations should be made carefully. In the worst-case scenario, the conclusions and insights drawn from such imperfect data may not be appropriate to serve as the basis of decision-making, completely wasting time and resource investment. Companies are often more inclined to focus on remediating data quality issues in the process and justifying the outcome afterward, rather than addressing them from the root cause, although the latter would significantly reduce the former’s effort. There is no shortage of data-related initiatives spending. However, most of the efforts and investments aim to solve the problems from now on. At the end of the hopefully successfully rebuilt framework, the desired outcome is much more viable, usable, and appropriate data for intended purposes.

A Few Problems with That Assumption

The requirements and planning are built on the understanding and intention as of today. Not only are the benefits not available until possibly 5 to even 10 years into the future, the usability of today’s framework is also very questionable. Technologies, requirements (regulatory or strategic), environments (micro or macro), industry trends, workforce, etc., are all destined to change.

If companies are always looking forward and trying to resolve today’s data quality challenges in 10 years, it is not hard to imagine that this is a constant catching-up game. The fact is that the result may not be available to assess until a few years later. Therefore, there is no effective way to measure that until a substantial investment has already been committed. From a cost-benefit perspective, the amount of effort required to make do with existing imperfect data substantially outweigh the benefits derived from them. Careful considerations of implications, thorough documentation of such deficiencies and perceived impact, and constant justification of decisions or lack thereof make this approach questionable. Even after all this effort, none of the data quality issues are actually resolved. Ultimately, this is a never-ending cycle that repeats itself while consistently soliciting attention and effort.

(One) of the solutions

There are many possible solutions. One is investing backward. Instead of always being forward-looking and trying to build a “perfect” future, attention should be given to the current state and how historical challenges can be resolved now, not 10 years from now. Inevitably, there would be plenty of resistance to this approach. Of course, it is not easy to look back. Changes not only make today’s effort fruitless and obsolete years from now, but they also make piecing together the past extremely difficult. Combined with legacy systems and unresolved organizational issues along the way, to say that improving legacy data quality is a time-consuming and difficult task is an understatement. To succeed in this effort, resources, time, and management support are crucial. Like implementing a new framework or transformation initiative in the future, investing and investigating the past requires even more of a cultural and organizational alignment to succeed. 

Appropriate value propositions and approaches to this recommendation need to be carefully tailored and planned. One possibility is to look at the alternative. What if nothing is addressed and the quality issues have to continue to exist? A closer look at existing practices and aggregation of the effort and resulting benefits could help paint a more accurate picture.

There is no one-size-fits-all approach to any company. Each of the scenarios must be examined individually and an assessment of the effort adequately understood before deciding whether the effort to update legacy systems’ data quality is worthwhile. In some situations, it is just nearly impossible to go back in time. If the company has gone through mergers and acquisitions and the lack of sufficient documentation and personnel to help data integration, it would be a very daunting task to remediate decades of poor quality data.

In other situations, though, having sufficient management support to allocate resources and time dedicated to solving this challenge could enable success. Like the assessment of return on investment, a holistic and realistic understanding of the benefits should be developed. In some cases, imperfect data requires that analysts continuously revise the models’ algorithms to address the quality concern. Whiles some deficiencies can be addressed this way, in other cases, constantly changing algorithms or data sources could lead to unreliable models with output requiring substantial adjustment and management explanations.  This would defeat the intent of building quantitative models in the first place.

Ultimately, advanced technological enablement does not necessarily address data quality concerns. On the contrary, imperfections could be magnified into results that are at best inappropriate to use in decision making or, at worst, misguided initiatives. Rather than spending a substantial amount of effort to find a workaround, perhaps some energy could be better spent to address the quality issue at the inception, fixing the data in the legacy systems. Furthermore, such effort could have a long-lasting effect on relevant effort, and the benefit would not exist in isolation. At the minimum, improving legacy data quality should help companies understand how to collect high quality data through policies designed to prevent such challenges from happening in the future. Improved data governance, attention to business metadata and other data management framework functions can help improve data quality. Requirements and strategies should be developed and implemented to support this approach to high data quality.


Investing backward is not a popular or preferred approach. It is much more enticing to look at the present and promise a better future and pay little attention to legacy systems and their data quality. However, the benefits of at least considering such a non-conventional perspective could be fruitful. A little time and effort given to the legacy systems’ data could bring unforeseen benefits while substantially reducing the effort to “patch-up” and “make do” with whatever is available. It could also help build a better, brighter, and more attainable data future.


Wenbo Wendy Zhang

Wenbo (Wendy) Zhang is a Data & Analytics leader with extensive experience in both the financial sector and the US Federal Government. She has extensive experience in leading compelling and successful projects, driving strategic business outcomes leveraging data, and using predictive analytics. Wendy earned a BA in Economics from the University of California, Santa Cruz, and holds dual Master’s degrees in Accounting and Business Analytics from Ohio State and George Washington University, respectively.

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.