Most data quality experts identify 16 major characteristics that affect the quality of data and information. It is important to understand their context before examining each characteristic.
Information quality can have many different definitions. When users describe issues with data, they talk about inaccurate data, data that is not relevant, data that is not timely, as well as having too much information.
The work done as part of the Massachusetts Institute of Technology (MIT) research concerning data quality conducted by Richard Wang, Yang Lee, Diane Strong, and Leo Pipino indicates that one can identify 16 characteristics that affect the overall quality of the information people are expected to use in fulfilling their job and task responsibilities. That does not mean that each information quality situation involves all of the characteristics. It does mean that one has to listen carefully to the person who is describing the situation and identify which of these 16 characteristics (or dimensions) exist within the specific context being explained. A list of the 16 characteristics can be found in the article published in the Communications of the ACM.
Over time, there has been an increasing number of instances where data quality has affected decisions, and they help to start one thinking about different data quality issues and characteristics and what individuals and organizations can do to address those issues. Although some people equate information quality with “accuracy,” research and experience have shown that concerns about information quality extend far beyond that single characteristic.
Master Data Management and Business Intelligence Data Quality
One example of the need for improved data quality is the challenge of master data management (MDM) and business intelligence (BI) efforts. Many organizations can relate to the statement “Data quality and data integrity are not going away. There’s no easy way to solve them.” The BI software vendors have tried to address data quality and integration issues with Master Data Management (MDM) solutions but that they have had limited success in trying to cleanse and reconcile data.
One of the main reasons that the BI and analytics efforts fall short in this regard is that they usually are dealing with only a part of the organization and the data quality issues arise as they attempt to integrate and use information from across the entire organization. In each particular context, the information is perceived and measured to have a high level of quality. It is when one tries to integrate the information within an overall organization context that troubles begin to appear and are not easy to overcome. In every organization, some sources of data will always be dirty. In such instances the user should try to determine if the data is really needed. Is it relevant for the purpose at hand? If it is, think about the best approach to handle the data to make it meaningful.
Often, two different people can use the same information to run reports using two different tools and get different results. It reminds one of a recent advertisement by a BI vendor that has three different people walking into a meeting with the CEO each with a different answer to a question. The question could be as simple as “What is the firm’s gross profit for the current fiscal period?”. Depending upon the specific context used to calculate the answers, all three could be correct or none of them could be correct. A too frequent experience is attending a meeting and spending too much time determining whose numbers are correct. The time should have been spent analyzing the numbers and determining the proper action that should be taken by the organization. Consistency and usefulness of the information is important to any organization as is understanding the context (meaning) of the data.
Flawed Critical Data
A Gartner Group study indicated that more than 25% of critical data held by Fortune 1000 companies was flawed. The data was inaccurate, incomplete or duplicated. Think about the implications to an organization! These data issues are addressed in some fashion when the financial statements for the firm are prepared. Unfortunately, when someone tries to build aggregates of the information from the original source data, complications will arise and issues of inconsistency and misunderstanding will occur.
A personal experience involved the development of an initial data warehouse for global financial information. The initial effort was to build a new source of global information that would be more available and would allow senior management to monitor the current month’s progress toward budget goals for gross revenue and other profit and loss (P&L) items. The effort was to build the information from the source systems that feed the process used to develop the P&L statements. To deliver information that would be believable to the senior executives, a stated goal was to match the published P&L information.
After a great deal of effort, the initial goal was changed to deliver the capability for gross revenue. This change was necessitated because there was no consistent source data for the other P&L items. Even the new goal proved elusive as the definition for gross revenue varied among the over 75 corporate subsidiaries. Initial attempts to aggregate sales for a subsidiary that matched reported amounts proved to be extremely challenging. The team had to develop a different process to aggregate sales for each subsidiary. Unfortunately, that process was not always successful in matching the published revenue amounts. It took almost two years to successfully match the revenue numbers for all but one of the “major subsidiaries”, the United States. Needless to say, the senior executives were not thrilled with the progress or results, and sent the team to discover a more thorough result.
Information quality is a complex concept, and that there usually are no quick and simple solutions within most organizations. That is why the metaphor of a “journey” is so apt, and that continuous improvement is fundamental to accept when talking about solving information quality problems. It took a long time to create the problems and it will take time to correct the problems.