It is essential to consider the enterprise view when designing any data model. There are many benefits to adopting an enterprise perspective with data modeling
Developing accurate, complete conceptual and logical data models forces the data architect to embrace the concepts behind structuring data to minimize the impact of change. The adoption of disciplined approaches to the structuring of data, including the use of third normal form, is an essential activity when designing any level data model. Traditionally, many developers and data architects have been accustomed to thinking in terms of data fields and performing operations on them, not thinking about the overall structure and relationships within the data.
Developers in many organizations still focus on the data needs of their code with little or no focus on structuring data to minimize the impact of change or its use for other business processes. A useful technique for overcoming some of the challenges presented by this developer perspective is to suggest organizations think about data as it exists in the real world, beyond the immediate task at hand, as they design the data structures for their applications. This has proven to be an effective way to influence organizations to adopt more sustainable and stable data structures for their organizations.
Reasons for Application versus Enterprise Focus
Perhaps the continued focus on the application requirements versus the enterprise needs lies with the focus of the application developer. Most people involved in information systems development begin their careers as programmers. Typically, this position entails sitting in relative isolation, receiving some program specifications from a business analyst (BA), or working iteratively with a BA, finalized with a requirement to develop code to satisfy the specification, all under a close deadline. Therefore, the programmer knows he/she has a deadline to meet, yet they want their job to be interesting and challenging. The programmer cannot make the activity too interesting or challenging or they will miss their deadline, so they revert to their core competency, which is programming. As a result, the programmer develops their code in the most creative way they can to exhibit their talent. There is no incentive for the programmer to define the fuel of their code (data) correctly, as this is outside of the scope of their assignment. The programmer, the supervisor and manager are interested in moving past any data requirements as quickly and easily as possible so they can concentrate on completing their coding tasks. The data architect, if there is one, is pressured by the BA and programmer to deliver a design as fast as possible and often with no direct business involvement.
Data Architects and an Enterprise Perspective
This lack of a data focus is a cause for concern for two reasons. First, since most people in IT are not concerned about data, this lack of concern makes the data management professional’s job challenging for their whole career. The second point is that most data architects start their careers as programmers and evolve into their data architect role over several years or decades. Unfortunately, there are not many enterprise data architect roles, establishing strategies for the use of data in the workplace. By definition, there should be one individual or team who has that responsibility for any enterprise, regardless of size. The enterprise data architect may have many data architects supporting him or her in this capacity; data architects have a more limited scope of responsibility, usually a subject or functional area within the enterprise.
What impact does this have on enterprise information management? Most organizations employ a collection of developers who do not understand data, and most data architects are less experienced staff who may lack formal training, and who have transitioned into their data role from a function where they learned bad data habits.
Bad data habits include the inclination to focus locally (a program or application view) when defining the data and to not separate business data from application data. These bad habits, often the result of poor data management training, are the representation of data stored in a database that makes the job of the developer easier, while obscuring the value of data for the users and the rest of the organization.
The reason for this focus is multi-sided: the IT business analyst, tasked with interfacing with the real business person, writes exactly what the business person tells the analyst they need. The business person also presents a local view of the data, since that is what they need to perform their specific business process. So the data architect – between the developer and the business analysts – is pressured from both sides to create a data definition that is expedient from a development point of view and one to which the business analysts can relate. Therefore, the data model usually represents the physical / application view of the data and not the normalized, enterprise / business view of the data.
Challenges with Locally Focused Data Design
There are many challenges with adopting a local (application / program) focus for data design. The data representation will not reflect how the data exists in the real world, but how it is used in one IT application. Taking an enterprise information management perspective may expose the problems this lack of real world focus on data will cause. Start with a simple example for a municipal government and examine a data model produced for a rudimentary purchase process.
Figure 1. Purchase Table
This table is a record of a purchase keyed by invoice number. It identifies the vendor, what product was purchased, how much of it was sold, what it cost, when it was purchased, and the payment status. Most developers would be happy with this simple table, as it is easy to code to and does not require any joins to other tables. The business analysts can understand it easily so there are no issues for development and presentation. The code is developed and put in production and everybody is happy.
However, in the very near future, an invoice is received and cannot be entered because it has the same invoice number as a record already in the database.
What happens now? The angry business process owner calls the business analyst and insists that the process be fixed immediately. This insistence on an immediate solution precludes any lasting resolution to the problem. The business process analyst figures out a quick, cheap, simple solution to the problem. They change the documentation to tell people to add a sequence number to the end of the invoice number upon entry, if there is a conflict, to circumvent the unique key constraint. Everybody is happy again.
What happens next? The director of procurement decides they would like to know how much chlorine they are buying to maintain the city’s swimming pools because they would like to look into getting a better deal by consolidating their purchases. IT is asked for a report to discover how much chlorine has been purchased with the vendor names. Can this be figured out from the table structure above? Probably not. First, there are no standard vendor names. The values in this field are whatever was keyed in for each invoice, therefore “Andy’s Pool Corporation” may appear as “Andy’s”, “Andy Pool”, “A’s Pool on Poplar”, Andy Corp” etc. The product may contain “Weber 3 Tabs,” “Splashes 1.25lb”, “Blue Wave Shock.” The quantity data element may contain 2kg, 50lbs, 100 etc.
Effective Data Model Solutions Use Enterprise Perspective
How could these problems be avoided? A good data architect models information required to drive business processes as they exist in the real world. This concept means that the data model is based on its inherent data characteristics, independent of the business process or, the program that is under development to automate the process. In this example, the data architect should have modeled the following tables (caveat: This is for illustrative purposes only; it is not a completely accurate model).
Figure 2. Example Data Model Illustration
In this example, the Business table contains a surrogate key assigned for uniqueness but it also contains the Tax ID number which may be used to identify this business uniquely. The data architect modeled the supplier’s product catalog so the organization can communicate effectively with them. The organization standardized the products on an internal surrogate key so IT can provide consistent reporting within the organization. The data architect modeled the products to industry standard codes, in this case the NGIP Code (National Institute of Government Procurement), so the organization can communicate effectively with other external organizations that follow this standard. Additionally, the data architect addressed the issue of relying on the suppliers’ invoice number to identify purchases uniquely, and created a cross reference to their numbers.
Adopting a “real world” approach to data representation provides many benefits. The organization can implement stable data structures that can be used to support all operational business processes effectively, and offer internal and external reporting and communications requirements that are tactical and strategic. These goals can be accomplished by looking beyond the immediate task and considering the data from a broader usage perspective.