Michael Scofield: Connecting Geospatial Data With Business Intelligence - Part 1
Most information technology professionals have been immersed in a tabular data paradigm, and thus are unfamiliar with the significant differences geospatial data brings. This article, the first in a short series, attempts to explain the differences, and explain at a very high level how data (or "information") may move across the gap between BI and GIS for added value to decision-making.
In many business areas, spatial data is finding greater appreciation. We like to say that "proximity matters" in so many retail business relationships, as well as supply-chain management. Only on the internet, and transactions where no physical movement of goods occurs, is proximity or location trivial. However, in retail marketing, epidemiology, transportation and delivery, communications, and the criminal-justice system, where something (or someone) is located can be very important.
Perhaps the most common business intelligence (BI) application of geospatial systems is in retail marketing and the proximity of customer (or the best customer) to a retail outlet. In this application that we will explore in more detail.
But first, we must understand the nature of geospatial systems and GIS.
A Different Data Paradigm
The traditional IT tabular data paradigm thinks not only in terms of rows and columns, but also requires that relationships are established by foreign keys which contain symbolic values. We know an employee is in a particular department because someone placed the department code in his employee record. The association was conscious, deliberate, and expressed with a code.
Another characteristic of the traditional tabular approach is that the data is stored in a nearly human-readable form (if you discount binary and packed-decimal issues). If you write a "SELECT * FROM CUST_MASTER" command, the output generally will be human readable.
Geospatial databases are somewhat different. The most significant difference is the special field which stores the shape and location of a geospatial object. Often, this field is called the "geometry" of the object, and generally is not "human-readable". It is stored usually in one of several special data types, both in general GIS files, and in spatial databases.
GIS data is made useful by expression as a map (readable to the human eye, and subject to intuitive human analysis) and to a GIS tool (which can perform a wide variety of computations to yield meaningful information, such as the number of customers within a certain radius of a store).
Either way, the data must be read by GIS software to form graphic or computational expressions.
That gap is bridged by the GIS software tool. The raw data is impossible for humans to comprehend without being expressed graphically by the software.
Components Of A Geodatabase
Geospatial databases employ the same kind of logical data modeling used in traditional database design. Generally, one finds one table for every class of map object (e.g. street, building, property, stream, power line, etc.). All or some of these objects may be found on a map, depending upon the purpose and usage of the map.
And all map objects may be stored and expressed either as points, lines, or polygons. (We are taking a simple approach here; 3-dimensional maps are a little more complex.)
Points represent map objects which have position only, but no significant shape (from the map perspective) or area. These include hydrants, utility poles, radio tower, and the junction of lines (such as streets). At some map scales (covering a wider area) even towns may be expressed as points. While having no shape, a point has location, sometimes expressed in X and Y coordinates but more often expressed in latitude and longitude.
In the above sketch, three buildings are expressed as points, and each stored as a separate record in the Building table in the GIS database.
Lines have shape but no area. A line can connect several consecutive points. Map objects expressed as lines (in their simple form) include streets, streams, power lines, pipes underground, and routes. Lines can be strait (connecting just a starting point and an ending point) or more complex (requiring many discrete points to express their shape).
Polygons have shape and area. Basically, a polygon is a line which wraps around an area, and connects back at its starting point. Polygons include lakes, swamps, properties, building footprints, habitats, government boundaries, and jurisdictions. On larger scale maps (covering less area, but showing each object with greater size on the paper-hence "larger scale") many objects which were points or lines also may be expressed as a polygon. Buildings may have been points on one map, but now have shape as well as position on a larger scale map, and thus stored and expressed as a polygon.
Below we find a simple example of a street table. Each segment of street is expressed as a line, and a distinct record in the table.
Let us look at some differences in how elements of a geospatial database table are structured and named. The ID (first field to the left end of the table) never has a business meaning. It is unique only within the table. The second field is usually the "geometry". The contents of this field are unreadable to the human eye. Most graphical representations of a GIS table simply color this green, and say what kind of geometry (point, line, or polygon) it is.
To the right of the geometry are "attributes" (in the GIS lexicon) which most readers will recognize as typical non-key attributes.
Each record represents a discrete instance of the object class. In the table above, there are three segments of "Main Street", perhaps each connecting two intersections. Those sections of Main Street could have different widths, or different house number ranges. Again, this is a very simple example.
Graphical Expression Of Digitalized Shapes
Nearly all objects on a digital map are stored as points, lines, or polygons. While points have position only, lines and polygons may employ multiple points to accurately describe their shape.
A line can be described with two or more points, each point expressed as coordinates (either X and Y, or latitude and longitude). A curved street segment may require several points to faithfully describe its shape (below).
A polygon (such as a property boundary on which a house sits) may be described in terms of points (often corresponding to actual survey markers).
The data on the house, the main road, and the driveway would be contained in other (non property) tables, and expressed as other layers.
How these records (containing digitized shapes) are created is generally beyond the scope of this article. There are several methods of data capture or digitization, and each has varying degrees of accuracy. Some map data companies now have vans equipped with GPS to take photographs in all directions as they drive down streets, counting the lanes, and plotting the exact position of every aspect of the street and other observable features.
In the next installment, we will see how these various kinds of spatial datasets (employing points, lines, and polygons representing a wide variety of features) are combined by a geographic information system to form a rich, complex map.
About The Author
Michael Scofield is manager of Data Asset Development at ESRI, Inc. in Redlands, California. He also holds a faculty appointment in Health Information Management at Loma Linda University in southern California. Michael is the recipient of the 2008 DAMA Community Award, and was 2007 nominee for the DAMA International award for Professional Achievement. Mr. Scofield has given more than 160 lectures to professional audiences in data management all over the United States, in the UK, and Australia, including 18 DAMA chapters and numerous database user groups. He also has numerous published articles on data management, data quality assessment, and related topics; his humor articles are published in the Los Angeles Times and other journals. Michael can be reached at nmscofield@aol.com