Rather than apply new words or phrases to poor data architecture concepts, learn how a data strategy should influence effective data architecture, especially for analytics and business intelligence
People love buzzwords. Sometimes people think the quality of a solution improves with the quantity of flashy words or acronyms used to define it. That is so wrong! Needing to learn new words or acronyms to stay updated is a challenge, and an unnecessary one. However, are things really changing that much at their core? Or is the old mantra “the more things change, the more they stay the same” still applicable? These questions are worth exploring and answering.
Setting the Background for Data Architecture
In the 1980’s a new paradigm emerged, the Data Warehouse architecture, built to consolidate data from different systems to create a reporting infrastructure to serve business users and save some dollars by freeing processing from expensive mainframes and transactional platforms.
As time evolved and business requirements grew more complex, the Data Warehouse evolved to a more mature and robust architecture to serve more use cases, from Batch Reporting to Complex Near-Real-Time event processing. An example from Teradata is depicted on the image below.
The 1990s and 2000s increased focus on the speed and the complexity of the data that was already used for analytics, enhancing those capabilities with more non-traditional sources (like web application logs or CRM outputs). Operational Intelligence was a key theme that made Data Warehouses more “active”, with intra-day or even intra-hour data loading along with the standard daily batches of data. There was also a switch from “looking at the mirror” to “looking at the front”, with predictive analytics as a key enabler of business decisions. As an analogy, “it makes no sense driving a car by only looking at the rear mirror”.
Although this review may be superfluous, it is important to know history to plan a future state. Data warehousing experienced a prosperous era delivering value to business users, with relational databases dominating this market and bringing lots of business value and customer satisfaction to the world. On the dark side, there have been many failed implementations that were unable to deliver much value, because those implementations were too “IT-centric” with little connection to the business needs. The lessons learned was that architecture deployments must be aligned with and focused on meeting business needs. Spoiler alert: does this sound familiar? Stay tuned.
The “Data Something” Architecture Issue
Over the years there has been a huge explosion in data volume from numerous data sources, some with massive volumes of records produced at high speed. This variety and amount of information introduced new requirements to capture and curate it. New ways of consuming the information extended the need for real time decisions, even along with other challenges such as higher concurrency and increased data structure complexity. Congratulations! This is the Big Data Era! Buckle up, the ride gets bumpy.
The architecture principles started to shift to a different paradigm. Having a one-for-all or multi-purpose platform wasn’t enough, so the hybrid analytical ecosystem, the “Logical Data Warehouse”, the “Enterprise Data Hub” and many other names emerged. Many of these terms originated from various analyst firm lexicons. Another new approach was created, the distributed file system, powered specially by the Hadoop Project. This became the great enabler of the bright and new architecture pattern: The Data Lake.
A Data Lake became the “de facto” standard for many analytical ecosystems, with views that only a Data Lake analytical ecosystem was cool, and environments without one were not modern. Most implementations failed (if we built it, they will come, but they didn’t) although it was not because the design pattern itself was flawed. Most Data Lakes were built without any clear business purpose and without proper data management practice. It was “cheap” and “good enough” based on open source software on top of commodity hardware.
No matter what the perception, even open source and commodity have a cost and take significant effort to deliver real business value. Perhaps some companies that specialize in “social network business” and some “content streaming” have succeeded on this trip, but the majority of companies have a very different way of doing business and building teams compared to these few organizations. No recipe is good for everyone.
Fast forward to the present, Data Warehouses still play their part. There are modern approaches included such as the DataLakeHouse and the Data Mesh among a few others.
Data is the Air in the Wheel
This is a highly competitive era, with a marketplace full of great data solutions (and not so great ones) but it is important to focus on what is important, which is the data. Sadly, many solution providers try to differentiate themselves by adding new spokes to the data wheel, trying to shake the “status quo” without much real content. Is it bad to change? Certainly not. Change enables innovation. However, remember that data is what powers of the wheel of progress. Improving the efficiency and modernizing the design of the wheel to make it more secure and adapting it to different environments, still leaves a wheel. Well architected data is what inflates the wheel to enable a business to drive to new heights safely with the fewest bumps in the road.
Data What? Data Strategy!
“Data Strategy” is the essential component for success with data, regardless of architecture. Keep looking at business requirements and build the data strategy based on that. What data will bring more value? Which sources exist and could complement this new initiative? How can the requirements be prioritized? Which funding will be available to accomplish this? Has a good ROI been determined to secure more funding? Will building this first provide value faster?
After answering those questions, it is important to apply architecture patterns to make the most intelligent use of the company’s technological and monetary resources. Based on the value that the data and the characteristics of the required analytics, the company can discover a good fit for different architecture patterns. Reuse what’s already proved useful. There’s no need to stick to one pattern, rather embrace what is useful in each one.
It’s not the other way around. Trying to fit the data, analytics and even the business decisions into an architecture pattern just because it sounds cool is not good. Holding the company’s architecture strategy and decisions to “just a word” is dangerous, and mostly a recipe for failure.
Remember, in the end, a proper data strategy will deliver more business value than chasing a shiny new architecture buzzword. Oh, by the way, how many buzzwords have you spotted on this article?