Realizing the most value from cloud migration can be achieved through adopting a new architecture to suit legacy and current databases
Cloud adoption rates continue to increase due to the cost savings, flexibility, reliability, scalability, and often improved security that can be accomplished using a cloud solution. A PaaS (platform as a service) approach can be used to host existing legacy databases, and organizations can obtain the most value from a cloud migration by considering a revision of data architecture that fits modern needs for data, perhaps even progressing to the level of providing Daas (Data as a Service) for an enterprise.
Cloud Migration Techniques
There are many things to consider when migrating a platform to the cloud – networking, security, user access, administration, etc. – but an important consideration to make is how data will be ingested, stored, and managed. One approach would be to just copy databases over and be done with it (lift and shift). However, there are challenges with this approach, including the issue of legacy mainframe databases – does the Database Analyst (DBA) place the data in a relational database (e.g., Amazon Redshift, SQL Server) without restructuring it (e.g., normalizing) to align with relational theory and business rules?
If the organization has a highly normalized database that needs to be used for analytics that might benefit from some level of de-normalization (e.g., dimensional model) that can improve ease of use and query performance, there are options to consider. There is of course no one size fits all model – the important thing is to identify the optimal data architecture for each organization.
The “lift and shift” approach is an appealing approach – and is a viable approach in some cases. However, the approach to take needs to be considered carefully. “Lift and shift” might be easier at first, since one does not have to go through all the data architecture and data analysis necessary to restructure and integrate the data according to the inherent nature of the data (i.e., aligned with business understanding) rather than using the existing structure. Eventually not taking the time to model the data appropriately can cause increased expense, data quality issues, and increased difficulty in using the data, especially if the organization wants to do new things with the data. For example, simply copying hundreds of Enterprise Resource Planning (ERP) tables into a separate database probably won’t provide out-of-the-box performance for analytics.
Of course, how the designer structures the data will be dependent upon the application – but looking to provide flexible data structures which can work with more than one application is an ideal way to optimize a cloud migration’s value. By using a DaaS approach, the data is stored, managed, and quality assured centrally but then accessed or written as needed by multiple applications (e.g., via SOA or Kafka topics, ODBC/JDBC). Designing an enterprise data store for analytics by simply copying the data over without restructuring it appropriately can lead to double counting, poor quality data, and poor performance. This shows the need to know the purpose of the data store and to apply the proper design approach to the requirements and the structure’s intended use.
An often appealing approach is to utilize a Data Lake approach where all the data is simply copied into Hadoop on the cloud (e.g., Azure HDInsight, Amazon Elastic MapReduce) where it can be made accessible in a single place. Some advantages to this approach include being able to host all data in Hadoop without restructuring. However, just because the data is stored in a central location does not make for an integrated database. So, the benefits achieved by storing all the data centrally are quickly overcome by the increased expense incurred in trying to understand and tie that data together in a meaningful way.
Strong data governance, metadata management, and data models (which help to curate the data, even if the data isn’t physically structured according to the conceptual or logical data models) are necessary to achieve value from a data lake. Combining Hadoop with a traditional Enterprise Data Warehouse (EDW) where business rules and integration patterns are implemented may be the best approach for an enterprise-level data store for reporting and analytics. Hadoop can serve as an effective staging area for an EDW and can also augment the EDW by enabling offloading of very large tables to Hadoop Hive and then a BI tool (e.g., Qlik, Tableau) can tie that information together for presentation (assuming a common keying paradigm is in place).
At one client (a healthcare insurer), mainframe data was copied over into a relational database (RDBMS) without restructuring or normalizing the data and there wasn’t a business oriented data model accompanying the data. In one table with over 600 columns there were multiple levels of granularity embedded (claim header, claim line, etc.,) with numeric measures at each of these levels of granularity as well as multiple sets of repeating groups (col1, col2, col3…) which made it very hard to use accurately. In fact, a program had to be written for each end user request. Additionally, this table was stored in a row-based (vs. a columnar) RDBMS – so the entire record had to be read up into memory even if just a few columns were needed, causing performance issues. So while it was (relatively) easy to move the mainframe data into a relational structure, the increased cost of using this structure necessitated development of a new data store to handle analytical queries.
Of course, each situation is different and sometimes “lift and shift” is the best approach. But be sure to consider the data architecture when starting a cloud migration strategy to ensure realizing the most value from a cloud migration.