Developed by Ralph Kimball in 1996, dimensional modeling was a data warehouse design technique optimized for online analytical processing (OLAP) rather than online transactional processing (OLTP). Most commonly implemented using star or snowflake schemas, dimensional modeling is optimized for speed and ease of use in reporting and analytics. By separating numerical measures from descriptive dimensions, dimensional modeling enables business users and analysts to explore data from multiple angles. It also supports fast query performance and improves flexibility as a business’s data needs evolve.
Dimensional models are one of the most popular data modeling techniques for building a modern data warehouse, allowing users to quickly develop facts and dimensions based on the business needs of an enterprise, says Data Engineer Journey , a website dedicated to sharing best practices in data modeling, ETL processing, and data governance.
According to Science Direct , “Dimensional models (DMs) are intuitive and identify the data required for business analysis and decision support. The DM is a logical design technique often used for data warehouses. It is the only viable technique for databases that are designed to support end-user queries in a data warehouse. Every dimensional model is composed of one table with a multi-part key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multi-part key in the fact table.”
The Relational Model and Early Data Challenges
In June 1970, Edgar F. Codd, a researcher at IBM, published the paper A Relational Model of Data for Large Shared Data Banks , which established the theoretical foundation for relational databases. Before Codd’s paper, the dominant database models were profoundly difficult to use and maintain from a modern perspective. They were complex, inflexible, and tightly coupled. Every data relationship must be predefined. Data stored in records connect in parent-child relationships. To access any piece of data, an application had to navigate this tree from the root down, one step at a time.
A user who wanted data not on a pre-defined path was either out of luck or had to develop a complex workaround through a labyrinth of linked records and pointers. Any system changing the data structure often required rewriting application programs, while the logical structure of the data was intertwined with its physical storage on a disk. User friendly, it was not.
The relational model was revolutionary not because it did something slightly better, but because it fundamentally changed the whole philosophy of data management. Based on a powerful mathematical foundation, it allowed users to question the data in a declarative query language. Instead of writing how to get the data, a user simply declared what he or she wanted. It democratized data access. Suddenly, non-programmers, like business analysts, could be trained to ask complex questions about a company’s data without writing low-level code. As a result, productivity skyrocketed.
Key Principles
Codd believed this complexity was a major barrier to building large, scalable, and flexible data systems, so he proposed all data in a database be represented as simple relationships. His key principles were:
Data Independence: the logical view of data (the tables users see) was separated from the physical storage details (how it’s stored on disk). This meant users could change how data was stored without breaking applications that queried it.
Simple Data Structure: Everything is a table. Each table has rows and columns. This simple structure was easy for users and applications to understand.
Set-Based Processing: Instead of navigating record-by-record, users could manipulate entire sets of data at once using high-level, declarative operations.
Entity-Relationship Modeling
In 1976, Peter Chen developed Entity-Relationship (ER) modeling, which, as Dreamio explains, “is a theoretical and conceptual way of representing data objects, and the relationships they share. Used primarily in database design, the ER Model simplifies the process of constructing relational databases by pictorially presenting data objects and associations, making it an essential tool for data analysts and software engineers.” ER became the standard for designing operational, transaction-processing systems (OLTP).
These systems were highly normalized to eliminate data redundancy and ensure integrity for processes like order entry. However, they had a problem. While excellent in transaction processing, ER models were terrible for end-user querying and analysis. Writing queries to answer simple business questions required complex joins across dozens of tables. These systems required deep technical expertise to operate as well.
In the late 1980s, a movement emerged to build “Executive Information Systems” (EIS), systems designed to give non-technical business leaders easy access to company data. However, developers wrestled with the question, “How should data be structured to allow a busy executive to get an answer quickly without writing SQL?” The initial answer was to de-normalize data into wide, easy-to-query tables. With this question, the seed that would later become dimensional modeling was sown.
Facts, Dimensions, and Star Schemas
In the Early 1990s, Ralph Kimball and his team created data marts to analyze retail sales figures. Facing the ER model query problem head-on, the team realized that all business questions could be framed as a simple process: “Measure by Context.” For example, what are the sales (measure) by product, store, and day (context)? What is the inventory count (measure) by warehouse and week (context)? Their breakthrough came when they formalized everything into a new structure that broke down into fact tables and dimension tables in the following ways:
Fact tables, which store quantitative data about business processes, such as sales, orders, transactions, or events, and typically include metrics (like sales amount) alongside foreign keys to dimension tables.
Dimension tables, which provide descriptive context to those facts, such as who , what , when , and where (e.g., customer, product, date, and location). They often include attributes that allow users to filter, group, or drill down into the data .
Facts Tables vs. Dimensions Tables
Facts Dimensions Numeric, measurable data (e.g., sales, clicks). Descriptive attributes (e.g., time, location, product). Stored in fact tables. Stored in dimension tables. Example: Revenue = $500. Example: Product = "iPhone 15".
Kimball seminal work, The Data Warehouse Toolkit, became the dimensional modeling bible, providing practical instructions for modeling data. It democratized data warehousing, making the concepts of star schemas, fact and dimension tables accessible to a new generation of data professionals. In the late 1980s and early 1990s, Kimball and his team independently recognized that the star schema wasn’t just a database trick, it was the perfect model for business intelligence (BI) and data visualization. Its simple structure mirrored how businesspeople actually thought, i.e., in terms of measures and context for those measures.
Formalizing the methodology, Kimball created a complete, practical methodology around it. He defined the precise terminology used today. He outlined the framework for building an integrated, enterprise-wide data warehouse by creating a set of standardized, shared dimensions that all data marts should use. It was a “blueprint” or “constitution” that ensured all the different parts of a data warehouse worked together. Known as Bus Architecture, the name was an analogy to a computer’s hardware bus. In a computer, different components (CPU, memory, hard drive) plug into a common bus with a standard interface to communicate. Similarly, in data warehousing, different data marts “plug into” a standard set of shared dimensions (the “data bus”) to ensure they can work together.
Kimball spread his ideas far and wide. Through his influential books and talks, a generation of data professionals learnt the how and why of star schemas. Kimball’s dimensional data marts are still relevant today because their core principles, especially the star schema and conformed dimensions, provide a clear, consistent, and optimized framework for BI and analytics, including AI. While modern platforms and big data have changed the landscape, the fundamental need for structured, understandable data for reporting and decision-making remains, making Kimball’s methodology a valuable foundation for data warehousing.
Star Schemas
“A star schema is a type of relational database schema that is composed of a single, central fact table that is surrounded by dimension tables,” states IBM . It is a popular data modeling technique that organizes data into a simple and intuitive structure optimized for fast querying and analysis. This structure, with a fact table at the center surrounded by dimension tables, was named the “Star Schema” (see figure 1) due to its resemblance to a star.
According to Databricks , the U.S. data and AI company, “A star schema is a multi-dimensional data model used to organize data in a database so that it is easy to understand and analyze. Star schemas can be applied to data warehouses, databases, data marts, and other tools. The star schema design is optimized for querying large data sets.”
Figure 1: Star Schema (Dimensional Model) for property sales of DreamHome. Source: Database Systems by Thomas Connolly
In its article, 3NF vs Dimensional Modeling , Matillion, the intelligent data integration platform, claims “The main advantage of a well-designed star schema is that multiple fact tables can use the same dimensions. This makes it simple for users to link different events and gather insights. A good hint that your dimensional model is well-designed is when the same dimensions are used repeatedly.”
The Universal Language of Analytics
A star schema is the fundamental bridge that allows everyone inside an organization to have a coherent, data-driven conversation. The star schema is considered the universal language of analytics because it:
Standardizes how users structure data for decision-making.
Simplifies complex technical concepts for business users.
Optimizes for both human understanding and machine performance.
Unifies an organization around a single version of the truth.
“Star schemas are efficient at storing data, maintaining history, and updating data by reducing the duplication of repetitive business definitions, making it fast to aggregate and filter data in the data warehouse,” adds Databricks.
Snowflake Schema
While a star schema is like a flat, one-page company directory, a snowflake schema is like a multi-page, hierarchical org chart. A snowflake schema (see Figure 2) “is a multi-dimensional data model that is an extension of a star schema, where dimension tables are broken down into subdimensions. Snowflake schemas are commonly used for business intelligence and reporting in OLAP data warehouses, data marts, and relational databases,” says Databricks .
Normally, analytical queries must scan large volumes of data to sum, average, or count values. However, with a star schema, a query only needs to join one large fact table to several small dimension tables. These are usually straightforward key-based joins (e.g., Product_ID in the fact table to Product_ID in the dimension table). The fact table is often designed like a long, thin table, which can be scanned quickly and efficiently. As a result, complex reports that would take minutes to run on a transactional database surface data in seconds.
Figure 2: Snowflake schema
“The main practical difference between star and snowflake schema from the perspective of an analyst has to do with querying the data. You need more joins for a snowflake schema to gain access to the deeper levels of the hierarchical dimensions, which can reduce query performance over a star schema. Thus, data analysts and data scientists tend to prefer the simpler star schema,” claims Data Engineer Journey .
Tale of the Tape: Star Schema vs. Snowflake Schema
Attribute Star schema Snowflake schema Read speed Fast Moderate Write speed Moderate Fast Storage space Moderate to high Low to moderate Data integrity risk Low to moderate Low Query complexity Simple to moderate Moderate to complex Schema complexity Simple to moderate Moderate to complex Dimension hierarchies Denormalized single tables Normalized over multiple tables Joins per dimension hierarchy One One per level Ideal use OLAP systems, Data Marts OLTP systems
Source: Data Engineer Journey
The Star Schema Ecosystem
The power of the star schema lies in the fact that it’s not just a data model, it’s a target. An entire ecosystem of tools built specifically to create, manage, and query data structured in this way has sprung up. These tools can be categorized by their function in the data pipeline, including:
Data Warehouse Platforms — The foundational databases housing the star schemas. Modern cloud data warehouses are particularly optimized for this structure, including:
Snowflake: Heavily optimized for star schema queries. Its separation of compute and storage means it can perform massive joins between large fact tables and dimension tables very efficiently.
Amazon Redshift: Originally designed for OLAP workloads, it excels at querying star schemas, especially with its columnar storage architecture.
Google BigQuery: A serverless data warehouse that automatically scales to run fast queries on star schemas using its powerful internal execution engine.
Microsoft Azure Synapse Analytics: Dedicated SQL pools within Synapse are built for running high-performance analytics on dimensional models.
Teradata: A classic enterprise data warehouse platform that has been a workhorse for large-scale star schemas for decades.
These platforms use columnar storage, which is perfect for star schema queries. When you run SELECT SUM(sales) FROM fact_sales GROUP BY product_category, the database only needs to read two columns: the sales amount from the fact table and the product_category from the dimension table. This is an incredibly fast process.
Here, a star schema truly shines. BI tools are built to intuitively understand the relationship between facts and dimensions. Some of the best BI tools include:
Tableau: The drag-and-drop interface is a perfect match for star schemas. Users can drag a dimension (e.g., Month) to columns and a measure (e.g., Sales) to rows to instantly create a chart. Its “data model” view allows users to define these relationships visually.
Microsoft Power BI: Its internal “Tabular” model is essentially a semantic layer built on star schema principles. The “Manage Relationships” feature is designed to connect fact and dimension tables. Features like Q&A (“show me sales by product”) work best with a well-defined star schema.
Qlik (Qlik Sense & QlikView): Its associative engine inherently understands the links between tables, making it ideal for exploring data in a star schema without writing complex SQL.
Looker (now part of Google Cloud): Relies on a defined semantic layer where developers explicitly define joins between facts and dimensions, which then empowers business users to explore data easily.
Domo: Cloud-based BI and data visualization platform whose engine is optimized to query data most efficiently when that data is structured like a star schema. Its core purpose is to help organizations integrate data from various sources, turn it into actionable insights, and share those insights in real-time through easy-to-understand dashboards and apps.
These tools automatically generate the complex SQL JOIN statements needed to query the star schema. Figure 3 shows a user dragging and dropping a measure into the “card builder”, the main workspace for creating visualizations.
Figure 3: Spider chart in Domo, showing a dashboard under construction
These tools are used to create the star schemas. They help transform raw data from source systems into clean fact and dimension tables.
dbt (data build tool): The modern standard for transforming data in the warehouse. Analysts use these tools to build:
Dimension models: dim_customer.sql, dim_date.sql
Fact models: fct_orders.sql, fct_sales.sql
SQL scripts: The classic method. Data engineers write complex SQL CREATE TABLE AS or INSERT statements to build fact and dimension tables.
Traditional ETL tools that have built-in components for slowly changing dimensions (SCD) and surrogate key generation.
The Choice of Schema
The reason for dimensional modeling is to enable BI reporting, data querying, and data analysis. The key concepts in dimensional modeling are facts, dimensions, and attributes, but it’s not as simple as that. There are different types of facts, depending on whether they can be added together. Dimensions can have different hierarchies. Attributes can define the who, what, where, and why of a dimensional model. The grain, or level of granularity, is another key concept with dimensional modeling because it determines the level of detail. Facts, dimensions, and attributes can be organized in several different types of schemas, including star, snowflake, multidimensional, and multifact star.
The choice of schema depends on variables such as the type of reporting that the model needs to facilitate and the type of BI tool being used. Building a dimensional model includes additional puzzle pieces such as calendar and time dimensions; and more complicated pieces such as conforming, degenerative, and role-playing dimensions; and consolidated fact tables.
Separating Data Mart Fact from Data Warehouse Fiction
Of course, data warehouse implementations are where the rubber meets the road. The internet is filled with thousands of articles saying that you can build a data warehouse using dimensional modeling. However, they rarely discuss slow-changing dimensions that can result from regularly scheduled changes. This occurs in a dimension when a data’s attribute changes slowly and unpredictably over time, like an address change for on a customer record. These are usually impossible to predict as are many data record changes.
Speaking from personal experience, almost all of the data warehouses that are built using dimensional modeling are not true data warehouses. Instead, they are small to large data marts, rarely reaching hundreds of terabytes. Data marts are also at the front-end of data warehouses, so many professionals don’t truly understand the architecture sitting behind them. It is a rare case when I see an analytical implementation that isn’t a massive group of small independent data marts. Independent data marts is one of the biggest problems in our industry and it is a chief reason why 80% of analytics are no longer used or even completely redundant.
Having personally implemented several dimensional models, it’s clear they are designed for extremely fast end-user access; however, the time it takes to load them is significant if you have a lot of data. Finally, the best data warehouse architecture is where a 3NF, integrated, non-volatile, subject-oriented, and time-variant data warehouse is built with dependent data marts that input their data from a data warehouse.
The Universal Language of Analytics
Pioneered by Ralph Kimball in 1996, dimensional modeling revolutionized data warehousing by providing a design optimized for analytical processing rather than transactional systems. By separating measurable facts from contextual dimensions and organizing them typically in a star or snowflake schema, dimensional modeling enables business users and analysts to explore their data effortlessly. Building upon foundational concepts from the relational model, it addresses the complexity limitations of traditional normalized models and entity-relationship designs for analytical queries.
Dimensional modeling’s strength lies in its intuitive simplicity and performance efficiency, making it the preferred approach for modern data warehouses and BI solutions. Tools like Tableau, Power BI, Domo, Qlik, and Looker are helping companies deep dive into their data, turning it into an asset they can better monetize.