DATA WAREHOUSE VERSUS DATA MART: THE GREAT DEBATE
by Daniel T. Graham
Customers exploring the field of business intelligence for the first time often lead with: What is the difference between a data warehouse and a data mart? The next question follows as predictably as night follows day: which one does my company need?
Let me start by saying that the two terms are often confused. Indeed, some people in the industry use them virtually interchangeably, which is unfortunate, because they do reflect a valuable hierarchical difference.
The Data Warehouse
A "data warehouse" will typically contain the full range of business intelligence available to a company from all sources. That data consists of transaction-processing records, corporate and marketing data, and other business operations information; for example, a bank might include loans, credit card statements, and demand deposits data, along with basic customer information. This internal data is frequently combined with statistical and demographic information obtained from outside sources. The cross-divisional nature of the data on file explains why a data warehouse is often called an "enterprise warehouse" -- because the wealth of data it gathers supports the informational needs of the corporate enterprise as a whole.
The Data Mart
Here we move to the next level down in the information hierarchy. A company's marketing, purchasing and finance departments will all make use of data stored in the enterprise warehouse. In many cases they will use the same data, but each department will massage that data in different ways. So each department sets up its own "data mart" designed to extract data from the enterprise warehouse. The key point here is that each mart processes the data in a form which suits its own departmental needs.
Differences defined
Here we see the difference between the two hierarchical levels. At the top of the information chain, a data or enterprise warehouse is "application-neutral." The task of the warehouse is first to store, and then supply, information to different users.
By contrast, a data mart is "application-specific." Data held in the warehouse will be accessed by several departments or divisions, each of which has a specific "single-subject" interest in the warehoused data, be it finance, human resources or marketing. Each area will set up its own "data mart" to service closely defined user-specific needs. From this we see that a mart is designed to solve one business problem well; it is not set up to cope with a variety of needs.
Ideally, warehouse and marts should coexist, complementing each other's roles. Technology cannot keep pace while human minds dream up ever more complex business demands. So we have to evolve strategies to determine when to build marts and when to build warehouses. Let's assume that a major corporate division's IT group finds that several marts are using much the same data. The question arises: "Why not combine the marts?" The reason is usually that one of the marts is delivering a specific price performance objective, an objective which would be impossible if data for other concepts were merged into that mart.
In such cases, the desired analysis depends on some specific denormalization to achieve its goal. But even if no such over-riding objective stands in the way of merging two marts, one still has to ask: "Does the combined data model give me an analytical capacity I didn't have before?" This question usually yields several answers, many of which point in the direction of creating a warehouse. After sober second thoughts the debate usually turns to more important considerations than an immediate need to reduce "redundant" storage costs. This tells us that, if combining the loans data mart with the credit card mart and the checking account mart will enhance our understanding of customers' purchasing habits, we should be thinking about setting up a warehouse to service several needs.
The decision to set up a mart usually originates in that part of an organization with the most business "pain", and, thus, the opportunity for greatest gain.
A warehouse usually comes into being when a senior executive notices a business problem recurring in several departments. Subsequent discussion reveals a greater need for cross-divisional data analysis than anyone thought. So a warehouse is born to collectively help all divisions behave as a single corporation.
In summary, a mart is born from a single department's urgent need to solve a problem. A warehouse is born when executives discover a common problem affecting different departments AND decide that they can obtain added value by getting a cross-departmental view of the data. Ideally, the warehouse is the best place to start, but that may not reflect the real world. Ultimately you pick your starting point knowing that, over time, you may end up with several warehouses and marts.
So the debate should not be mart versus warehouse, but rather which applications are best served by a mart or warehouse design, followed by how the highest priority implementation will fit into a three to five year plan. And the difference itself, summarized in business terms, depends on whether the system is inter-departmental, pulling data from multiple major subsystems (loans, credit, trust, checking, etc.). A mart might model a small number of major entities; a warehouse by definition models several. Thus, a mart will model just customer loans, payments and forecasts, whereas the warehouse would combine this with checking account transactions, credit card transactions, and so on.
Marts tend toward heavy summarization. The epitome of this characteristic is represented by the Essbase OLAP cube -- all summary, no details. Thus marts focus on solving predefined questions. Knowing that lets us fine-tune their performance responses.
Let's phrase the question "either/or," not "versus" I hope by now it is clear that we are not looking at a "Warehouse versus Mart" debate. There is nothing adversarial about the process. Selection revolves around complementary roles.
Accuracy of data marts and data warehouses
As long as I'm on the subject of warehouses and marts, I feel it's my duty to talk about the importance of transforming data into useful context. For data analysis to be carried out efficiently and effectively, it's critical for a data warehouse and/or data mart to provide "a single version of the truth," a term one hears in the industry. To accomplish this, data must be extracted and transformed in one place, for the marts and the warehouses alike, so we can have consistency across them. Therefore, Joe Brown in the mart is not confused with J.P. Brown in the warehouse. The process of eliminating these differences in accuracy and context saves the company considerably -- mailing costs are high as it is, so why mail the same piece twice to a single person?
Analysis: good for what ails you
Some situations are best analyzed at a warehouse level; other, more specific conditions are best examined in a data mart. A useful analogy is to imagine the difference between a data warehouse and a data mart as equivalent to the difference between a general practitioner and a specialist. The GP has a broad knowledge of many disciplines, not least of which is understanding human nature. But one does not expect a GP to perform open-heart surgery or cataract operations, procedures appropriate to specialist training. However, one does expect a GP to diagnose heart disease or eye problems, along with a host of other conditions, referring patients to specialists for treatment, as appropriate. Similarly, competent analysis at a data warehouse level -- the GP -- will give senior management an excellent overview of the whole operation, or of customers' buying habits. That may be enough to prescribe an overall treatment and a few symptom-specific prescriptions, too. Beyond that, if you want to solve a single problem very well, send for a specialist, call in the mart.
At IBM, we come at each customer's challenge from a very specific point of view -- the customer's point of view. It is essential to establish exactly what the customer wants. Only when vendor and customer have worked out the precise parameters of a business problem should they move to the next step: identifying the appropriate solution. We think that if we as a vendor do our homework right, both we and our customer will come out of negotiations happy -- and our customer will prosper.
---
For more information, see http://www.ibm.com/bi