What Is a Data Mart?
A data mart is a warehouse that collects data from operational data and other data sources serving a particular group of professionals. In terms of scope, data is extracted from an enterprise-wide database or a more specialized data warehouse. The point of a data center is that it caters to the specific needs of professional user groups in terms of analytics, content, performance, and ease of use. Users of data centers expect data to be represented in terms they are familiar with.
Data Mart vs Data Warehouse:
A data warehouse is an integrated, subject-oriented collection of data designed to support DSS (Decision Support System) function. In a data warehouse, each data unit is associated with a specific time. A data warehouse, consisting of atomic-level data and lightly aggregated data, is a topic-oriented, integrated, non-updatable (stable), time-varying collection of data to support the decision-making process in business management.
Data Mart vs Data Warehouse
Then the data mart is a subset of enterprise-level data warehouse, which is mainly oriented to department-level business and only oriented to a specific topic. In order to solve the contradiction between flexibility and performance, a data mart is a small department or workgroup level data warehouse added to the data warehouse architecture. Data marts store pre-computed data for specific users to meet users’ performance needs. They can alleviate the bottleneck of accessing data warehouses to a certain extent.
Data Mart Characteristics:
- small in size;
- have specific applications;
- defined, designed, and developed by business units;
- managed and maintained by business units;
- can be implemented quickly;
- cheaper to buy;
- rapid investment recovery;
- tight integration of toolsets;
- provides a more detailed, preexisting, summary subset of the data warehouse;
- upgradable to a full data warehouse;
The Data Structure of A Data Mart:
The structure of data in a data mart is often described as a star or snowflake structure. A star structure consists of two basic parts – a fact table and various supporting dimension tables.
The fact table describes the densest data in the data mart. In a telephone company, the data used for calls is typically the most dense. In banks, data related to reconciliation and automated teller machines is typically the most intensive. For retail, sales and inventory data are the most dense and so on.
A fact table is a combination of multiple types of data that are pre-joined together, including: a primary key of the entity that reflects the purpose of the fact table, such as an order, a sale, a phone call, etc., primary key information, foreign keys that connect the fact table to the dimension table, and non-key external data carried by the foreign keys.
If this non-key external data is frequently used for data analysis in the fact table, it is included in the scope of the fact table. Fact tables are highly indexed. It is very common to have 30 to 40 indexes on a fact table. Sometimes each column of the fact table is indexed, and the result is that the data in the fact table is very easy to read. However, the number of resources required to import the index must factor into the equation. Normally, fact table data cannot be changed, but data can be entered, and once a record is entered correctly, nothing can be changed for that record.
2. Dimension Table
Dimension tables are built around fact tables. The dimension table contains non-intensive data that is linked to the fact table via a foreign key. Typical dimension tables are based on data marts, including product catalogs, customer lists, vendor lists, and so on.
The data in the data mart comes from the enterprise data warehouse. All data, with one exception, should go through an enterprise data warehouse before being imported into a data mart. The exception to this is the specific data used in the data mart, which cannot be used elsewhere in the data warehouse. External data usually falls into this category. If this is not the case, and the data is used elsewhere in the decision support system, the data must go through the enterprise data warehouse.
Data marts contain two types of data, usually detailed data and aggregated data.
1. Detailed Data
As described earlier, the detailed data in the data mart is contained in a star structure. It is worth mentioning that the star schema is well aggregated as the data passes through the enterprise data warehouse. In this case, the enterprise data warehouse contains the necessary basic data, and the data mart contains the data of higher interval size. However, in the minds of data mart users, the star-structured data is as detailed as it was when it was acquired.
2. Aggregated Data
The second type of data that a data mart contains is aggregated data. Analysts typically create various aggregated data from data in a star schema. A typical rollup might be the total monthly sales for the sales territories. Because the basis of aggregation is constantly evolving, historical data is in the data mart. But the advantage of this historical data lies in the level of generalization it stores. Very little historical data is kept in the star schema.
Data marts are updated based on enterprise data warehouses. It is not uncommon for them to be updated about once a week. However, the update time of the data mart can be less than one week or more than one week, which is mainly determined by the needs of the department to which the data mart belongs.
Data Mart Types:
The data of the independent data mart comes from the operational database, which is an analytical environment established to meet the needs of special users. The development cycle of this kind of data mart is generally short and flexible, but because it is separated from the data warehouse, an independent data mart may lead to the existence of information islands, and data cannot be analyzed from a global perspective.
The data of the subordinate data mart comes from the data warehouse of the enterprise, which will prolong the development cycle, but the subordinate data mart is more stable in architecture than the independent data mart, which can improve the quality of data analysis and ensure the data consistency.
Data Mart Advantages:
- It is one of the most cost-effective alternatives for data warehouses where you only need to process a small subset of data.
- Separating data from sources will make data marts efficient because a specific group of people can work on data from a specific source, rather than everyone using the data warehouse.
- If we know which subset we need to access, we can use data marts to access the data faster.
- Easier to use so end users can easily query it.
- Because the data is segregated in groups, it takes less time to enter the implementation-time data mart than in a data warehouse.
- Historical data from specific topics can be used for easy trend analysis.
Steps to Implement a Data Mart:
Step 1. Designing:
This will be the first step of implementation, in which all the tasks and sources needed to collect technical and business information are identified. A logical plan is implemented later, and after review, this is converted to a physical plan. Also, here the logical and physical structure of the data is decided like how to partition the data and partition fields like date or any other file.
Step 2. Construction:
This is the second stage of the implementation, the generation of the physical database with the help of the RDBMS is determined as part of the design process and logical structure. Create all objects like schemas, indexes, tables, views, etc.
Step 3. Populating:
This is the third stage, where you populate the data as you fetch it. All necessary transformations are implemented before data is populated.
Step 4. Accessing:
This is the next step in the implementation, we will use the populated data to query to create a report. End users use this step to understand the data using the query.
Step 5. Managing:
This is the final stage of the data mart implementation, where tasks such as access management, system optimization and tuning, managing and adding new data to the data mart, and planning recovery scenarios to handle any failure scenarios are handled here.
Thank you for reading our artile and we hope you’ve enjoyed it. If you want to learn more about data goveranance, we would like to advise you to visit Gudu SQLFlow for more information.
As one of the best data lineage tools available on the market today, Gudu SQLFlow can not only analyze SQL script files, obtain data lineage, and perform visual display, but also allow users to provide data lineage in CSV format and perform visual display. (Published by Ryan on Jun 22, 2022)