A data warehouse is a database that is designed for querying, analyzing and reporting. A data warehouse usually gathers data from various sources, but those data will be sampled and cleaned, and then only is put into the datawarehouse application.
IMPORTANT POINTS ABOUT DATA WAREHOUSE
- A data warehouse is usually separate from your application database.
- While a regular application database only keeps enough information required to meet the current needs and archive the rest, a data warehouse will have current as well as historical data to support historical analysis.
- A data warehouse is generally read only as it is used only for querying, analyzing and reporting.
- Unlike an application database where data will be mostly normalized, data in a data warehouse is usually de-normalized to speed up the retrieval process.
- Having a good data warehouse is essential for successful business intelligence or analytics initiatives.
- Bill Inmon is considered as the father of data warehouse.
ADVANTAGES OF DATA WAREHOUSE
- A data warehouse is usually separate from your application database, and by having a separate data warehouse, we can query, analyze data and generate reports without affecting the performance of the application database.
- A data warehouse gathers data from various sources and provides a common data model, making it easier to report and analyze information from a single system rather than multiple sources.
- Any inconsistencies between data from different sources are identified and resolved while loading data into the data warehouse.
- Even if the source application data is purged over time, the data warehouse hold data for much larger period to support historical analysis.
- Data warehouses can be used for applications involving data mining, online analytical processing, market research etc. and they primarily help in taking complex business decisions.
OLTP AND OLAP SYSTEMS
- Traditional application databases are usually referred to as OLTP systems whereas as data warehouses can be considered as OLAP systems.
- OLTP stands for ‘On-Line transaction processing’, and is a systems that handles transaction-oriented applications like daily business transactions.
- A ticket reservation system is an example for an OLTP system.
- In an OLTP system every transaction that occurs may be recorded in real time.
- OLAP stands for On-Line Analytical Processing and is an approach to make querying of data faster.
- Databases configured for OLAP enable users to analyze multidimensional data interactively from multiple perspectives. Therefore data warehouses are often designed as OLAP systems.
- OLTP systems are generally normalized whereas OLAP systems are generally de normalized to increase query speed.
TYPES OF DATA WAREHOUSE SYSTEMS
There are many variants or types of data warehouse systems.
- In a time variant data warehouse, changes in information can be monitored over a period of time.
- In a real time data warehouse transactions are recorded into the data warehouse when they occur.
- A federated data warehouse is an integration of heterogeneous business intelligence systems.
- In distributed data warehouse, data warehouse is distributed across different physical systems.
- A small data warehouse focused on a specific area of interest is called a data mart.
- An enterprise data warehouse might be divided into smaller data marts to improve performance and to simplify the usage as only a specific area is concerned. You can also start with a data mart and then improve upon to build an enterprise data warehouse.
BIG DATA AND DATA WAREHOUSE (LIMITATIONS OF DATA WAREHOUSE)
Data warehouse applications has below limitations:
- Data is usually sampled. So you don’t get to look into the complete data.
- Data had to be sampled and cleaned before analyzing it. This might introduce some delay, which might make the data stale and hence might not capture the most recent events.