Data warehouse vs Data Mart Vs Data Lake

  • Data warehouse
    • Aggregation of data collected from multiple sources to a single central repository that unifies the data quality and format.
    • Highly curated data that serves as the central version of the truth
    • Meant to store structured data.
    • Mostly used for BI, Analytics, Data mining, Artificial Intelligence(AI) and machine learning
    • Example of use cases – Big data integration, NLP, Auditing, Reporting Systems, Tactical business analytics etc.
    • Size: Typically larger than 100 GB 
  • Data Mart
    • Subset of a data warehouse that benefits a specific set of users within the business or business unit.
    • Used to segment a large data warehouse into operable ones
    • Subset of data held in a data mart typically aligns with a particular business unit like sales, finance, or marketing
    • Example of use cases – A financial analyst can use a finance data mart to carry out financial reporting
    • Size: Typically less than 100 GB
  • Data Lake
    • Large repository of raw data , contains structured , semi-structured or unstructured data.
    • Data is aggregated from various sources and is simply stored
    • Not built to suit a specific purpose or fit into a particular format
    • Stored any data that may or may not be curated (ie. raw data)
    • Example of use cases- Machine Learning, Predictive analytics, data discovery and profiling
    • Size : Typically in PBs

Thank you !!!

One thought on “Data warehouse vs Data Mart Vs Data Lake

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s