Introduction
eBev, an online brewery, was facing challenges in managing its operations and sales effectively. They had a vast amount of data but lacked the necessary tools and expertise to turn it into actionable insights. I was brought on board to help eBev leverage data analytics and warehousing to optimize their operations and sales.
Overview
-
The client wanted to create a centralized repository for all organizational data.
-
The goal was to make the data accessible to everyone and create an organized data source.
-
The client wanted continuous data to refresh the repository.
-
The main objective was to convert raw data into actionable insights for better decision-making.
-
The client wanted to build a data lake-house architecture on the cloud to achieve this.
Requirements
-
Ingest data from different sources.
-
Store data securely in Azure Data Lake Storage.
-
Perform data transformations using Azure Databricks.
-
Utilize Apache Airflow for orchestrating and scheduling the entire workflow.
-
Ensure scalability and fault tolerance.
Architecture
Description
-
Data Sources: Multiple data sources feed into the system, and data is ingested into Azure Data Lake Storage (ADLS).
-
Data Processing: Azure Databricks is used for batch processing and data transformations.
-
Workflow Orchestration: Apache Airflow orchestrates the entire workflow, scheduling tasks and triggering data processing jobs.
-
Data Analysis: Processed data in ADLS can be queried and analyzed using Power BI for advanced analytics.