Problem Statement: A leading online retailer faces a sudden dip in profitability for a $999 product at the beginning of 2022, despite a consistent trend in 2021. Our analysis aims to uncover the reasons behind this decline.
Key data points include:
-
Shopping Event (Binary): Indicates special sales events.
-
Ad Spend (Numeric): Investment in advertising campaigns.
-
Page Views (Numeric): Visits to the product detail page.
-
Unit Price (Numeric): Price, factoring in temporary discounts.
-
Sold Units (Numeric): Quantity of units sold.
-
Revenue (Numeric): Daily sales revenue.
-
Operational Cost (Numeric): Daily operational expenses.
-
Profit (Numeric): Daily net profit.
Review Data:
-
Shape: 455 rows and 9 columns
-
Null Values: There is no null value in the dataset
-
Columns : By unique values we came to know that there is one categorical column ie: Shopping Event? rest of the columns have continuous data
-
Data Type : Date and shopping events are object and boolean data types which can not be fed to the learning algorithm so they are required to be transformed.
-
head: date column needs to be divided into day, month and year.
Exploratory Data analytics
In exploratory data analytics we are expected to know reason behind the dip in 2022
To visualize the data against the date we have extracted the year and month from the date field and also changed the data type of the date column from object to datetime by using pandas to_datetime method and extracted the year and month by using pandas DatetimeIndex method.
Profit Vs Date:
By visualizing profit viruses date we can see if there is actually a dip in profit in 2022 and visualize the strength as well. We will be using the seaborn library to visualize the data.
In plot 2 we can clearly see the dip in profit as the year passes. By plot 4 we can see that there is a churn in the customer as compared to fiscal year 2021.
Continuous Variables Vs Profit
Categorical variable Vs Profit
I have used a violin graph to visualize the categorical feature ie: shopping event? Vs profit, i have also marked the divergence in the graph as well.
Here the divergence the the year 2021 to 2022 is not very large though the causal effect might be more that we will calculate further in this document.
Date Vs Categorical & Continuous Variables
-
Plot 1: Avg. 'Ad spend' is relatively the same for both the year but there is more in July 2022 which increases revenue for the same month of the same year.
-
Plot 2: Avg. 'Page view' is very low in 2022 which might be the main culprit behind the decrease in profit.
-
Plot 3: Surprisingly avg unit sold per indices per month is more in 2022 dispite low profit
-
Plot 5: Operational cost is high is 2022
-
Plot 6: unit price is very low in 2022
Correlation map
-
Pearson correlation defines that there is a strong relation between all the variables, but we are interested to see an exclusive relationships so we can telly kendall tau and spearman correlation which can give the direction of the relationship as well.
-
By Concluding kendall tau and spearman correlation we noticed below relations 2.1 page views and profit have positive relation
Conclusion for EDA
We can conclude from above EDA that page views have a definite correlation that might be affecting the profit in 2022. We also saw strong correlation for page view and profit in spearman correlation
Causal Inference mathematical approach
It is the field of data science that aims to quantify the cause and effect relationship between the variables.
I am calculating causal effect by mean of ITE ie: individual term frequency.
Ref. Article: Recent Developments in Causal Inference and Machine Learning
I am using pandas dataframe for loading data and and numpy for calculation let's look at the code snippet
By using the seaborn library i am plotting bar graph for causal effect of all the columns that we have calculated using above calculation.
causal effect for shopping event is $12,68,282. which means when the shopping event is true it causes the profit to increase by $12,68,282. So the treatment is associated with increase in profit by coefficient 1268282.
Root Cause Analysis using mathematical approach
I am using Kullback-Leibler Divergence for calculating the divergence between the data for fiscal year 2021 and 2022.
Output
By using KL divergence we found top three root factors ie: Operational cost, Revenue, Page Views.
Causal Effect and Root Cause analysis using Dowhy
Dowhy is an open source Python library that aims to spark causal thinking and analysis. DoWhy provides a principled four-step interface for causal inference that focuses on explicitly modeling causal assumptions and validating them as much as possible.
In Dowhy the first step is to provide a causal relationship to the dowhy model. We have to provide the causal relation by our own experience. We have to pack the relation in networksx object container and then pass it to the model. Let's look at the code snippet.
We can also plot the relation as well.
Next step is to fit the data into the model.
Causal effect using Dowhy
Root Cause Analysis using Dowhy
For calculating the root cause we use attribute_anomalies method of Dowhy. This require an input of a causal graph which we had already added while finding causal effect.
Conclusion
In this article, we have compared root cause analysis and causal inference by classical approach, mathematical approach and by using Dowhy library. In the classical approach we have using exploratory data analytics, in the mathematical approach we computed causal effect and divergence/ root cause using mathematical formulas, then we used dowhy open source library to compute the same ie: causal effect and root cause.
The classical and mathematical approach give better understanding of the data whereas in dowhy it is very convenient to find the result but in dowhy we are not able to understand the backend calculation. Dowhy is a bit more computational whereas the mathematical approach is simple and less computational
Potential Solution
The decline in page views led to a drop in sales, and the decline in unit pricing had a detrimental effect on revenue as well. Let's talk about some of the possible fixes that may be put into practice for increasing page views.
-
Optimize your website for organic search: Search is the top way both individuals and businesses research new products and services. This means it’s critical for you to make sure search engines find your website and bring it to the attention of the right people through search engine optimization (SEO).
-
Invest in paid search
-
Engage in social channels
-
Work with influencers: Find social influencers and bloggers who have sizable audiences in your target demographic. Posts shared by influencers can help boost awareness and your SEO value if they feature your products in an authentic way.
-
Write blogs or articles: Publishing original content, either on your own blog or on industry websites, can position you as a thought leader.
-
Drive awareness with public relations (PR): There are plenty of ways to do PR on a budget, either on your own or with the help of a small agency or freelancer. Local publications and websites are always looking for interesting stories and contacting their editors with a pitch can lead to wonderful visibility that will attract high-potential website traffic.
-
Use retargeting display ads: Retargeting can capture the attention of customers who visit your website but leave without making a purchase.
-
Make the most of email: Email is still the preferred method of communication for many buyers, and it can increase traffic from your existing audience. Emails that connect at different points in the customer journey can be automated with minimal effort, freeing you up for other important tasks. These include welcome emails that introduce new customers to your brand, abandoned cart emails to bring people back to complete a purchase, messages highlighting best-selling items, and more.