Background
Noon is a popular online marketplace where multiple sellers list their products for buyers. However, having duplicate product listings from different sellers can result in a poor buying experience for users. The goal of this project is to implement a data-driven solution to identify and remove duplicate listings on the Noon marketplace.
Problem statement
One of the main challenges in this project is the large volume of data that needs to be processed to identify duplicate listings. Additionally, there can be variations in product attributes across different sellers, which can make it difficult to accurately identify duplicates. Another challenge is to ensure that all sellers receive equal opportunities to sell their products, while still removing duplicate listings.
Solution
To address these challenges, we propose a combination of pre-creation and post-creation checks using machine learning algorithms. In the pre-creation stage, we can use product attributes such as SKU, product name, brand, model, etc. to identify potential duplicates before they are listed on the marketplace. If a potential duplicate is found, we can prompt the seller to list their product under an existing listing or provide more distinguishing details.In the post-creation stage, we can use supervised and unsupervised machine learning algorithms to analyze product attributes and seller information to identify duplicate listings that were not caught during the pre-creation stage. We can use clustering algorithms to group similar products based on product attributes and seller information. Based on the results, we can remove duplicate listings or merge them under a single listing. analysis.
ResultsThe deliverables for this project include a machine learning model to identify potential duplicate listings before creation, periodic checks to identify duplicate listings that were not caught during the pre-creation stage, and a process for removing or merging duplicate listings. The effectiveness of the solution can be evaluated by measuring the percentage of duplicate listings that were removed from the marketplace and monitoring user feedback and satisfaction scores. The solution should ensure a fair and equal opportunity for all sellers while improving the buying experience for users on the Noon marketplace.