Classifying Indian addresses for the e-commerce domain
By T. Ravindra Babu
October 18, 2024
Summary
Address Complexity: The speaker highlights the challenge of dealing with complex addresses in India, where a single address can have multiple variants, making it difficult to ensure accurate shipments.
Acronym Processing: The talk reveals that acronyms play a crucial role in reducing vocabulary size and improving embedding approaches for tasks like clustering, non-deliverability prediction, and address classification.
Data Limitations: The speaker notes that the dataset cannot be publicly shared due to customer interest concerns, highlighting the challenges of working with sensitive data in the e-commerce domain.
Embedding Approach: The speaker discusses using embedding approaches like CBOW and phonetic distance to represent words and handle variations in address formats, improving the accuracy of shipments.
Generated using GPT-4o-mini.
Share
More Videos of our talks
Practical Testing Strategies for Databricks: A Software Engineer’s Journey into Data Engineering
What Happens As You Code with AI? Beyond Vibe Coding