Classifying Indian addresses for the e-commerce domain
By T. Ravindra Babu
October 18, 2024
Summary
Address Complexity: The speaker highlights the challenge of dealing with complex addresses in India, where a single address can have multiple variants, making it difficult to ensure accurate shipments.
Acronym Processing: The talk reveals that acronyms play a crucial role in reducing vocabulary size and improving embedding approaches for tasks like clustering, non-deliverability prediction, and address classification.
Data Limitations: The speaker notes that the dataset cannot be publicly shared due to customer interest concerns, highlighting the challenges of working with sensitive data in the e-commerce domain.
Embedding Approach: The speaker discusses using embedding approaches like CBOW and phonetic distance to represent words and handle variations in address formats, improving the accuracy of shipments.
Generated using GPT-4o-mini.
Share
More Videos of our talks
eBPF for Developers: Observability Without Overhead
Privacy Preserving Machine Learning
Best Practices for Highly Available Applications on Kubernetes