Top 16% Solution to Kaggle’s Product Classification Challenge

Kaggle is a platform for predictive modelling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. As of May 2016, Kaggle had over 536,000 registered users, or Kagglers. The community spans 194 countries. It is the largest and most diverse data community in the world (Wikipedia).

One of my first Kaggle competitions was the OTTO product classification challange. OTTO is one of the world’s biggest e-commerce companies. For this competition, OTTO has provided a dataset with 93 features (all features have been obfuscated) for more than 200,000 products. The objective was to build a predictive model which is able to distinguish between their main product categories. There are nine categories for all products.

Kaggle also allows users to publicly share their code on each competition page. It helped me a lot to check out some other people’s code before getting started.

You can find the R-based solution on my Github.