September 16, 2014

Higgs Challenge is over

This summer I spent a lot of time on an interesting competitive project, Higgs Boson Machine Learning Challenge. The goal of the competition is to make better predictions on H -> tau{\bar tau} signals out of a bunch of background included collision data (obtained by the ATLAS team in CERN), supposedly using machine learning methods and physical insights. I first know of this competition here. It is interesting to have real collision data at hand (even if it is suitably modified from raw data by project admins) and play with it. I had an experience of regression trees using R several years ago but did not know current developments of big data analysis using python and related packages, and it seemed reasonable to use one of such packages, xgboost, to be competitive at the contest. So I started from scratch to build a python environment, which took me few days but brought me within top 100 most of the time. One of the main modifications I have made on the xgboost demo program is to scale the weights of the data entries so that the AMS score is independent of the data size. At some point I was around top 30 and made further effort to build more original models from physical perspectives but could not improve the score. I ended up at 216th among 1792 contestants. The final result is here. The top 3 guys will receive cash prizes, congratulations!