Challenge Results Verification

KAGGLE (leaderboard results)

Δ1w

↑6

↑3

↓2

Team Name ‡ model uploaded * in the money

AAAGV

Score

Entries

145

101

Last Submission UTC (Best − Last Submission)

Mon, 05 May 2014 20:50:39

Mon, 05 May 2014 19:27:34 (-0h)

Sun, 04 May 2014 13:47:26 (-0.3h)

Mon, 05 May 2014 15:54:48 (-18.1h)

0.94161

0.94102

0.94063

0.93956

‡ *

Matthias Ossadnik *

Ildefons Magrans ‡ *

Lukasz 8000

Fact sheets [pdf][form][summary]

Winners prize #1 (first place, verified) 500 USD and 1000 USD travel award + Award certificate

AAGV (Verified by Javier)

The code from the winning team AAAGV, which is publicly available on https://github.com/asutera/kaggle-connectomics, was run successfully on a desktop PC, it used 7 GB of RAM and it took 30h to run in single core mode on a 3 GHZ i7 CPU for each dataset.

The code is built in python and only uses standard dependencies. There was a issue with a specific library version but this has been resolved. Also you only need to run 1 script for the whole computation (main.py). From the valid dataset I obtained an AUC of 0.9426

and for the valid dataset and 0.9416 for the test dataset, which are the same as the ones reported in Kaggle.

Winners prize #2 (third place, verified) 250 USD and 750 USD travel award + Award certificate

Ildefons (Verified by Javier, Mehreen, and Bisakha)

Ildefons code, which is publicly available here https://github.com/ildefons/connectomics consisted of 6 separate scripts. The following are the time and memory requirements for each of the scripts. The main challenges were installing the required R package gbm and his script makeFeatures.R which needed 128 G. This R script started a MATLAB server in the SGE (Sun Grid Engine) background. I had to execute makeFeatures.R separately for normal-1, normal-2, valid, and test. His code was executed on our standard compute nodes on the cluster. The compute nodes have 2 INTEL CPUs, 16 processing cores, and 128 GB RAM.

The code passed verification successfully. His AUC for the Kaggle submission generated by us is 0.94066. This is better than his leader board score of 0.93900. The difference between the two scores is 0.00166.

Winners prize #3 (fourth place, verified) 100 USD and 400 USD travel award + Award certificate

Lukasz (Verified by Javier and Bisakha)

The code of this team is found at: https://github.com/lr292358/connectomics. The following are the details for Lukasz’s code. His code required executing a Python script run.py with different seeds. After that, mergeNormalize.py was used to average the outputs from different seeds. All of his code passed verification successfully.

The bottlenecks were installing theano (Python module) on the GPU units and gaining access to the GPU units. We have 5 cluster nodes with GPU accelerators. Each node has 1 accelerator. Each GPU has 2496 cores. The accelerator is NVIDIA Tesla Kepler (K20). The compute nodes have 2 INTEL CPUs, 16 processing cores, and 128 GB RAM.

After merging, his score is 0.93931, which is slightly better than his score of 0.93920 on the leader board. The difference between the two is is 0.00011 or, in other words, negligible.

Acknowledgement

This work has utilized computing resources at the High Performance Computing Facility of the Center for Health Informatics and Bioinformatics at the NYU Langone Medical Center.

Page updated

Report abuse