Setting, Prizes, Schedule

Tasks of the challenge





Tutorial, Sample code



Setting, Prizes, Schedule

What is the goal of the challenge?

The goal is to reconstruct the structure of a neural network from temporal patterns of activities of neurons. The activities are obtained from video recording of calcium fluorescence imaging. [Learn more...]

Are there prizes and travel grant?

We will attribute 3000 USD in prizes to the winners donated by ChaLearn:

First place: 500 USD and 1000 USD travel award (*) + Award certificate

Second place: 250 USD and 750 USD travel award (*) + Award certificate

Third place: 100 USD and 400 USD travel award (*) + Award certificate

(*) The travel award may be used for one of the workshops organized in conjunction with the challenge. The award money will be granted in reimbursement of expenses including airfare, ground transportation, hotel, or workshop registration. Reimbursement is conditioned on (i) attending the workshop, (ii) making an oral presentation of the methods used in the challenge, and (iii) presenting original receipts and boarding passes.

The challenge is part of the official selection of the WCCI 2014 competition program and the ECML 2014 competition program, where the participants will be encouraged to present their results.

What is the [tentative] schedule?

February 5, 2014: Start of the challenge (development period).

May 5, 2014: End of challenge.

May 20, 2014: All teams must turn in fact sheets. The three top ranking teams must turn in their code to qualify for prizes. The post-challenge verifications start.

June 1, 2014: End of the post challenge verifications. Release of the official ranking.

June 20, 2014: Paper submission deadline.

July 6-11, 2014: Result presentation at WCCI 2014, Beijing, China.

August 30, 2014: Revised papers due.

September 15 or 19, 2014: ECML workshop, Nancy, France. Discussion of the results.

To be notified of changes, please register to the causalitychallenge Google group.

Are we obliged to attend the workshop(s) or publish our method(s) to participate in the challenge?

Entering the challenge is NOT conditioned on publishing code, methods or attending workshops, however:

(1) If you want to be part of the final ranking you must fill out a survey (fact sheet) with general information about your method.

(2) If you are among the top three ranking participants in the final test phase, to be eligible for a prize, you must also make your code publicly available under a popular OSI-approved license, including the training code.

Attending the workshop and submitting a paper for the proceedings is optional but recommended.

Can I attend the workshop if I do not participate in the challenge or if I do not qualify for prizes?

Yes. You are even encouraged to submit papers for presentation on the topics of the workshop.

Do I need to register to participate?

Yes. You can download the data and experiment with the samples code without registering. However, to make submissions you must register to the Kaggle website. We also ask you to join the causalitychallenge Google group to get challenge notifications and workshop announcements.

Can I register multiple times?

No. You must make submissions under a single Kaggle user ID. You may be part of a team. Requests may be sent to Kaggle to merge teams.

Tasks of the challenge

Is this a classical causal discovery problem?

Pretty much. It is a causal discovery problem from temporal data. The only specific aspect is that the nodes of the causal graph (neurons) have all the same types of connections (synapses) to other nodes: all excitatory or all inhibitory. In the first challenge, we consider only excitatory connections. Another twist is that the neurons are arranges in a 2-dimensional layout simulating a neuronal culture. We provide the coordinates of the neurons so you can eventually take into account cross-over between signals due to simulated light scattering effects.

Why should we care about this problem?

It may seem contrived to recover the architecture of a neural network from the activity of the neurons. However, when you think about it, what are the alternatives? Each neuron is receiving signals from other neurons through a complex dendritic arborescence and sending out signals via long and winding axons. Everything is intertwined. Tracing axons to their target is difficult and tedious. [Learn more...]

Aren't there already standard methods to address this problem?

Yes, there is an abundant literature. Check out our summary and our references. We also provide sample code.

What is the difference between observational and interventional data?

Observational data are data collected by an "external agent" not intervening on the system at hand. Interventional data are obtained when an external agent forces some of the variables of the system to assume given values (hard interventions) or influences the values of certain variables (soft interventions). In this challenge, we are considering only observational data. However, interventional data are very useful to gain knowledge of causal relationships. For instance, if A causes B, but B does not cause A, forcing A to assume certain values may change the values of B, but not vice versa.

Why are you not providing interventional data?

Collecting interventional data is more difficult and expensive experimentally. Many more neurons can be visualized simultaneously than can be stimulated. Hence it is very useful to be able to recover the network structure from observational data only. We will eventually provide both observational and interventional data in an upcoming challenge.

Can this problem be solved at all?

Yes. In fact, you can get relatively easily AUCs of the order of 0.9. However, this can be deceiving. Let's suppose you have a network with N=100 neurons and k=12 connections per neuron on average. So, excluding diagonal terms you have 9900 possible entries in the connectivity matrix, and only ~1200 correspond to real connections. If you have a sensitivity=TPR=0.7 for a (1-specificity)=FPR=0.1 you are looking at a subset of 1830 links, of which 840 are correctly identified and 990 wrong! So there is still a lot of room for improvement.

Why is correlation giving such good results?

The networks are very sparsely connected and have some substructure that is relatively easily identified with correlation. However, and AUC between 0.85 and 0.9 is not good. Assume that you know what the average number of connections per neuron is and you use that to set a threshold on your score, then among the connections called significant, you may have ~50% wrong connections (See the entry "Can this problem be solved at all?").

What applications do you have in mind?

Research and medical applications. For research purposes, we are interested in recovering architectures of neural networks from cultures, slices, and whole organisms. For medical purposes, we are interested in interpreting images captured during surgery. Initially, we are limiting ourselves to constant synapses, but the goal is to eventually monitor changes in connectivity.

Tutorial, Sample code

What do I need to know about brain sciences?

We formulated the problem such that no knowledge of brain sciences is needed to enter the challenge. However, to understand we provide a brief introduction to basic notions needed to understand the problem being solved.

Can you give me clues on how to get started?

We provide sample code, and a brief overview of methods with references.


Are the datasets using real data?

No. All data are simulated. [Download the data...]

Why are you not providing real data?

It is very difficult to establish the ground truth of the network connectivity in real data. [Learn more...]

What simulator are you using?

We are using a simulator of spiking neurons with a model of fluorescence imaging. [Learn more...]

What specific simplifying assumptions are you making?

- All neurons are identical.

- All synaptic strengths identical and constant: easier to assess truth of connection, semi-realistic in immature neural cultures.

- Inhibitory connections are blocked: this can be achieved in actual neural cultures by silencing inhibitory neurons with GABA blockers.

Why are there some "-1" in the network connection matrices?

Those correspond to inhibitory connections that were blocked in the simulation. After the challenge, we intend to provide data using the same networks but unblocking the inhibitory connections, to test the power of algorithms to tackle the more difficult problem of finding both excitatory and inhibitory connections. For the purpose of this challenge, "-1" is equivalent to no connection.

What do you mean by "realistic" simulations?

- We use spiking neurons from the NEST simulator.

- The parameters of the simulator are set to reproduce dynamics resembling those of real neural cultures (Stetter, 2012).

- The connectivity is similar to network connectivities in neural cultures (Ivenshitz, 2010; Soriano, 2008).

- We use a model of calcium fluorescence and of light scattering (Stetter, 2012).

What are the data license terms?

The data are licensed under a Creative Commons CC0 license (no rights reserved). If data are re-used please give credit to the authors by citing their paper (Stetter, 2012).


What is the meaning of the score on the leaderboard?

The score is the AUC (Area under the ROC curve) rating the accuracy of classifying a potential directed neural connection correctly as existent (positive value) or missing (negative value).

Will the results on the validation set count for the final ranking?

No. The final ranking will be computed only on test data provided by the organizers.


What is the submission format?

Submit your results as a CSV file. The submission file should contain 2 columns:

NET_neuronI_neuronJ: The key of the connection neuron I -> neuron J, in the format NET (network name) followed by the number of the neurons I and J.

Strength: The strength of the connection (your numerical confidence score between 0 and 1).

The file should contain the concatenation of the results on the validation and test data (2.106 values):









See a sample submission. The performance of the sample submission is AUC(valid)=0.8732.

How many submissions can I make?

You may not make more than 5 submissions per day on validation data. Only one final submission can be made.

Can I make experiments with mixed methods?

We encourage you to use a single unified methodology, however, this will not be enforced.

Can I submit a method involving no learning at all?


How do I prepare the software that I will submit?

We provide instructions for preparing your code. You must submit your software together with your selected submission(s) on validation data before the software submission deadline (see the schedule).

How do I submit final results?

Submit final results together with the validation results in a single CSV file. See a sample submission. The prediction results of your final entry must be generated with the software that you submit.

How will the organizers carry out the verifications?

The winners are responsible for providing all necessary software components to the organizers so they can reproduce their results. Standard hardware and OS must be used (laptop or desktop computers running Windows, Linux or MacOS). The interface must be similar to that of the sample code or follow the instructions. The participants are advised to provide a preliminary version of their code for testing ahead of the deadline to make sure that it runs smoothly.

The organizers will run your software on the final test data and compare the score obtained with the score of your final submission(s). You will be informed of any discrepancy and the organizers will attempt to troubleshoot the problem with you. However, if the difference alters the ranking of participants, the results obtained by the organizers using the software provided will prevail.

To ensure result reproducibility, we recommend that you eliminate any stochastic component from your software and, in particular, initialize pseudo-random number generators always in the same way to produce the exact same results every time the software it run.

Do I have to submit results on the final test set even though I provide my software?

Yes. This will allow us to verify that we ran your code properly.

Do I have to provide source code?

Only the winners will be required to make their source code publicly available if they want to claim their prize.

Do I have to provide the training software or only the prediction software?

The winners will be asked to make publicly available the source code of their full software package under a popular OSI-approved license, including training software, to be able to claim their prize.


Is there a document in which the official rules are summarized?

Yes, here it is in PDF format.

The rules specify that I will have to fill out a fact sheet to be part of the final ranking, do you have information about that fact sheet?

You will have to fill out a multiple choice question form when the challenge is over. It includes high level questions about the method you used, software and hardware platform. Your identity, details or proprietary information may be withheld, and you retain all intellectual property rights on your methods. The form can be edited if you save the link at the end of the survey.

Will the information of the fact sheets be made public?

Yes. Do not disclose any personal information that you do not want to be made public.

Do I need to let you know what my method is?

Disclosing detailed information about your method is optional. However, you will have the opportunity to present your work at the WCCI 2014 conference or the ECML 2014 conference. See the workshop page.

To get my prize, does my paper need to be accepted for publication in the proceedings?

No. The only publishing requirement to earn your prize is to fill out the fact sheets. However, to get the travel award, you must present your work at the workshop, whether or not your paper is accepted for publication in the proceedings.

How will papers be judged?

The reviewers will use five criteria: (1) Performance in challenge, (2) Novelty/Originality, (3) Sanity, (4) Insight, and (5) Clarity of presentation. "Performance" means final challenge performance results on test data.

Will the organizers enter the competition?

The prize winners may not be challenge organizers. The challenge organizers will enter benchmark submissions from time to time to stimulate participation, but these do not count as competition entries.

Can I give an arbitrary hard time to the organizers?


In case of dispute about prize attribution or possible exclusion from the competition, the participants agree not to take any legal action against the organizers, CHALEARN, KAGGLE, or data donors. Decisions can be appealed by submitting a letter to Vincent Lemaire, secretary of ChaLearn, and disputes will be resolved by the board of ChaLearn.


Help for may be obtained by writing to the Kaggle website forum or emailing the organizers to

To keep informed of all events of ChaLearn on Causality and Connectomics, subscribe to our Google group causality challenge.

Last updated February 6, 2013.