Area Under the ROC curve (AUC)
We consider the problem of network connectivity as a binary classification problem:
- Presence of a connection: 1
- Absence of a connection: 0
You must return a numerical score between 0 and 1 indicating your confidence that there is a connection, higher values indicating a more likely connection. The results of classification, obtained by thresholding the prediction score, may be represented in a confusion matrix, where tp (true positive), fn (false negative), tn (true negative) and fp (false positive) represent the number of examples falling into each possible outcome:
We define the sensitivity (also called true positive rate or hit rate) and the specificity (true negative rate) as:
True positive ratio = tp/pos
False positive ratio = fp/neg
where pos=tp+fn is the total number of positive examples and neg=tn+fp the total number of negative examples.
The prediction results are evaluated with the so-called Area Under ROC Curve (AUC), which we refer to as AUC. It corresponds to the area under the curve obtained by plotting the "True positive ratio" against the "False positive ratio" by varying a threshold on the prediction values to determine the classification result. The AUC is calculated using the trapezoid method.
The submission file should contain 2 columns:
- NET_neuronI_neuronJ: The key of the connection neuron I -> neuron J, in the format NET (network name) followed by the number of the neurons I and J.
- Strength: The strength of the connection (your numerical confidence score between 0 and 1).
A sample submission file is provided. The file should contain the concatenation of the results on the validation and test data:
Download a sample submission.