Abstract:
Self--supervised machine learning algorithms now achieve accuracies comparable to their supervised counterparts on benchmark classification tasks. The representations learned by these algorithms generalize to a range of downstream tasks beyond classification. We apply self--supervised learning to better understand auroral morphology. Specifically, we demonstrate using a recently released auroral image dataset that a modified version of the Simple framework for Contrastive Learning of Representations (SimCLR) algorithm (Chen et. al. 2020) learns sufficiently high-quality representations for a simple linear classifier to classify them with an average precision of over 94%, an 11 percentage-point improvement over the previous benchmark. The representations learned by our model cluster into more clusters than exist manually assigned categories, suggesting that the existing categorization is overly coarse and may obscure important connections between auroral types, near-earth solar wind conditions, and geomagnetic disturbances at the earth's surface. By combining our self-supervised representation learning strategy with semi-supervised automatic labelling, we aim to classify all of the auroral image data available from Time History of Events and Macroscale Interactions during Substorms (THEMIS) all-sky imagers, producing a labelled, machine-learning ready auroral image database many orders of magnitude greater than any currently available.