Working Paper
Johnson JW. Subspace Match Probably Does Not Accurately Assess the Similarity of Learned Representations. Working Paper.Abstract
Learning informative representations of data is one of the primary goals of deep learning, but there is still little understanding as to what representations a neural network actually learns. To better understand this, subspace match was recently proposed as a method for assessing the similarity of the representations learned by neural networks. It has been shown that two networks with the same architecture trained from different initializations learn representations that at hidden layers show low similarity when assessed with subspace match, even when the output layers show high similarity and the networks largely exhibit similar performance on classification tasks. In this note, we present a simple example motivated by standard results in commutative algebra to illustrate how this can happen, and show that although the subspace match at a hidden layer may be 0, the representations learned may be isomorphic as vector spaces. This leads us to conclude that a subspace match comparison of learned representations may well be uninformative, and it points to the need for better methods of understanding learned representations.
In Press
Johnson JW, Hari S, Connor HK, Hampton D, Keesee AM. A Contrastive Learning Approach to Auroral Indentification and Classification. 20th IEEE International Conference on Machine Learning and Applications [Internet]. In Press. PreprintAbstract
Unsupervised learning algorithms are beginning to achieve accuracies comparable to their supervised counterparts on benchmark computer vision tasks, but their utility for practical applications has not yet been demonstrated. In this work, we present a novel application of unsupervised learning to the task of auroral image classification. Specifically, we modify and adapt the Simple framework for Contrastive Learning of Representations (SimCLR) algorithm to learn representations of auroral images in a recently released auroral image dataset constructed using image data from Time History of Events and Macroscale Interactions during Substorms (THEMIS) all-sky imagers. We demonstrate that (a) simple linear classifiers fit to the learned representations of the images achieve state-of-the-art classification performance, improving the classification accuracy by almost 10 percentage points over the current benchmark; and (b) the learned representations naturally cluster into more clusters than exist manually assigned categories, suggesting that existing categorizations are overly coarse and may obscure important connections between auroral types, near-earth solar wind conditions, and geomagnetic disturbances at the earth's surface. Moreover, our model is much lighter than the previous benchmark on this dataset, requiring in the area of fewer than 25\% of the number of parameters. Our approach exceeds an established threshold for operational purposes, demonstrating readiness for deployment and utilization.
Johnson JW, Jin K, Sabin M. Student Emotional Response to Oral Assessments in Computing and Mathematics. Frontiers in Education 2021. In Press.Abstract
The COVID-19 pandemic created a host of issues for institutions of higher education over the past year, including the issue of how to effectively assess student learning when courses are taught remotely. In this work-in-progress paper we present our experience using remote oral assessments in five introductory courses in two subject areas: computing and mathematics. We discuss our motivation for adopting this new assessment format and how to successfully implement remote oral assessments to replace traditional written final exams. We conducted a post-assessment student survey to understand how students responded to the oral exam format. The purpose of the survey was to gather feedback on students' emotions during and after the assessment. Our preliminary quantitative results show that overall students experienced more positive than negative emotions in all courses, though students responded differently in computing and mathematics courses. Students generally favored the oral format, and those who did not have previous experience with this type of assessment had a similar positive response to students who were more familiar with the format. We expect to shed more light on students' experience with the oral assessment as we will continue our research and conduct a qualitative study of the open-ended responses in the survey.
Johnson JW. A Diophantine Equation with an Elementary Solution. The College Mathematics Journal. In Press.Abstract
Let p and q be distinct primes such that q+1 | p-1. In this paper we find all integer solutions a, b to the equation 1/a + 1/b  = (q+1)/pq using only elementary methods.
Coughlan M, Keesee AM, Pinto VA, Connor HK, Johnson JW. Using an LSTM and Classification Methods to Determine the Risk of dB/dt Threshold Crossings as Proxy for Geomagnetically Induced Currents. Second AI and Data Science Workshop for Earth and Space Sciences. 2021.
Halpin PA, Johnson J, Badoer E. Students from a large Australian university use Twitter to identify difficult course concepts to review during face-to-face lectorial sessions. Advances in Physiology Education [Internet]. 2021;45 (1) :10-17. Publisher's VersionAbstract
Engaging undergraduate students in large classes is a constant challenge for many lecturers, as student participation and engagement can be limited. This is a concern since there is a positive correlation between increased engagement and student success. The lack of student feedback on content delivery prevents lecturers from identifying topics that would benefit students if reviewed. Implementing novel methods to engage the students in course content and create ways by which they can inform the lecturer of the difficult concepts is needed to increase student success. In the present study, we investigated the use of Twitter as a scalable approach to enhance engagement with course content and peer-to-peer interaction in a large course. In this pilot study, students were instructed to tweet the difficult concepts identified from content delivered by videos. A software program automatically collected and parsed the tweets to extract summary statistics on the most common difficult concepts, and the lecturer used the information to prepare face-to-face (F2F) lectorial sessions. The key findings of the study were 1) the uptake of Twitter (i.e., registration on the platform) was similar to the proportion of students who participated in F2F lectorials, 2) students reviewed content soon after delivery to tweet difficult concepts to lecturer, 3) Twitter increased engagement with lecturers, 4) the difficult concepts were similar to previous years, yet the automated gathering of Twitter data was more efficient and time saving for the lecturer, and 5) students found the lectorial review sessions very valuable.
Johnson JW. Detecting Invasive Ductal Carcinoma with Semi-supervised Conditional GANs, in Proceedings of the Future Technologies Conference (FTC) 2020, Volume 3. Cham: Springer International Publishing ; 2021 :113–120.Abstract
Invasive ductal carcinoma (IDC) comprises nearly 80% of all breast cancers. The detection of IDC is a necessary preprocessing step in determining the aggressiveness of the cancer, determining treatment protocols, and predicting patient outcomes, and is usually performed manually by an expert pathologist. Here, we describe a novel algorithm for automatically detecting IDC using semi-supervised conditional generative adversarial networks (cGANs). The framework is simple and effective at improving scores on a range of metrics over a baseline CNN.
Johnson JW. Chapter 13 - Generative adversarial networks in medical imaging. In: El-Baz AS, Suri JS State of the Art in Neural Networks and their Applications. Academic Press ; 2021. pp. 271-278. Publisher's VersionAbstract
Generative adversarial networks (GANs) are a recently introduced class of state-of-the-art generative models. GANs are characterized by a unique training process that, although unstable, enables them to accurately learn highly complex distributions. While much of the recent attention that GANs have received in the machine learning and computer vision communities is due to their ability to synthesize highly realistic images, this is but one of many potential uses for these models. In this chapter, we survey several recent applications of GANs in medical imaging, highlighting significant developments, and illustrating avenues for future work in this nascent area of research.
Coughlan M, Keesee AM, Pinto VA, Johnson JW, Connor HK. Training a Neural Network Using Geomagnetic Storm Data to Predict Ground Magnetic Field Fluctuations. Geospace Environment Modeling (GEM) Workshop. 2020.
Hari S, Johnson J W, Pinto V A, Coughlan M, Keesee A M, Connor H K. Predicting Ground Magnetic Field Fluctuations from Geomagnetic Storm Data Using a Novel Transformer-Based Model, in AGU Fall Meeting Abstracts. Vol 2020. ; 2020 :NG004-0033.
Pinto V A, Keesee A M, Coughlan M, Gadbois M A, Johnson J W, Connor H K. A Deep Learning Approach to the Forecasting of Ground Magnetic Field Perturbations at High and Mid-Latitudes, in AGU Fall Meeting Abstracts. Vol 2020. ; 2020 :NG006-05.
Coughlan M, Keesee A M, Pinto V A, Johnson J W, Connor H K. Using Machine Learning and Geomagnetic Storm Data to Determine the Risk of GIC Occurrence, in AGU Fall Meeting Abstracts. Vol 2020. ; 2020 :SM011-13.
Johnson JW. Benefits and Pitfalls of Jupyter Notebooks in the Classroom, in Proceedings of the 21st Annual Conference on Information Technology Education. New York, NY, USA: Association for Computing Machinery ; 2020 :32–37. Publisher's VersionAbstract
Jupyter notebooks are widely used in industry and in academic research, but have only begun to make inroads into the classroom. The design of the Jupyter notebook is in many ways well suited for teaching subjects in information technology and computer science, but it is a tool that departs significantly from a standard text editor or integrated development environment, and thus carries with it several unique advantages as well as several surprising potential pitfalls. As use of Jupyter notebooks has grown, so has criticism of the notebook, for varied reasons: notebooks can behave in unexpected ways, they can be difficult to reproduce, they open up potential security issues, and they may encourage poor coding practices. A set of best practices to guide instructors and help addressing these concerns when using Jupyter notebooks in the classroom is currently lacking. This paper addresses the strengths and weaknesses of the Jupyter notebook for education, drawing on existing literature as well as the author's experience teaching a range of courses with Jupyter notebooks for over five years, and recommends a set of best practices for teaching with the Jupyter notebook.
Johnson JW, Jin KH. Jupyter Notebooks in Education. J. Comput. Sci. Coll. 2020;35 (8) :268–269.Abstract
Jupyter notebooks are widely used in industry for a range of tasks. This is particularly so in areas that involve significant amounts of data analysis or machine learning; indeed, while 5% of Python developers surveyed in the 2018 JetBrains Python Developer Survey report using Jupyter notebooks for their primary development tool, when restricted to those working in data science roles, Jupyter notebooks tied with the PyCharm IDE as the most popular tool for Python development [1], and in the 2019 StackOverflow developer survey, 9.5% of developers surveyed listed Jupyter notebooks as their preferred development environment [2].
Johnson JW. Automatic Nucleus Segmentation with Mask-RCNN, in Advances in Computer Vision. Las Vegas, NV: Springer International Publishing ; 2020 :399-407. Publisher's VersionAbstract
Automatic segmentation of microscopy images is an important task in medical image processing and analysis. Nucleus detection is an important example of this task. Mask-RCNN is a recently proposed state-of-the-art algorithm for object detection, object localization, and object instance segmentation of natural images. In this paper we demonstrate that Mask-RCNN can be used to perform highly effective and efficient automatic segmentations of a wide range of microscopy images of cell nuclei, for a variety of cells acquired under a variety of conditions.
Mahmood F, Johnson JW, Yang Z, Durr NJ. Fusing attributes predicted via conditional GANs for improved skin lesion classification (Conference Presentation). Proc. SPIE 10950, Medical Imaging 2019: Computer-Aided Diagnosis, 109501T [Internet]. 2019. Publisher's Version
Johnson JW. Towards the Algorithmic Detection of Artistic Style. International Journal of Advanced Computer Science and Applications. 2019;10 (1) :76-81.Abstract
The artistic style of a painting can be sensed by the average observer, but algorithmically detecting a painting's style is a difficult problem. We propose a novel method for detecting the artistic style of a painting that is motivated by the neural-style algorithm of Gatys et. al. and is competitive with other recent algorithmic approaches to artistic style detection.
Johnson JW, Mahmood F, Xu W, Durr N, Yuille A. Structured Prediction Using cGANs with Fusion Discriminator. International Conference on Learning Representations Workshop on Deep Generative Models for Highly Structured Data [Internet]. 2019. Publisher's VersionAbstract
We propose a novel method for incorporating conditional information into a generative adversarial network (GAN) for structured prediction tasks. This method is based on fusing features from the generated and conditional information in feature space and allows the discriminator to better capture higher-order statistics from the data. This method also increases the strength of the signals passed through the network where the real or generated data and the conditional data agree. The proposed method is conceptually simpler than the joint convolutional neural network - conditional Markov random field (CNN-CRF) models and enforces higher-order consistency without being limited to a very specific class of high-order potentials. Experimental results demonstrate that this method leads to improvement on a variety of different structured prediction tasks including image synthesis, semantic segmentation, and depth estimation.
Johnson JW. Teaching Neural Networks in the Deep Learning Era. J. Comput. Sci. Coll. 2019;34 (6) :16-25.Abstract
This paper describes the design and evaluation of the first iteration of a standalone course in neural networks aimed at upper level undergraduates and first-year graduate students. The development of this course was motivated by recent state-of-the-art results on challenging tasks in computer vision and natural language processing that have been obtained using deep neural networks, and the subsequent widespread adoption of these models for various applications in industry. The course design emphasizes theoretical understanding and development of applications following existing best practices. Throughout, many unsettled aspects of the underlying mathematical theory of deep neural networks are highlighted, and students are prepared to adapt as current trends and techniques evolve.
Johnson JW. Scaling Up: Introducing Undergraduates to Data Science Early in Their College Careers. J. Comput. Sci. Coll. [Internet]. 2018;33 (6) :76–85. Publisher's VersionAbstract

It has historically been the case that most data science and analytics programs are offered at the Master of Science level. What few undergraduate offerings exist are frequently limited to either a standalone course or a small number of courses targeted to upper level undergraduates. Literature on how best to teach data science to undergraduate students is practically nonexistent. We review recent work on establishing standards and learning objectives for undergraduate data science education, and we make the case that undergraduate students should be exposed to data science early in their college career. We describe the strategy used to teach an introductory course in data science aimed not at upper-level students, but at undergraduate students in their first or second year of study. This course assumes no prerequisite knowledge in computing, mathematics, or statistics, aligns well with recently outlined objectives for undergraduate data science education, and has a track record of success for five consecutive semesters.