Search results
(1 - 1 of 1)
- Title
- Systematic Serendipity: A Study in Discovering Anomalous Astrophysics
- Creator
- Giles, Daniel K
- Date
- 2020
- Description
-
In the present era of large scale surveys, big data presents new challenges to the discovery process for anomalous data. Advances in astronomy...
Show moreIn the present era of large scale surveys, big data presents new challenges to the discovery process for anomalous data. Advances in astronomy are often driven by serendipitous discoveries. Such data can be indicative of systematic errors, extreme (or rare) forms of known phenomena, or most interestingly, truly novel phenomena which exhibit as-of-yet unobserved behaviors. As survey astronomy continues to grow, the size and complexity of astronomical databases will increase, and the ability of astronomers to manually scour data and make such discoveries decreases. In this work, we introduce a machine learning-based method to identify anomalies in large datasets to facilitate such discoveries, and apply this method to long cadence light curves from NASA's Kepler Mission. Our method clusters data based on density, identifying anomalies as data that lie outside of dense regions in a derived feature space. First we present a proof-of-concept case study and we test our method on four quarters of the Kepler long cadence light curves. We use Kepler's most notorious anomaly, Boyajian's Star (KIC 8462852), as a rare `ground truth' for testing outlier identification to verify that objects of genuine scientific interest are included among the identified anomalies. Additionally, we report the full list of identified anomalies for these quarters, and present a sample subset of identified outliers that includes unusual phenomena, objects that are rare in the Kepler field, and data artifacts. By identifying <4% of each quarter as outlying data, under 6k individual targets for the dataset used, we demonstrate that this anomaly detection method can create a more targeted approach in searching for rare and novel phenomena.We further present an outlier scoring methodology to provide a framework of prioritization of the most potentially interesting anomalies. We have developed a data mining method based on k-Nearest Neighbor distance in feature space to efficiently identify the most anomalous light curves. We test variations of this method including using principal components of the feature space, removing select features, the effect of the choice of k, and scoring to subset samples. We evaluate the performance of our scoring on known object classes and find that our scoring consistently scores rare (<1000) object classes higher than common classes, meaning that rarer objects are successfully prioritized over common objects. The most common class, categorized as miscellaneous stars without any major variability, and rotational variables compose well over two-thirds of the KIC, yet are considerably underrepresented in the top outliers. We have applied scoring to all long cadence light curves of quarters 1 to 17 of Kepler's prime mission and present outlier scores for all 2.8 million light curves for the roughly 200k objects.
Show less