Science is undergoing a data explosion, and astronomy is leading the way. Modern telescopes produce terabytes of data per observation, and the simulations required to model our observable Universe push supercomputers to their limits. To analyse this data scientists need to be able to think computationally to solve problems. In this course you will investigate the challenges of working with large datasets: how to implement algorithms that work; how to use databases to manage your data; and how to learn from your data with machine learning tools. The focus is on practical skills - all the activities will be done in Python 3, a modern programming language used throughout astronomy.
Regardless of whether you’re already a scientist, studying to become one, or just interested in how modern astronomy works ‘under the bonnet’, this course will help you explore astronomy: from planets, to pulsars to black holes.
Course outline:
Week 1: Thinking about data
- Principles of computational thinking
- Discovering pulsars in radio images
Week 2: Big data makes things slow
- How to work out the time complexity of algorithms
- Exploring the black holes at the centres of massive galaxies
Week 3: Querying data using SQL
- How to use databases to analyse your data
- Investigating exoplanets in other solar systems
Week 4: Managing your data
- How to set up databases to manage your data
- Exploring the lifecycle of stars in our Galaxy
Week 5: Learning from data: regression
- Using machine learning tools to investigate your data
- Calculating the redshifts of distant galaxies
Week 6: Learning from data: classification
- Using machine learning tools to classify your data
- Investigating different types of galaxies
Each week will also have an interview with a data-driven astronomy expert.
Note that some knowledge of Python is assumed, including variables, control structures, data structures, functions, and working with files.

From the lesson

Learning from data: regression

This module introduces the idea of machine learning. We look at standard methodology for running machine learning experiments, and then apply this to calculating redshifts of distant galaxies using decision trees for regression.