Synthetic data is artificially generated data that mimics some or all of the statistical properties and relationships of a real-world dataset. It can provide the vast volumes of data needed to train machine-learning models and also allow for the testing of software applications in a controlled environment. According to Gartner, by 2024, 60% of data used for AI and machine learning will be synthetic data. Synthetic data will democratise data gathering, labelling, and analysis tasks by making useful data abundant, and so challenging what we used to call “big data”.