Synthetic Data: Real or Fake?

Whether you believe synthetic data to be real or fake, the overall market for it is expected to grow from $351.2 million in 2023 to $2,339.8 million by 2030 (Synthetic Data Generation Market Forecast | Fortune Business Insights). This statistic can make the most hardened cynic’s ears perk up. But, surely, in our world of regulated medicinal products and heightened data integrity awareness, this can’t possibly be a viable option for regulated industry, or can it?

First, let’s explain what synthetic data is. This type of data is defined as “Data that have been created artificially (e.g., through statistical modeling, computer simulation) so that new values and/or data elements are generated. Generally, synthetic data are intended to represent the structure, properties and relationships seen in actual patient data, except that they do not contain any real or specific information about individuals.” This definition is from the recently published FDA Digital Health and Artificial Intelligence Glossary – Educational Resource. So, in a lay person’s vernacular, synthetic data isn’t randomly “made up” but instead is based on sophisticated mathematical modeling and is often based on actual data collected from other data sources.

Second, why would we use synthetic data? Given the growth of AI and the thirst it has for data, the search for data that has volume, quality, and appropriate controls on privacy often seems to be something of a unicorn chase. Additionally, if the ultimate goal of AI is speed in decision making, waiting for the accumulation of large datasets can hinder crucial, time-sensitive decisions. Below are some advantages of synthetic data:

1. Privacy: Since the data is not of “real” people, it is fundamentally anonymized by design.
2. Data Collection Cycle Time: The time from data collection to analysis is likely to be significantly shorter (i.e. in instances where data may take years to collect).
3. Mitigate Bias: Synthetic data may be able to help eliminate data bias.
  (Synthetic Data | European Data Protection Supervisor)

This may all seem intriguing; however, are we ready to use it in real-life cases?

Several cases and pilots have been publicly discussed recently. One notable case is the Artificial data pilot performed by NHS England. This is one example, with additional interesting examples found in Synthetic data in health care: A narrative review | PubMed Central (PMC). With 86% of companies sampled in a recent survey in the process of implementing AI tools within the next two years or less (The Convergence of Life Sciences and Artificial Intelligence: Seizing Opportunities While Managing Risk | Arnold & Porter), the demand for (useful) data will only increase. As promising as all of these cases are, great skepticism exists in risk-adverse industries such as life sciences. How can this be addressed in an open and transparent environment with so much at stake?

One potential solution that is in its formative phase, OECD (Organisation for Economic Co-operation and Development) has initiated an AI Incident Monitor (AIM) designed to give visibility to worldwide insights into risks and harms of AI systems, across industries, including healthcare, drugs, and biotechnology. This is a promising format for sharing AI technology risks that will need time to mature through practical usage.

In a mature application of Quality Risk Management, companies are well advised to use considerations like this to proactively assess their own QMS and DI readiness for the inevitable use of AI in GMP applications. Unfortunately, reacting later can render promising technology a remediation project. So, now is the time to start assessing AI readiness in your QMS. The technology and its legal aspects may not be in your control, but understanding QMS and Data Governance readiness is within any company’s circle of control. As we noted at the beginning of this blog, by the year 2030 your competitors may have a competitive advantage so contact Lachman Consultants for an analysis of AI in your QMS. Reach out to us at LCS@LachmanConsultants.com for a consultation.

Synthetic Data: Real or Fake?

Subscribe to our blog

Recent Articles

Contact Us: