author
AindoApr 6, 2023

Synthetic Data Success Stories

Synthetic data is revolutionizing how organizations leverage their data assets. By preserving the insights of real data without containing sensitive information, synthetic datasets make it possible to securely and rapidly capitalize on data opportunities. Applications include extraction and visualization of business intelligence; advanced analytics; software testing; product demonstrations; and development of AI models for prediction, personalization, profiling, and more.

For these applications, synthetic data will soon overtake real data in processed volume 1. Organizations must anticipate this change. To help them do so, we have collected some of the benefits of our Aindo Synthetic DataOps Platform (ASDOP), along with success stories from our clients.

Benefits of Synthetic Data

Synthetic data allows organizations to extract the full value of their data assets. It enables secure and free exchange and analysis of data and removes data shortcomings through augmentation. As such, its key benefits include:

  • Lead-time reduction and cost savings: real data are subject to the cumbersome processing steps, standards, and protocols. These do not affect synthetic data, which is therefore available immediately. This significantly reduces the monetary, time, and human intellectual resources involved in data-intensive projects.
  • Complete and fair data: Synthetic data can be used in case insufficient real data is available. For instance, when studying rare diseases, synthetic data can artificially increase the number of patient records, allowing for better prognostic and diagnostic AI tools. It can also be used to remove bias by artificially increasing records of underrepresented groups.
  • Data privacy protection: by reconciling data utility with privacy protection, synthetic data makes secure and reliable AI innovation readily available.
  • Increased data mobility and availability: synthetic data can be shared freely across departments and organizations. This enables healthcare organizations to rely on external consulting for AI development; data analysis; software testing; gathering and visualizing of business intelligence; and more. It also opens up opportunities for data trade and acquisition of missing pieces of information.
  • Flexibility and data-centricity: synthetic data can be constructed with its final application in mind. This means that it is available in the right quantity and with the desired properties for any data project.

The Aindo Synthetic DataOps Platform

Aindo offers synthetic data in a wider data value chain automation platform, the Aindo Synthetic Data Operations Platform (ASDOP). This platform has five modules:

  1. Acquire, for importing and integrating data from multiple sources. This module allows users to import data from multiple sources and create unified, integrated datasets.

  2. Structure, to convert unstructured to structured data suitable for analysis. Our technology automatically converts formats such as free text, PDFs, images, and more to tables. It also automatically labels, annotates and curates data. This makes further data processing possible without any effort.

  3. Synthesize, creating privacy-protecting, realistic synthetic data for enhanced data mobility and completeness. Through our proprietary innovations, our synthetic data technology is applicable to even the most advanced data types and formats. Our method for generating realistic synthetic relational data dramatically outperforms the state-of-the-art. The quality of the synthetic data is guaranteed through various methods.

  4. Explore, to assess the privacy, fidelity and utility of synthetic and/or anonymized datasets. This module simulates attacks on datasets. This makes it possible to quantify how successful an adversary would be in trying to re-identify an individual in an anonymized dataset. This empirical approach to privacy quality assurance is highly novel and met with great enthusiasm.

  5. Transact, to intuitively manage and audit data flows in an organization. The module is a graphical user interface that allows data controllers to monitor who has access to which datasets and why. It allows them to effortlessly grant, restrict, and retract data access. This functionality is highly sought after: in many organizations, data is still shared over the internet or physical devices. This leads to large (often unintentional) data breaches: over

Aindo Synthetic DataOps Platform
Aindo Synthetic DataOps Platform (ASDOP) offers 5 modules: Acquire, Structure, Synthesize, Explore, Transact.

Combined, these modules remove all the obstacles to AI and data projects. ASDOP can be deployed on-site, without data ever leaving its controller’s secure IT environment.

Success Story 1: Product demonstrations in Insurance

Challenge: A car insurance provider wants to use an internet-of-things (IoT) application to collect and manage customer data. The company collects data through IoT devices in the cars of their customers. It needs a platform in which this data is managed and leveraged to create business intelligence.

Four potential vendors are offering such platforms. The insurance provider wants product demonstrations from each of them to make an informed decision. Unfortunately, such a demonstration requires the insurer’s sensitive customer data.

Solution: The car insurance provider integrated Aindo's Synthetic DataOps Platform on their infrastructure. They connected it to a relational database containing customer information. Our platform generated a database of artificial customer records with the same format and properties as the sensitive database. This synthetic data was securely generated on-site, without the insurer’s real data ever leaving its original IT environment.

The insurer provided the synthetic dataset to the four potential vendors. These vendors used it to demonstrate their products without needing access to the insurer’s confidential information.

Synthetic data was also applied to simulate special events. For example, an additional experiment was conducted in which data was rebalanced so that the number of long-distance commuters was relatively large. This showed how well the software responded to changes in customer behavior.

Car insurance synthetic data solution

Benefits: Through Aindo’s synthetic data, the insurer could make an informed decision, substantially reducing risks. The process also showcased that synthetic data can dramatically shorten software development cycles. Risks were further reduced through data augmentation for simulating special events, showcasing the robustness of each of the products.

Success Story 2: Synthetic Data Trading for Improved Telemedicine

Challenge: A telemedicine company wants to leverage AI to improve its predictive model estimating fall risks of remote elderly patients. The company wants to combine its proprietary database with external socio-demographic data sources for a more complete understanding of its patients.

Solution: Synthetic data was created of the external datasets the telemedicine company intended to acquire. The synthetic dataset was seamlessly compatibilized and integrated with the company’s proprietary data. ASDOP further integrated other data sources, including automatically structured transcriptions of phone calls. All this data was combined to create a risk estimation model. This model drastically outperformed the company’s earlier models.

Improved telemedicine synthetic data solution

Benefits: The project led to the development of a superior risk prediction model. It also illustrated the ability to easily acquire and integrate missing information sources. Through the use of synthetic data, new synergies were explored and data could directly and safely be monetized.

Success Story 3: Data mobility for personalized finance

Challenge: A large investment bank wants to offer personalized guidance to small and medium-sized corporate clients. The bank has a large relational database of corporate clients and their business trajectories. Through AI, the bank wants to leverage this database to predict which clients are likely to encounter financial difficulties. It will then tailor advice to these clients’ specific needs.

However, external consulting is required to build the involved AI methods. This consultant needs data access and client data is highly confidential and contains trade secrets. Sharing the data goes against the bank’s commitment to discretion.

Solution: A synthetic client database was created with the same format and properties of the real database. The generation process took place on a dedicated server at the bank. Hence, the data never left its original institution and remained subject to the bank’s customary privacy protocols and standards.

The fidelity, privacy and utility of the synthetic dataset were assessed and deemed excellent. The synthetic data was then provided to the external consulting firm. This firm built a predictive AI model using the synthetic dataset. The model was employed by the bank shortly after and is currently in use to better tailor advice to clients.

Personalized finance synthetic data solution

Benefits: synthetic data made external consulting available to the bank rapidly, with minimal internal data processing steps. This resulted in a new partnership between the bank and an AI service provider. The resulting AI model improved the bank’s advice to corporate clients and allowed them to detect risks early.

Success Story 4: Data mobility for patient journey optimization in oncology

Challenge: A hospital wants to optimize the oncology patient journey. They have a large database of electronic health records (EHR) from previous patients. Through consulting, they know that by leveraging this database, they could improve the patient journey by detecting pathological signs early; improving and personalizing treatments; and offering guided support.

Unfortunately, the database is subject to substantial privacy restrictions. The EHR data is also unstructured, with information collected in text form. This makes analysis challenging at scale. Granting access to external data scientists for AI development involves time-consuming, costly processing steps.

Solution: The EHR data was automatically structured through Aindo’s innovative structure module. This included the recognition of all involved attributes and the formatting of all patient data in tables. Subsequently, a synthetic database was created to mimic these tables, without containing sensitive information about real patients. This synthetic dataset could readily be transferred to an AI team.

The team used the synthetic data to build three AI tools: a diagnostic model, helping doctors identify a collection of oncological pathologies; a prognostic model, able to predict the risk of patients developing oncological pathologies based on attributes in their EHRs; and a model that helped optimally administer treatments to oncology patients.

Oncology patient journey synthetic data solution

Benefits: through the synthetic data’s rapid availability, the project’s duration was only two months. This is a 450% increase compared to the typical nine-month duration of AI projects in healthcare. This impressive pace was achieved as ASDOP removed the need for cumbersome manual data preparation and anonymization processes and protocols. Similarly, the involved budget was significantly reduced compared to previous projects of similar scope. The resulting models significantly outperformed the hospital’s earlier diagnostic, prognostic, and treatment models.

Footnotes

  1. Judah, S., White, A., Sicular, S., Jones, C.J., De Simoni, G., Friedman, T., Beyer, M., Heizenberg, J. and Parker, S., (2020). “Gartner Predicts 2021: Data and Analytics Strategies to Govern, Scale and Transform Digital Business.”