-->

The multi-million dollar potential of synthetic data


This article is part of a special issue of VB. Read the full series here: How data privacy is transforming marketing.

Synthetic data will be a big industry in five to 10 years. For example, Gartner Dear that by 2024, 60% of data for AI applications will be synthetic. This type of data and the tools used to create it have significant untapped investment potential. This is why.

Synthetic data can feed data-hungry AI/ML

We are effectively on the cusp of a revolution in how machine learning (ML) and artificial intelligence (AI) can grow and have even more applications across sectors and industries.

We live in an era of dizzying demand for machine learning algorithms in all aspects of our lives, from fun face-masking apps like filters on Instagram or Snapchat, to deeply useful apps designed to enhance our work and life experiences, like helping to diagnose diseases. or recommend treatment. Key opportunities include emotion recognition and engagement, better homeland security features, and better anomaly detection in industrial contexts.

At the same time, while people and businesses are hungry for ML/AI-based products, algorithms are hungry for data to train themselves. All of that means we will definitely see more and more different data needs, and fully fabricated data is the key.

events

Summit Low-Code/No-Code

Join today’s top executives at the Low-Code/No-Code Summit virtually on November 9. Sign up today to get your free pass.

register here

of Grand Theft Auto to Google

I heard that self-driving cars learn traffic rules by playing games like Grand Theft Auto V study virtual traffic? That was an early version of ML via synthetic data. Similarly, many in tech may have come across synthetic “scanned documents”, which have been used to train text recognition and data mining models.

Banking and finance is a sector that already relies heavily on synthetic data for certain processes, while tech giants like Google and Facebook are also using it, attracted by the extraordinary efficiency it can bring to the work of IT managers. projects and data scientists.

In fact, we expect to see the number of synthetic images and data points increase tenfold over the next year and hundreds of times over the next few years.

Real World Data Restrictions

Those at the forefront of machine learning are increasingly turning to synthetic data to circumvent the many limitations of raw or real-world data. For example, company AI Synthesis offers a cloud-based generation platform that offers millions of diverse and perfectly labeled images of artificial people. Synthesis AI has been able to accomplish many challenges that come with the messy reality of the original data. For starters, the company cheapens data. It can be too costly for an organization to generate the amount and diversity of data it needs.

For example, could I get photos of someone from every conceivable angle, wearing every possible combination of clothing in every possible light condition? It would be an unimaginable amount of work to do that in real life, but synthetic data can be designed to account for infinite variations.

That also means much easier data labeling. Imagine trying to identify the light source, its brightness, and its distance from an object in photos to train a shadow development algorithm. It would be practically impossible. With synthetic data, you have that data by default, because it was generated with those parameters.

In addition, companies must also deal with strict restrictions on the use of real-world data. In the past, companies have shared data without the layers of cybersecurity now expected. GDPR and other data regulations make it complex and challenging, and sometimes illegal, for companies to share real-world data with partners and vendors.

In other cases, it may not even be possible or safe to generate the data. The producer of real-time 3D engines Unigin account as a customer Daedalian, which is working on urban air mobility. Daedalean has started training its autonomous flying cars in Unigin virtual worlds. This makes a lot of sense – you don’t yet have a secure, real-world environment in which to thoroughly test your products and generate the deep data sets you need. A similar case is car manufacturer software by IPG Automotive. Their version 10.0 featured an enhanced 3D visualization powered by UNIGINE 2 Sim, featuring physics-based rendering and real-world camera parameters.

Synthetic people and synthetic objects have been used much more recently by tech giants. Amazon synthetic data used to train Alexa, Facebook acquired the synthetic data generator AI.Reverie, and Nvidia took notice NVIDIA Omniverse Replicatora powerful synthetic data generation engine that produces physically simulated synthetic data to train deep neural networks.

Combat data bias

The real-world data challenges don’t end there. In some fields, a huge historical bias contaminates the data sets. This is how we end up with global tech giants running into hot water because their algorithms don’t recognize blackface correctly. Even now, with ML techies well aware of the bias issue, it can be challenging to collect a completely bias-free real-world data set.

Even if a real-world dataset can account for all of the above challenges, which is actually hard to imagine, data models need to be constantly improved and adjusted to stay unbiased and avoid degradation over time. That means a constant need for new data.

understanding the opportunity

Synthetic data is in relatively early stages of growth and is not a panacea for all use cases. It continues to face technical challenges and limitations, and tools and standards have not yet been standardized.

Nonetheless, synthetic data is definitely an accelerator for ML/AI based products as they continue to expand across all industries and sectors, and we will certainly see many startups and deals in the area. For anyone who wants to delve into the topic of synthetic data, here is the Synthetic Data Open Community. Discover a hub for synthetic data sets, documents, code, and people who are pioneering their use in machine learning.

Sergey Toporov is a partner at Leta Capital.

Data decision makers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including data techies, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read more about DataDecisionMakers


Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel