To optimize data curation for AI, Lightly turns to self-supervised learning

3 days ago 15

All machine learning models are bound by a critical factor: The quality of the data on which the model is trained.

The challenge of data curation to improve the quality of machine learning and AI models is one that is well-understood. A 2021 MIT research study found systemic issues in how training data was labeled, leading to inaccurate outcomes in AI systems. A study in the journal Quantitative Science Studies that analyzed 141 prior investigations into data labeling found that 41% of models were using datasets that had been labeled by humans.

Among the vendors trying to tackle the challenge of optimizing data curation for AI is a Swiss startup, Lightly. Founded in 2019, the company announced this week that it has raised $3 million in a seed round of funding. Lightly isn’t looking to be a data-labeling vendor, however. Instead, the company wants to help curate data using a self-supervised machine learning model that could one day reduce the need for data labeling operations altogether.

“I continue to be surprised at how much of the work in machine learning is manual, very tedious and not automated at all,” Matthias Heller, cofounder of Lightly, told VentureBeat. “People always believe that with machine learning everything is so advanced, but machine learning and deep learning, in particular, is such a young technology and a lot of the tooling and infrastructure is just now being made available.”

A growing market for data curation and data labeling

There’s no shortage of money or vendors in the market to help optimize data for machine learning, be it data curation or data labeling.

For example, Defined.ai, which was known as DefinedCrowd before rebranding in 2021, has raised $78 million to date to help advance its data curation vision.

And Grand View Research has forecasted that the data labeling market will reach $8...

Read Entire Article