Breaking the data bottleneck: Salesforce’s ProVision speeds multimodal AI training with image scene graphs

2 weeks ago 77

January 10, 2025 10:35 AM

Credit: VentureBeat made with Midjourney

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

As enterprises around the world double down on their AI projects, the availability of high-quality training data has become a major bottleneck. While the public web has largely been exhausted as a data source, major players like OpenAI and Google are securing exclusive partnerships to expand their proprietary datasets, further limiting access for others.

To address this growing concern, Salesforce has taken a major step in the arena of visual training data. The company has just introduced ProVision, a novel framework that programmatically generates visual instruction data. These datasets are systematically synthesized to enable the training of high-performance multimodal language models (MLMs) that can answer questions about images.

The company has already released the ProVision-10M dataset with this approach and is employing it to boost the performance and accuracy of various multimodal AI models.

For data professionals, this framework represents a significant advancement. By programmatically generating high-quality visual instruction data, ProVision alleviates the dependency on limited or inconsistently labeled datasets, a common challenge in training multimodal systems.

Moreover, the ability to systematically synthesize datasets ensures better control, scalability and consistency, enabling faster iteration cycles and reducing the cost of acquiring domain-specifi...

Read Entire Article