January 10, 2025 10:35 AM
Credit: VentureBeat made with Midjourney
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
As enterprises around the world double down on their AI projects, the availability of high-quality training data has become a major bottleneck. While the public web has largely been exhausted as a data source, major players like OpenAI and Google are securing exclusive partnerships to expand their proprietary datasets, further limiting access for others.
To address this growing concern, Salesforce has taken a major step in the arena of visual training data. The company has just introduced ProVision, a novel framework that programmatically generates visual instruction data. These datasets are systematically synthesized to enable the training of high-performance multimodal language models (MLMs) that can answer questions about images.
The company has already released the ProVision-10M dataset with this approach and is employing it to boost the performance and accuracy of various multimodal AI models.
For data professionals, this framework represents a significant advancement. By programmatically generating high-quality visual instruction data, ProVision alleviates the dependency on limited or inconsistently labeled datasets, a common challenge in training multimodal systems.
Moreover, the ability to systematically synthesize datasets ensures better control, scalability and consistency, enabling faster iteration cycles and reducing the cost of acquiring domain-specifi...