Diffbot’s AI model doesn’t guess—it knows, thanks to a trillion-fact knowledge graph

2 weeks ago 50

January 9, 2025 7:00 AM

Credit: VentureBeat made with Midjourney

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Diffbot, a small Silicon Valley company best known for maintaining one of the world’s largest indexes of web knowledge, announced today the release of a new AI model that promises to address one of the biggest challenges in the field: factual accuracy.

The new model, a fine-tuned version of Meta’s LLama 3.3, is the first open-source implementation of a system known as Graph Retrieval-Augmented Generation, or GraphRAG.

Unlike conventional AI models, which rely solely on vast amounts of preloaded training data, Diffbot’s LLM draws on real-time information from the company’s Knowledge Graph, a constantly updated database containing more than a trillion interconnected facts.

“We have a thesis that eventually general purpose reasoning will get distilled down into about 1 billion parameters,” said Mike Tung, Diffbot’s founder and CEO, in an interview with VentureBeat. “You don’t actually want the knowledge in the model. You want the model to be good at just using tools so that it can query knowledge externally.”

How it works

Diffbot’s Knowledge Graph is a sprawling, automated database that has been crawling the public web since 2016. It categorizes web pages into entities such as people, companies, pro...

Read Entire Article