Wikimedia Pushes to Open Wikidata for AI Training

October 2, 2025

Wikimedia Deutschland has launched a new database designed to make Wikipedia’s vast knowledge base more accessible to artificial intelligence models.

The initiative, called the Wikidata Embedding Project, introduces a vector-based semantic search system capable of parsing more than 120 million articles across Wikipedia and its sister sites. The tool also supports the Model Context Protocol (MCP), a new standard that enables AI systems to query external data sources directly.

Developed in collaboration with neural search firm Jina and IBM-owned data provider DataStax, the project aims to give developers structured access to verified knowledge for retrieval-augmented generation (RAG) systems. Until now, Wikidata searches were limited to keywords or the specialised query language SPARQL.

“Powerful AI can be open and collaborative, rather than monopolised by large corporations,” said Philippe Saadé, project manager for Wikidata AI.

The new system organises information semantically. For example, a search for “scientist” will return nuclear physicists, Bell Labs alumni, multilingual translations, images, and related concepts such as “researcher” and “scholar.”

The database is available on Toolforge, and Wikimedia will hold a developer webinar on October 9. The launch comes amid rising demand for reliable training data in AI, as companies face legal and financial pressure over the use of copyrighted material. In February, Anthropic agreed to a $1.5 billion settlement with authors whose works had been used in its datasets.

Wikimedia Pushes to Open Wikidata for AI Training

EDITOR PICKS

TikTok to Host First U.S. Awards Show in December

Google Expands Chrome Autofill to Cover Passports, Licenses

Facebook Lets Private Groups Go Public Without Exposing Members’ Posts

POPULAR POSTS

How to avoid, detect, report suspicious emails

China to grant licences for commercial use of 5G

Onu wants Japan to strengthen collaboration with Nigeria on innovation

POPULAR CATEGORY

Adobe and YouTube Partner to Launch ‘Create for YouTube Shorts’ in...