Wikidata:SPARQL query service/WDQS backend update

From Wikidata
Jump to navigation Jump to search

One of the Wikimedia Foundation Search team's priorities for 2022 is scaling the Wikidata Query Service (WDQS), specifically by moving off of the Blazegraph backend that WDQS currently uses, and finding a suitable replacement before Blazegraph maxes out.

As part of this process, the Search team has started developing several documents, issuing monthly updates on their work, and holding periodical meetings with the involved communities. The aim of all of this is to keep the Wikimedia community informed about what are the problems and what the team is currently doing for solving them, but also to get input and feedback from the community about their use cases and their needs.

Blazegraph failure playbook

[edit]

This document is a playbook in the event of catastrophic failure of Wikidata Query Service (WDQS) due to the Blazegraph graph backend maxing out, outlined as the predominant risk in the WDQS August 2021 scaling update.

How much time we have before catastrophic failure is difficult to predict, but the probability of it occurring is very high within the next 5 years if no action is taken. While we are working to avoid this scenario by planning a migration off of Blazegraph, as well as exploring other potential solutions such as graph-splitting with federation, we feel that it is crucial to have this playbook.

The goal here is to provide transparency into the discrete steps that the Wikimedia Foundation (WMF) will take in order to maintain a minimum level of WDQS and Wikidata functionality in the case of catastrophic failure.

Current status

[edit]

Where we are

[edit]
WDQS Backend Alternatives, a paper published by WMF Search team that addresses technical and user requirements for the WDQS backend

As of April 12, 2022, the Search team has already held several meetings, and conducted surveys and detailed research to find alternative backends. The list has been narrowed down to four candidates (listed in alphabetical order):

  1. Apache Jena with the Fuseki SPARQL Server component;
  2. Qlever (some aspects, such as update support, still in development);
  3. RDF4J V4 (still in development);
  4. Virtuoso Open-Source.

You can find the full evaluation study process and results in our paper, “WDQS Backend Alternatives”. The paper addresses the technical and user requirements for the WDQS backend, gathered over the last seven years of operation, as well as the implications for the system architectures. These topics, the process for evaluation, and the resulting detailed assessments of the possible alternatives are discussed in the document.

WDQS scaling updates

[edit]

Community meetings

[edit]

The team

[edit]

See also

[edit]