Wikidata:SPARQL query service/WDQS backend update
One of the Wikimedia Foundation Search team's priorities for 2022 is scaling the Wikidata Query Service (WDQS), specifically by moving off of the Blazegraph backend that WDQS currently uses, and finding a suitable replacement before Blazegraph maxes out.
As part of this process, the Search team has started developing several documents, issuing monthly updates on their work, and holding periodical meetings with the involved communities. The aim of all of this is to keep the Wikimedia community informed about what are the problems and what the team is currently doing for solving them, but also to get input and feedback from the community about their use cases and their needs.
Blazegraph failure playbook
[edit]This document is a playbook in the event of catastrophic failure of Wikidata Query Service (WDQS) due to the Blazegraph graph backend maxing out, outlined as the predominant risk in the WDQS August 2021 scaling update.
How much time we have before catastrophic failure is difficult to predict, but the probability of it occurring is very high within the next 5 years if no action is taken. While we are working to avoid this scenario by planning a migration off of Blazegraph, as well as exploring other potential solutions such as graph-splitting with federation, we feel that it is crucial to have this playbook.
The goal here is to provide transparency into the discrete steps that the Wikimedia Foundation (WMF) will take in order to maintain a minimum level of WDQS and Wikidata functionality in the case of catastrophic failure.
Current status
[edit]Where we are
[edit]As of April 12, 2022, the Search team has already held several meetings, and conducted surveys and detailed research to find alternative backends. The list has been narrowed down to four candidates (listed in alphabetical order):
- Apache Jena with the Fuseki SPARQL Server component;
- Qlever (some aspects, such as update support, still in development);
- RDF4J V4 (still in development);
- Virtuoso Open-Source.
You can find the full evaluation study process and results in our paper, “WDQS Backend Alternatives”. The paper addresses the technical and user requirements for the WDQS backend, gathered over the last seven years of operation, as well as the implications for the system architectures. These topics, the process for evaluation, and the resulting detailed assessments of the possible alternatives are discussed in the document.
WDQS scaling updates
[edit]- August 2021
- December 2021
- January 2022
- February 2022
- March 2022
- April 2022
- May 2022
- August 2022
- September 2022
- March 2023
- October 2023
- February 2024
- April 2024
- June 2024
- September 2024
Community meetings
[edit]- February 2022 community meetings (Etherpads: Meeting #1, Meeting #2)
- April 2022 community meeting (Etherpad)
- June 2022 community meeting (Etherpad)
- March 2023 community meetings (Etherpads: Meeting #1, Meeting #2)
The team
[edit]- Mike Pham, Senior Product Manager, Search and Relevancy
- Guillaume Lederrey, Operations Engineer — Search Platform
- Andrea Westerinen, Graph Consultant
- Aisha Khatun, Data Analyst
- Joseph Allemandou, Data Engineer
- David Causse, Software Engineer
- Ryan Kemper, Site Reliability Engineer
- Brian King, Site Reliability Engineer
- Luca Martinelli, Community Relation Specialist