Project Name: wikiwho (to replace the current https://wikiwho.wmflabs.org which is part of the commtech project)
Wikitech Usernames of requestors: MusikAnimal, Dmaza, HMonroy, Samwilson, Daimona Eaytoy
Purpose: To bring the WikiWho article attribution service in-house, which is used by XTools, Who-Wrote-That, Education-Program-Dashboard, among other applications and researchers.
Brief description: This is the first step of T288840: Migrate WikiWho service to VPS. Longer-term, we might try to get WikiWho on production, but for now we'd like to mimic the current external setup so we can keep WikiWho running without interruption. T288840 goes into more detail in regards to the stack, but in short, we need something very big!
The WikiWho team has said their current setup involves a single server with 24 CPU cores and 122 GB RAM. Additionally, there are three mounted disks, which I assume can be Cinder volumes:
- Database: 4 TB Postgres, partitioned
- The actual space currently used is only 3.2 TB. I put 4 TB to give it room to grow.
- This stores editor persistence information, which WikiWho maintainers said could be omitted if we don't care about it. This is not to my knowledge needed by any WMF or WikiEdu product, but other consumers of WikiWho may be relying on it.
- I believe this db also stores credentials for API access, which we will need, but that will only require a very small amount of space.
- Python Pickle disk: 5 TB
- This stores one pickle file for each article. English Wikipedia by itself consumes about 2.5 TB, but the other four languages currently supported aren't nearly as big: 541 GB (German), 397 GB (Spanish), 66 GB (Turkish), 25 GB (Basque).
- If necessary we likely can shave this down to 4 TB, but 5 would give us room to add more languages and allow the existing ones to grow.
- Revision dumps: 6 TB
- I assume this is only needed temporarily when first importing a new wiki, since the attribution data all lives in the pickle files. After the initial import, the system reads EventStreams and appends to the pickle files as new revisions are created. So we probably only need a disk the size of English Wikipedia's dump, uncompressed.
How soon you are hoping this can be fulfilled: Sometime within a month or two (October-November 2021), ideally, to give us enough breathing room before the WikiWho service (probably) shuts down in early 2022.
We realize we're requesting an exceptionally large amount of quota. It may be that we don't even have the hardware to accommodate this right now, or that VPS isn't the best home for this service, even in the short-term. So I guess this task is more about getting the conversation started. In the meantime we're going to try to get a rough headcount on all the stakeholders of WikIWho, as well as talk with WMF management, after which we'll have a better idea of whether this amount of storage is really justified. For now we'd like to hear what Cloud Services can do for us, if anything. Thank you for your time!