After being unable to find anywhere if we decided on which sections will go per host, I am creating a task to get a proposal.
On the planning do we have this:
Ideal is 8 slices * 2 types (web + analytics) * 2 instances each == 32 database instances. Deployed as multi-instance with 4 instances per physical node == 8 nodes
Each host will have 512GB RAM, let's use 80% for the buffer pool, so that's 410
Usable disk space for MySQL will be 8.7TB
Host 1:
s1 (enwiki): 150
s3: 70
s4 (commons): 140
s5: 50
Total disk space needed (with InnoDB compression enabled): 5TB
Host 2:
s2: 60
s6: 50
s7 (some big wikis like arwiki, eswiki, metawiki or viwiki or : 100
s8 (wikidatawiki): 200
Total disk space needed (with InnoDB compression enabled): 4.2TB
@Bstorm it is not yet clear to me whether you guys want to have full redundancy between hosts (per service), as in, let's say we are talking about the Analytics service, we can have 2 models within the same service:
Model a)
host1 and host3 having identical data and serving s1, s3, s4 and s5
host2 and host4 having identical data and serving s2, s6, s7 and s8
Or whether you'd like to have more computational power for reads and have for example:
Model b)
Host1 serving s1
Host 2 serving s8
Host 3 serving s4 and s7
Host 4 serving: s2, s3, s5 and s6
Model a give us more redundancy as we can lose up to two hosts per array (like a RAID 10!) but less computational power for reads as we share more sections and hence each section has less buffer pool available
Mode b give us more power for reads as big wikis like s1 (enwiki) s4 (commons) and s8 (wikidatawiki) have dedicated resources, but if we lose a host, we lose some sections.