TL;DR: We need to place all of our existing domainnames into proper policy categories, and make some decisions about the criteria for and implementations of those categories which make sense in light of TLS.
There are basically four technical implementation categories our domains can fall under:
- Lame Delegations - Domains that are registered to us as an organization, and pointed at our DNS servers, but we don't serve the domain from our DNS servers at all.
These probably should not exist at all, as they result in DNS Lame Delegations. They probably only exist now because of a lack of coordination between ops and whoever registered them. We don't even have a complete list of these. I sniffed our network traffic for just a few moments to build a partial list, and it includes domains such as:
de.wiki en.wiki wikipedia.us wikiperia.org wkipedia.org
Note that my sniff test has nothing to do with realtime user typos. The typos above mean we've already gone to a registrar and registered those typos officially, but nobody told ops about it. I've done longer sniff tests before and I know the total list is more like 20+ domainnames, but I don't a fuller list at the moment.
- Un-browseable - Domains that are registered to us as an organization and configured on our DNS servers, but serve no real hostnames or IP addresses, and are thus not usable at all from a web browser.
We don't have any of these right now, but it's a viable technical option. There's an outstanding commit to turn wikiartpedia.biz and related domains into this, and it would result in DNS data looking like this: https://gerrit.wikimedia.org/r/#/c/197361/2/templates/wikiartpedia.biz
- Dead-end Landing - Registered to us, served by our servers, offers addresses for browsers to use, but is not functionally useful.
Again, no such domains currently that I'm aware of, but it's an option. The dead-end page could contain some simple text indicating that we own the domain, with a manual link a user can click to go elsewhere, but not a redirect or a search portal or anything like that.
- Functional Endpoint - This is a real live domain which contains useful content when a browser tries it, including a redirect to a more-prominent domain of ours.
If it's useful/usable, people will link to it, search engines will find it, etc. There really is no distinction to be made between "major project site" and "minor redirect domain".
The Pragmatic Issues
The reason this is becoming an issue is because everything in category 4 (Functional Endpoint) needs TLS support. When we offer even minor functionality over an endpoint, such as a redirect, it should be secured with TLS or it creates a situation where legitimate users can be hijacked away from TLS against their will, while trying to reach what was ultimately intended to a domain protected by TLS (we have fewer of these now, but will likely have a whole lot more in future...). The bulk of our traffic in category 4 falls under our existing wildcard certificates, which are:
(keep in mind that in TLS certificate terms, a * wildcard only covers a single level of domainname, which is why there are subwildcards defined for the mobile subdomains):
*.zero.wikipedia.org *.m.wikipedia.org *.wikipedia.org *.m.wikimedia.org *.wikimedia.org *.m.wiktionary.org *.wiktionary.org *.m.wikiquote.org *.wikiquote.org *.m.wikibooks.org *.wikibooks.org *.m.wikisource.org *.wikisource.org *.m.wikinews.org *.wikinews.org *.m.wikiversity.org *.wikiversity.org *.m.wikidata.org *.wikidata.org *.m.wikivoyage.org *.wikivoyage.org *.m.wikimediafoundation.org *.wikimediafoundation.org *.m.mediawiki.org *.mediawiki.org
However, we currently also have a very long tail of "minor" functional domains which do not match these TLS wildcards, which are presumably in place for typo-squatting, trademark protection, chapter requests for special sites, etc. We get new requests to register names on this list constantly. A quick audit of our DNS config (could be imperfect) yields ~140 such domains currently:
border-wikipedia.de en-wp.com en-wp.org indiawikipedia.com mediawiki.com softwarewikipedia.com softwarewikipedia.net toolserver.org vikipedi.com.tr vikipedia.com.tr visualwikipedia.com visualwikipedia.net voyagewiki.com voyagewiki.org w.wiki webhostingwikipedia.com wekipedia.com wicipediacymraeg.org wiikipedia.com wiki-pedia.org wikiartpedia.biz wikiartpedia.co wikiartpedia.info wikiartpedia.me wikiartpedia.mobi wikiartpedia.net wikiartpedia.org wikibook.com wikibooks.com wikibooks.cz wikibooks.pt wikicitaty.cz wikidata.pt wikidisclosure.com wikidisclosure.org wikidruhy.cz wikiepdia.com wikiepdia.org wikifamily.com wikifamily.org wikiipedia.org wikijunior.com wikijunior.net wikijunior.org wikiknihy.cz wikimania.asia wikimania.com wikimania.org wikimaps.com wikimaps.net wikimaps.org wikimedia.biz wikimedia.com wikimedia.com.pt wikimedia.community wikimedia.ee wikimedia.is wikimedia.jp.net wikimedia.lt wikimedia.us wikimedia.xyz wikimediacommons.co.uk wikimediacommons.eu wikimediacommons.info wikimediacommons.jp.net wikimediacommons.mobi wikimediacommons.net wikimediacommons.org wikimediafoundation.com wikimediafoundation.info wikimediafoundation.net wikimediastories.com wikimediastories.net wikimediastories.org wikimemory.org wikinews.com wikinews.de wikinews.pt wikipaedia.net wikipedia.bg wikipedia.co.il wikipedia.co.za wikipedia.com wikipedia.cz wikipedia.ee wikipedia.id wikipedia.in wikipedia.info wikipedia.is wikipedia.lt wikipedia.net wikipedia.org.br wikipedia.org.il wikipediastories.com wikipediastories.net wikipediastories.org wikipediazero.org wikipedie.cz wikiquote.com wikiquote.cz wikiquote.net wikiquote.pt wikislovnik.cz wikisource.com wikisource.cz wikisource.pl wikisource.pt wikispecies.com wikispecies.cz wikispecies.net wikispecies.org wikiversity.com wikiversity.cz wikiversity.pt wikiverzita.cz wikivoyage-old.org wikivoyage.com wikivoyage.de wikivoyage.eu wikivoyage.net wikivoyager.de wikivoyager.org wikizdroje.cz wikizpravy.cz wikpedia.org wiktionary.com wiktionary.cz wiktionary.eu wiktionary.pt wmftest.com wmftest.net wmftest.org xn--80adaxaliyuf0k.xn--p1ai xn--80adgfman1aa4l.xn--p1ai xn--80adhoalbi6c.xn--p1ai xn--80adjlalc6d.xn--p1ai xn--80adsaabkez2cb8b.xn--p1ai xn--90abjlackez1d3b.xn--p1ai xn--b1aajamacm1dkmb.xn--p1ai xn--b1aarabjwib4al.xn--p1ai
The problem is that in the long run all of these ~140, as well as the category 1 lame delegations that we don't even have a complete list of yet, either need matching TLS certificates or they need to migrated into category 2 or 3 (non-functional) where TLS certificates aren't important.
Every certificate has a real annual dollar cost in the hundreds, which adds up very quickly. Managing hundreds of certificates is difficult on a technical level as well. Beyond a certain small number (close to where we already are today with our short production TLS list), we'd have to start splitting up subsets of them onto separate sets of terminating IP addresses to limit the size and scope of unified wildcard certs for non-SNI clients as well.
Bottom line is that blindly adding certs for all of them is both expensive and technologically difficult/painful, but not completely impossible. The cost issue may go away in the long term (later this year, perhaps early next year), but that still leaves some very real technological hurdles. It would be better if we could re-categorize the bulk in ways that they simply don't need TLS support.
Questions that needs Answers
- Do we have / can we produce a list of all domains registered to us globally, with any registrar, and get them into our DNS servers so that they're not Lame Delegations anymore?
- What are the functional categories that each of our domains belongs to? Is there a list somewhere? I'm imagining the set of categories includes "Real Site", "Trademark Placeholder", "Typo Placeholder", etc.
- Are the Category 2/3 (unbrowseable / deadend) options acceptable for at least some of the functional categories of what are currently minor redirect domains, as alternative to buying TLS certificates for them? Is there a legal reason we can't dead-end a domain or fail to have it accessible to browsers at all, because that causes problems with defending it as a trademarked domain, or something of that nature?