[go: up one dir, main page]

Page MenuHomePhabricator

Policy decisions for new (and current) DNS domains registered to the WMF
Open, MediumPublic

Description

TL;DR: We need to place all of our existing domainnames into proper policy categories, and make some decisions about the criteria for and implementations of those categories which make sense in light of TLS.

There are basically four technical implementation categories our domains can fall under:

  1. Lame Delegations - Domains that are registered to us as an organization, and pointed at our DNS servers, but we don't serve the domain from our DNS servers at all.

These probably should not exist at all, as they result in DNS Lame Delegations. They probably only exist now because of a lack of coordination between ops and whoever registered them. We don't even have a complete list of these. I sniffed our network traffic for just a few moments to build a partial list, and it includes domains such as:

de.wiki
en.wiki
wikipedia.us
wikiperia.org
wkipedia.org

Note that my sniff test has nothing to do with realtime user typos. The typos above mean we've already gone to a registrar and registered those typos officially, but nobody told ops about it. I've done longer sniff tests before and I know the total list is more like 20+ domainnames, but I don't a fuller list at the moment.

  1. Un-browseable - Domains that are registered to us as an organization and configured on our DNS servers, but serve no real hostnames or IP addresses, and are thus not usable at all from a web browser.

We don't have any of these right now, but it's a viable technical option. There's an outstanding commit to turn wikiartpedia.biz and related domains into this, and it would result in DNS data looking like this: https://gerrit.wikimedia.org/r/#/c/197361/2/templates/wikiartpedia.biz

  1. Dead-end Landing - Registered to us, served by our servers, offers addresses for browsers to use, but is not functionally useful.

Again, no such domains currently that I'm aware of, but it's an option. The dead-end page could contain some simple text indicating that we own the domain, with a manual link a user can click to go elsewhere, but not a redirect or a search portal or anything like that.

  1. Functional Endpoint - This is a real live domain which contains useful content when a browser tries it, including a redirect to a more-prominent domain of ours.

If it's useful/usable, people will link to it, search engines will find it, etc. There really is no distinction to be made between "major project site" and "minor redirect domain".

The Pragmatic Issues

The reason this is becoming an issue is because everything in category 4 (Functional Endpoint) needs TLS support. When we offer even minor functionality over an endpoint, such as a redirect, it should be secured with TLS or it creates a situation where legitimate users can be hijacked away from TLS against their will, while trying to reach what was ultimately intended to a domain protected by TLS (we have fewer of these now, but will likely have a whole lot more in future...). The bulk of our traffic in category 4 falls under our existing wildcard certificates, which are:

(keep in mind that in TLS certificate terms, a * wildcard only covers a single level of domainname, which is why there are subwildcards defined for the mobile subdomains):

*.zero.wikipedia.org
*.m.wikipedia.org
*.wikipedia.org
*.m.wikimedia.org
*.wikimedia.org
*.m.wiktionary.org
*.wiktionary.org
*.m.wikiquote.org
*.wikiquote.org
*.m.wikibooks.org
*.wikibooks.org
*.m.wikisource.org
*.wikisource.org
*.m.wikinews.org
*.wikinews.org
*.m.wikiversity.org
*.wikiversity.org
*.m.wikidata.org
*.wikidata.org
*.m.wikivoyage.org
*.wikivoyage.org
*.m.wikimediafoundation.org
*.wikimediafoundation.org
*.m.mediawiki.org
*.mediawiki.org

However, we currently also have a very long tail of "minor" functional domains which do not match these TLS wildcards, which are presumably in place for typo-squatting, trademark protection, chapter requests for special sites, etc. We get new requests to register names on this list constantly. A quick audit of our DNS config (could be imperfect) yields ~140 such domains currently:

border-wikipedia.de
en-wp.com
en-wp.org
indiawikipedia.com
mediawiki.com
softwarewikipedia.com
softwarewikipedia.net
toolserver.org
vikipedi.com.tr
vikipedia.com.tr
visualwikipedia.com
visualwikipedia.net
voyagewiki.com
voyagewiki.org
w.wiki
webhostingwikipedia.com
wekipedia.com
wicipediacymraeg.org
wiikipedia.com
wiki-pedia.org
wikiartpedia.biz
wikiartpedia.co
wikiartpedia.info
wikiartpedia.me
wikiartpedia.mobi
wikiartpedia.net
wikiartpedia.org
wikibook.com
wikibooks.com
wikibooks.cz
wikibooks.pt
wikicitaty.cz
wikidata.pt
wikidisclosure.com
wikidisclosure.org
wikidruhy.cz
wikiepdia.com
wikiepdia.org
wikifamily.com
wikifamily.org
wikiipedia.org
wikijunior.com
wikijunior.net
wikijunior.org
wikiknihy.cz
wikimania.asia
wikimania.com
wikimania.org
wikimaps.com
wikimaps.net
wikimaps.org
wikimedia.biz
wikimedia.com
wikimedia.com.pt
wikimedia.community
wikimedia.ee
wikimedia.is
wikimedia.jp.net
wikimedia.lt
wikimedia.us
wikimedia.xyz
wikimediacommons.co.uk
wikimediacommons.eu
wikimediacommons.info
wikimediacommons.jp.net
wikimediacommons.mobi
wikimediacommons.net
wikimediacommons.org
wikimediafoundation.com
wikimediafoundation.info
wikimediafoundation.net
wikimediastories.com
wikimediastories.net
wikimediastories.org
wikimemory.org
wikinews.com
wikinews.de
wikinews.pt
wikipaedia.net
wikipedia.bg
wikipedia.co.il
wikipedia.co.za
wikipedia.com
wikipedia.cz
wikipedia.ee
wikipedia.id
wikipedia.in
wikipedia.info
wikipedia.is
wikipedia.lt
wikipedia.net
wikipedia.org.br
wikipedia.org.il
wikipediastories.com
wikipediastories.net
wikipediastories.org
wikipediazero.org
wikipedie.cz
wikiquote.com
wikiquote.cz
wikiquote.net
wikiquote.pt
wikislovnik.cz
wikisource.com
wikisource.cz
wikisource.pl
wikisource.pt
wikispecies.com
wikispecies.cz
wikispecies.net
wikispecies.org
wikiversity.com
wikiversity.cz
wikiversity.pt
wikiverzita.cz
wikivoyage-old.org
wikivoyage.com
wikivoyage.de
wikivoyage.eu
wikivoyage.net
wikivoyager.de
wikivoyager.org
wikizdroje.cz
wikizpravy.cz
wikpedia.org
wiktionary.com
wiktionary.cz
wiktionary.eu
wiktionary.pt
wmftest.com
wmftest.net
wmftest.org
xn--80adaxaliyuf0k.xn--p1ai
xn--80adgfman1aa4l.xn--p1ai
xn--80adhoalbi6c.xn--p1ai
xn--80adjlalc6d.xn--p1ai
xn--80adsaabkez2cb8b.xn--p1ai
xn--90abjlackez1d3b.xn--p1ai
xn--b1aajamacm1dkmb.xn--p1ai
xn--b1aarabjwib4al.xn--p1ai

The problem is that in the long run all of these ~140, as well as the category 1 lame delegations that we don't even have a complete list of yet, either need matching TLS certificates or they need to migrated into category 2 or 3 (non-functional) where TLS certificates aren't important.

Every certificate has a real annual dollar cost in the hundreds, which adds up very quickly. Managing hundreds of certificates is difficult on a technical level as well. Beyond a certain small number (close to where we already are today with our short production TLS list), we'd have to start splitting up subsets of them onto separate sets of terminating IP addresses to limit the size and scope of unified wildcard certs for non-SNI clients as well.

Bottom line is that blindly adding certs for all of them is both expensive and technologically difficult/painful, but not completely impossible. The cost issue may go away in the long term (later this year, perhaps early next year), but that still leaves some very real technological hurdles. It would be better if we could re-categorize the bulk in ways that they simply don't need TLS support.

Questions that needs Answers

  • Do we have / can we produce a list of all domains registered to us globally, with any registrar, and get them into our DNS servers so that they're not Lame Delegations anymore?
  • What are the functional categories that each of our domains belongs to? Is there a list somewhere? I'm imagining the set of categories includes "Real Site", "Trademark Placeholder", "Typo Placeholder", etc.
  • Are the Category 2/3 (unbrowseable / deadend) options acceptable for at least some of the functional categories of what are currently minor redirect domains, as alternative to buying TLS certificates for them? Is there a legal reason we can't dead-end a domain or fail to have it accessible to browsers at all, because that causes problems with defending it as a trademarked domain, or something of that nature?

Related Objects

Event Timeline

BBlack raised the priority of this task from to Medium.
BBlack updated the task description. (Show Details)
BBlack added subscribers: BBlack, Dzahn, faidon, mark.

Do we have / can we produce a list of all domains registered to us globally, with any registrar, and get them into our DNS servers so that they're not Lame Delegations anymore?

I can get this list. What's the best format for you (just a big CSV)?

What are the functional categories that each of our domains belongs to? Is there a list somewhere? I'm imagining the set of categories includes "Real Site", "Trademark Placeholder", "Typo Placeholder", etc.

I don't believe we have this sort of list right now, but I can start creating it and share it with you.

Are the Category 2/3 (unbrowseable / deadend) options acceptable for at least some of the functional categories of what are currently minor redirect domains, as alternative to buying TLS certificates for them?

Do you have a list of domains that are currently functional?

I think it's possible we have some unnecessary Category 4 domains. I'd like to see traffic statistics before we make any decisions, just to avoid messing with any domains that people want to use.

Is there a legal reason we can't dead-end a domain or fail to have it accessible to browsers at all, because that causes problems with defending it as a trademarked domain, or something of that nature?

That's a very good point. We may need to keep some domains active for legal reasons, but that will likely be a small number of domains overall. I can research this further, and then identify those domains for you in a non-public list.

Do we have / can we produce a list of all domains registered to us globally, with any registrar, and get them into our DNS servers so that they're not Lame Delegations anymore?

I can get this list. What's the best format for you (just a big CSV)?

If we can start with this (really CSV or any format is fine, I can convert from there), I can go build us a table with the status of them all, including which are currently in what technical state at the DNS and apache redirect level, etc.

Do we have / can we produce a list of all domains registered to us globally, with any registrar, and get them into our DNS servers so that they're not Lame Delegations anymore?

I can get this list. What's the best format for you (just a big CSV)?

If we can start with this (really CSV or any format is fine, I can convert from there), I can go build us a table with the status of them all, including which are currently in what technical state at the DNS and apache redirect level, etc.

Great! Here's a list to start: https://docs.google.com/a/wikimedia.org/spreadsheets/d/1K1QJJMTF19E5xqOKv3G3gbPUzeZ_13MxSekKa1YKFgg/edit?usp=sharing

@BBlack looking at the google doc above.. are you gonna +1 if i add all of the missing ones as links to "parking" then? hmm

It's going to take me a couple days to process all of this, but the list is a good start. I'd like to be able to at least start binning them all up into categories first, and make sure that legal's ok with various remedies for various categories, and gather statistics on the ones that are existing redirects and how much they're really being used, etc. Also some of those on the spreadsheet that aren't currently in DNS may not even be assigned to our nameservers, I'm going to have to run a bunch of automated queries and sort that out as well (for all I know at this point, they're on markmonitor's DNS servers for a static parking page hosted there)

It would probably be best if we hold off further changes for new domains for a few days so that the situation isn't evolving further while we're trying to sort it out and define this stuff :)

Ok, sounds good. I'll try to help with putting them into categories.

Yeah there's really a few different dimensions to categorize here. There's the "reasoning" categorization, as in "Is this held because it's a trademark issue, or because we won some lawsuit, or because we purchased it from a typosquatter, or because we pre-emptively registered it to avoid typosquatting, or because a chapter requested a ccTLD redirect, etc, etc..." That stuff I really don't know and don't plan to think about yet.

What I'm going to try to sort out from a bunch of scripting and query automation and looking at our configuration data is sorting them out in terms of

  • Is this set to our DNS servers at the registrar? (if not - MM servers for parking/redirect? we're holding it but chapter DNS servers serve it?)
  • If set for our DNS servers, is it configured at all in our DNS servers?
  • If so, how is it configured? (empty parking domain, separate static zonefile, link to wm.o, link to wp.o, etc)
  • If it has some kind of address resolution, what are we doing in terms of HTTP(S) config? (apache rewrite, redirect, actual real separate site, etc)
  • Stats on traffic hitting it
  • etc...

summary of progress today:

category: "WMF owns it but it wasn't in our DNS zones yet, added as parked domain without traffic":

wiki.voyage
wikiquotes.info
wikipedia.lol

category: WMF owns but wasn't in our DNS zones yet, PLUS does not point to our nameservers yet":

wikipedia.es

category: "WMF ows it and we actually redirected it to projects but now we stopped that and parked it":

wikimedia.xyz
visualwikipedia.com
visualwikipedia.net
border-wikipedia.de
softwarewikipedia.com
softwarewikipedia.org
softwarewikipedia.net

  • Is this set to our DNS servers at the registrar? (if not - MM servers for parking/redirect? we're holding it but chapter DNS servers serve it?) ..

I "forked" the Google doc that @Slaporte started and shared it with you. I added columns for these things. (WHOIS, NS settings, is in WMF zones, if yes parking or linked, is in WMF Apache ) and started filling it out for stuff touched today and a couple examples.

What should be fixed is that the last columns should also be red/green based on values in other columns. For example "if it's parked in DNS then being in Apache is bad/remnant (red)" and similar things.

Does it cost us money to keep domains such as wikipedia.lol and wikimedia.xyz registered? If we're spending donor money on these frivolous and unused domains, that's really unacceptable. Even if these domains are just being donated, we should politely reject them. Brion and others have specifically noted that gTLDs are a racket. Why are we participating?

Even within more traditional gTLDs, is there any valid reason for keeping most of these domains, such as visualwikipedia.net or webhostingwikipedia.com? We should just get rid of them.

@MZMcBride - I'm assuming for now that for at least some of the seemingly-superfluous ones, we have some legal reason we want to own them.

@BBlack it looks we have 3 google docs, one each of us made. which one should we declare the "master" ?:)

I didn't create any that I'm aware, I'd say keep working on the one you made with all the fields in it.

@Dzahn, @BBlack, this document is hereby declared master: https://docs.google.com/spreadsheets/d/1nmu60y1gkvc9NrvG0uPCS4wI9jfdCJIvwXVEv8UE00Y/edit#gid=1638386207

I think it would be helpful to have a quick call to discuss our domain policy, too.

Ok, cool. I'll keep using that and already making some updates, like for wikipedia.lol :p

P.S. I found wilkipedia.org, wikidpedia.org and wikipedial.org have to be added and the master list is not complete yet. So more than 675 :)

Is there any technical reason not to have the servers using 700 certificates, using SNI for most of them?

I propose WMF to get certificates through Let's Encrypt for all those domains. True, they are hundreds of domains, but LE is intended to automatically provide that. We will only need some glue where the existing tools isn't aren't working for our use case.
It will go from limited beta to public beta next week.

PS: @MZMcBride, see T88861

Is there any technical reason not to have the servers using 700 certificates, using SNI for most of them?

I propose WMF to get certificates through Let's Encrypt for all those domains. True, they are hundreds of domains, but LE is intended to automatically provide that. We will only need some glue where the existing tools isn't aren't working for our use case.
It will go from limited beta to public beta next week.

PS: @MZMcBride, see T88861

It's not as easy as you're making it sound. We're already not sure we're willing to us LE for all of the domains our beta cluster, which is a much shorter list and importantly not production. Basically the idea of giving these 700 production domainnames LE certs is a non-starter at this time.

If you want the quick version of where this argument is going (because frankly, I don't feel like going back and forth 10 times before we get there): The bottom line is that managing 700 anythings that aren't of significant value (and random redirects are not of significant value) is something we don't need to be wasting valuable time, money, or energy on, and never should have to begin with. We have enough complications in this area with introducing new ones.

Is there any technical reason not to have the servers using 700 certificates, using SNI for most of them?

I propose WMF to get certificates through Let's Encrypt for all those domains. True, they are hundreds of domains, but LE is intended to automatically provide that. We will only need some glue where the existing tools isn't aren't working for our use case.
It will go from limited beta to public beta next week.

PS: @MZMcBride, see T88861

5 months later and the picture is somewhat different now. There's a new subtask blocker for doing exactly this now, although we still need to make time to do all the leg-work for it: T133548

4 years later, lots of things have changed for the better, and we're starting to get near the end of this.

We have scaleable automation of LE cert issuance, and we have an automated non-canonical redirect service, and we're just about done sorting out issues with all of the non-canonical domains which exist on our DNS servers (with automated redirects to various canonical locations).

After we're done (Soon!) with those bits, we'll want to loop back to tackling the much-larger list of domains registered to our organization and/or nameservers, for which we've never served zone data before, and get them added to the non-canonical redirect service as well (this is the list of hundreds of domains we've gotten via CSV file before), and establish a process by which Legal can keep us updated on new domains as they're registered (and the removal of any we decide to drop over time).

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!