[go: up one dir, main page]

Page MenuHomePhabricator

https://en.wikipedia.org/api/ 404 Not Found due to extract2.php RewriteRule
Closed, ResolvedPublic

Description

There used to be a landing page at https://en.wikipedia.org/api/ with discovery to:

  • /w/api.php (MediaWiki Action API)
  • /api/rest_v1/ (legacy RESTBase).

I suspect this may've broken as part of T273179: Update the front-page of Wikimedia projects:

Change 857794 merged by jenkins-bot:

[operations/mediawiki-config@master] Get rid of extract2.php

https://gerrit.wikimedia.org/r/857794

And indeed, specifying the full URL to the internal index.html file makes the page show up:
https://en.wikipedia.org/api/index.html

Event Timeline

When using WikimediaDebug and selecting a server outside Kubernetes, such as mwdebug2002, it works fine on the canoincal URL.

So... I guess this is due to MediaWiki-on-Kubernetes having diverged or missed a piece of the Apache configuration.

At first I thought it could be due to this change routing /api/ to /w/rest.php for T364400

However testing from inside the infrastructure to eliminate the ATS layer, we indeed have a 404 on mw-on-k8s:

cgoubert@cumin1002:~$ curl --connect-to en.wikipedia.org:443:mw-api-ext.discovery.wmnet:4447 https://en.wikipedia.org/api/
File not found.

While the same file on mwdebug1002 is present:

cgoubert@cumin1002:~$ curl --connect-to en.wikipedia.org:443:mwdebug1002.eqiad.wmnet https://en.wikipedia.org/api/
<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
  <meta charset="utf-8">
  <title>APIs</title>
  <meta name=viewport content="width=device-width, initial-scale=1">
  <meta name="robots" content="index, follow">
  <style>
body { background: #fff; margin: 7% auto 0; padding: 2em 1em 1em; font: 15px/1.6 sans-serif; color: #333; max-width: 640px; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645AD; text-decoration: underline; }
</style>
</head>
<body>
        <h2>APIs</h2>
        <ul>
            <li><a href="/w/api.php">Action API</a>, providing rich queries, editing and content access.</li>
            <li><a href="/api/rest_v1/?doc">REST API v1</a>, mainly focused on high-volume content access.</li>
        </ul>
    <h2>Legal</h2>
    <ul>
        <li><a href="https://foundation.wikimedia.org/wiki/Developer_app_guidelines">App Guidelines</a>, for developers on how to properly reuse Wikimedia data, API, trademarks, and other content.</li>
    </ul>
</body>
</html>

Maybe it's an urban myth but I always though if index.php/index.html exists in a directory, apache automatically sends requests to the directory there (maybe it's lighttp only but I have seen this work in many places)

Change #1064723 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mediawiki: Get rid of obsolete extract2.php redirect

https://gerrit.wikimedia.org/r/1064723

Maybe it's an urban myth but I always though if index.php/index.html exists in a directory, apache automatically sends requests to the directory there (maybe it's lighttp only but I have seen this work in many places)

It does when there isn't an obsolete RewriteRule to extract2.php :D
I'm pushing a couple of patches that should fix this.

Change #1064724 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] httpbb: Add /api/ to appservers tests

https://gerrit.wikimedia.org/r/1064724

Change #1064724 merged by Clément Goubert:

[operations/puppet@production] httpbb: Add /api/ to appservers tests

https://gerrit.wikimedia.org/r/1064724

Clement_Goubert triaged this task as Medium priority.

As expected:

cgoubert@cumin1002:~$ httpbb /srv/deployment/httpbb-tests/appserver/test_main.yaml --host mwdebug1002.eqiad.wmnet
Sending to mwdebug1002.eqiad.wmnet...
PASS: 54 requests sent to mwdebug1002.eqiad.wmnet. All assertions passed.
cgoubert@cumin1002:~$ httpbb /srv/deployment/httpbb-tests/appserver/test_main.yaml --host mw-api-ext.discovery.wmnet --https_port 4447
Sending to mw-api-ext.discovery.wmnet...
https://en.wikipedia.org/api/ (/srv/deployment/httpbb-tests/appserver/test_main.yaml:48)
    Status code: expected 200, got 404.
    Body: expected to contain 'providing rich queries, editing and content access', got 'File not found.\n'.
===
FAIL: 54 requests sent to mw-api-ext.discovery.wmnet. 1 request with failed assertions.

Change #1064723 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: Get rid of obsolete extract2.php redirect

https://gerrit.wikimedia.org/r/1064723

Mentioned in SAL (#wikimedia-operations) [2024-08-22T10:18:35Z] <cgoubert@deploy1003> Started scap sync-world: mediawiki: Get rid of obsolete extract2.php redirect - 1064723 - T373048

Mentioned in SAL (#wikimedia-operations) [2024-08-22T10:19:39Z] <cgoubert@deploy1003> cgoubert: mediawiki: Get rid of obsolete extract2.php redirect - 1064723 - T373048 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-08-22T10:24:18Z] <cgoubert@deploy1003> Finished scap sync-world: mediawiki: Get rid of obsolete extract2.php redirect - 1064723 - T373048 (duration: 05m 43s)

All good now

cgoubert@cumin1002:~$ httpbb /srv/deployment/httpbb-tests/appserver/*.yaml --host mw-api-ext.discovery.wmnet --https_port 4447
Sending to mw-api-ext.discovery.wmnet...
PASS: 131 requests sent to mw-api-ext.discovery.wmnet. All assertions passed.
cgoubert@cumin1002:~$ curl --connect-to en.wikipedia.org:443:mw-api-ext.discovery.wmnet:4447 https://en.wikipedia.org/api/
<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
  <meta charset="utf-8">
  <title>APIs</title>
  <meta name=viewport content="width=device-width, initial-scale=1">
  <meta name="robots" content="index, follow">
  <style>
body { background: #fff; margin: 7% auto 0; padding: 2em 1em 1em; font: 15px/1.6 sans-serif; color: #333; max-width: 640px; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645AD; text-decoration: underline; }
</style>
</head>
<body>
        <h2>APIs</h2>
        <ul>
            <li><a href="/w/api.php">Action API</a>, providing rich queries, editing and content access.</li>
            <li><a href="/api/rest_v1/?doc">REST API v1</a>, mainly focused on high-volume content access.</li>
        </ul>
    <h2>Legal</h2>
    <ul>
        <li><a href="https://foundation.wikimedia.org/wiki/Developer_app_guidelines">App Guidelines</a>, for developers on how to properly reuse Wikimedia data, API, trademarks, and other content.</li>
    </ul>
</body>
</html>
Krinkle reopened this task as Open.EditedSep 2 2024, 9:00 PM

This is broken, again, after T364400: map the /api/ prefix to /w/rest.php:

Change #1070032 merged by Clément Goubert:

[operations/puppet@production] trafficserver: Fix /w/rest.php and /api/ regex_map

https://gerrit.wikimedia.org/r/1070032

@Clement_Goubert This patch doesn't look right to me. Is it intentional that ATS is used as a novel away of rewriting the URI path itself? Afaik Apache/MW need to see this as /api/ unchanged for it to be a valid REST request. Plus, there's the factor of non-REST requests under that path, as per this task. Both are solved by preserving the url path unchanged. If/when we do `/api/ to rest.php rewrites, that'll have to happen in Apache to make sure the request stays internally valid.

The patch also appears to have an off-by-one error. Notice how https://en.wikipedia.org/api/index.html renders an error about /w/rest.phpindex.html instead of /w/rest.php/index.html. In any event, the CGI param REQUEST_URL must remain unaltered at the Apache level in order for sef-identificiation and routing validation to work correctly, and indeed for other Apache config to be able to apply correctly.

Change #1070274 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/puppet@production] ats: Fix issue with /api/ pointing to /w/rest.php

https://gerrit.wikimedia.org/r/1070274

This is broken, again, after T364400: map the /api/ prefix to /w/rest.php:

Change #1070032 merged by Clément Goubert:

[operations/puppet@production] trafficserver: Fix /w/rest.php and /api/ regex_map

https://gerrit.wikimedia.org/r/1070032

@Clement_Goubert This patch doesn't look right to me. Is it intentional that ATS is used as a novel away of rewriting the URI path itself?

No, it was an inadvertent mistake by your truly. Nor (overall and in principle) should the CDN be remapping URL paths as it makes debugging in the future more difficult for incident responders (they have to create a mental map of the various remappings. It isn't fun)

Afaik Apache/MW need to see this as /api/ unchanged for it to be a valid REST request. Plus, there's the factor of non-REST requests under that path, as per this task. Both are solved by preserving the url path unchanged. If/when we do `/api/ to rest.php rewrites, that'll have to happen in Apache to make sure the request stays internally valid.

Yes indeed.It would have been preferable if we didn't need to mess with Apache but rather MediaWiki was able to ofc.

I 've uploaded https://gerrit.wikimedia.org/r/1070274 and it should fix the problem reported in this task and set the CDN ready for whenever the MediaWiki is ready to do the corresponding change.

Change #1070274 merged by Alexandros Kosiaris:

[operations/puppet@production] ats: Fix issue with /api/ pointing to /w/rest.php

https://gerrit.wikimedia.org/r/1070274

akosiaris renamed this task from https://en.wikipedia.org/api/ 404 Not Found to https://en.wikipedia.org/api/ 404 Not Found due to extract2.php RewriteRule.Sep 4 2024, 12:14 PM

https://en.wikipedia.org/api/ is now sent properly, without URL remappings to MW. Specifically to the mw-api-ext cluster as intended. Of course, as pointed out above, mediawiki+apache themselves know nothing about /api/, we 'll have to fix that.

In hindsight, I should have split the remapping part into a different ask, as it is a regression from an effort for a new API, whereas originally the task was about an obsolete RewriteRule. In the interest of sticking with 1 thing in the task, I 'll resolve this one and I 've filed T373998 for followup.

Krinkle reopened this task as Open.EditedSep 4 2024, 7:27 PM

This task isn't about setting up rest.php routing. It's about serving the api/index.html static file. This is still broken again. This doesn't require a rewrite rule afaik, that's a separate issue, both it's in terms of why and in terms of end result (adding that rewrite rule wouldn't fix the api/ root 404?).

The 404 at https://en.wikipedia.org/api/ now self-identifies as /w/api/ which means there's yet another thing incorrectly rewriting/modifying the path in a way that it didn't until recently.

Fixed in:

Change #1071229 merged by Alexandros Kosiaris:

[operations/puppet@production] ats: Revert the /api/ changes on the CDN side

https://gerrit.wikimedia.org/r/1071229