Don't use file separator for splitting path info. #2302

ioquatix · 2025-03-10T22:31:38Z

It is only by coincidence that the ::File::SEPARATOR is the same as the URI path separator, but semantically, they are unrelated. Ensure that only "/" is used to separate path segments, as per the RFC: https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

Note: This does not change the interpretation of path_info which may be unescaped prior to being an argument of this method - as such, it does not change the interpretation of %2F characters which will be a follow up discussion/PR.

ioquatix · 2025-03-10T22:35:53Z

@akostadinov this is a first step towards resolving the issues you raised in #2257. Do you mind reviewing this change?

tenderlove · 2025-03-10T23:20:15Z

I think this is good. I'm not sure why we used alt separator. Does this mean URI cleaning was OS dependent before this PR?

ioquatix · 2025-03-10T23:29:16Z

I think this is good. I'm not sure why we used alt separator. Does this mean URI cleaning was OS dependent before this PR?

In principle, yes. In practice, it's unlikely people were deploying servers on Mac System 9 or Windows - not sure if there are other major situations where this could apply.

jeremyevans · 2025-03-10T23:47:14Z

I don't think this is a good idea. I believe it may it introduce a security vulnerability on Windows (where ALT_SEPARATOR is \\).

Consider:

Rack::Utils.clean_path_info("/foo/..\\../bar")
# Before: "/bar"
# After: "/foo/..\\../bar"

If Rack::Static/Rack::Files allows paths starting with /foo, Windows will allow this file even though it is outside of the root. Windows will handle this type of path just fine:

File.read(".gem/specs/..\\../.irb_history").length
# => 1736 # for me

ioquatix · 2025-03-10T23:58:07Z

Should Rack::Utils.clean_path_info("/foo/..\\../bar") be valid or perhaps raise an error? It seems odd to me that it might "work" depending on the OS you are running the server on.

I think we should be clear that the input "path_info" is a URI path, not a file path. Perhaps if there are path segments that contain semantic characters, we can raise an error (e.g. ::File::SEPARATOR / ::File::ALT_SEPARATOR).

ioquatix · 2025-03-11T00:30:47Z

For reference, here is how Rails uses clean_path_info: https://github.com/rails/rails/blob/801959745469e3ab2ad48ba3f2191f68792e7def/actionpack/lib/action_dispatch/middleware/static.rb#L185-L190

By the time clean_path_info is called, the argument is actually not a valid path_info according to the RFC, because all the percent encoded characters are replaced, which is why stricter validation (according to the RFC) fails. Technically, the percent encoded characters should only be processed AFTER splitting by the path separator, at least according to my interpretation of the RFC.

      path          = path-abempty    ; begins with "/" or is empty
                    / path-absolute   ; begins with "/" but not "//"
                    / path-noscheme   ; begins with a non-colon segment
                    / path-rootless   ; begins with a segment
                    / path-empty      ; zero characters

      path-abempty  = *( "/" segment )
      path-absolute = "/" [ segment-nz *( "/" segment ) ]
      path-noscheme = segment-nz-nc *( "/" segment )
      path-rootless = segment-nz *( "/" segment )
      path-empty    = 0<pchar>
      segment       = *pchar
      segment-nz    = 1*pchar
      segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                    ; non-zero-length segment without any colon ":"

      pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

Specifically: path is composed of segments separated by "/" characters, of which those segments consist of pchar which may contain pct-encoded sequences. There is no provision for CGI.unescape(path_info) having meaning, at least according to the above parser.

jeremyevans · 2025-03-11T01:09:46Z

Should Rack::Utils.clean_path_info("/foo/..\\../bar") be valid or perhaps raise an error?

It's more work to have it raise an error. More work generally results in more complexity, and more complexity generally makes it more likely that there will be bugs/security issues.

As long as it is handled securely, I don't see the reason to go out of the way to complain about it.

It seems odd to me that it might "work" depending on the OS you are running the server on.

There are a lot of things that may or may not work depending on the OS. Even for the same OS, some file systems may be case sensitive and some not.

I think we should be clear that the input "path_info" is a URI path, not a file path. Perhaps if there are path segments that contain semantic characters, we can raise an error (e.g. ::File::SEPARATOR / ::File::ALT_SEPARATOR).

Considering how Rack itself is using it (only for Files/Static), even if the path given is a URI path, it will be treated as a file path. While we could raise an error, I'm not sure we want to, for reasons described above.

By the time clean_path_info is called, the argument is actually not a valid path_info according to the RFC, because all the percent encoded characters are replaced, which is why stricter validation (according to the RFC) fails. Technically, the percent encoded characters should only be processed AFTER splitting by the path separator, at least according to my interpretation of the RFC.

From the sections of the RFC you pointed out to me, my interpretation the RFC states where you must use unencoded characters, which are only a few places (e.g. separating scheme/domain and domain/path). It places no restrictions on where encoded characters can be. As I mentioned earlier, Nginx will treat %2F as / when serving static files, so having Rack do so as well doesn't seem wrong.

Now, if you were returning the path split into segments, I think based on the BNF you would be correct to split it only on slashes. But that's not exactly what clean_path_info is doing, and as I showed earlier, a naive approach to that leads to security vulnerabilities.

ioquatix · 2025-03-11T01:33:02Z

I was surprised to see this:

samuel@aiko /u/s/n/html> tree
.
├── 50x.html
├── foo
│   └── bar.txt
└── index.html

2 directories, 3 files
samuel@aiko /u/s/n/html> ruby -r net/http -e "puts Net::HTTP.get(URI('http://localhost/foo%2Fbar.txt'))"
bar bar bar
samuel@aiko /u/s/n/html> ruby -r net/http -e "puts Net::HTTP.get(URI('http://localhost/foo/bar.txt'))"
bar bar bar

ioquatix · 2025-03-11T01:38:18Z

Interesting, PR to change the Nginx behaviour: nginx/nginx#569

%2F is forbidden in paths. %2F might be an attempt to bypass filtering by a reverse proxy upstream to Nginx, and I believe Apache automatically 404s any request containing %2F in a path.

akostadinov · 2025-03-11T10:54:46Z

@ioquatix , these changes make a lot of sense to me, thank you for looking into this!

The changes map what is present in the RFC and although I can't claim full understanding of intended usage, I see this as a big net improvement of making this method more predictable.

There are still a couple of important questions in #2257

wrt nginx/nginx#569, I'm in favor of making invalid a percent encoded / (%2F). But then it makes sense to also reject backslashes too. Because it might pose a similar risk on windows platforms if I understand that correctly.

It may cause issues though if somebody is using path parameters with such characters not to represent file system paths - for example having some name in the path or Base64 representation of it. I've seen this in a project I've worked on. So it's fine to leave them be.

@jeremyevans , I while technically true, that more work results in more opportunities for errors, the goal is to make Rack more helpful in voiding security issues on the app level. Especially in case Rack supports non-standard processing that can cause discrepancies between what it serves and what a reverse proxy expects or what an app expects.

Presently allowing all these different path separators is ripe for cache poisoning and path traversal attacks.

I think the point of a middleware software like Rack is to help the developers do the right thing and not fall into known traps. So it is reasonable for me to spend some more effort here.

ioquatix requested review from tenderlove and jeremyevans March 10, 2025 22:34

ioquatix force-pushed the path-separator branch from 98930bb to 2861484 Compare March 11, 2025 00:19

akostadinov approved these changes Mar 11, 2025

View reviewed changes

ioquatix added 2 commits May 19, 2025 12:46

Don't use file separator for splitting path info.

f3d6ab9

Strict validation of path info segments.

a999846

ioquatix force-pushed the path-separator branch from 2861484 to a999846 Compare May 19, 2025 03:46

ioquatix force-pushed the main branch from d5e4576 to fac9a7a Compare July 31, 2025 01:07

ioquatix added this to the v3.3.0 milestone Jul 31, 2025

ioquatix mentioned this pull request Jul 31, 2025

consistent clean_path_info separator #2257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't use file separator for splitting path info. #2302

Don't use file separator for splitting path info. #2302

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Don't use file separator for splitting path info. #2302

Are you sure you want to change the base?

Don't use file separator for splitting path info. #2302

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!