E545 Don't use file separator for splitting path info. by ioquatix · Pull Request #2302 · rack/rack · GitHub
[go: up one dir, main page]

Skip to content

Conversation

ioquatix
Copy link
Member
@ioquatix ioquatix commented Mar 10, 2025

It is only by coincidence that the ::File::SEPARATOR is the same as the URI path separator, but semantically, they are unrelated. Ensure that only "/" is used to separate path segments, as per the RFC: https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

Note: This does not change the interpretation of path_info which may be unescaped prior to being an argument of this method - as such, it does not change the interpretation of %2F characters which will be a follow up discussion/PR.

@ioquatix
Copy link
Member Author

@akostadinov this is a first step towards resolving the issues you raised in #2257. Do you mind reviewing this change?

@tenderlove
Copy link
Member

I think this is good. I'm not sure why we used alt separator. Does this mean URI cleaning was OS dependent before this PR?

@ioquatix
Copy link
Member Author

I think this is good. I'm not sure why we used alt separator. Does this mean URI cleaning was OS dependent before this PR?

In principle, yes. In practice, it's unlikely people were deploying servers on Mac System 9 or Windows - not sure if there are other major situations where this could apply.

@jeremyevans
Copy link
Contributor

I don't think this is a good idea. I believe it may it introduce a security vulnerability on Windows (where ALT_SEPARATOR is \\).

Consider:

Rack::Utils.clean_path_info("/foo/..\\../bar")
# Before: "/bar"
# After: "/foo/..\\../bar"

If Rack::Static/Rack::Files allows paths starting with /foo, Windows will allow this file even though it is outside of the root. Windows will handle this type of path just fine:

File.read(".gem/specs/..\\../.irb_history").length
# => 1736 # for me

@ioquatix
Copy link
Member Author
ioquatix commented Mar 10, 2025

Should Rack::Utils.clean_path_info("/foo/..\\../bar") be valid or perhaps raise an error? It seems odd to me that it might "work" depending on the OS you are running the server on.

I think we should be clear that the input "path_info" is a URI path, not a file path. Perhaps if there are path segments that contain semantic characters, we can raise an error (e.g. ::File::SEPARATOR / ::File::ALT_SEPARATOR).

@ioquatix
Copy link
Member Author
ioquatix commented Mar 11, 2025

For reference, here is how Rails uses clean_path_info: https://github.com/rails/rails/blob/801959745469e3ab2ad48ba3f2191f68792e7def/actionpack/lib/action_dispatch/middleware/static.rb#L185-L190

By the time clean_path_info is called, the argument is actually not a valid path_info according to the RFC, because all the percent encoded characters are replaced, which is why stricter validation (according to the RFC) fails. Technically, the percent encoded characters should only be processed AFTER splitting by the path separator, at least according to my interpretation of the RFC.

      path          = path-abempty    ; begins with "/" or is empty
                    / path-absolute   ; begins with "/" but not "//"
                    / path-noscheme   ; begins with a non-colon segment
                    / path-rootless   ; begins with a segment
                    / path-empty      ; zero characters

      path-abempty  = *( "/" segment )
      path-absolute = "/" [ segment-nz *( "/" segment ) ]
      path-noscheme = segment-nz-nc *( "/" segment )
      path-rootless = segment-nz *( "/" segment )
      path-empty    = 0<pchar>
      segment       = *pchar
      segment-nz    = 1*pchar
      segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                    ; non-zero-length segment without any colon ":"

      pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

Specifically: path is composed of segments separated by "/" characters, of which those segments consist of pchar which may contain pct-encoded sequences. There is no provision for CGI.unescape(path_info) having meaning, at least according to the above parser.

@jeremyevans
Copy link
Contributor

Should Rack::Utils.clean_path_info("/foo/..\\../bar") be valid or perhaps raise an error?

It's more work to have it raise an error. More work generally results in more complexity, and more complexity generally makes it more likely that there will be bugs/security issues.

As long as it is handled securely, I don't see the reason to go out of the way to complain about it.

It seems odd to me that it might "work" depending on the OS you are running the server on.

There are a lot of things that may or may not work depending on the OS. Even for the same OS, some file systems may be case sensitive and some not.

I think we should be clear that the input "path_info" is a URI path, not a file path. Perhaps if there are path segments that contain semantic characters, we can raise an error (e.g. ::File::SEPARATOR / ::File::ALT_SEPARATOR).

Considering how Rack itself is using it (only for Files/Static), even if the path given is a URI path, it will be treated as a file path. While we could raise an error, I'm not sure we want to, for reasons described above.

By the time clean_path_info is called, the argument is actually not a valid path_info according to the RFC, because all the percent encoded characters are replaced, which is why stricter validation (according to the RFC) fails. Technically, the percent encoded characters should only be processed AFTER splitting by the path separator, at least according to my interpretation of the RFC.

From the sections of the RFC you pointed out to me, my interpretation the RFC states where you must use unencoded characters, which are only a few places (e.g. separating scheme/domain and domain/path). It places no restrictions on where encoded characters can be. As I mentioned earlier, Nginx will treat %2F as / when serving static files, so having Rack do so as well doesn't seem wrong.

Now, if you were returning the path split into segments, I think based on the BNF you would be correct to split it only on slashes. But that's not exactly what clean_path_info is doing, and as I showed earlier, a naive approach to that leads to security vulnerabilities.

@ioquatix
Copy link
Member Author

I was surprised to see this:

samuel@aiko /u/s/n/html> tree
.
├── 50x.html
├── foo
│   └── bar.txt
└── index.html

2 directories, 3 files
samuel@aiko /u/s/n/html> ruby -r net/http -e "puts Net::HTTP.get(URI('http://localhost/foo%2Fbar.txt'))"
bar bar bar
samuel@aiko /u/s/n/html> ruby -r net/http -e "puts Net::HTTP.get(URI('http://localhost/foo/bar.txt'))"
bar bar bar

@ioquatix
Copy link
Member Author
ioquatix commented Mar 11, 2025

Interesting, PR to change the Nginx behaviour: nginx/nginx#569

%2F is forbidden in paths. %2F might be an attempt to bypass filtering by a reverse proxy upstream to Nginx, and I believe Apache automatically 404s any request containing %2F in a path.

@akostadinov
Copy link

@ioquatix , these changes make a lot of sense to me, thank you for looking into this!

The changes map what is present in the RFC and although I can't claim full understanding of intended usage, I see this as a big net improvement of making this method more predictable.

There are still a couple of important questions in #2257

wrt nginx/nginx#569, I'm in favor of making invalid a percent encoded / (%2F). But then it makes sense to also reject backslashes too. Because it might pose a similar risk on windows platforms if I understand that correctly.

It may cause issues though if somebody is using path parameters with such characters not to represent file system paths - for example having some name in the path or Base64 representation of it. I've seen this in a project I've worked on. So it's fine to leave them be.

@jeremyevans , I while technically true, that more work results in more opportunities for errors, the goal is to make Rack more helpful in voiding security issues on the app level. Especially in case Rack supports non-standard processing that can cause discrepancies between what it serves and what a reverse proxy expects or what an app expects.

Presently allowing all these different path separators is ripe for cache poisoning and path traversal attacks.

I think the point of a middleware software like Rack is to help the developers do the right thing and not fall into known traps. So it is reasonable for me to spend some more effort here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels 4CBD
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0