8000 Don't include default port in upstream url to proxy by ejose19 · Pull Request #498 · tinyproxy/tinyproxy · GitHub
[go: up one dir, main page]

Skip to content

Don't include default port in upstream url to proxy #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ejose19
Copy link
@ejose19 ejose19 commented Jun 16, 2023

@rofl0r
Copy link
Contributor
rofl0r commented Jun 17, 2023

don't just throw an uncommented commit and a link at us, describe what problem you experience and what the commit is meant to fix. the commit message has unlimited space for explanations.

@ejose19
Copy link
Author
ejose19 commented Jun 17, 2023

Hello @rofl0r, since the issue is very similar to the one in curl library, I thought linking them was better.

The problem is that tinyproxy is appending/preserving either :80 or :443 to uri sent to upstream proxy even if it's the default port for the connection protocol (80 for http, 443 for https). This ends up with a url that violates RFC7230 because the Host header doesn't match the target uri and make upstream proxy fail to process due to this mismatch.

Example (tinyproxy listening on 127.0.0.1:8082 upstreaming to proxy on 127.0.0.1:8081):

$ all_proxy=127.0.0.1:8082 curl http://example.com
GET http://example.com:80/ HTTP/1.0
Host: example.com

$ all_proxy=127.0.0.1:8082 curl http://example.com:80
GET http://example.com:80/ HTTP/1.0
Host: example.com

This issue is only present on default ports, as using other port yields the same for both Host and target uri:

$ all_proxy=127.0.0.1:8082 curl http://example.com:81
GET http://example.com:81/ HTTP/1.0
Host: example.com:81

So this PR makes url sent to proxy to not include the port if it's the default for its protocol (so both Host and target uri matches)

@rofl0r
Copy link
Contributor
rofl0r commented Jun 18, 2023

make upstream proxy fail to process

are you talking about an actual upstream (see Upstream directive in config) or just the http server one connects to ?

also, what upstream server fails to process this correctly ? it seems odd nobody ever reported this as an issue before if this was an actual practical problem, not just theoretical.

apart from that, HTTPS connections are made via CONNECT method, so whatever the browser puts into Host: field is forwarded unchanged to the target.

btw do you use latest master, or some outdated version for your testing ?

@ejose19
Copy link
Author
ejose19 commented Jun 18, 2023

make upstream proxy fail to process

are you talking about an actual upstream (see Upstream directive in config) or just the http server one connects to ?

Yes, I'm talking about the "actual upstream" (connected using the Upstream directive)

also, what upstream server fails to process this correctly ? it seems odd nobody ever reported this as an issue before if this was an actual practical problem, not just theoretical.

I found out this issue because I'm chaining tinyproxy to apt-cacher-ng, and due the target uri rewritting (adding that :80) makes the target uri and host mismatch which leads to failure to process the request

apart from that, HTTPS connections are made via CONNECT method, so whatever the browser puts into Host: field is forwarded unchanged to the target.

The problem is not the Host: field, it's the target uri that tinyproxy currently sends which doesn't match the Host: field when using default ports [80, 443] (see the first 2 examples, Host: doesn't include the port but the target uri does)

btw do you use latest master, or some outdated version for your testing ?

Tested compiling latest master

@rofl0r
Copy link
Contributor
rofl0r commented Jun 21, 2023

i've taken the time to read the mentioned RFC, and when it comes to the "optional port number", it refers to https://www.rfc-editor.org/rfc/rfc3986#section-3.2.3, which reads:

3.2.3. Port

The port subcomponent of authority is designated by an optional port
number in decimal following the host and delimited from it by a
single colon (":") character.

 port        = *DIGIT

A scheme may define a default port. For example, the "http" scheme
defines a default port of "80", corresponding to its reserved TCP
port number. The type of port designated by the port number (e.g.,
TCP, UDP, SCTP) is defined by the URI scheme. URI producers and
normalizers should omit the port component and its ":" delimiter if
port is empty or if its value would be the same as that of the
scheme's default.

emphasis mine. so the port should be omitted, but it's not a must. in which case the text would use the terms must or shall.

this basically means that the culprit isn't tinyproxy (it conforms to RFC), but apt-cache-ng, and therefore apt-cache-ng should be fixed.

@ejose19
Copy link
Author
ejose19 commented Jun 22, 2023

i've taken the time to read the mentioned RFC, and when it comes to the "optional port number", it refers to https://www.rfc-editor.org/rfc/rfc3986#section-3.2.3, which reads:

3.2.3. Port
The port subcomponent of authority is designated by an optional port
number in decimal following the host and delimited from it by a
single colon (":") character.

 port        = *DIGIT

A scheme may define a default port. For example, the "http" scheme
defines a default port of "80", corresponding to its reserved TCP
port number. The type of port designated by the port number (e.g.,
TCP, UDP, SCTP) is defined by the URI scheme. URI producers and
normalizers should omit the port component and its ":" delimiter if
port is empty or if its value would be the same as that of the
scheme's default.

emphasis 8000 mine. so the port should be omitted, but it's not a must. in which case the text would use the terms must or shall.

The link you provided is for RFC3986, but the issue is not with that one but RFC7230, specifically section 5.4:

A client MUST send a Host header field in all HTTP/1.1 request
messages. If the target URI includes an authority component, then a
client MUST send a field-value for Host that is identical to that
authority component, excluding any userinfo subcomponent and its "@"
delimiter (Section 2.7.1).

As seen in the examples, the client correctly set both target URI and Host to be identical, but it's tinyproxy the one that's adding the :80 or :443 when the client didn't set it, hence modifying the request and making it not comply with the RFC (see it says MUST and not SHOULD)

this basically means that the culprit isn't tinyproxy (it conforms to RFC), but apt-cache-ng, and therefore apt-cache-ng should be fixed.

Even though upstream proxies could fix this on their end the same way curl did, it's not only applicable to apt-cacher-ng (see original issue, haproxy also failed with this thing of error) as they're expecting both the target URI and Host header to be identical. So I'd say that at least tinyproxy shouldn't be touching the target URI sent by the client at all (that'd require a different change)

@rofl0r
Copy link
Contributor
rofl0r commented Jun 22, 2023

good find. can you try whether the attached patch fixes the issue (apply on master without your PR)

diff --git a/src/reqs.c b/src/reqs.c
index 45db118..5179505 100644
--- a/src/reqs.c
+++ b/src/reqs.c
@@ -258,11 +258,7 @@ establish_http_connection (struct conn_s *connptr, struct request_s *request)
         char portbuff[7];
         char dst[sizeof(struct in6_addr)];
 
-        /* Build a port string if it's not a standard port */
-        if (request->port != HTTP_PORT && request->port != HTTP_PORT_SSL)
-                snprintf (portbuff, 7, ":%u", request->port);
-        else
-                portbuff[0] = '\0';
+        snprintf (portbuff, 7, ":%u", request->port);
 
         if (inet_pton(AF_INET6, request->host, dst) > 0) {
                 /* host is an IPv6 address literal, so surround it with

the patch makes it so the port field is unconditionally added in both uri and host field, and should therefore comply with the requirements of RFC7230.

@ejose19
Copy link
Author
ejose19 commented Jun 22, 2023

good find. can you try whether the attached patch fixes the issue (apply on master without your PR)

diff --git a/src/reqs.c b/src/reqs.c
index 45db118..5179505 100644
--- a/src/reqs.c
+++ b/src/reqs.c
@@ -258,11 +258,7 @@ establish_http_connection (struct conn_s *connptr, struct request_s *request)
         char portbuff[7];
         char dst[sizeof(struct in6_addr)];
 
-        /* Build a port string if it's not a standard port */
-        if (request->port != HTTP_PORT && request->port != HTTP_PORT_SSL)
-                snprintf (portbuff, 7, ":%u", request->port);
-        else
-                portbuff[0] = '\0';
+        snprintf (portbuff, 7, ":%u", request->port);
 
         if (inet_pton(AF_INET6, request->host, dst) > 0) {
                 /* host is an IPv6 address literal, so surround it with

the patch makes it so the port field is unconditionally added in both uri and host field, and should therefore comply with the requirements of RFC7230.

That doesn't work for https as you mentioned

apart from that, HTTPS connections are made via CONNECT method, so whatever the browser puts into Host: field is forwarded unchanged to the target.

So this would still cause issues, and there's also a concern for leaving the port unconditionally as already talked in the upstream issue, quoting the relevant parts:

issue request poster:

curl should construct a request that conforms to RFC 7230. This could be achieved by any of:

  • Retaining the port number within the Host header unconditionally, or
  • Retaining the port number within the Host header when issuing a request via a proxy, or
  • Stripping default port numbers from the request URI target using the same logic as in Curl_http_host()

curl member reply:

Let me address the three suggestions one by one for completeness:

  • Retaining the port number within the Host header unconditionally, or

This doesn't work on the Internet. Servers out there will break on port numbers for default ports in the Host: header. We learned that decades ago.

  • Retaining the port number within the Host header when issuing a request via a proxy, or

I fear that this will end up in the same bucket as the above, since the Host: header will most probably be passed along as-is to the remote server.

  • Stripping default port numbers from the request URI target using the same logic as in Curl_http_host()

I believe this alternative has the highest chance of working with the least amount of friction and problems.

Given curl popularity and being the backbone for many tools and applications, I'd say that following their approach to this problem is the best (unless you have a particular reason not to)

@rofl0r
Copy link
Contributor
rofl0r commented Jun 22, 2023

happy with this patch ? please test.

diff --git a/src/reqs.c b/src/reqs.c
index 45db118..8ce0a44 100644
--- a/src/reqs.c
+++ b/src/reqs.c
@@ -249,6 +249,19 @@ ERROR_EXIT:
         return -1;
 }
 
+/*
+ * fill into buf either :port or empty string,
+ * according to wether the port is a default port.
+ * buf must be of size >= 7
+ */
+static void port_or_not(char *buf, int port) {
+        /* Build a port string if it's not a standard port */
+        if (port != HTTP_PORT && port != HTTP_PORT_SSL)
+                snprintf (buf, 7, ":%u", port);
+        else
+                buf[0] = '\0';
+}
+
 /*
  * Create a connection for HTTP connections.
  */
@@ -258,11 +271,7 @@ establish_http_connection (struct conn_s *connptr, struct request_s *request)
         char portbuff[7];
         char dst[sizeof(struct in6_addr)];
 
-        /* Build a port string if it's not a standard port */
-        if (request->port != HTTP_PORT && request->port != HTTP_PORT_SSL)
-                snprintf (portbuff, 7, ":%u", request->port);
-        else
-                portbuff[0] = '\0';
+        port_or_not(portbuff, request->port);
 
         if (inet_pton(AF_INET6, request->host, dst) > 0) {
                 /* host is an IPv6 address literal, so surround it with
@@ -1413,6 +1422,7 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
          */
         return -1;
 #else
+        char portbuff[7];
         char *combined_string;
         int len;
 
@@ -1451,6 +1461,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                      "using file descriptor %d.",
                      cur_upstream->host, connptr->server_fd);
 
+        port_or_not(portbuff, request->port);
+
         /*
          * We need to re-write the "path" part of the request so that we
          * can reuse the establish_http_connection() function. It expects a
@@ -1464,8 +1476,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                         return -1;
                 }
 
-                snprintf (combined_string, len, "%s:%d", request->host,
-                          request->port);
+                snprintf (combined_string, len, "%s%s", request->host,
+                          portbuff);
         } else {
                 len = strlen (request->host) + strlen (request->path) + 14;
                 combined_string = (char *) safemalloc (len);
@@ -1473,8 +1485,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                         return -1;
                 }
 
-                snprintf (combined_string, len, "http://%s:%d%s", request->host,
-                          request->port, request->path);
+                snprintf (combined_string, len, "http://%s%s%s", request->host,
+                          portbuff, request->path);
         }
 
         if (request->path)

@ejose19
Copy link
Author
ejose19 commented Jun 23, 2023

happy with this patch ? please test.

diff --git a/src/reqs.c b/src/reqs.c
index 45db118..8ce0a44 100644
--- a/src/reqs.c
+++ b/src/reqs.c
@@ -249,6 +249,19 @@ ERROR_EXIT:
         return -1;
 }
 
+/*
+ * fill into buf either :port or empty string,
+ * according to wether the port is a default port.
+ * buf must be of size >= 7
+ */
+static void port_or_not(char *buf, int port) {
+        /* Build a port string if it's not a standard port */
+        if (port != HTTP_PORT && port != HTTP_PORT_SSL)
+                snprintf (buf, 7, ":%u", port);
+        else
+                buf[0] = '\0';
+}
+
 /*
  * Create a connection for HTTP connections.
  */
@@ -258,11 +271,7 @@ establish_http_connection (struct conn_s *connptr, struct request_s *request)
         char portbuff[7];
         char dst[sizeof(struct in6_addr)];
 
-        /* Build a port string if it's not a standard port */
-        if (request->port != HTTP_PORT && request->port != HTTP_PORT_SSL)
-                snprintf (portbuff, 7, ":%u", request->port);
-        else
-                portbuff[0] = '\0';
+        port_or_not(portbuff, request->port);
 
         if (inet_pton(AF_INET6, request->host, dst) > 0) {
                 /* host is an IPv6 address literal, so surround it with
@@ -1413,6 +1422,7 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
          */
         return -1;
 #else
+        char portbuff[7];
         char *combined_string;
         int len;
 
@@ -1451,6 +1461,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                      "using file descriptor %d.",
                      cur_upstream->host, connptr->server_fd);
 
+        port_or_not(portbuff, request->port);
+
         /*
          * We need to re-write the "path" part of the request so that we
          * can reuse the establish_http_connection() function. It expects a
@@ -1464,8 +1476,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                         return -1;
                 }
 
-                snprintf (combined_string, len, "%s:%d", request->host,
-                          request->port);
+                snprintf (combined_string, len, "%s%s", request->host,
+                          portbuff);
         } else {
                 len = strlen (request->host) + strlen (request->path) + 14;
                 combined_string = (char *) safemalloc (len);
@@ -1473,8 +1485,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                         return -1;
                 }
 
-                snprintf (combined_string, len, "http://%s:%d%s", request->host,
-                          request->port, request->path);
+                snprintf (combined_string, len, "http://%s%s%s", request->host,
+                          portbuff, request->path);
         }
 
         if (request->path)

I got error: corrupt patch at line 76 but manually creating the patch worked.

diff --git a/src/reqs.c b/src/reqs.c
index 45db118..8ce0a44 100644
--- a/src/reqs.c
+++ b/src/reqs.c
@@ -249,6 +249,19 @@ ERROR_EXIT:
         return -1;
 }
 
+/*
+ * fill into buf either :port or empty string,
+ * according to wether the port is a default port.
+ * buf must be of size >= 7
+ */
+static void port_or_not(char *buf, int port) {
+        /* Build a port string if it's not a standard port */
+        if (port != HTTP_PORT && port != HTTP_PORT_SSL)
+                snprintf (buf, 7, ":%u", port);
+        else
+                buf[0] = '\0';
+}
+
 /*
  * Create a connection for HTTP connections.
  */
@@ -258,11 +271,7 @@ establish_http_connection (struct conn_s *connptr, struct request_s *request)
         char portbuff[7];
         char dst[sizeof(struct in6_addr)];
 
-        /* Build a port string if it's not a standard port */
-        if (request->port != HTTP_PORT && request->port != HTTP_PORT_SSL)
-                snprintf (portbuff, 7, ":%u", request->port);
-        else
-                portbuff[0] = '\0';
+        port_or_not(portbuff, request->port);
 
         if (inet_pton(AF_INET6, request->host, dst) > 0) {
                 /* host is an IPv6 address literal, so surround it with
@@ -1413,6 +1413,7 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
          */
         return -1;
 #else
+        char portbuff[7];
         char *combined_string;
         int len;
 
@@ -1451,6 +1451,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                      "using file descriptor %d.",
                      cur_upstream->host, connptr->server_fd);
 
+        port_or_not(portbuff, request->port);
+
         /*
          * We need to re-write the "path" part of the request so that we
          * can reuse the establish_http_connection() function. It expects a
@@ -1464,8 +1464,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                         return -1;
                 }
 
-                snprintf (combined_string, len, "%s:%d", request->host,
-                          request->port);
+                snprintf (combined_string, len, "%s%s", request->host,
+                          portbuff);
         } else {
                 len = strlen (request->host) + strlen (request->path) + 14;
                 combined_string = (char *) safemalloc (len);
@@ -1473,8 +1473,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                         return -1;
                 }
 
-                snprintf (combined_string, len, "http://%s:%d%s", request->host,
-                          request->port, request->path);
+                snprintf (combined_string, len, "http://%s%s%s", request->host,
+                          portbuff, request->path);
         }
 
         if (request->path)

Also ran some tests and couldn't find any inconsistent target uri & host, so it's working as expected.

8000

@rofl0r
Copy link
Contributor
rofl0r commented Jun 23, 2023

i can see a potential issue though, someone running a plain http server on port 443... probably quite uncommon, yet possible

@ejose19
Copy link
Author
ejose19 commented Jun 23, 2023

i can see a potential issue though, someone running a plain http server on port 443... probably quite uncommon, yet possible

We can use the connptr->connect_method for that (similar on how's currently done on this pr). Here's a diff adjusting
port_or_not to handle that case.

diff --git a/src/reqs.c b/src/reqs.c
index 45db118..729ddf4 100644
--- a/src/reqs.c
+++ b/src/reqs.c
@@ -249,6 +249,22 @@ ERROR_EXIT:
         return -1;
 }
 
+/*
+ * fill into buf either :port or empty string,
+ * according to wether the port is a default port.
+ * buf must be of size >= 7
+ */
+static void port_or_not(char *buf, int port, unsigned int connect_method) {
+        /* Build a port string if it's not a standard port */
+        if (
+                (!connect_method && port != HTTP_PORT) 
+                    || (connect_method && port != HTTP_PORT_SSL)
+        )
+                snprintf (buf, 7, ":%u", port);
+        else
+                buf[0] = '\0';
+}
+
 /*
  * Create a connection for HTTP connections.
  */
@@ -258,11 +274,7 @@ establish_http_connection (struct conn_s *connptr, struct request_s *request)
         char portbuff[7];
         char dst[sizeof(struct in6_addr)];
 
-        /* Build a port string if it's not a standard port */
-        if (request->port != HTTP_PORT && request->port != HTTP_PORT_SSL)
-                snprintf (portbuff, 7, ":%u", request->port);
-        else
-                portbuff[0] = '\0';
+        port_or_not(portbuff, request->port, connptr->connect_method);
 
         if (inet_pton(AF_INET6, request->host, dst) > 0) {
                 /* host is an IPv6 address literal, so surround it with
@@ -1413,6 +1425,7 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
          */
         return -1;
 #else
+        char portbuff[7];
         char *combined_string;
         int len;
 
@@ -1451,6 +1464,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                      "using file descriptor %d.",
                      cur_upstream->host, connptr->server_fd);
 
+        port_or_not(portbuff, request->port, connptr->connect_method);
+
         /*
          * We need to re-write the "path" part of the request so that we
          * can reuse the establish_http_connection() function. It expects a
@@ -1464,8 +1479,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                         return -1;
                 }
 
-                snprintf (combined_string, len, "%s:%d", request->host,
-                          request->port);
+                snprintf (combined_string, len, "%s%s", request->host,
+                          portbuff);
         } else {
                 len = strlen (request->host) + strlen (request->path) + 14;
                 combined_string = (char *) safemalloc (len);
@@ -1473,8 +1488,8 @@ connect_to_upstream (struct conn_s *connptr, struct request_s *request)
                         return -1;
                 }
 
-                snprintf (combined_string, len, "http://%s:%d%s", request->host,
-                          request->port, request->path);
+                snprintf (combined_string, len, "http://%s%s%s", request->host,
+                          portbuff, request->path);
         }
 
         if (request->path)

Should I edit the original PR?

@Gruummy
Copy link
Contributor
Gruummy commented Jul 14, 2024

@ejose19
I am also interested in this topic, and wonder why all this dance around the port handling.

I personally would have the following thoughts / opinions to make it clear:

  • tinyproxy should not directly care about which port to use for communiation.
  • It should be configured incl protocoll to be used by the one who configures tinyproxy config file
  • tinyproxy should simply take 1 to 1 what is configured under the setting of Upstream in the config file
    (in case of no dedicated port configured, use default port specified by rfc's for the protocol to be used for tcp communication)

Examples for configurations:

  • Upstream http some.remote.proxy:port --> send requests to upstream proxy http://some.remote.proxy:port
    (use dedicated configured port for tcp connection)
  • Upstream http some.remote.proxy --> send requests to upstream proxy http://some.remote.proxy
    (for the reason that no dedicated port is configured, but protocol 'http' --> use default port 80 for tcp connection)
  • Upstream https some.remote.proxy:port --> send requests to upstream proxy https://some.remote.proxy:port
    (use dedicated configured port for tcp connection)
  • Upstream https some.remote.proxy --> send requests to upstream proxy https://some.remote.proxy
    (for the reason that no dedicated port is configured, but protocol 'https' --> use default port 443 for tcp connection)

Could you provide your configuration examples you had used when you faced the issue ?
I would want to understand it fully

For me this would make the logic much more understandable, and if a port is configured, then it will be used. If not then not.

It can be, that this comment decribes what was in the last code diff proposed ... not got it fully. But it really would be good i think if we know how you had configured the tinyproxy.conf, to avoid to do assumptions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0