From life with Kubernetes: How the HTTP server did not favor the Spaniards

From life with Kubernetes: How the HTTP server did not favor the Spaniards

A representative of our client, whose application stack lives in the cloud from Microsoft (Azure), came up with a problem: recently, some requests from some clients from Europe began to fail with a 400 error (Bad Request). All applications are written in .NET, deployed in Kubernetes…

One of the applications is the API through which all traffic ultimately comes. This traffic is listening on an HTTP server Kestrel, configured by the .NET client and placed in the pod. With debugging, we were lucky in the sense that there was a specific user who consistently reproduced the problem. However, everything was complicated by the traffic chain:

From life with Kubernetes: How the HTTP server did not favor the Spaniards

The error in Ingress looked like this:

{
   "number_fields":{
      "status":400,
      "request_time":0.001,
      "bytes_sent":465,
      "upstream_response_time":0,
      "upstream_retries":0,
      "bytes_received":2328
   },
   "stream":"stdout",
   "string_fields":{
      "ingress":"app",
      "protocol":"HTTP/1.1",
      "request_id":"f9ab8540407208a119463975afda90bc",
      "path":"/api/sign-in",
      "nginx_upstream_status":"400",
      "service":"app",
      "namespace":"production",
      "location":"/front",
      "scheme":"https",
      "method":"POST",
      "nginx_upstream_response_time":"0.000",
      "nginx_upstream_bytes_received":"120",
      "vhost":"api.app.example.com",
      "host":"api.app.example.com",
      "user":"",
      "address":"83.41.81.250",
      "nginx_upstream_addr":"10.240.0.110:80",
      "referrer":"https://api.app.example.com/auth/login?long_encrypted_header",
      "service_port":"http",
      "user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",
      "time":"2019-03-06T18:29:16+00:00",
      "content_kind":"cache-headers-not-present",
      "request_query":""
   },
   "timestamp":"2019-03-06 18:29:16",
   "labels":{
      "app":"nginx",
      "pod-template-generation":"6",
      "controller-revision-hash":"1682636041"
   },
   "namespace":"kube-nginx-ingress",
   "nsec":6726612,
   "source":"kubernetes",
   "host":"k8s-node-55555-0",
   "pod_name":"nginx-v2hcb",
   "container_name":"nginx",
   "boolean_fields":{}
}

At the same time, Kestrel gave:

HTTP/1.1 400 Bad Request
Connection: close
Date: Wed, 06 Mar 2019 12:34:20 GMT
Server: Kestrel
Content-Length: 0

Even at maximum verbosity, the Kestrel error was extremely little useful information:

{
   "number_fields":{"ThreadId":76},
   "stream":"stdout",
   "string_fields":{
      "EventId":"{"Id"=>17, "Name"=>"ConnectionBadRequest"}",
      "SourceContext":"Microsoft.AspNetCore.Server.Kestrel",
      "ConnectionId":"0HLL2VJSST5KV",
      "@mt":"Connection id "{ConnectionId}" bad request data: "{message}"",
      "@t":"2019-03-07T13:06:48.1449083Z",
      "@x":"Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Malformed request: invalid headers.n   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1Connection.TryParseRequest(ReadResult result, Boolean& endConnection)n   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.<ProcessRequestsAsync>d__185`1.MoveNext()",
      "message":"Malformed request: invalid headers."
   },
   "timestamp":"2019-03-07 13:06:48",
   "labels":{
      "pod-template-hash":"2368795483",
      "service":"app"
   },
   "namespace":"production",
   "nsec":145341848,
   "source":"kubernetes",
   "host":"k8s-node-55555-1",
   "pod_name":"app-67bdcf98d7-mhktx",
   "container_name":"app",
   "boolean_fields":{}
}

It would seem that only tcpdump will help in solving this problem ... but I will repeat about the traffic chain:

From life with Kubernetes: How the HTTP server did not favor the Spaniards

Investigation

Obviously, it's better to listen to traffic on that particular node, where Kubernetes deployed a pod: the volume of the dump will be such that it will be possible to quickly find at least something. And indeed, when considering it, the following frame was noticed:

GET /back/user HTTP/1.1
Host: api.app.example.com
X-Request-ID: 27ceb14972da8c21a8f92904b3eff1e5
X-Real-IP: 83.41.81.250
X-Forwarded-For: 83.41.81.250
X-Forwarded-Host: api.app.example.com
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Original-URI: /front/back/user
X-Scheme: https
X-Original-Forwarded-For: 83.41.81.250
X-Nginx-Geo-Client-Country: Spain
X-Nginx-Geo-Client-City: M.laga
Accept-Encoding: gzip
CF-IPCountry: ES
CF-RAY: 4b345cfd1c4ac691-MAD
CF-Visitor: {"scheme":"https"}
pragma: no-cache
cache-control: no-cache
accept: application/json, text/plain, */*
origin: https://app.example.com
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36
referer: https://app.example.com/auth/login
accept-language: en-US,en;q=0.9,en-GB;q=0.8,pl;q=0.7
cookie: many_encrypted_cookies; .AspNetCore.Identity.Application=something_encrypted; 
CF-Connecting-IP: 83.41.81.250
True-Client-IP: 83.41.81.250
CDN-Loop: cloudflare

HTTP/1.1 400 Bad Request
Connection: close
Date: Wed, 06 Mar 2019 12:34:20 GMT
Server: Kestrel
Content-Length: 0

On closer examination of the dump, the word was noticed M.laga. It is easy to guess that there is no M.laga city in Spain (but there is MΓ‘laga). Seizing on this idea, we looked at the Ingress configs, where we saw the one inserted a month ago (at the request of the client) "harmless" snippet:

    ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header X-Nginx-Geo-Client-Country $geoip_country_name;
      proxy_set_header X-Nginx-Geo-Client-City $geoip_city;

When you disable the forwarding of these headers, everything became fine! (It soon became clear that these headers were no longer required by the application itself.)

Now let's look at the problem more generally. It is easy to reproduce it inside the application if you make a telnet request to localhost:80:

GET /back/user HTTP/1.1
Host: api.app.example.com
cache-control: no-cache
accept: application/json, text/plain, */*
origin: https://app.example.com
Cookie: test=Desiree

…returns 401 Unauthorized, as expected. What happens if we do:

GET /back/user HTTP/1.1
Host: api.app.example.com
cache-control: no-cache
accept: application/json, text/plain, */*
origin: https://app.example.com
Cookie: test=DΓ©sirΓ©e

?

will return 400 Bad request - in the application log we will get the error already familiar to us:

{
   "@t":"2019-03-31T12:59:54.3746446Z",
   "@mt":"Connection id "{ConnectionId}" bad request data: "{message}"",
   "@x":"Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Malformed request: invalid headers.n   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1Connection.TryParseRequest(ReadResult result, Boolean& endConnection)n   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.<ProcessRequestsAsync>d__185`1.MoveNext()",
   "ConnectionId":"0HLLLR1J974L9",
   "message":"Malformed request: invalid headers.",
   "EventId":{
      "Id":17,
      "Name":"ConnectionBadRequest"
   },
   "SourceContext":"Microsoft.AspNetCore.Server.Kestrel",
   "ThreadId":71
}

Results

Specifically Kestrel can not correctly handle HTTP headers with valid characters in UTF-8, which are contained in the names of a fairly large number of cities.

An additional factor in our case is that the client does not currently plan to change the implementation of Kestrel in the application. However, issues in AspNetCore itself (β„– 4318, β„– 7707) say that this will not help ...

To sum up: the note is no longer about Kestrel or UTF-8 specific issues (in 2019?!), but about the fact that attentiveness and consistent study every step during the search for problems sooner or later will bear fruit. Good luck!

PS

Read also on our blog:

Source: habr.com

Add a comment