The Ingress Venn

This is intended to give ~~a brief~~ an overview of how to serve and consume traffic safely. Most of these components are abstracted away from you, so you’ll seldom get the full picture without a deep dive here or there.

Components of Ingress

On the public internet, there are three main problems when getting traffic to your site or service. They are a) figuring out where on the internet that service is, b) ensuring traffic to/from the service was not tampered with, and c) verifying that you’re actually talking to the correct service. Otherwise put:

Discovering and reaching a service
Establishing a secure connection to the service
Service identity verification

Discovering and reaching a service

How do we know where our service lives?

There are a few solutions to this, a pretty common one is DNS. Domain Name Servers form a distributed key value store which maps domain names to IP addresses. There are different kinds of records, but loosely speaking they all are to accomplish this same goal. dig is a command line utility for DNS lookup, here it used to show which IP addresses we should expect to find the service dns.google.com.

$ dig dns.google.com  
  
; <<>> DiG 9.18.28-1~deb12u2-Debian <<>> dns.google.com  
;; global options: +cmd  
;; Got answer:  
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3525  
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1  
  
;; OPT PSEUDOSECTION:  
; EDNS: version: 0, flags:; udp: 512  
;; QUESTION SECTION:  
;dns.google.com.                        IN      A  
  
;; ANSWER SECTION:  
dns.google.com.         647     IN      A       8.8.4.4  
dns.google.com.         647     IN      A       8.8.8.8  
  
;; Query time: 12 msec  
;; SERVER: 192.168.1.1#53(192.168.1.1) (UDP)  
;; WHEN: Tue Oct 29 08:39:56 MDT 2024  
;; MSG SIZE  rcvd: 75

Here you can see that dns.google.com resolves to both 8.8.8.8 and 8.8.4.4.

DNS will propagate information to each other. Each domain name registrar has a unique way of configuring records. It is bespoke and inane, but at least consuming them meets a spec.

Establishing a secure connection to the service

Great, so we know where the service lives. How do we connect to it securely? The internet is a dangerous place. With data moving about all willy-nilly, anyone can and inspect or (heaven forbid) change the data in your requests and responses (gasp!).

Side Quest: understanding HTTP and TCP

So there’s this thing called the OSI model. I’ll let you go read about it, or not. The two layers we’re primarily concerned with are Application and Transport. HTTP requests are made at the Application Layer, but moving the data happens at the Transport Layer.

Any time there’s something that feels Computer Science-y called a “model” then you can assuredly trust that it is closer to documentation than compiled code, and therefore less accurate as the Real Things You Interact With. Trust the landscape, not the map.

Requests made to http://example.com and https://example.com are both over HTTP, but are very different. That s is doing a lot of work and we need to understand where. While theoretical lectures are great, I prefer to write some programs to verify what’s happening as we explore it.

For this side-quest two commands curl (curl) and netcat (nc) will be our friends. First let’s use the python builtin module to host a directory as an HTTP server (echo 'hi there' > index.html && python -m http.server). This should be available on localhost:8000.

# first curl to see it works
$ curl localhost:8000  
hi there  
# second curl with the -i flag to get headers as well
$ curl -i localhost:8000  
HTTP/1.0 200 OK  
Server: SimpleHTTP/0.6 Python/3.10.8  
Date: Tue, 29 Oct 2024 15:15:51 GMT  
Content-type: text/html  
Content-Length: 9  
Last-Modified: Tue, 29 Oct 2024 15:12:26 GMT  
  
hi there

That right there is a valid HTTP Response. Curl sent a request, and this output shows both the HTTP Response Headers and Response Body, separated by two newline characters. Here’s a pretty picture from wikipedia.

Let’s tuck that in our back pocket and try to make use of it later.

# third curl with the -i flag to get headers as well, we want the entire response
$ curl -i localhost:8000 > valid-http.txt

So, if the Transport Layer just reads and writes bits and bytes, and we have all of the bits and bytes which satisfy a valid Application Layer response. Can we just run a TCP server which sends the bytes of that file across the wire each time a request comes in? Let’s try with nc (netcat).

# terminal a
while true; do
    nc -l 8001 < response.txt
done

# terminal b
$ curl localhost:8001    
hi there  
$ curl -i localhost:8001    
HTTP/1.0 200 OK  
Server: SimpleHTTP/0.6 Python/3.10.8  
Date: Tue, 29 Oct 2024 15:17:37 GMT  
Content-type: text/html  
Content-Length: 9  
Last-Modified: Tue, 29 Oct 2024 15:12:26 GMT  
  
hi there

Sure can!

Mini recap: we made an HTTP server, served a file, saved the entire response (including http response headers) in file, then made a TCP server serve that same response over TCP. And curl bought it! The old switcheroo.

Great, so what the hell am I going on about?

This tells me that while the Application Layer can be as picky as it wants, the Transport Layer doesn’t care what data it moves, just that it is it’s job to move it. You can see timestamps are from the original response; the tcp server doesn’t care, it just sends and receives data.

If it is the case that the Transport Layer is Application Layer agnostic, it would make a ton more sense to encrypt traffic at the Transport Layer. It could work for any Application Layer! At least that’s what I would do.

Whew, with our side quest over, let’s return to the main quest: actually encrypting traffic.

Main Quest: actually encrypting traffic

One last friend (openssl) is here to help. It can create both create keys and certificates! Plus, there’s a nc equivalent too, which helps serve and test the encrypted traffic using keys we just generated. Perfect!

# make a key
$ openssl req -x509 -newkey rsa:4096 -keyout server.key -out server.crt -days 365 -nodes -subj '/CN=localhost'
# serve encrypted traffic
$ while true; do
   openssl s_server -accept 8002 -cert server.crt -key server.key -quiet < valid-http.txt
done

And we can finally hit our local service with an https prefix instead of ahttp

$ curl --insecure https://localhost:8002
hi there

If it feels like we’re in the weeds it is because we are. But the important thing is that:

the example servers encrypts at the transport layer, TCP (layer 4)
the example clients need an HTTP server

Some form of security can occur at one layer (encryption) while the other is largely unaffected.

Time to play with wireshark, specifically it’s terminal counterpart tshark. We’re going to capture a single request to each of the services we now have running:

http://localhost:8000 a http server (via python)
http://localhost:8001 a tcp server (via nc)
https://localhost:8002 a tls server (via openssl)

The commands I’ll run are (where X is 0,1,2)

curl http://localhost:8000
curl http://localhost:8001
curl --insecure https://localhost:8002

Before each, in another terminal (where X is 0,1,2, respectively):

tshark -i lo -f "port 800X" -d "tcp.port==800X,tls" -x -w capture800X.pcap

This tshark command captures all network traffic for that port and stuffs it in a file. So what’s in the file!?

Here are excerpts from each:

From capture8000.txt

*llNgblJE<@@<@d~æL0  
**ldNlBE4@@Cd@æL߀(  
**dNlE@@Cd@æL߀v  
**GET / HTTP/1.1  
Host: localhost:8000  
User-Agent: curl/7.88.1  
Accept: */*  
  
dN1!mBE4jY@@h@d(  
**dN5EjZ@@ѯ@d  
**HTTP/1.0 200 OK  
Server: SimpleHTTP/0.6 Python/3.10.8  
Date: Tue, 29 Oct 2024 17:42:02 GMT  
Content-type: text/html  
Content-Length: 9  
Last-Modified: Tue, 29 Oct 2024 15:12:26 GMT  
  
dNcSyBE4@@Cd@æ7(  
**dlNzyKE=j[@@]@d7æ1  
**hi there  
ldNjAzBE4@@Cd@æ@(  
**ddN'zBE4j\@@e@d@æ(  
**ddN;{BE4@@Cd@æA(  
**ddN{{BE4j]@@d@dAæ(

Both the request and response!

From capture8001.txt:

UFllQKfJE<@@<AmMi0  
UFUFldQsfBE4U^@@cAimN(  
UFUFdQ7EU_@@AimNv  
UFUFGET / HTTP/1.1  
Host: localhost:8001  
User-Agent: curl/7.88.1  
Accept: */*  
  
dQgBE4@@AmNi(  
UFUFd$QE@@AmNi  
UFUFHTTP/1.0 200 OK  
Server: SimpleHTTP/0.6 Python/3.10.8  
Date: Tue, 29 Oct 2024 15:17:37 GMT  
Content-type: text/html  
Content-Length: 9  
Last-Modified: Tue, 29 Oct 2024 15:12:26 GMT  
  
hi there  
$dQgBE4U`@@aAim(  
UFUFddQ<NhBE4Ua@@`Aim(  
UFUFddQhBE4@@Ami(  
UFUFddQ?hBE4Ub@@_Aim(

The request and response! (Again, in plaintext.)

And finally, from capture8002.txt:

dVllasJE<@@<B\KˏD|0  
dVdVlda  
      tBE4=@@\BD|K̀(  
dVdVdhaTGE9=@@\BD|K̀.  
dWdV~8  
     gg  
6=[>,0̨̩̪+/$(k#'g AȈ]'R:yG6f  
9       3=<5/u  
               localhost  
  
  
*(ttp/1.11  
  
+      -3&$ L]|[96ì.,vUwhdaFBE4a@@B\K̏D(  
dWdWd  a@@JB\K̏D  
`BT$Έ85V˹m-l˟!W$bT#n8j.k&fE>iN7/_5[v-ƑDN}F  
l1g  
n"wXjQ-g}OmLBE-DwD  
}(NŁ52yp~ծw\X2aCaPf3+C!8*C*-D~'\m/ZW'p!MLFPVT!yPVaŋѲ8   s  
QKM/dJQmt K,Aoz^0xeЯS_.ӗL1P)a5z.k<^~8R7cʘfOvv50AY5VP}-pCK       [@љć#0pFi=vݨD`  
                                     )E+wEJFL To70Gyb09,>  
                                                         {fK34ZF)xG1!JA  
O6%B_3edu\Q<8ے/^Ǖ&)'N̓Tz Au5D8BZX}                                       e֭O:c/:S{{ iy2LS:{WT'[JH(j*;>x6iev~G]~Eo[sOZZC]7Z4H=SBTeK<tVYj8J<%Ԝ!/Qh  
                                YM!yc4Х+X>FrjMSxTюɎ  
Ak=9F7gjQ$IB3_엺_L?ǫ+]ӿ櫯SUg.+0\KGsܔ0|[             .\Ay7       *~l`9dvvګgN)\`0@z.QfO/ß~JO#!l!aS&܀^OY?B<?\j9>?Зrx݉c~j3&mo@Tkd=;9o;%'{zeVTZo_]{f@]dn叟lCLen  
5'Rd.NxYvG<ϞƿT$\{#$m`Kή\2]|.AHh        .*  
                              `<bb``^  
HXXfXr[_.{qay778m       daКBE4=@@\BDTy(K5]Fʋ~Zr/zw%^zEn  
d_d_daE=@@z\BDTyx  
d_d_E\ymVy+*IA  NjՂ{UB4zt/I>cЅq΅:]zC|aE=@@e\BDTy  
d_d__l\G<>;Uu1vT$^FK)H}5bjݥL[s{]7N      $RW60daAE3a@@B\TyD5'  
d_d_RSL 8G`pV%zUm  
                \/z`O,W        4r8I?%M  
                                      Z:jri/nP֦NXM` g(\~)CZ3=+_}qm"6!er?z77jՂ = \G%tUz,?GP[=\"(уa|<}+`Կ<F\i:?\  
                                                                                                             U`&dda    AE3a@@B\UxD5'  
d_d_nm`WPKE/ݪO2c\q"5bNZ0'Gxy#!oߖG<^VOr,DqC^"VNuE8)nyUδƦtm;  
H`]x    C.apJN`Ԏ1!RG*%ȹHqG  
                         =+56cgӢ03:S*Ii2FQ;Άb 5$+eanddaT       BE4=@@\BD5Vw(  
d_d_d<a|       E  
               a@@B\VwD5  
T2p$$SMf/g+R{c#0Zj d(?p  
Qz*VcY;k`xhoNWIִQS|'6j#.3@ŢhbCIpuU[>}o\pjlqOۛ̏)+)"advS5lkhaW,<da    
BE4a@@B\WND5(****

My god, what an absolutely beautiful hot pile of steaming garbage. Everything a heart could desire. What gorgeous gobbledegook. Except.. hey now, It says localhost in plaintext! Indeed, it does. Everything in the request is encrypted except the request’s hostname. Traffic is encrypted because the internet is full of conniving individuals, but the messengers of your packets do need to know where to send them.

Q: Golly, you sure glossed over that --insecure flag, what was that all about?

A: Sure did! And might I say what an excellent question, astute reader who puts up with my antics! That flag tells curl “while the content may be encrypted, the certificate that won’t trace back to a root certificate authority, and that’s OK.” And boy, if you’re asking this kind of question you’ll probably be very interested in this next section: Service Identity Verification

Service Identity Verification

What was that about certs again?

Here’s another problem when it comes to safely passing traffic across the internet: If all the servers in the middle (OSI Layers 1-3, safely ignore for it for now) can send traffic to whoever they like, how do we know that th server responding to your request is the correct one?

And the answer is certificates. You can use DNS to direct traffic to any IP address, and anyone can make a key to encrypt the traffic, and someone somewhere that we all agree to trust will sign a certificate saying “that there’s the guy.” Running a public site without a signed certificate is about as professional as a 5 year old wearing a badge that says “Sheriff.” I’ll bet you are, kid.

So how does it work? There’s a wonderful protocol called ACME. This is not the organization which provides dynamite and anvils to reliably work for anyone besides Wiley Coyote. This is a protocol for Automatic Certificate Management Environment.

There’s a bunch of fancy words, so we’ll start with a dictionary:

challenge - verification of some claim, like “this is my website”
entity - person, e.g. you
identifier - this can be anything, but is typically a domain name, e.g. foo.example.com

There’s a lovely RFC describing the process, but at the core is a basic assumption:

[…] the entity must both:

Hold the private key of the account key pair used to respond to the challenge, and

Control the identifier in question

Which, using our fancy dictionary from before can be translated to “you both hold the key and control the domain.”

So, if anyone can create and sign a key, how do we know which signatures to trust? Well, baked into most operating systems is a list of trusted Root Certificate Authorities. With a clever use of chaining together signatures, you can verify if any certificate is trustworthy or not. That’s part of what the ACME protocol does. By somehow showing you control a domain, they (those blessed by the Root Certificate Authority) will sign your certificate.

Certificate signing looks like this:

request a challenge from a certificate authority
somehow fulfill that challenge
ask the CA to check whether you have fulfilled the challenge
the CA gives you a ~~gold star~~ certificate

With both a key and that certificate, you can serve encrypted traffic and it can be trusted.

Wrapping it up

Ingress sounds fancy, but it technically just means “the in door” or “action of entering.” While I’m typically in the camp of “use simple words instead of fancy ones to sounds smart,” the term Ingress in a technical settings holds very meaningful and relevant connotations. Connotations that shouldn’t be dropped by using different words.

From both a server and client perspective, you need to:

know how to find a service
securely connect to and interact with that service
know if a verifiable way that you can trust you’re talking to who you think you are

Each of these main concerns are deeply intertwined with each other. DNS alone isn’t enough. Neither is public key cryptography. Those two alongside trusted certificate authorities is what browsers have collectively settled on as good enough for today.