The Ingress Venn
This is intended to give a brief an overview of how to serve and consume traffic safely. Most of these components are abstracted away from you, so you’ll seldom get the full picture without a deep dive here or there.
Components of Ingress
On the public internet, there are three main problems when getting traffic to your site or service. They are a) figuring out where on the internet that service is, b) ensuring traffic to/from the service was not tampered with, and c) verifying that you’re actually talking to the correct service. Otherwise put:
- Discovering and reaching a service
- Establishing a secure connection to the service
- Service identity verification
Discovering and reaching a service
How do we know where our service lives?
There are a few solutions to this, a pretty common one is DNS. Domain Name Servers form a distributed key value store which maps domain names to IP addresses. There are different kinds of records, but loosely speaking they all are to accomplish this same goal. dig is a command line utility for DNS lookup, here it used to show which IP addresses we should expect to find the service dns.google.com.
$ dig dns.google.com
; <<>> DiG 9.18.28-1~deb12u2-Debian <<>> dns.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3525
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;dns.google.com. IN A
;; ANSWER SECTION:
dns.google.com. 647 IN A 8.8.4.4
dns.google.com. 647 IN A 8.8.8.8
;; Query time: 12 msec
;; SERVER: 192.168.1.1#53(192.168.1.1) (UDP)
;; WHEN: Tue Oct 29 08:39:56 MDT 2024
;; MSG SIZE rcvd: 75
Here you can see that dns.google.com resolves to both 8.8.8.8 and 8.8.4.4.
DNS will propagate information to each other. Each domain name registrar has a unique way of configuring records. It is bespoke and inane, but at least consuming them meets a spec.
Establishing a secure connection to the service
Great, so we know where the service lives. How do we connect to it securely? The internet is a dangerous place. With data moving about all willy-nilly, anyone can and inspect or (heaven forbid) change the data in your requests and responses (gasp!).
Side Quest: understanding HTTP and TCP
So there’s this thing called the OSI model. I’ll let you go read about it, or not. The two layers we’re primarily concerned with are Application and Transport. HTTP requests are made at the Application Layer, but moving the data happens at the Transport Layer.
Any time there’s something that feels Computer Science-y called a “model” then you can assuredly trust that it is closer to documentation than compiled code, and therefore less accurate as the Real Things You Interact With. Trust the landscape, not the map.
Requests made to http://example.com and https://example.com are both over HTTP, but are very different. That s is doing a lot of work and we need to understand where. While theoretical lectures are great, I prefer to write some programs to verify what’s happening as we explore it.
For this side-quest two commands curl (curl) and netcat (nc) will be our friends. First let’s use the python builtin module to host a directory as an HTTP server (echo 'hi there' > index.html && python -m http.server). This should be available on localhost:8000.
# first curl to see it works
$ curl localhost:8000
hi there
# second curl with the -i flag to get headers as well
$ curl -i localhost:8000
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.10.8
Date: Tue, 29 Oct 2024 15:15:51 GMT
Content-type: text/html
Content-Length: 9
Last-Modified: Tue, 29 Oct 2024 15:12:26 GMT
hi there
That right there is a valid HTTP Response. Curl sent a request, and this output shows both the HTTP Response Headers and Response Body, separated by two newline characters. Here’s a pretty picture from wikipedia.
Let’s tuck that in our back pocket and try to make use of it later.
# third curl with the -i flag to get headers as well, we want the entire response
$ curl -i localhost:8000 > valid-http.txt
So, if the Transport Layer just reads and writes bits and bytes, and we have all of the bits and bytes which satisfy a valid Application Layer response. Can we just run a TCP server which sends the bytes of that file across the wire each time a request comes in? Let’s try with nc (netcat).
# terminal a
while true; do
nc -l 8001 < response.txt
done
# terminal b
$ curl localhost:8001
hi there
$ curl -i localhost:8001
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.10.8
Date: Tue, 29 Oct 2024 15:17:37 GMT
Content-type: text/html
Content-Length: 9
Last-Modified: Tue, 29 Oct 2024 15:12:26 GMT
hi there
Sure can!
Mini recap: we made an HTTP server, served a file, saved the entire response (including http response headers) in file, then made a TCP server serve that same response over TCP. And curl bought it! The old switcheroo.
Great, so what the hell am I going on about?
This tells me that while the Application Layer can be as picky as it wants, the Transport Layer doesn’t care what data it moves, just that it is it’s job to move it. You can see timestamps are from the original response; the tcp server doesn’t care, it just sends and receives data.
If it is the case that the Transport Layer is Application Layer agnostic, it would make a ton more sense to encrypt traffic at the Transport Layer. It could work for any Application Layer! At least that’s what I would do.
Whew, with our side quest over, let’s return to the main quest: actually encrypting traffic.
Main Quest: actually encrypting traffic
One last friend (openssl) is here to help. It can create both create keys and certificates! Plus, there’s a nc equivalent too, which helps serve and test the encrypted traffic using keys we just generated. Perfect!
# make a key
$ openssl req -x509 -newkey rsa:4096 -keyout server.key -out server.crt -days 365 -nodes -subj '/CN=localhost'
# serve encrypted traffic
$ while true; do
openssl s_server -accept 8002 -cert server.crt -key server.key -quiet < valid-http.txt
done
And we can finally hit our local service with an https prefix instead of ahttp
$ curl --insecure https://localhost:8002
hi there
If it feels like we’re in the weeds it is because we are. But the important thing is that:
- the example servers encrypts at the transport layer, TCP (layer 4)
- the example clients need an HTTP server
Some form of security can occur at one layer (encryption) while the other is largely unaffected.
Time to play with wireshark, specifically it’s terminal counterpart tshark. We’re going to capture a single request to each of the services we now have running:
http://localhost:8000a http server (viapython)http://localhost:8001a tcp server (vianc)https://localhost:8002a tls server (viaopenssl)
The commands I’ll run are (where X is 0,1,2)
curl http://localhost:8000curl http://localhost:8001curl --insecure https://localhost:8002
Before each, in another terminal (where X is 0,1,2, respectively):
tshark -i lo -f "port 800X" -d "tcp.port==800X,tls" -x -w capture800X.pcap
This tshark command captures all network traffic for that port and stuffs it in a file. So what’s in the file!?
Here are excerpts from each:
From capture8000.txt
*llNgblJE<@@<@d~æL0
**ldNlBE4@@Cd@æL߀(
**dNlE@@Cd@æL߀v
**GET / HTTP/1.1
Host: localhost:8000
User-Agent: curl/7.88.1
Accept: */*
dN1!mBE4jY@@h@d(
**dN5EjZ@@ѯ@d
**HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.10.8
Date: Tue, 29 Oct 2024 17:42:02 GMT
Content-type: text/html
Content-Length: 9
Last-Modified: Tue, 29 Oct 2024 15:12:26 GMT
dNcSyBE4@@Cd@æ7(
**dlNzyKE=j[@@]@d7æ1
**hi there
ldNjAzBE4@@Cd@æ@(
**ddN'zBE4j\@@e@d@æ(
**ddN;{BE4@@Cd@æA(
**ddN{{BE4j]@@d@dAæ(
Both the request and response!
From capture8001.txt:
UFllQKfJE<@@<AmMi0
UFUFldQsfBE4U^@@cAimN(
UFUFdQ7EU_@@AimNv
UFUFGET / HTTP/1.1
Host: localhost:8001
User-Agent: curl/7.88.1
Accept: */*
dQgBE4@@AmNi(
UFUFd$QE@@AmNi
UFUFHTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.10.8
Date: Tue, 29 Oct 2024 15:17:37 GMT
Content-type: text/html
Content-Length: 9
Last-Modified: Tue, 29 Oct 2024 15:12:26 GMT
hi there
$dQgBE4U`@@aAim(
UFUFddQ<NhBE4Ua@@`Aim(
UFUFddQhBE4@@Ami(
UFUFddQ?hBE4Ub@@_Aim(
The request and response! (Again, in plaintext.)
And finally, from capture8002.txt:
dVllasJE<@@<B\KˏD|0
dVdVlda
tBE4=@@\BD|K̀(
dVdVdhaTGE9=@@\BD|K̀.
dWdV~8
gg
6=[>,0̨̩̪+/$(k#'g AȈ]'R:yG6f
9 3=<5/u
localhost
*(ttp/1.11
+ -3&$ L]|[96ì.,vUwhdaFBE4a@@B\K̏D(
dWdWd a@@JB\K̏D
`BT$Έ85V˹m-l˟!W$bT#n8j.k&fE>iN7/_5[v-ƑDN}F
l1g
n"wXjQ-g}OmLBE-DwD
}(NŁ52yp~ծw\X2aCaPf3+C!8*C*-D~'\m/ZW'p!MLFPVT!yPVaŋѲ8 s
QKM/dJQmt K,Aoz^0xeЯS_.ӗL1P)a5z.k<^~8R7cʘfOvv50AY5VP}-pCK [@љć#0pFi=vݨD`
)E+wEJFL To70Gyb09,>
{fK34ZF)xG1!JA
O6%B_3edu\Q<8ے/^Ǖ&)'N̓Tz Au5D8BZX} e֭O:c/:S{{ iy2LS:{WT'[JH(j*;>x6iev~G]~Eo[sOZZC]7Z4H=SBTeK<tVYj8J<%Ԝ!/Qh
YM!yc4Х+X>FrjMSxTюɎ
Ak=9F7gjQ$IB3_엺_L?ǫ+]ӿ櫯SUg.+0\KGsܔ0|[ .\Ay7 *~l`9dvvګgN)\`0@z.QfO/ß~JO#!l!aS&܀^OY?B<?\j9>?Зrx݉c~j3&mo@Tkd=;9o;%'{zeVTZo_]{f@]dn叟lCLen
5'Rd.NxYvG<ϞƿT$\{#$m`Kή\2]|.AHh .*
`<bb``^
HXXfXr[_.{qay778m daКBE4=@@\BDTy(K5]Fʋ~Zr/zw%^zEn
d_d_daE=@@z\BDTyx
d_d_E\ymVy+*IA NjՂ{UB4zt/I>cЅq΅:]zC|aE=@@e\BDTy
d_d__l\G<>;Uu1vT$^FK)H}5bjݥL[s{]7N $RW60daAE3a@@B\TyD5'
d_d_RSL 8G`pV%zUm
\/z`O,W 4r8I?%M
Z:jri/nP֦NXM` g(\~)CZ3=+_}qm"6!er?z77jՂ = \G%tUz,?GP[=\"(уa|<}+`Կ<F\i:?\
U`&dda AE3a@@B\UxD5'
d_d_nm`WPKE/ݪO2c\q"5bNZ0'Gxy#!oߖG<^VOr,DqC^"VNuE8)nyUδƦtm;
H`]x C.apJN`Ԏ1!RG*%ȹHqG
=+56cgӢ03:S*Ii2FQ;Άb 5$+eanddaT BE4=@@\BD5Vw(
d_d_d<a| E
a@@B\VwD5
T2p$$SMf/g+R{c#0Zj d(?p
Qz*VcY;k`xhoNWIִQS|'6j#.3@ŢhbCIpuU[>}o\pjlqOۛ̏)+)"advS5lkhaW,<da
BE4a@@B\WND5(****
My god, what an absolutely beautiful hot pile of steaming garbage. Everything a heart could desire. What gorgeous gobbledegook. Except.. hey now, It says localhost in plaintext! Indeed, it does. Everything in the request is encrypted except the request’s hostname. Traffic is encrypted because the internet is full of conniving individuals, but the messengers of your packets do need to know where to send them.
Q: Golly, you sure glossed over that --insecure flag, what was that all about?
A: Sure did! And might I say what an excellent question, astute reader who puts up with my antics! That flag tells curl “while the content may be encrypted, the certificate that won’t trace back to a root certificate authority, and that’s OK.” And boy, if you’re asking this kind of question you’ll probably be very interested in this next section: Service Identity Verification
Service Identity Verification
What was that about certs again?
Here’s another problem when it comes to safely passing traffic across the internet: If all the servers in the middle (OSI Layers 1-3, safely ignore for it for now) can send traffic to whoever they like, how do we know that th server responding to your request is the correct one?
And the answer is certificates. You can use DNS to direct traffic to any IP address, and anyone can make a key to encrypt the traffic, and someone somewhere that we all agree to trust will sign a certificate saying “that there’s the guy.” Running a public site without a signed certificate is about as professional as a 5 year old wearing a badge that says “Sheriff.” I’ll bet you are, kid.
So how does it work? There’s a wonderful protocol called ACME. This is not the organization which provides dynamite and anvils to reliably work for anyone besides Wiley Coyote. This is a protocol for Automatic Certificate Management Environment.
There’s a bunch of fancy words, so we’ll start with a dictionary:
- challenge - verification of some claim, like “this is my website”
- entity - person, e.g. you
- identifier - this can be anything, but is typically a domain name, e.g. foo.example.com
There’s a lovely RFC describing the process, but at the core is a basic assumption:
[…] the entity must both:
- Hold the private key of the account key pair used to respond to the challenge, and
- Control the identifier in question
Which, using our fancy dictionary from before can be translated to “you both hold the key and control the domain.”
So, if anyone can create and sign a key, how do we know which signatures to trust? Well, baked into most operating systems is a list of trusted Root Certificate Authorities. With a clever use of chaining together signatures, you can verify if any certificate is trustworthy or not. That’s part of what the ACME protocol does. By somehow showing you control a domain, they (those blessed by the Root Certificate Authority) will sign your certificate.
Certificate signing looks like this:
- request a challenge from a certificate authority
- somehow fulfill that challenge
- ask the CA to check whether you have fulfilled the challenge
- the CA gives you a
gold starcertificate
With both a key and that certificate, you can serve encrypted traffic and it can be trusted.
Wrapping it up
Ingress sounds fancy, but it technically just means “the in door” or “action of entering.” While I’m typically in the camp of “use simple words instead of fancy ones to sounds smart,” the term Ingress in a technical settings holds very meaningful and relevant connotations. Connotations that shouldn’t be dropped by using different words.
From both a server and client perspective, you need to:
- know how to find a service
- securely connect to and interact with that service
- know if a verifiable way that you can trust you’re talking to who you think you are
Each of these main concerns are deeply intertwined with each other. DNS alone isn’t enough. Neither is public key cryptography. Those two alongside trusted certificate authorities is what browsers have collectively settled on as good enough for today.