How complex is the modern internet anyway?
So, this popped into my twitter feed, and it explored an interesting concept. From start to finish what really happens when the user types "google.com" into their URL bar? On the surface for tech people we can say it's fairly simple talking about TLS and DNS but let's dig shall we? Let's go through every step of the OSI model (Or TCP / IP Model).
As the tweet makes it clear, I don't think anyone actually can walk you through from start to finish what happens on a deeply technical level, but I have to admit the challenge of going from the user hitting the letter G on the keyboard all the way to hitting enter and having the web page render is a sort of interesting thing to break down.
For the sake of simplicity I'm going to lay out a few ground rules:
- The user is using a mechanical Keyboard
- The keyboard is connected to the computer via USB
- The computer is a Dell Optiplex 7010 with a USB Wireless adapter running Windows 10
- You have an Acer 24" LCD monitor connected via an HDMI Cable
- The user is using the Chrome Web browser on the latest build
- This is a home network and there is a commercial wireless router between the user and their ISP
- The user lives in Chattanooga, TN
So let's start simple: With they keyboard
So at it's core a keyboard is basically a small computer. Most modern keyboards have their own little processor and RAM to help them accomplish their goal. While the exact method for checking which key was pushed differs between mechanical and membrane keyboards the effective method is basically the same. When you press a key a circuit is completed and the processor in the keyboard correlates what circuit was completed with the key mapping it has.
Then depending on how your keyboard is connected to the computer (USB / PS-2 / IR) that data is transmitted and interpreted by a driver on the operating system (Windows, Linux, Mac). That driver then interacts with the kernel. But wait let's dive on into that shall we?
In this case, the driver in question handling the input via USB devices is HID.dll. So what is a DLL anyway? DLL's are actually just a collection of functions, in fact a .dll could in theory in some cases act as it's own executable. But in this case it is just a gathering of various functions that executables call. HID.dll is sort of a catch all for any Human Interface Device (HID). HID.dll then interacts with HIDServ.dll.
So HidServ is our last step in what's called the "user" space. It then transitions into direct native API calls to ntdll.dll which puts us firmly into what's called the "Kernel" space of the operating system. The kernel in the broadest possible terms is the direct interface between the operating system which is in memory and the actual hardware of the device. This is what converts
which in turn becomes
mov rax, 1
mov rdi, 1
mov rsi, messagemov rdx, 13syscallmov rax, 60xor rdi, rdisyscallsection .datadb "Hello World", 10
Into the appropriate series of electrical impulses on the processor to actually perform that function. What's with the assembly and shoddy python you ask? Well we're going to get to that at some point but for now it's important to understand that your "g" has been passed to the kernel. Now the kernel knows where to hand this off too because the browser had been making use of it's own series of .dll's to collect information.
OS Event Hanndlers
This is then passed back out through the graphics card and out the HDMI cable attached to it to the monitor which has it's own processor and memory. It converts the data passed along the HDMI cable into something it can make sense of and then begins to manipulate a series of nematic liquid crystals placed between a blue, green, and red light source. By twisting the liquid crystal in the correct fashion the appropriate color is displayed on the screen.
Congratulations you have the letter g.
But that certainly isn't the end of things, not even for just the letter G. You see chrome likes to keep itself busy, and even more so it prides itself on trying to predict what you're typing. In order to do this as you type a letter into that "Omnibar" as chrome calls it, the process will hook into the tcpip.sys file in order to start making TCP / IP calls.
TCP / IP Stack
Here's where we start getting into the networking portion of things. tcpip.sys is a windows native function that passes your request down through what's called a driver stack. In this case it passes your request to an NDIS miniport driver. Said driver then converts it into the appropriate series of electrical impulses to be passed along.
Inside of these frequency's there are these things called channels. You see, when they say 2.4ghz, what they actually mean is 2412mhz all the way to 2484mhz. In that range there are 12 different channels to make use of (the first channel is 2401 - 2423mhz). Various implementations of wireless determine how many of these channels (or which frequency) is used.
That letter G's path through life is exceedingly similar with a few minor differences. The first is that before it connects to 1e100.net it first ask to make what's called a TLS handshake. This is so that if anyone along that path happens to be listening in they will see nothing more then encryption. The TLS handshake is a bit of asymmetric encryption that uses "Diffie-Hellman" in order to generate the public key on either side.
To keep things exceedingly simple Diffie-Helman uses some math around prime numbers in order to come up with the private key. The Prime number is what's agreed upon in that first message and because both computers know that they can effectively decrypt the rest. As Diffie-Helman is actually symetric encryption, this is when your computer shares it's public key for the real TLS handshake.
See your computer didn't want the rest of the world to know what it's public key was, only this server. So it spins up a new private / public key pair just for this little session. The receiving web server gets this information and then provides its public key. Again, in order to really dig into the meet on this one we'd need to understand how the public / private key pair works, but for simplicity sake we just need to know that the first step of the handshake is to come up with that pre shared key via Diffie Helman and then begin the asymmetric key exchange so they can keep things private.
Now that it's all encrypted and secured the information of possible things you want to type is passed back to your computer, interpreted by chrome and presented to the user.
Alright, so NOW you select google.com from the list presented to you. We're going to do exactly the same thing of calling that tcpip.sys file which calls the NDIS wrapper which in turn passes the information to a miniport driver which wraps up the information as a frame and then transmits it via an electromagnetic wave for your router to listen for. Then your router unwraps the frame sees the destination IP for google.com which it got after making a DNS request just seconds before from 126.96.36.199). The Diffie Helman handshake occurs and then an asymmetric encryption is performed by both parties to keep the information encrypted using a public / private key pair.
THENNNNN google.com sends back this:
When google.com is called, your computer gets a WHOLE NEW list of things to look up, and so with each the very same thing is performed where a DNS request, HTML request, etc. So your computer is gathering all this information and then chrome translates that into something that's actually viewable.
And I haven't covered the server side of this at all! There's so much more to that. For another good write up that digs a lot more into specific listener events check out this