So, this popped into my twitter feed, and it explored an interesting concept. From start to finish what really happens when the user types "google.com" into their URL bar? On the surface for tech people we can say it's fairly simple talking about TLS and DNS but let's dig shall we? Let's go through every step of the OSI model (Or TCP / IP Model).
There's a lot of concepts to cover here, from the way that browsers render web pages all the way to the way the kernel level drivers handle input from a keyboard. From the basics of NATing an IP Address to the routing of packets with GSLB via BGP and all the little hops in between. We could talk at length about the various security measures, about the javascript and the TLS that goes into the PKI model for the certificate chain.
As the tweet makes it clear, I don't think anyone actually can walk you through from start to finish what happens on a deeply technical level, but I have to admit the challenge of going from the user hitting the letter G on the keyboard all the way to hitting enter and having the web page render is a sort of interesting thing to break down.
Ground Rules
For the sake of simplicity I'm going to lay out a few ground rules:
- The user is using a mechanical Keyboard
- The keyboard is connected to the computer via USB
- The computer is a Dell Optiplex 7010 with a USB Wireless adapter running Windows 10
- You have an Acer 24" LCD monitor connected via an HDMI Cable
- The user is using the Chrome Web browser on the latest build
- This is a home network and there is a commercial wireless router between the user and their ISP
- The user lives in Chattanooga, TN
How complex is this process? Well for every part of this blog post I'm basically going to have to consult someone who has more knowledge in the area then me.
The Keyboard
So let's start simple: With they keyboard
So at it's core a keyboard is basically a small computer. Most modern keyboards have their own little processor and RAM to help them accomplish their goal. While the exact method for checking which key was pushed differs between mechanical and membrane keyboards the effective method is basically the same. When you press a key a circuit is completed and the processor in the keyboard correlates what circuit was completed with the key mapping it has.
Then depending on how your keyboard is connected to the computer (USB / PS-2 / IR) that data is transmitted and interpreted by a driver on the operating system (Windows, Linux, Mac). That driver then interacts with the kernel. But wait let's dive on into that shall we?
In this case, the driver in question handling the input via USB devices is HID.dll. So what is a DLL anyway? DLL's are actually just a collection of functions, in fact a .dll could in theory in some cases act as it's own executable. But in this case it is just a gathering of various functions that executables call. HID.dll is sort of a catch all for any Human Interface Device (HID). HID.dll then interacts with HIDServ.dll.
So HidServ is our last step in what's called the "user" space. It then transitions into direct native API calls to ntdll.dll which puts us firmly into what's called the "Kernel" space of the operating system. The kernel in the broadest possible terms is the direct interface between the operating system which is in memory and the actual hardware of the device. This is what converts
println("Hello World")
which in turn becomes
mov rax, 1
mov rdi, 1
mov rsi, messagemov rdx, 13
syscall
mov rax, 60
xor rdi, rdi
syscall
section .data
db "Hello World", 10
Into the appropriate series of electrical impulses on the processor to actually perform that function. What's with the assembly and shoddy python you ask? Well we're going to get to that at some point but for now it's important to understand that your "g" has been passed to the kernel. Now the kernel knows where to hand this off too because the browser had been making use of it's own series of .dll's to collect information.
OS Event Hanndlers
Those .dll's are the System Event Handlers. Much like Javascript in a web browser may listen for click event, so to does anything that needs to take any sort of input. This is handled at a system level as an interrupt through what's called an Interrupt descriptor table (IDT). This is a newer approach that supersedes the previous approach of simply watching various Input / Output ports (I/O). Running processes will register a thread looking for interrupts (in our case input).
Monitor
This is then passed back out through the graphics card and out the HDMI cable attached to it to the monitor which has it's own processor and memory. It converts the data passed along the HDMI cable into something it can make sense of and then begins to manipulate a series of nematic liquid crystals placed between a blue, green, and red light source. By twisting the liquid crystal in the correct fashion the appropriate color is displayed on the screen.
Congratulations you have the letter g.
Chrome Precaching
But that certainly isn't the end of things, not even for just the letter G. You see chrome likes to keep itself busy, and even more so it prides itself on trying to predict what you're typing. In order to do this as you type a letter into that "Omnibar" as chrome calls it, the process will hook into the tcpip.sys file in order to start making TCP / IP calls.
TCP / IP Stack
Here's where we start getting into the networking portion of things. tcpip.sys is a windows native function that passes your request down through what's called a driver stack. In this case it passes your request to an NDIS miniport driver. Said driver then converts it into the appropriate series of electrical impulses to be passed along.
Wireless
As we are using wireless in this scenario we get to talk about all that is involved there. At it's core, wireless signals are a series of electromagnetic waves that pass through the air. The one's and zeros are instead the peaks and valleys of that electromagnetic wave. To keep things from getting to crazy that electromagnetic wave is limited to a certain set of frequency's (5.0ghz and 2.4ghz).
Inside of these frequency's there are these things called channels. You see, when they say 2.4ghz, what they actually mean is 2412mhz all the way to 2484mhz. In that range there are 12 different channels to make use of (the first channel is 2401 - 2423mhz). Various implementations of wireless determine how many of these channels (or which frequency) is used.
Your wireless card will generate these electromagnetic waves, broadcast them into the air and then await a response. The wireless router in your house is doing the exact same thing, waiting for an electromagnetic wave and then it will do the appropriate actions to send along a response. Now to make sure it's your computer and not some importer, when your computer and the wireless router first talked they made use of what's called a Pre Shared Key (PSK).
WPA2
The point of this was to use that Pre Shared Key in combination with something called Counter Mode Cipher Block Chaining Message Authentication Code Protocol (CCMP) to come up with an encryption key (In theory, at this time CCMP has proven vulnerable to a few things and as such WPA2 is being depreciated). That encryption key is then used to generate the messages in AES256 (an encryption standard). As both the computer and the router know the Preshared key, the method being used and the encryption protocols involved they can thus pass the message back and forth securely (Again in theory, these days that's actually possible to break).
Frames
So your router gets this "frame", which has a few things. First the MAC Address of the device that scent it. This is an address tied to the physical network card on your computer that uniquely identifies it (there are such things as MAC collisions but for the sake of simplicity lets ignore that). A mac address is in the format of 6 sets of Hexadecimal numbers.
The second part of the frame is the destination MAC address. This is in the even that say your computer just wanted to send the frame to another machine on the network the Wireless router would know to just send it there instead of looking at the packet itself. As; however, the router is the destination MAC it needs to look at the next set of things in the frame. In this case it's called a packet
Packets
Packets have as you can see more then a few key items. How long the packet is in terms of bytes, what the TTL (time to live) is, which tells the routers along the path how many hops the packet is willing to make before it's best to just give up (great for detecting loops by the way). The source and destination IP address which is a series of 4 "octets" of binary data to help uniquely identify what network this packet should travel to and then a source and destination port.
That source port is usually something random on the host machine, called an "Ephemeral" port, it's chosen by the host when it 'establishes the connection'. The destination port; however, is usually chosen based on a set of standards for the different types of message you're trying to send. As we are trying to send over HTTPS we're going to use port 443.
DNS
You may be asking yourself "But wait, how does it know the destination IP address?", and that is a good question. So before you even typed in that G, chrome had it preconfigured where it wanted this data to go. In this case something called 1e100.net
This is where google will send those request to figure out and try and predetermine what you might be trying to type. But yx-in-f105.1e100.net is NOT an IP Address, it's a domain name. So how in the world do we turn that into an IP Address? Well that's where this magical protocol called DNS kicks in.
You see, back when you first turned on chrome, one of the first "DNS" request it made was for this 1e100.net and more specificaly that yx-in-f105.1e100.net. Now, to help keep things simple the idea behind how this DNS request is made applies in very much the same manner as how the HTTPS request is made with a few minor exceptions near the end, but the basics of it are pretty straightforward.
Like our letter G, it's gone through the Wireless card and been caught by the router, the router looks at the frame, realizes it is for it and then digs into the packet. Inside the packet the Router will see an address for either itself or a predetermined DNS server (like 8.8.8.8 or 1.1.1.1). For simplicity sake, let's say that the request today is going to 8.8.8.8.
Switching vs Routing
So Your router / modem / switch / all in one generally only has one "Gateway". The gateway in the broadest possible terms is where your router will send every single packet unless there is an address matching something on it's existing network. So if your computers IP is 192.168.100.5 and it gets a packet saying that it needs to that address it will be clever enough to send it to your computer; however, if it doesn't see the IP address on it's network (which it keeps in something called an ARP table, which keeps a record of MAC addresses to IP address and it refreshes with an ARP broadcast every few minutes which all computers respond to on whats called a broadcast address), then it will merrily forward the packet onto the gateway.
9 times out of 10 your gateway is your ISP. Now most of these are what we call "Tier 3" ISP's or last mile ISP's. They probably don't have the web server that you're looking for, but using something called "routing" they can get it where it needs to go. But wait you ask again, my friend's computer also has an IP address of 192.168.100.5 how does my ISP tell us apart?
NAT
Well that's something your router does called Network Address Translation. Basically your home network is using something called a "Private" network. They are a series of IP addresses which are reserved for networks that should never been seen on the wider internet. We knew a long time ago that 4 sets of octets actually wasn't a lot of addresses, and so we started to NAT big networks behind a single address so that they could use that private address for everything inside.
It does this by making up it's own little ephemeral port and whatever comes back to that ephemeral port it keeps in memory as actually being meant for your computer. That way your ISP only sees your routers address and not your computers.
You can see in the image above that it bounces around a bit before going where it needs to go. Most of these are using what's called "static routes". Basically just like your router any address they don't know just goes to a "gateway" further up the chain. But once we get to that 6th hop there, that's where things start to change. That's when we get into the land of "BGP" routing.
BGP Routing
BGP routing is where routers around the world are given something called an "ASN" or autonomous systems number. These ASN's have a list of IP's that they can connect to (or are directly responsible for). Using a "BGP Advertisement" these routers exchange what networks they handle every so often and the various routers keep a BGP Table in their own memory
And so in our case, even though these routers may have many different connections, they keep in memory which of those connections can get them to 8.8.8.8. So the packet continues to travel until it finally hits 8.8.8.8 which is a "DNS Server". The DNS server will see the request "yx-in-f105.1e100.net" and go "Oh yes I keep that in memory it's address is: 64.233.177.105".
TLD Domain Name Servers
Now if the DNS server doesn't actually know where this is, it will go and tap the "TLD" server for .net, this is a known address it keeps which manages ANY Domain that ends in .net. The .net TLD server will respond that ns1.google.com is the name server for 1e100.net and so it will either known the IP address of ns1.google.com or simply ask the .com TLD server where google.com is to ask it where the subdomain is at.
For the sake of keeping things quick, everything we've talked about so far has a chance of being cached by any of the involved systems. That basically just means it stores the result in memory so if someone comes back later to ask it can quickly grab it.
So the packet travels back, flipping the source and destination address around and meanders the same way it go there back to your router. The router sees the destination port as the same ephemeral port it opened and forwards it back to your computer.
TLS Encryption for HTTPSThat letter G's path through life is exceedingly similar with a few minor differences. The first is that before it connects to 1e100.net it first ask to make what's called a TLS handshake. This is so that if anyone along that path happens to be listening in they will see nothing more then encryption. The TLS handshake is a bit of asymmetric encryption that uses "Diffie-Hellman" in order to generate the public key on either side.
To keep things exceedingly simple Diffie-Helman uses some math around prime numbers in order to come up with the private key. The Prime number is what's agreed upon in that first message and because both computers know that they can effectively decrypt the rest. As Diffie-Helman is actually symetric encryption, this is when your computer shares it's public key for the real TLS handshake.
See your computer didn't want the rest of the world to know what it's public key was, only this server. So it spins up a new private / public key pair just for this little session. The receiving web server gets this information and then provides its public key. Again, in order to really dig into the meet on this one we'd need to understand how the public / private key pair works, but for simplicity sake we just need to know that the first step of the handshake is to come up with that pre shared key via Diffie Helman and then begin the asymmetric key exchange so they can keep things private.
Now that it's all encrypted and secured the information of possible things you want to type is passed back to your computer, interpreted by chrome and presented to the user.
Alright, so NOW you select google.com from the list presented to you. We're going to do exactly the same thing of calling that tcpip.sys file which calls the NDIS wrapper which in turn passes the information to a miniport driver which wraps up the information as a frame and then transmits it via an electromagnetic wave for your router to listen for. Then your router unwraps the frame sees the destination IP for google.com which it got after making a DNS request just seconds before from 8.8.8.8). The Diffie Helman handshake occurs and then an asymmetric encryption is performed by both parties to keep the information encrypted using a public / private key pair.
THENNNNN google.com sends back this:
HTML and Javascript
Alright... that's not google.com, that's a bunch of code. Well again, everything behind the scenes has to occur. To you and me that's nothing but a large "blah" of text, but to a browser it's a combination of HTML, Javascript and CSS. Each of those get interpreted in their own way in order to bring the full website to life.
When google.com is called, your computer gets a WHOLE NEW list of things to look up, and so with each the very same thing is performed where a DNS request, HTML request, etc. So your computer is gathering all this information and then chrome translates that into something that's actually viewable.
If you're really curious on how google chrome turns <title>Google</title> into something like the title of the tab you can take a peek at the webkit rendering engine. The source code is actually technically open and publicly viewable (only a few minor things specific to chrome differentiate it from Chromium or Webkit).
Then as with the G, it calls on that vulcan.dll to render things properly and then passes that along to your LCD monitor
Things not covered
And this is not diving into even really half of what's going on there. We could talk about how memory is allocated, the underlying Operating systems for routers [it's usually a custom derivative of BSD]. We could talk about the insanity that is involved in Global Load balancing a DNS entry or Webserver so that your computer doesn't have to make as many hops.
We could get into the switch vs router infrastructure or the way the ARP table is built, maintained and how that is useful for all the devices on the netework. We could talk about DNSSEC and how that will eventually be implemented to ensure that the same kind of TLS handshake that occured with the HTTPs request occured with your DNS request as well
We could talk about the C++ code that Chrome Browser is written in, how one uses a compiler to turn a .h file into just one part of the greater whole. We could spend hours getting into the registry keys that chrome reads or changes as it's going about its day to day life, or the hundreds of other .dll files that involved in making it function.
There are entire videos on just the segment about the pixels in your LCD monitor and why those work the way that they do. I have barely scratched the surface of the complication and the more I write the more I realize how accurate that tweet is. Even DNS on it's own has massive amounts of documentation and paperwork associated with it.
But I hope this gives you at least a surface level idea of why things are so complicated, and perhaps why "the internet doesn't work" is an incredibly vague statement because there's a lot more to it then most people realize
And I haven't covered the server side of this at all! There's so much more to that. For another good write up that digs a lot more into specific listener events check out
this
Comments
Post a Comment