What happens when you type a URL in the browser and press enter
Today I will be explaining a commonly used question technical job interview, Whether you are applying for an engineer, developer, marketer, or DevOps.
It is always a good idea to have a good understanding of is going on behind the scenes when you type and URL on your web browser and how information is transferred via the Internet to other computers.
For this article, we are going to to use the following URL https://www.holbertonschool.com, as our example.
What happen when you type an url?
The best way to understand what happen when you type “www.holbertonschool.com” is with an infogram, as is shown in the folowing:
After typing the URL https://www.holbertonschool.com into the address bar of your browser, your browser checks the cache for DNS record to find the corresponding IP address of the URL; for those who are unfamiliar with DNS (Domain Name System) is the phone book of the internet, so normally once you type the URL the web browser interact through Internet Protocol IP addresses, and the DNS translates domain names to IP addresses so the browser can load the URL web page resources, as each device connected to the internet has a unique IP address.
DNS servers eliminate the need for humans to memorize IP address such 192.168.1.1 and instead to just memorize names such https://www.holbertonschool.com, but you can still access the website by also typing the IP address.
To find the DNS record, the browser checks four caches.
● First, it checks the browser cache. The browser maintains a repository of DNS records for a fixed duration for websites you have previously visited. So, it is the first place to run a DNS query.
● Second, the browser checks the OS cache. If it is not in the browser cache, the browser will make a system call (i.e., gethostname on MacOS) to your underlying computer OS to fetch the record since the OS also maintains a cache of DNS records.
● Third, it checks the router cache. If it’s not on your computer, the browser will communicate with the router that maintains its’ own cache of DNS records.
● Fourth, it checks the ISP cache. If all steps fail, the browser will move on to the ISP. Your ISP maintains its’ own DNS server, which includes a cache of DNS records, which the browser would check with the last hope of finding your requested URL.
Caches are essential for regulating network traffic and improving data transfer times.
If the requested URL is not on the cache of the ISP, DNS server initiates a DNS query to find the IP address to the server. that hosts https://www.holbertonschool.com, the main goal of the DNS query is to search multiple DNS servers over the Internet until it finds the right IP address of the website, this type of search is called a recursive search since will repeatedly continue from a DNS server of a DNS server until either it finds the IP address or an Error response.
Let´s continue with the process flow after the right IP address is found, a connection is made using an Internet Protocol, for this case scenario I´m going to focus the most commonly used which is TCP (Transmission Control Protocol), once the browser initiates a TCP connection with the server that matches that IP it will next transfer data packets between the computer (client) and the server.
To transfer data packets between a client and the server, a TCP/IP three-way-handshake connection needs to be established, this a three steps process where the client and the server exchange SYN(synchronize) and ACK(acknowledge) messages to communicate to establish a connection.
1) The client machine sends the SYN flag packet to the server over the internet asking if they are ports open for connection.
2) If the server has open ports that can accept and initiate new connections, it will respond back with the ACK flag packet, an Acknokledegment with the SYN packet received, SYN/ACK packet
3) The client will receive the SYN/ACK, packet from the server and will acknowledge it by sending an ACK flag packet to the server.
After all this, the TCP connection is completed and established for data transmission.
TCP 3-Way-Handshake-Connection:
Usually, a firewall is configured before the connection is made between the client and the server, to prevent unauthorized access to or from the private network, normally firewall is implemented in either hardware or software or a combination of both, so for this case scenario when the URL directs to a website, the port 80 is usually is ready and open to establishing the connections between client and server, so the TCP/IP connection can be established through that port 80 which connect usually an HTTP request.
Once the TCP connection is established, it is time to start transferring data! The browser will send a GET request asking for www.holbertonschool.com. If you’re entering credentials or submitting a form, this could be a POST request. This request will also contain additional information such as browser identification (User-Agent header), types of requests that it will accept (Accept header), and connection headers asking it to keep the TCP connection alive for additional requests. It will also pass information taken from cookies the browser has in store for this domain.
GET request (Headers are highlighted) sample:
Then, later the server handles the request and sends back a response, usually the server is configured with Apache that receives the request from the browser and passes it to request handler to read and generate a response. The request handler is a program that reads the request, its’ headers, and cookies to check what is being requested and also update the information on the server if needed. Then it will assemble a response in a particular format.
After all that. browsers display the HTML content from the webserver, so usually, the HTML code is displayed in different phases, it first checks the HTML tags and sends out GET request for the additional elements on the web page, such images, CSS stylesheets, JavaScript files and so on, usually, all those static files are cached in your browser, so it doesn´t have to fetch them again the next time you visit the URL web page.
All this process happens in a matter of milliseconds and despite it seems to be a long process, this usually happens without even noticed.
There is another protocol where which is commonly used today and is HTTPS (Hypertext Transfer Protocol Secure), this used for secure communication over a computer network and widely used on the Internet. This communication protocol is encrypted using TLS (Transport Layer Security) or formerly. SSL (Secure Socket Layer).
The main purpose of the HTTPS protocol is to provide authentication of the accessed website and protection of the privacy and integrity of the exchanged data while on transit, it also helps to protect against man in the middle attacks, eavesdropping and tampering, and usually, the communication between the server and client is made on the port 443 by default.
HTTPS connection sample:
Load Balancer:
Refers to efficiently distributing incoming network traffic across a group of backend servers, also known as a server farm or server pool.
Modern high‑traffic websites must serve hundreds of thousands, if not millions, of concurrent requests from users or clients and return the correct text, images, video, or application data, all in a fast and reliable manner. To cost‑effectively scale to meet these high volumes, modern computing best practice generally requires adding more servers.
A load balancer acts as the “traffic cop” sitting in front of your servers and routing client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked, which could degrade performance. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts to send requests to it.
In this manner, a load balancer performs the following functions:
- Distributes client requests or network load efficiently across multiple servers
- Ensures high availability and reliability by sending requests only to servers that are online
- Provides the flexibility to add or subtract servers as demand dictates
Load Balancer Diagram:
Web Server:
A web server is a software and hardware that uses HTTP (Hypertext Transfer Protocol) and other protocols to respond to client requests made over the World Wide Web. The main job of a web server is to display website content through storing, processing and delivering webpages to users. Besides HTTP, web servers also support SMTP (Simple Mail Transfer Protocol) and FTP (File Transfer Protocol), used for email, file transfer and storage.
Web server hardware is connected to the internet and allows data to be exchanged with other connected devices, while web server software controls how a user accesses hosted files. The web server process is an example of the client/server model. All computers that host websites must have web server software.
Application Server:
An application server is a server specifically designed to run applications. The “server” includes both the hardware and software that provide an environment for programs to run.
Application servers are used for many purposes. Several examples are listed below:
- running web applications
- hosting a hypervisor that manages virtual machines
- distributing and monitoring software updates
- processing data sent from another server
Why Use an Application Server?
A web server is designed — and often optimized — to serve webpages. Therefore, it may not have the resources to run demanding web applications. An application server provides the processing power and memory to run these applications in real-time. It also provides the environment to run specific applications. For example, a cloud service may need to process data on a Windows machine. A Linux-based server may provide the web interface for the cloud service, but it cannot run Windows applications. Therefore, it may send input data to a Windows-based application server. The application server can process the data, then return the result to the web server, which can output the result in a web browser.
Database:
A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS). Together, the data and the DBMS, along with the applications that are associated with them, are referred to as a database system, often shortened to a database.
Data within the most common types of databases in operation today is typically modelled in rows and columns in a series of tables to make processing and data querying efficient. The data can then be easily accessed, managed, modified, updated, controlled, and organized. Most databases use structured query language (SQL) for writing and querying data.
A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS). Together, the data and the DBMS, along with the applications that are associated with them, are referred to as a database system, often shortened to just database.
Data within the most common types of databases in operation today is typically modelled in rows and columns in a series of tables to make processing and data querying efficient. The data can then be easily accessed, managed, modified, updated, controlled, and organized. Most databases use structured query language (SQL) for writing and querying data.
That is all! and now that you know when you type an URL and press enter in your web browser.
If you have any questions, comments or suggestions, feel free to contact me on Twitter @MrTechi_