Behind the scenes - What happens when you open a page in a website
A lot of you may very well know what happens behind the scenes when a webpage is requested or being served, this article will explain the details and internals of what goes behind when you type http://www.clickoffline.com/index.php
The Internet Architecture
What is meant by internet is a collection of networks that interconnect a large number of content providing servers with the end users. Internet typically is a network of much smaller networks which are interconnected. Any network/machine that is connected to the internet can provide and retrieve content from the internet.
Since internet is a network, just like any other network it supports a wide range of protocols, one of the most frequently used protocols is the HTTP protocol. The HTTP (Hyper Text Transfer Protocol) is the main medium for content retrieval and delivery over the internet. Some of the other protocols that are frequently used are as follows
- FTP – File Transfer Protocol
- SMTP – Simple Mail Transfer Protocol
- POP – Post Office Protocol
- IMAP – Internet Message Access Protocol
- HTTPS – Secure HTTP Protocol
All these protocols are dependent on the internet as the backbone for providing the infrastructure needed to communicate. However these protocols may be used in a wide range of networks, not just the internet, for example, you may run an ftp server in an intranet (local LAN), which is not exactly internet.
DNS Service
Internet also has services that are used by other applications. DNS or Domain Name System is the one of the most important one. DNS works like a telephone directory. DNS typically is a collection of servers, which have a list of domain names matched with their internet IP addresses. Since every machine in the internet can be assigned with a unique IP address, and remembering the IP address is not so easy. DNS helps in referencing these Domains with IP addresses
A sample DNS table is shown below.
| Domain Name | IP Address |
|---|---|
| www.google.com | [209.85.153.104] |
| analytics.google.com | [209.85.175.147] |
| www.yahoo.com | [69.147.76.15] |
| www.clickoffline.com | [74.220.219.58] |
So when you request a page from Google.com, the browser sends a request to the DNS server to find a match for ‘www.google.com’, the DNS servers will in turn, return the IP address of the respective Domain Name by doing a looking in the DNS table. When this operation is performed in reverse, i.e identifying the default domain name from a given IP address, it is referred to as Reverse DNS Lookup or rDNS.
Steps and processes involved in obtaining an html webpage
Once you type the URL in your browser, which may be IE or Firefox; the steps as described above happen sequentially before we get to see the page on our browsers. Let’s look at each of these steps in detail.
Components of the URL
The first phase is to understand the URL or identifying the components of the URL. A typical URL consists of the following components
• Protocol
• Domain Name
• Resource Name
However there are other components that can be plugged into a URL which include, usernames, passwords and port numbers; we can ignore them as they are not used prominently.
The figure above explains the components of the URL. Now were ready to send the HTTP Request.
Construction of the HTTP Request
The browser composes a request called the HTTP Request that should be sent to the server to retrieve information from the server.
Sample
GET /index.php HTTP/1.1 Host: www.clickoffline.com
Since HTTP is a connection-less protocol, all information related to the session are sent along the request every time, this includes, resource name, protocol version, browser name, version and OS, supported content types, cookies if any, etc.
Identifying the server
Before sending out a HTTP request, we will need to identify the remote server; this is done by identifying the IP address using DNS for the domain specified in the URL. The HTTP Request is sent to eth remote server by establishing a route between the local machine and the remote server.
Wait For Response
The client browser waits for the response for the server.
Analysis/ Rendering of HTTP Response
The HTTP Response consists of a HTTP Response Header and the HTTP Response Data.
The Response Headers provide information related to the HTTP Response, the most important field is the Response code. Different response codes mean different statuses
| 200 | OK |
| 301 | Moved permanently |
| 302 | Found |
| 303 | See Other |
| 403 | Forbidden |
| 404 | Not Found |
The browser analyzes the HTTP response and renders the response one the screen, any dependencies like Images, CSS, java scripts that arise from the source html page follow the same processes to load int eh client, except that the URLs of the resources is loaded automatically by the client browser instead of a user typing it.
HTTP/1.1 200 OK Date: Wed, 18 Feb 2009 16:19:57 GMT Server: Apache/2.2.11 (Unix) X-Powered-By: PHP/5.2.8 X-Pingback: http://www.clickoffline.com/xmlrpc.php Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=UTF-8
Advanced Resources
• Specfications for URL (RFC 1738): http://www.w3.org/Addressing/URL/url-spec.txt
• Specifications for DNS (RFC 1035): http://www.ietf.org/rfc/rfc1035.txt
• HTTP Protocol Details: http://en.wikipedia.org/wiki/HTTP
Disclaimer: The views expressed on this webpage are my personal views and do not necessarily reflect views of my employer.





December 22nd, 2009 at 2:18 pm
Merci pour ce post très intéressant et tellement réaliste
February 3rd, 2010 at 6:37 pm
I just saw your blog from Google. It has lots of very useful details. I will definitely be bookmarking this. Anyhow, pop over to my site if you want to quit smoking.
March 11th, 2010 at 10:39 am
By far the most concise and up to date information I found on this topic. Sure glad that I navigated to your page by accident. I’ll be subscribing to your feed so that I can get the latest updates. Appreciate all the information here
March 14th, 2010 at 9:02 am
hello fantastic website yea nice job our review blog will soon be adding reviews on blogs and add them to our websites as the top best 1000 blogs to visit we also do reviews on Consumer Reports reviews all types of reviews thanks
April 12th, 2010 at 11:39 pm
Thanks a lot! That was really informative, I just Dugg your website url.
July 11th, 2010 at 11:43 am
Hey, i am the owner of pastisold.com.Thank’s for sharing this info.This is very useful and informative material.Good post and keep it up friend.
July 12th, 2010 at 4:09 pm
Haloadmin I like with ur post . May i save this post for my college test ? thanks adminstrator