< Browse > Home / Archive: February 2009

| RSS

A lot of you may very well know what happens behind the scenes when a webpage is requested or being served, this article will explain the details and internals of what goes behind when you type http://www.clickoffline.com/index.php

The Internet Architecture

What is meant by internet is a collection of networks that interconnect a large number of content providing servers with the end users. Internet typically is a network of much smaller networks which are interconnected. Any network/machine that is connected to the internet can provide and retrieve content from the internet.

Since internet is a network, just like any other network it supports a wide range of protocols, one of the most frequently used protocols is the HTTP protocol. The HTTP (Hyper Text Transfer Protocol) is the main medium for content retrieval and delivery over the internet. Some of the other protocols that are frequently used are as follows

  • FTP – File Transfer Protocol
  • SMTP – Simple Mail Transfer Protocol
  • POP – Post Office Protocol
  • IMAP – Internet Message Access Protocol
  • HTTPS – Secure HTTP Protocol

All these protocols are dependent on the internet as the backbone for providing the infrastructure needed to communicate. However these protocols may be used in a wide range of networks, not just the internet, for example, you may run an ftp server in an intranet (local LAN), which is not exactly internet.

DNS Service

Internet also has services that are used by other applications. DNS or Domain Name System is the one of the most important one. DNS works like a telephone directory. DNS typically is a collection of servers, which have a list of domain names matched with their internet IP addresses. Since every machine in the internet can be assigned with a unique IP address, and remembering the IP address is not so easy. DNS helps in referencing these Domains with IP addresses

A sample DNS table is shown below.

Domain Name IP Address
www.google.com [209.85.153.104]
analytics.google.com [209.85.175.147]
www.yahoo.com [69.147.76.15]
www.clickoffline.com [74.220.219.58]

So when you request a page from Google.com, the browser sends a request to the DNS server to find a match for ‘www.google.com’, the DNS servers will in turn, return the IP address of the respective Domain Name by doing a looking in the DNS table. When this operation is performed in reverse, i.e identifying the default domain name from a given IP address, it is referred to as Reverse DNS Lookup or rDNS.

Steps and processes involved in obtaining an html webpage


Request a web page

Once you type the URL in your browser, which may be IE or Firefox; the steps as described above happen sequentially before we get to see the page on our browsers. Let’s look at each of these steps in detail.

Components of the URL

The first phase is to understand the URL or identifying the components of the URL. A typical URL consists of the following components

•    Protocol
•    Domain Name
•    Resource Name

However there are other components that can be plugged into a URL which include, usernames, passwords and port numbers; we can ignore them as they are not used prominently.

The figure above explains the components of the URL. Now were ready to send the HTTP Request.

Construction of the HTTP Request

The browser composes a request called the HTTP Request that should be sent to the server to retrieve information from the server.

Sample


GET /index.php HTTP/1.1
Host: www.clickoffline.com

Since HTTP is a connection-less protocol, all information related to the session are sent along the request every time, this includes, resource name, protocol version, browser name, version and OS, supported content types, cookies if any, etc.

Identifying the server

Before sending out a HTTP request, we will need to identify the remote server; this is done by identifying the IP address using DNS for the domain specified in the URL. The HTTP Request is sent to eth remote server by establishing a route between the local machine and the remote server.

Wait For Response

The client browser waits for the response for the server.

Analysis/ Rendering of HTTP Response

The HTTP Response consists of a HTTP Response Header and the HTTP Response Data.
The Response Headers provide information related to the HTTP Response, the most important field is the Response code. Different response codes mean different statuses

200 OK
301 Moved permanently
302 Found
303 See Other
403 Forbidden
404 Not Found

The browser analyzes the HTTP response and renders the response one the screen, any dependencies like Images, CSS, java scripts that arise from the source html page follow the same processes to load int eh client, except that the URLs of the  resources is loaded automatically by the client browser instead of a user typing it.


HTTP/1.1 200 OK
Date: Wed, 18 Feb 2009 16:19:57 GMT
Server: Apache/2.2.11 (Unix)
X-Powered-By: PHP/5.2.8
X-Pingback: http://www.clickoffline.com/xmlrpc.php
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

Advanced Resources
•    Specfications for URL (RFC 1738): http://www.w3.org/Addressing/URL/url-spec.txt
•    Specifications for DNS (RFC 1035): http://www.ietf.org/rfc/rfc1035.txt
•    HTTP Protocol Details: http://en.wikipedia.org/wiki/HTTP

Disclaimer: The views expressed on this webpage are my personal views and do not necessarily reflect views of my employer.

[ View Post ]

I had this weird problem, where i had to secure a web server, to which access form the local network is granted by default, but if anyone who is not in the local network tires to access a web page, we had to validate him for username and password.

After some googling, i found this configuration that allowed me to secure the apache directory, this is the configuration


<Directory /var/www/html/123/>

AuthType Digest
AuthName "Intranet"
AuthDigestDomain /
BrowserMatch "MSIE" AuthDigestEnableQueryStringHack=On

AuthDigestProvider file
AuthUserFile /physical/path/to/.digest_pw
Require valid-user
Order Allow,Deny
Allow from 192.168.0.0/255.255.255.0

Satisfy any

</Directory>

There are typically two kinds of authorizations here,
1) Allow from Directive
2) AuthUserFile Directive

The Allow from directive allows traffic from the specified ip range, and the AuthUserFile validates allows the validation of user credentials, if he/she has given them. The Satisfy any directive allows apache to validate any one of teh above conditions to grant access to the specified resources.

[ View Post ]

There can be many instances where you would like to preload images in a HTML primarily for caching in order to serve the page better. This can be achieved by the use of Image objects in JavaScript. Image Object allow dynamic creation and manipulation of images in JavaScript, it is support by IE 4+ and Firefox.

Preloading Image

Preloading an image can be done by invoking a small JavaScript, that creates an image object and loads an image from the URL specified in the src parameter. See below


<head>
<script type="text/javascript">
<!--
image01= new Image();
image01.src="http://www.google.co.in/intl/en_com/images/logo_plain.png";
image02= new Image();
image02.src="http://labs.google.com/images/labs_logo2.gif";
//-->
</script>
</head>

The most optimal place to trigger the preloaded is on the body onLoad(), so that the images will start loading once the page has completed loading.

Changing images in JavaScript

The most immediate function after loading/caching and image is to use it somewhere in the html page, this can be achieved easing using JavaScript
See Full Example below


<html>
<head>
<title>Page title</title>
<script language="JavaScript">
objImage1 = new Image();
objImage2 = new Image();
objImage3 = new Image();
cnt = 0;
function preLoadImages(){
// preload the image file
objImage1.src='http://www.google.com/logos/mlk09.gif';
objImage2.src='http://www.google.com/logos/newyear09.gif';
objImage3.src='http://www.google.com/logos/olympics08_opening.gif';
}

function changeImage() {
cnt++;
if (cnt==1) {
document.images['im'].src = objImage1.src;
}
else if (cnt==2) {
document.images['im'].src = objImage2.src;
}
else if (cnt==3) {
document.images['im'].src = objImage3.src;
cnt=0;
}
}
</script>
</head>
<body onLoad="preLoadImages()">
<form name="myWebForm">
<img name="im" src="http://www.google.com/logos/olympics06_opening.gif"><br>
<input type="button" name="Prev" value="Switch Image " onClick="changeImage()">
</form>
</body>
</html>
[ View Post ]