In previous articles, we talked about how the internet you 'see' may not be the internet that 'is'. Ever wondered how the Internet Service Providers block websites, or how they know what websites you have visited? There is a lot of stuff happening in the background.

Firewall

In this article, we would try to answer the following questions:

Jargon


We will start off with the organization of the internet.

The internet is basically a network of computers. Why is it _inter_ ? Because the computers are internconnected with each other.

Schematic Diagram of a Local Area Network
Schematic Diagram of a Local Area Network

A node in a network, is a single computer. Various end-users like you, are connected to a center called the Internet Service Provider (ISP), which infact is connected to a bigger network. They charge you for getting connected to the internet using their infrastructure.

How do I get connected ?

Typically, your phone company provides internet coupled with your telephone wire. The reason is that the telephone companies already have a huge network of telephones connected by wires, which can also be used to transmit data.

How do computers communicate ?

How do you get things across in the internet? Imagine you are in a foreign country, and you want to communicate. Either:

This essentially brings the idea of a 'language' of communication. In computer terminology, it is commonly called a 'protocol'. Unlike natural languages, there are different protocols for different usages. Examples of protocols are:

Essentially, it means that depending on your puspose, you need to communicate in the languages listed above (among many others) to tell the computer at the other end what you want.

How do you get connected to a website ?

When you want to get connected to a website, you need to know the address of that computer, called the ``IP Address''. This is a unique number by which the destination computer is known to the world wide web. But, you don't type in the address (formatted as: xxx.xxx.xxx.xxx, where xxx is a number from 0 to 255) everytime you want to check your mail, do you?

When you want to check your mail (say user@gmail.com), you type in the address in your browser. What really happens is:

Proxy Servers

Imagine a situation where you are studying in a university. Clearly everyone cannot be offered internet connection individually.

How is it resolved?

A network of computers is created inside the university, and all computers share the internet connection. Ultimately, only a few computers are connected to the internet from the institute. All the local computers form a subnetwork, that connect to the central machine. This machine just takes your request as its own, fetches the content from the WWW, and throws it back at you. To the outside world, your identity is that of the central computer.

A proxy server is just another computer 'program' that runs on another computer which is just like yours, except for the fact that it is modified to handle handle huge amount of requests.

 

Firewalls


With enough background on the short description of the architecture of the internet, we shift our focus to Firewalls.

A firewall is a security device, whose function is to regulate the traffic between computers/computer networks. It can be software or hardware implemented. It looks for signatures in its packets, tries to decode them, and blocks unwanted packets.

The firewall which you might be having on your computer, tries to filter packets that come to your computer, and possibly filters off viruses and keeps unwanted requests out.

How Content Filtering works

We have discussed in the previous articles that content-filtering is done by ISPs of regions because the government has laws governing the usage of the internet. Typically, when your request is sent to your ISP, the ISP's software takes a look at your request. If a request is made to a website that is blocked, a quick reply is sent back to you saying that it is not possible to process your request: either explicitly or implicitly.

By explicit, we mean that the ISP sends a webpage that says: ``This website is blocked by <ISP provider's name>''.

If it is an implicit block, typically the request is cut off at the ISP itself, and your browser, which is expecting a reply, would never get it at all! So, after a ``timeout'' of about a minute (or it depends on your browser settings), the browser simply gives up and might say: ``The remote server <address.com> is not reachable at this time. Please try again later''.

How can you bypass it?

You are being blocked. How do you get around the system? There are many possible ways of doing it. Some of them are:

Again, the ISPs can have a harsh say on this. They can block all encrypted connections altogether, and make life miserable for you!

Conclusion


What we have seen is how the internet is organized, and how the content filtering works in real life. Ultimately it is a piece of software (or a firmware which is a software present in the hardware) looking for patterns in huge amounts of data flowing through the internet for suspicious activity. These softwares are evolving, and so are methods to evade them. But, finally, the content filtering on the WWW is done by the laws the government enforces, in the interest of the people.

References