In previous articles, we talked about how the internet you 'see' may not be the internet that 'is'. Ever wondered how the Internet Service Providers block websites, or how they know what websites you have visited? There is a lot of stuff happening in the background.
In this article, we would try to answer the following questions:
- The basic organization of the internet
- How do computers communicate
- How you get connected to a website
- What are proxy servers ?
- What are firewalls ?
- How content filtering works, and how you can bypass it
- Server: A computer that provides essential service to other computers.
- Client: A normal end-user computer.
- Packet: A formatted unit of information that is sent across a network.
We will start off with the organization of the internet.
The internet is basically a network of computers. Why is it _inter_ ? Because the computers are interconnected with each other.
Schematic Diagram of a Local Area Network
A node in a network, is a single computer. Various end-users like you, are connected to a center called the Internet Service Provider (ISP), which infact is connected to a bigger network. They charge you for getting connected to the internet using their infrastructure.
How do I get connected ?
Typically, your phone company provides internet coupled with your telephone wire. The reason is that the telephone companies already have a huge network of telephones connected by wires, which can also be used to transmit data.
How do computers communicate ?
How do you get things across in the internet? Imagine you are in a foreign country, and you want to communicate. Either:
- You should know the local language
- The locals should know your language
- Both should know a common language
This essentially brings the idea of a 'language' of communication. In computer terminology, it is commonly called a 'protocol'. Unlike natural languages, there are different protocols for different usages. Examples of protocols are:
- The Hyper-Text Transfer Protocol (HTTP).
- The File Transfer Protocol (FTP).
- The Post Office Protocol (POP). - Used in email clients.
- The Simple Mail Transfer Protocol (SMTP). - Also used for mail transmission.
Essentially, it means that depending on your puspose, you need to communicate in the languages listed above (among many others) to tell the computer at the other end what you want.
How do you get connected to a website ?
When you want to get connected to a website, you need to know the address of that computer, called the ``IP Address''. This is a unique number by which the destination computer is known to the world wide web. But, you don't type in the address (formatted as: xxx.xxx.xxx.xxx, where xxx is a number from 0 to 255) everytime you want to check your mail, do you?
When you want to check your mail (say email@example.com), you type in the address in your browser. What really happens is:
- Your browser transmits a request to a DNS server (whose address is known) asking for the address of the website (in this case: gmail.com).
- Your DNS server gets this request, and replies with the address.
- You send a request to the address you just received, asking for the webpage.
- The remote server (another computer), accepts the request and sends you back the data you requested for.
Imagine a situation where you are studying in a university. Clearly everyone cannot be offered internet connection individually.
How is it resolved?
A network of computers is created inside the university, and all computers share the internet connection. Ultimately, only a few computers are connected to the internet from the institute. All the local computers form a subnetwork, that connects to the central machine. This machine just takes your request as its own, fetches the content from the WWW, and throws it back at you. To the outside world, your identity is that of the central computer.
A proxy server is just another computer 'program' that runs on another computer which is just like yours, except for the fact that it is modified to handle huge amount of requests.
With enough background on the short description of the architecture of the internet, we shift our focus to Firewalls.
A firewall is a security device, whose function is to regulate the traffic between computers/computer networks. It can be software or hardware implemented. It looks for signatures in its packets, tries to decode them, and blocks unwanted packets.
The firewall which you might be having on your computer, tries to filter packets that come to your computer, and possibly filters off viruses and keeps unwanted requests out.
How Content Filtering works
We have discussed in the previous articles that content-filtering is done by ISPs of regions because the government has laws governing the usage of the internet. Typically, when your request is sent to your ISP, the ISP's software takes a look at your request. If a request is made to a website that is blocked, a quick reply is sent back to you saying that it is not possible to process your request: either explicitly or implicitly.
By explicit, we mean that the ISP sends a webpage that says: ``This website is blocked by <ISP provider's name>''.
If it is an implicit block, typically the request is cut off at the ISP itself, and your browser, which is expecting a reply, would never get it at all! So, after a ``timeout'' of about a minute (or it depends on your browser settings), the browser simply gives up and might say: ``The remote server <address.com> is not reachable at this time. Please try again later''.
How can you bypass it?
You are being blocked. How do you get around the system? There are many possible ways of doing it. Some of them are:
- Connecting to a proxy
When you subscribe for the internet through your ISP, you are typically given direct access to the internet. By direct access, we mean that there is no proxy server sitting between you and the internet, and the identity your computer gets (the IP address) is unique. So, if your ISP blocks certain content, and since you have direct access to the internet, why not contact another computer and ask it to fetch content for you? Provided the other computer isn't using the same ISP as you, it could probably get blocked content for you. Effectively, the remote computer is acting like a proxy for you, and this way, you could get access to blocked content. But, ISPs are just too clever. If they want to prevent proxying, they block the proxy websites too! But new proxies keep coming every now and then, and this makes blocking them a hard job for the ISPs.
- Encrypting your connection
Sometimes, it is possible to encrypt your connection to a proxy outside, so that your ISP has no clue as to what you are doing! This is usually done by SSH, which is a Secure Shell Client. It is called Secure because it encrypts data before sending it to the destination. Only the destination can decrypt the data.
Again, the ISPs can have a harsh say on this. They can block all encrypted connections altogether, and make life miserable for you!
What we have seen is how the internet is organized, and how the content filtering works in real life. Ultimately it is a piece of software (or a firmware which is a software present in the hardware) looking for patterns in huge amounts of data flowing through the internet for suspicious activity. These softwares are evolving, and so are methods to evade them. But, finally, the content filtering on the WWW is done by the laws the government enforces, in the interest of the people.