We know the basics and what every search engine essentially does. So now let’s look at which parts do the main work.
The part of the search engine that “searches” he Web is the spider or the crawler. The spider visits the webpage, reads it, and then follows the links on that page to other pages on the Web. The spider does rounds of the Web, once it’s done scouring the entire Web, it will start its journey again. So it would visit the same site every month or two, looking for changes.
As the spider travels, all the information it finds goes into the second part of the search engine, the index. It is like a giant book or record of all the information on the web pages that a spider finds. IF the web page has changed, and the spider discovers this change, this big record is updated with the new information.
Sometimes it can take some time for new pages found by the spider to be added to the index. In other words, the web page might have been ‘spidered’ but not yet ‘indexed’. Until the information is not indexed, it cannot be available to users searching the search engine.
The final part of the search engine doing the final main task is the search engine software itself. This is the important program that goes through all the information and the millions of pages recorded in the index to find matches to a search query by a user. This software ranks the pages in order of what it believes to be most relevant.
We will now look at how each of these 3 essential components (the spider, index and the search engine software) work.