Varun: Good afternoon Mr.Vijay Krishnan.
Vijay: (*laughs*) Ah hi you can call me Vijay. It’s night anyway, but yeah good afternoon.
Varun: Ok sir, for the next few minutes im gonna be asking you a few questions regarding search engines.
Vijay: Sure.
Varun: And with me are my friends Jayakumar, Danny Matthew and satheesh.
Vijay: Alright.
Varun: Ok and shall we begin?
Vijay: Sure go ahead.
Varun: What motivated you to develop a search engine? I mean you could have done anything right?
Vijay: Ok so first is of course is by way of starting a company I guess for starters you are likely to offer more technological differentiation or something if it is something that is your strength, my background has been strongly in these areas of search, data mining and things like that. I have published a bunch of papers in these areas both at IIT Bombay and Stanford and I guess my co-founder, Siddarth Johnanthan has a similar background too. At Stanford we did a bunch of work together in these areas. And yeah I guess that apart we were looking for interesting ideas broadly in the space and we felt that this, I think our company offers what is a strong un-met need and where we could offer a lot of differentiated value also.
Varun: In your opinion, how would you define a search engine?
Vijay: I mean, that’s not too ambiguous right? Most people would just define it as something which enables you to search any kind of information quickly and now most of the, I mean it now so turns out that most of the search engines are primarily searching text-data of some sort, but of course there is no reason why all search engines in the future should be restricted to that kind of thing. I mean like currently for example, even if want to search a video or something, ultimately what the underlying search engine is doing is searching any of kind of text meta data, but there is no reason why with bit more advanced algorithms which do video to text or audio to text and things like that, there is no reason why more sophisticated kinds of search cannot be done in the future.
Varun: How do you define an ideal search engine?
Vijay: Ideally, you wanted to have all the, you wanted to firstly have indexed all the possible information. I mean in which time of course we have seen Google indexing more and more information both by way of a blotter setup webpage and now indexing books and things like that, but as one can certainly conceive of a situation where search engines index pretty much everything out there. That apart, I mean I guess ultimately the idea is that, yeah I mean a search engine should give you any information you want. That’s probably the ideal thing. You are constrained to simple keyword queries for starters in practice. Like for example, let me put it this way: Have you heard of this service called "Just dial" in India? (Varun: Yeah I have heard of it.) Yeah yeah! So for example I am not too sure if the current state the technology even if a search engine were to index all the data, if an automated system would give the same user experience as a person gets by calling them. So I can certainly conceive that, firstly with regard to power of queries itself there is a lot of syndrome and of course there is data and the relevance that will come, I mean it’s just not enough you have access to hundred times the data, you will also need to ensure that you effectively sift through all that and threw up relevant results soon, because in practice no one is gonna waste time looking at more than say, three results.
Varun: What would you think would be the top most priority for a search engine? Any search engine, generally?
Vijay: It depends in large part on what it is if it’s a general purpose, I mean, ok someone like Google is of course going to try to attempt everything. A search engine like us on the other hand that targets more niche problem. Like for us for example it is much more important that we scope out our problem with our problem. For our particular problem which is letting users very quickly get back to any page that we have seen in the past. We want to be better than Google or any other service. So I guess it’s in large part, an issue of whether you are a general purpose guy or a specific guy. In either case of course, if you are a general purpose guy you probably want to bring in your territory into more and more things. If you are a niche market, in that particular niche you want to do much better than others. So, that there is still some reason for people to come to you.
Varun: Ok sir. So it’s all about attracting user base?
Vijay: Yeah of course. That is what makes you viable as a business. Google probably monetizes its users much better than anyone else. They probably managed to make roughly, maybe $100 a user, per US user in advertising revenues. I guess any other search engine will typically do, as in make much less money per user because firstly both, it is due to your queries and also due to the fact that they might not be able to monetize each search with relevant ads as well. So, yeah in trust you have certainly won millions of users, that’s for sure if you want to be a viable business. And of course if you have 300 millions or something like that, you have really really taken off and you are probably on route to being a billion dollar company.
Varun: Speaking about that: What are the revenue strategies commonly used and which ones have you adopted for Infoaxe?
Vijay: Ok, in our case as of now we are not making revenues. We are more focused on growth for the next few months but, I guess some of the most common things would be, as in opting for a service like Google’s ad sense. There are a few other search services like there are these comparative shopping search engines. There are other search engines targeting particular verticals like cosmetics. A lot of these guys essentially, they don't have their own advertising infrastructure and they opt for Google’s ad sense. In terms of that, they managed to make reasonably good money because search ads are generally tend to be more targeted than if you are something else. Suppose for example your running a social network, for every 1000 page views, by US citizens you will still probably, barely be able to make maybe 20cents, while if you are into search you will certainly make money in the dollars, maybe 3 or 4 dollars for 1000 views or 1000 fetches.
Varun: Do the search engines need any authorization since they index the entire web?
Vijay: No not really. In practice there are some broad guidelines but I would not say they are law yet. Suppose Infoaxe indexes a page from rediff.com, Infoaxe really makes a request to rediff.com to fetch a page and that is probably no different from a user visiting a particular page on rediff with some browser and fetching it. In practice I guess the guidelines are more of what is called politeness in search engine literature, which is that you want the algorithms to be such that you don't hit a particular server too much. Rediff for example has a ton of content, so a search engine tries to hit them by trying to fetch 10,000 pages a second or something like that, that can slow down the service or bring down the service. While i dunno if there are specific laws against them, these are reasonably considered not acceptable and the relevant service provider will certainly block your search engine if you were to violate that. Have you guys heard of this recent search engine named Cuil, spelt C-U-I-L? (Varun: Yes sir.) Yeah so for example there were some complaints when they initially launched, that they were not following this politeness criterion too well and they were repeatedly trying to fetch too many pages from particular servers. I think a lot of people complained and blocked them and things like that.
Varun: Speaking of security, what do you think are the key security issues for any search engine?
Vijay: What would security mean here? Do you mean offering user's privacy or what?
Varun: I mean look at it in a broad way: security for the search engine and for user's privacy as such.
Vijay: Ok so it appears to me that in large part, security has to do with offering privacy for some kind of user specific sensitive data right? If you are an e-mail provider or something, it is clear; you don't want to make it possible for someone to hack into someone's e-mail or something like that. If you are a search engine, I suppose you have some kind of data either like IPN searches and things like that. Yeah I guess if you want to monetize better by showing targeted ads based on a particular person's searches, you probably want to make all those policies clear upfront so that people don't get pissed off later.
Varun: What about sensitive information that search engines display so easily in the web?
Vijay: What kind of sensitive information do you have in mind other than maybe a particular IP and the past searches or something like that? Do you have anything more in mind?
Varun: There are certain government records and registers like that which the people can access and gather information about someone else. A search engine can easily display that information right? So search engine's basic philosophy is that anything on the web is allowed to be indexed?
Vijay: In an ideal world, yes! And of course if you are a website with data of any kind and if you don't mind a search engine indexing it, you would have made it public, otherwise it will probably be authenticated or something like that. So like currently there are these things, like do you know of this file called "robots.text", txt? (Varun:Yes sir.) So I do see your point. Yes there might be certain data that is public, but probably people don't want it to be easily accessible or something like that, which it can be with the help of a search. I guess those site owners are free to list a particular provider like google or any other search or even put in the robot.txt that they don't want any robot to index it and to that extent, this can be taken care of.
Varun: Have you taken any special security measures in Infoaxe?
Vijay: Ok so in our case of course there are some very specific sensitivity. For example, since we allow users to keep track of their past web browsing data it is very important that, that be private. Yeah people can access it only with authentication. We will be allowing people to share parts of their browsing and all that. But all that again only with exclusive consent of the user. For people to be comfortable to be using a service like this, they certainly should be assured of privacy.
Varun: This question should have been asked initially. Can you tell us why you chose the name "Infoaxe"?
Vijay :(*laughs*). For starters I guess, I am not sure if you try to buy any domain names yourself. These days of course, lot of the simple domain names are taken. You want a short domain name also. It is more likely that people will type it correctly and not got to a wrong page or something like that. You want to sort of minimize the bounce rate if you will. Like for example, suppose we had a much more complicated name say, with ten or fifteen characters and you tell your friend about it. It is very likely that he might not get it right when he types it and you lose a potential user that way. With a simpler name and maybe something that is in some where related to the product, ok we kind of justify it as information access or something like that. It is much easier for people for remember and if they tell their friends, more likely, their friends will remember that sort of thing.
Varun: Coming back to security, what is "click fraud" and could you elaborate on that?
Vijay: Ok click fraud, I don't think I would say it is related to security directly. Before Google popularized this pay per click kind of model of advertising; suppose you are an advertiser, suppose you wanted to sell shoes or something like that. What you would do is, you would buy banner ads at yahoo or something like that. Yahoo would say "ok to show your ad thousand times, pay us some 1 dollar or 2 dollar" or something like that. Obviously advertisers are doing this kind of thing with the hope that it would boost sales. So if any publishers or a search engine or someone can give a much better guarantee that the, user was interested in your ad or the ad caught the user's attention, that certainly counts for much more. So, I wouldn't say Google did really invented this, but at any rate, they have been doing this on a large scale and have popularized this business of being a payment per click. So currently, if you are person who wants to sell shoes or you want to sell cars or anything like that and you want to use Google to advertise, what you would do is, you would bid on different keywords and ultimately Google will charge you only if someone clicks on the ads. So that way, they are kind of taking away part of the problem away from you. They are saying "ok, we will show your ad only to people who are likely to be interested so that impressions don't go waste and we won't take money if we show it to the wrong people". The trouble with this is of course that sometimes there is incentive for what is called "click fraud". Suppose in Chennai you have a huge car dealership, there is this other guy who has a huge car dealership as well, who is your competitor. He wants to of course ideally destroy your business and become the soul guy. He could potentially employ people to deliberately click on your ads for no good reason without buying anything so that you will be billed by Google and not get anything in return for it. Suppose you are an anti-smoking guy or something, you could potentially hire some people to go and keep clicking on the Wills cigarette ads or something like that without doing anything for their business. So Google and so on to ensure that the purity of their model is sustained, they attempt a bunch of algorithms to counter “click fraud” by saying that if multiple clicks come from the same IP they should be discounted or if there is a some kind of a pattern to the clicks or if there is a particular spike in clicks at some point which hard to explain in any other way, they will discount all that.
Varun: Let’s speak about the environmental impact of search engine. Search engines require a lot of power since they have to maintain data centers and servers and all that. So what do think would be the environmental impact of all that?
Vijay: Ok. I think it is important to look at the environmental impact on per capita basis. There are different things that affect the environment as pollution, there is all kinds of things. Ultimately you want to look at it on per capita basis. Ok per user we are serving or whatever, how much is energy consumption or how much global warming or whatever it is, you are looking optimized. Yeah obviously a bus for example pollutes more than a car but if a bus takes a 100 people it is certainly better right? (Varun: Yes sir.) Yeah there were in fact some study recently that, I don't exactly remember the numbers but I, 5grams of carbon dioxide per search; they listed out something. Did you guys read those articles too? (Varun: Yes sir we are familiar with that.) Yeah so that is my knowledge, on per capita basis it doesn't turn out that bad at all. Yes in absolute terms there is no doubt, particularly like a search engine like us of course we don't consume any big power and all that, like all our operations are on a couple of servers and we expect it to remain that way even over the next year or two. Of course when you get really big and say, you start serving hundreds of millions of users a day, obviously your scale gets much bigger and yeah there is no doubt what so ever. I have heard that these data centers in fact consume a very significant fraction of power in the US compared to the total consumption power itself. In absolute terms, I would say on a per capita basis, it is still way cheaper. If you look at how it used to be in the past, say, someone had to drive down to the library and pick up a book or drive down to the bookstore or in fact a lot of things that you look up online for that probably books would be published in the past which in turn would incur all the energy cost of publishing, of cutting down trees and all that. I think if you factor in all that, I would say it is way way more energy efficient than what its alternatives are.
Varun: What is the concept of "Web memory" search engine?
Vijay: What we target is a very niche market. Obviously, currently it is difficult to, relatively to take on general search in a big way and still offer some differentiated value. For example, if we were to taken on general web search and we wanted users, there is no good reason a user would come unless we are clearly better than Google. Crossing that milestone for example is obviously quite hard because they have probably thousands of very smart engineers, they have had lot of experience during their ranking function, they are crawling, they are indexing and every other thing. So currently I guess it is probably, relatively easier to offer some kind of differentiated value to users if you target a particular problem like what we do. In case of Infoaxe, the motivation is that people either bookmark pages, either on their browser or user tagging service. Have you heard of delicioous? What we wanted to do was ensure that people have to do no work what so ever. Ok you install a browser add-on, you remain signed-in, it won't even sign you out and you do your normal stuff. If you saw this great video on youtube that you are unable to find quickly or anything else, you came to know of this new pizza hut, you wanted to find the address or anything at all that you have seen in the past which you want to revisit quickly, our service lets you get to that quickly. One sort of slightly unintuitive use case which however is very common is that there are many pages which probably you can get back to with some effort. For example videos on youtube which you can get back to by typing youtube.com going and typing a particular query but it so turns out that it is way faster for you to just type a single keyword of that in the infoaxe toolbar. Suppose there is a video of Tendulkar's innings at sharjha or something. In youtube you probably have to write something quite detail like "Tendulkar", "sharjah", "Australia" or something like that. In infoaxe all you will do is type "Tendulkar" in the toolbar and since it is looking at a much smaller space of what you browsed in the past, you will get it immediately. In large parts, they are encouraging users to be lazy and to get to whatever they have browsed in the past with very minimal effort.
Varun: Generally speaking. How important are search engines today?
Vijay: I guess fundamentally, those have certainly revolutionized the way many businesses have been done. I guess they have pretty much touched all aspects of our life isn't it? It’s almost hard to think of anything that it hasn't touched in a big way. In the past say if someone had to research, they would probably go to the library, go check through different volumes, they might still not find something as relevant and so on, while now you will just search, you get relevant papers, you will correspond with the relevant authors. Even if you are a software engineer, for example, you get some particular some bug line or something. All you will do is that copy paste that put that in Google, in forum or something like that, someone would have asked a question and someone would have answered it. Whereas say fifteen years back you would have had no such alternative. I guess it would be way more messy. You probably have to sit through various manuals; still you will probably not get answer to these kinds of things easily. I guess as internet usage has greatly increased, to make any sense of the information and to really benefit from the increasing web pages, I guess search engines clearly have a huge hand in it. The web pages might have increased from like a few million in '95 to hundred billion now. But for that to actually have any substantial impact on users, users should have something to reach them as well. Pretty much most pages are reached for the first time through a search engine these days.
Varun: Finally what is the future for the search engines in general and Infoaxe in particular?
Vijay: I guess it will continue to be, as in they are targeting very hot problems and I guess I expect a lot of investment in the space and I expect a lot of progress both in the general search, the kind that Google is doing and a lot of other niche searches; videos, multimedia, different things. As for Infoaxe itself we think that we are handling a fairly useful niche problem which certainly has a market in the millions of users and we are growing quite well over the last couple of months. We have been barely live for about three or four months now. We see good potential for our current product itself to get there. That apart, we see it as an excellent platform because, currently what happens is that users browse a bunch of stuff, all that information gets kind of remains locked in your own browser and next time you clear history or something, all that information is lost and a lot of valuable effort is kind of lost without either benefitting you or someone else later. With Infoaxe there is a also excellent platform to move on to many more highly lucrative services where say, one can allow you to share content on mark with others or even in an anonymous way say, things like, these are the pages that are being extremely browsed these days. I mean essentially we see it as an excellent platform for offering several highly lucrative services in addition to functionality of letting people revisit pages that they have seen in the past.
Varun: Sir, on behalf of the team I’d like to thank you for your time. This interview would certainly be helpful to us as we gained lot of insights into search engines. Thank you once again, sir.
Vijay: Alright! Thanks a lot guys. Good luck! See you!