|
Perl Case Study - News Grabber CGI By Lisa Hui The libwww-perl bundle has two functions that pluck web page information for you from the internet, which ends up being quite handy. LWP::Simple and LWP::UserAgent provide the functionality and you can use the use operator to include them in your script. LWP::SimpleThis module has a get() function that returns a long string with the HTML code from a URL specified within the parentheses. Just to show you how it works, try a simple test script like this:
#!/usr/local/bin/perl use LWP::Simple; $_ = get("http://www.thinkquest.org");
print "Content-type: text/html\n\n";
See this script in action: lwpsimple-test.cgi Running those three lines of code on your server would result in what seems to be the ThinkQuest front page loading in your browser. Check the URL - it isn't a redirection - just the script copying over the HTML code at the specified URL. Also notice that it is all stored in one string variable (the default one $_). But how do you get what you want from this string? You'll want to use substitution expressions (pattern matching) to remove the unwanted data. [Note: the script would not run on this server - possibly because the bundle files are not installed here] Since we're using the default variable, we can omit explicitly stating this in the substitution expressions below:
s/^.*Comment//s;
s/Comment.*$//s;
s/<[^>]+>//g; They are the same as explicitly stating $_ =~ s///s; ("s" stands for substituion - meaning that the value inbetween the first two slashes / / is being removed. The second set of slashes is what is being substituted in its place. What's the difference between LWP::SIMPLE and LWP::UserAgent then? Simple can handle only GET queries (in which the data is passed through the URL itself) whereas LWP::UserAgent can 'send' POST queries - and retrieve the data with the help of HTTP::Request::Common. We're not going to go into this as of yet - we did cover what we set out to do: a quick run through of how the simple module can "grab" news from a page - but I'll let you know when this section gets an overhaul. Last Updated August 16, 1999
|
||||||||||
©1999 Team 26297 "Ad Infinitum Web." All rights reserved. Any reproduction of this document for commercial or redistribution purposes without the permission of the author is forbidden.