Print this Article (NS4)
Netscape Navigator
Internet Explorer
Opera
Neoplanet

Forums
HTML
General
Site Dev
Programming
Flash
Grafix (Art)

Laboratory
Smart HTML
Color Lab
Generators

Contents
Simple CGI
  1. Hello World
  2. Print function
  3. Quoting, EOF
  4. Metacharacters
  5. Special Characters
Perl Basics 1
  1. Variables
  2. Arrays
  3. Hashes
  4. Split function
  5. Subroutines
  6. Defaults
Form CGI
  1. Loops
  2. Conditions
  3. Boolean Statements
  4. Pattern Matching
Time CGI
  1. Local Time
  2. GM Time
  3. time function
Perl Basics 2
  1. Reading Files
  2. Writing Files
  3. Including Files
  4. chop function
  5. chomp function
Guestbook CGI

Redirect CGI

Poll CGI

  1. Giving Commands
  2. Voting
  3. Results Display
  4. Adding Your Vote
Password CGI
  1. Authentification
  2. Multiple Users
  3. Encryption
Mailing List CGI
  1. Sendmail
  2. Multiple Recipients
Unlimited Subdomains CGI

News Grabber CGI

  1. LWP::Simple
Message Board CGI (Part 1)

Back to the Top


Perl Case Study - News Grabber CGI
By Lisa Hui

The libwww-perl bundle has two functions that pluck web page information for you from the internet, which ends up being quite handy. LWP::Simple and LWP::UserAgent provide the functionality and you can use the use operator to include them in your script.

LWP::Simple

This module has a get() function that returns a long string with the HTML code from a URL specified within the parentheses. Just to show you how it works, try a simple test script like this:

#!/usr/local/bin/perl
###############
# File: lwpsimple-test.cgi

use LWP::Simple;

$_ = get("http://www.thinkquest.org");

print "Content-type: text/html\n\n";
print $_;


See this script in action: lwpsimple-test.cgi

Running those three lines of code on your server would result in what seems to be the ThinkQuest front page loading in your browser. Check the URL - it isn't a redirection - just the script copying over the HTML code at the specified URL.

Also notice that it is all stored in one string variable (the default one $_). But how do you get what you want from this string? You'll want to use substitution expressions (pattern matching) to remove the unwanted data.

[Note: the script would not run on this server - possibly because the bundle files are not installed here]

Since we're using the default variable, we can omit explicitly stating this in the substitution expressions below:

s/^.*Comment//s;
# This would remove everything in $_ before "Comment"

s/Comment.*$//s;
# Removes everything in $_ after "Comment"

s/<[^>]+>//g;
# Removes all HTML tags in $_

They are the same as explicitly stating $_ =~ s///s; ("s" stands for substituion - meaning that the value inbetween the first two slashes / / is being removed. The second set of slashes is what is being substituted in its place.

What's the difference between LWP::SIMPLE and LWP::UserAgent then? Simple can handle only GET queries (in which the data is passed through the URL itself) whereas LWP::UserAgent can 'send' POST queries - and retrieve the data with the help of HTTP::Request::Common.

We're not going to go into this as of yet - we did cover what we set out to do: a quick run through of how the simple module can "grab" news from a page - but I'll let you know when this section gets an overhaul.

Last Updated August 16, 1999


Common Beginner Mistakes

©1999 Team 26297 "Ad Infinitum Web." All rights reserved. Any reproduction of this document for commercial or redistribution purposes without the permission of the author is forbidden.