|
> Adding RSS feeds to your website with Perl <
In this article, based on a similar article that can be found at Cnet.com,
I will explain how you can add RSS feeds to your website and automatically update the feeds, using Perl and cron.
RSS, an acronym for Really Simple Syndication, is the XML-based format which allows Web sites to
publish and syndicate the latest content on their site to all interested parties.
RSS is very convenient for Webmasters, because (s)he no longer has to manually update his or her Web site with
new content from other sites. As such, RSS feeds also add "dynamic" content, in an automated way, to an otherwise static website.
What do you need ?
Everything described in this article was put to practice, and still is in production as we speak, on the following
combination :
- Slackware 10.1
- Perl 5.8.6
- Apache 2.0.53
- cron
Apache is configured with SSI (Server Side Includes, .shtml) enabled.
Installing the XML::RSS CPAN package
RSS parsing in Perl is usually handled by the XML::RSS CPAN package. The XML::RSS package is specifically designed to
read and parse RSS feeds. When you give XML::RSS an RSS feed, it converts the various items in the feed into array elements,
and exposes numerous methods and properties to access the data in the feed. XML::RSS currently supports versions 0.9, 0.91, and
1.0 of RSS.
Written entirely in Perl, XML::RSS isn't included with Perl by default, and you must install it from CPAN. Detailed
installation instructions are provided in the download archive, but by far the simplest way to install it is to use the
CPAN shell, as follows:
[root@stargate]# perl -MCPAN -e shell
cpan> install XML::RSS
You must be connected to internet on order to access the CPAN archive. Downloading and installing takes only a few
minutes. If you use the CPAN shell, dependencies will be automatically downloaded for you (unless you told the shell not to
download dependent modules). If you manually download and install the module, you may need to download and install
the XML::Parser module before XML::RSS can be installed. The examples in this tutorial also need the LWP::Simple package,
so you should download and install that one too if you don't already have it.
Basic usage
For our example, we'll assume that you're interested in displaying the latest news from Slashdot and Freshmeat on your site.
The URL for Slashdot's RSS feed is located here.
The follwing Perl script retrieves this feed, parses it, and turns it into a human-readable HTML page :
Listing A (Slashdot RSS feed)
Adjust the script to your likings and place it in your Web server's cgi-bin/ directory. Remember to make it executable,
and then browse to it using your Web browser. After a short wait for the RSS file to download, you should see something
similar to this :
Note that the script in Listing A does not produce exactly the same result as the above image. The image was generated
by an adapted version of the script (more similar to Listing B). Listing A however provides a good allround script to start with.
How does the script in Listing A work?
Well, the first task is to get the RSS feed from the remote system to the local one. This is accomplished with the
LWP::Simple package, which simulates an HTTP client and opens up a network connection to the remote site to retrieve the RSS data.
An XML::RSS object is created, and this raw data is then passed to it for processing. The various elements of the RSS feed are
converted into Perl structures, and a foreach() loop is used to iterate over the array of items.
Each item contains properties representing the item name, URL and description; these properties are used to dynamically
build a readable list of news items. Each time Slashdot updates its RSS feed, the list of items displayed by the script above
will change automatically, with no manual intervention required.
The script in Listing A will work with other RSS feeds as well, simply alter the URL passed to the LWP's get() method,
and watch as the list of items displayed by the script changes.
Script for the Freshmeat RSS feed
As promised, here is the script to get Freshmeat's RSS file. The URL for Freshmeat's feed is located
here. The following Perl script retrieves this feed,
parses it, and turns it into a human-readable HTML page.
The script is adapted to parse only the first 10 entries : Freshmeat has the annoying habbit of putting
all the releases of one day in a single RSS file, which produces listings way longer than I like. You are
of course free to adjust the script to your likings.
Listing B (Freshmeat RSS feed)
Optimizing performance
How not to do it
Calling the Perl script(s) directly (as cgi-bin) from a webpage works fine (mostly), but is certainly not the smartest way to
implement the addition of RSS feeds. Here is why :
- Loading your webpage is inevitably slowed down by the Perl script(s) first fetching and parsing the RSS feed(s).
- The Perl script is executed EVERY time the page is loaded, and that implies a huge waist of resources because
it is VERY unlikely that the RSS feed(s) changes every second, every minute, or even every 5 minutes.
- Resources (bandwith, CPU) are thus waisted on fetching and processing the same file over and over again.
- Most Webmasters that provide RSS feeds don't like it when you slam their servers to obtain the same file again,
and again, . . . in a short timescale. For example, Slashdot allows fetching 1x every 30 minutes. They will block you
for up to 72 hours if you violate their policy :

Of course, you can easily circumvent this blockade by using another source IP address . . .
Use local, static copies instead
A far better way of doing things, is to generate a static HTML snapshot from the RSS file(s), at periodic intervals,
and send that to clients instead. This is the way I do it :
Since the generated HTML file is a static file and not a script, no server-side processing takes place before the server
transmits it to the client (except for the SSI include). Loading performance with a static file is noticeably better than with
a Perl script.
You can finish it up by adding checks to ensure that fetching and parsing the RSS file(s) was succesfull. Instead of using the
Perl script to fetch the RSS file(s), you could use tools such as wget or lynx. The choice is yours.
Looks easy? Well, it is. Now, move on and add feeds to your site !
|