Caching using cron

Just a quick little post about caching data; I’ve seen a few people worrying about the performance of their site when generating pages from a database or pulling in data from other sites (for example RSS feeds) - the more times you are going to a database or a remote server, the more delay you are likely to encounter on your own site.

Often there is no need to dynamically generate the page each and every time someone visits, so this is where we can use cron to optimise response times. Cron is just a simple command line tool you’ll find on most Linux/UNIX hosts that will run a given command at a specified time or time interval. By using cron, we can generate a page using PHP/ASP/etc at a specific time interval (for example every 15 minutes, hour, or day etc)

So how does this work?

Lets look at an example: imagine you had a PHP page (lets call it “getRSS.php”) that generated content for your site by consuming some RSS feeds from other servers. At the moment that page is called every time someone visits your site, and its causing a delay. Here is how I’d use cron to cache a local copy and improve performance:

  1. Decide on how often you want getRSS.php to update, lets say for example every 30 minutes.
  2. Setup a cron task (this will vary based on your host - it could be using the command line or through a control panel) to use the following command:


    /usr/bin/php -f /home/me/public_html/getRSS.php > /home/me/public_html/rss.html


    What this command is doing is basically this: “Execute getRSS.php, and save the output into a file called rss.html”, i.e. every 30 minutes, getRSS.php will run and save the new RSS feeds into rss.html
  3. I would then use a standard include statement (i.e. through PHP or SSI) to then include the rss.html file in my usual page.

And that is all there is to it! We now have a local, static copy of the RSS data stored in rss.html that we can call up almost instantly because its just a simple text file stored locally.

This example used RSS as an example, but there is no reason why you couldn’t do the same for a site that was just generating pages based on the content of a database too.

A word of warning

Obviously this approach is only useful for sites that are viewed often, but updated less frequently! It is ideal for sites where the content changes only now and then, for example a blog might not have any new posts for many hours or even days but will get a lot of visitors in that time, so a cached copy is ideal. However, if your site relies on user-interaction a lot (e.g. a forum), then using this approach to cache the output is pretty pointless!

3 Responses

  1. Bud Says:

    Thought this was exactly what I was after. Thanks.

    However, all I’m getting in the destination file though is “No input file specified.”

    Any ideas why this is happening?

    /usr/bin/php -f /home//public_html/category.php?c=181 > /home//public_html/categories/c181.html

    The original file runs fine if put in the address bar.

    I’m using cPanel cron jobs (if that helps).

    Chhers
    Bud

  2. Matt Says:

    The problem you have here is you are trying to pass in the argument on the command line using the same query string syntax you’d use on the web, i.e. ….?c=181.

    When PHP is run from the command line using -f, it is looking for the file name, which in this case it thinks is “category.php?c=181″, which obviously it cannot find because it doesn’t exist (even if category.php does)

    To pass in variables to PHP on the command line, you need to use command line arguments - PHP’s $_GET will see command line arguments as the same thing as the query string arguments.

    So instead of this:

    /usr/bin/php -f /home/user/public_html/page.php?a=apples&b=bananas&c=oranges

    You’d do this:

    /usr/bin/php -f /home/user/public_html/page.php a=apples b=bananas c=oranges

    Hope that helps!

  3. Bud Says:

    Matt - You’re a STAR!

    That sorted it and I’m now up and running! Each day I learn a little more….

    THANKS, really appreciated.

    Bud

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.