Skip Navigation

Blog » Output Caching for Beginners

High-traffic sites can often benefit from caching of pages, to save processing of the same data over and over again. This caching tutorial runs through the basics of file caching in PHP.

Caching of output in PHP is made easier by the use of the output buffering functions built in to PHP 4 and above.

You'll need to use two files to set up a caching system for your site. The first, "begin_caching.php" in this case, will run before any other PHP on your site. The second, "end_caching.php" in this case, runs after normal scripts have run. The two scripts effectively wrap around your current site.

You can achieve this wrapping effect one of two ways. The first way is to simply use the include() function and add them manually to every script you run. Unfortunately, this method can take some time, but is arguably more portable than the alternative.

The alternative relies on adding the following two lines of code (modified to reflect the correct path to the two PHP files needed) to your htaccess file. This is my preferred method, just because it requires no modification to existing scripts, and can very easily and quickly be turned off (just by commenting out the relevant lines in the htaccess file).

php_value auto_prepend_file /full/path/to/begin_caching.php
php_value auto_append_file /full/path/to/end_caching.php

Next, we move on to the scripts that do the work. There are several stages to caching a document:

  1. Receive request for page
  2. Check for the existence of a cached version of that page
  3. Check the cached copy is still valid
    • If it is, send the cached copy
    • If not, create a new cached copy and send it

To begin with, the script below contains a few basic settings. Here, you can set the directory you want to save cached files to (I would recommend keeping that directory outside your web root directory or at least protecting it from view through a normal browser). This script will need to be able to create files in this directory, and you need to allow this by setting the permissions of the directory. The permissions depend upon your server set up, so you may want to start by setting them to 777 while testing the script, and then reduce them to the lowest levels possible once the script is working.

You can also set the time, in seconds, a cached file should be considered valid for after creation, and set the file extension for saved files. It would be wise to not name them ".php", just for safety's sake.

<?php

    // Settings
    $cachedir = '../cache/'; // Directory to cache files in (keep outside web root)
    $cachetime = 600; // Seconds to cache files for
    $cacheext = 'cache'; // Extension to give cached files (usually cache, htm, txt)

    // Ignore List
    $ignore_list = array(
        'addedbytes.com/rss.php',
        'addedbytes.com/search/'
    );

    // Script
    $page = 'http://' . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI']; // Requested page
    $cachefile = $cachedir . md5($page) . '.' . $cacheext; // Cache file to either load or create

    $ignore_page = false;
    for ($i = 0; $i < count($ignore_list); $i++) {
        $ignore_page = (strpos($page, $ignore_list[$i]) !== false) ? true : $ignore_page;
    }

    $cachefile_created = ((@file_exists($cachefile)) and ($ignore_page === false)) ? @filemtime($cachefile) : 0;
    @clearstatcache();

    // Show file from cache if still valid
    if (time() - $cachetime < $cachefile_created) {

        //ob_start('ob_gzhandler');
        @readfile($cachefile);
        //ob_end_flush();
        exit();

    }

    // If we're still here, we need to generate a cache file

    ob_start();

?>

The file starts by generating an MD5 hash of the page that has been requested. It will use the complete requested URL, and the MD5 hash will be a 32 digit number, unique for each file. It then checks for the existence of this file.

If the file exists, it checks to see when it was last updated. If the file is older than the allowed time, it acts as though no cache existed (carrying on and generating a new file). If the file is still valid, it simply displays it.

There is also, in the settings, a list of pages to ignore when caching. This can be search results, comments pages, a news page or news feed - anything that should always be up to date. Simply add anything you do not want cached into here, and it will not be cached. You can add directories, or parts of URLs - the above simply searches for a text string. In the example above, I have left out the "http://www" portion of the URL, as this can be missed out by some visitors.

Finally, the two lines in italics above are both commented out. You can, if you like, uncomment these, and that will use outbut buffering to gzip your content before sending it to users, making your site even faster for them. Please note, though, that output buffering with gz encoding is not available in versions of PHP previous to 4.0.5.

Which brings us to the second file, "end_caching.php". At the end of the first file, if no cache exists, we start output buffering. This means that rather than send the page to the user, we are saving it for use later. In the second script below, we take the contents of the output buffer, and write it to a file.

<?php

    // Now the script has run, generate a new cache file
    $fp = @fopen($cachefile, 'w'); 

    // save the contents of output buffer to the file
    @fwrite($fp, ob_get_contents());
    @fclose($fp); 

    ob_end_flush(); 

?>

Important: If you do not have "register_globals" set to off in php.ini, make sure you add the following to the beginning of "end_caching.php" (straight after the "<?php" line) to aid security. This will ensure that an attacker cannot visit "end_caching.php" directly and overwrite an important file on your site (or read its contents).

    $cachedir = '../cache/'; // Directory to cache files in (keep outside web root)
    $cacheext = 'cache'; // Extension to give cached files (usually cache, htm, txt)
    $page = 'http://' . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI']; // Requested page
    $cachefile = $cachedir . md5($page) . '.' . $cacheext; // Cache file to either load or create

And there we have it. If a cached document exists, it is shown to the user, and if not, one is created.

Finally, you need to make sure the cache remains reasonably clean. Over time, out of date or redundant files could build up, and these should be removed regularly. For this reason, I usually set up an automated script to delete all cache files once a week (or less often, depending on the traffic of the site), but this will depend greatly upon the server software you are using.

The script below is one example of a script to delete all cache files. You will need to set the cache directory at the beginning before running the script. You can either use this manually, visiting the page through your browser whenever you want to empty the cache, or run it automatically. An example of a CRON job used to run this script automatically is below the script (the " >/dev/null 2>&1" bit at the end of the crontab prevents the server emailing me every time the script runs). Please note that this last script will be cached too, unless you specify otherwise!

<?php

    // Settings
    $cachedir = '../cache/'; // Directory to cache files in (keep outside web root)

    if ($handle = @opendir($cachedir)) {
        while (false !== ($file = @readdir($handle))) {
            if ($file != '.' and $file != '..') {
                echo $file . ' deleted.<br>';
                @unlink($cachedir . '/' . $file);
            }
        }
        @closedir($handle);
    }

?>
curl http://www.your_domain.com/empty_caching.php >/dev/null 2>&1

comments powered by Disqus