Tagged with "html" http://www.addedbytes.com/feeds/tag-feed/ en Web Development in Brighton - Added Bytes 2006 120 "Select All" JavaScript for Forms Posting to an Array http://www.addedbytes.com/blog/code/select-all-javascript-for-forms-posting-to-an-array/ The problem that led to this snippet of code was that when posting from a form to a PHP script, you may sometimes want to have several fields with the same name and different values. For example, you might want people to be able to tick boxes to indicate which cities they have been to from a list.

The problem that led to this snippet of code was that when posting from a form to a PHP script, you may sometimes want to have several fields with the same name and different values. For example, you might want people to be able to tick boxes to indicate which cities they have been to from a list. You would normally add "[]" to the name of the field inputs, like so:

<input type="checkbox" name="cities[]" value="London"> London
<input type="checkbox" name="cities[]" value="Paris"> Paris
<input type="checkbox" name="cities[]" value="Berlin"> Berlin
<input type="checkbox" name="cities[]" value="Madrid"> Madrid
<input type="checkbox" name="cities[]" value="Rome"> Rome

When the form is received by PHP, whichever items are ticked in the cities list above are accessible in the array $_POST['cities']. This is very handy.

Unfortunately, the addition of square brackets causes trouble with JavaScript, especially with a "Select All" function - which allows you to check all boxes at once by clicking a single one. This script works around that using regular expressions.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<title>Checkbox Fun</title>
<script type="text/javascript"><!--
var formblock;
var forminputs;
function prepare() {
  formblock= document.getElementById('form_id');
  forminputs = formblock.getElementsByTagName('input');
function select_all(name, value) {
  for (i = 0; i < forminputs.length; i++) {
    // regex here to check name attribute
    var regex = new RegExp(name, "i");
    if (regex.test(forminputs[i].getAttribute('name'))) {
      if (value == '1') {
        forminputs[i].checked = true;
      } else {
        forminputs[i].checked = false;
if (window.addEventListener) {
  window.addEventListener("load", prepare, false);
} else if (window.attachEvent) {
  window.attachEvent("onload", prepare)
} else if (document.getElementById) {
  window.onload = prepare;
<form id="form_id" name="myform" method="get" action="search.php">
  <a href="#" onClick="select_all('area', '1');">Check All Fruit</a> | <a href="#" onClick="select_all('area', '0');">Uncheck All 
  <input type="checkbox" name="area[]" value="1" />Apples<br />
  <input type="checkbox" name="area[]" value="2" />Bananas<br />
  <input type="checkbox" name="area[]" value="3" />Chickens<br />
  <input type="checkbox" name="area[]" value="4" />Stoats
  <br><br><a href="#" onClick="select_all('location', '1');">Check All Locations</a> | <a href="#" onClick="select_all('location', 
'0');">Uncheck All Locations</a><br><br>
  <input type="checkbox" name="location[]" value="1" />Brighton<br />
  <input type="checkbox" name="location[]" value="2" />Hove<br />

Thu, 28 Jul 2005 10:05:00 +0100 http://www.addedbytes.com/blog/code/select-all-javascript-for-forms-posting-to-an-array/ Dave Child ,,,,,,,,
robots.txt File http://www.addedbytes.com/articles/online-marketing/robots-txt-file/ A robots.txt file is a simple, plain text file that you store on your website. Its purpose is to give instructions to robots (also known as "spiders", programs that retrieve content for search engines like Google and Fast) detailing what they should not index on a website. If you are unable to create or use a robots.txt file, you might find this meta tags tutorial useful.

A robots.txt file (a document detailing the robots.txt exclusion standard is available) is always stored in the root of your site, and is always named in lower case. For example, if a website at http://www.addedbytes.com/ had a robots.txt file it would be found at http://www.addedbytes.com/robots.txt - and only there. Spiders will always search for it in the root of a domain, and will never ever look for it elsewhere. You cannot specify a different name or location for a robots.txt file.

A robots.txt file should be viewed like a list of recommendations. By including one, you are asking the spiders that visit your site to ignore certain things that you would prefer not to be indexed, but they are not obliged to pay attention to that. If you really do not want things indexed, it is far better to disallow access with server-side programming than a robots.txt file.

Writing a robots.txt File

A robots.txt file is a list of instructions. Each instruction is divided into two parts. The first, "User-agent" (case-sensitive), tells robots reading the file which robots should pay attention to the instructions that follow. Usually, this will be a "*", which is a wild card meaning "all robots". The wild card character can only be used in this context, except in the case of Googlebot, which does support it in other places (see User-Agent Specific Commands).

Following this line specifying a user agent are the rules themselves. The rules that apply to a defined user agent must be defined on the lines following the "User-agent" instruction. There can be no blank lines within each set of instructions, and there must be at least one blank line seperating sets of instructions. The instructions are usually of the format: "Disallow: /folder/" or "Disallow: /file.htm". There can only be one instruction per line, and you should really avoid putting spaces before the instructions (though this isn't specifically allowed or disallowed, it is probably best to avoid taking a risk).

Anything following a hash symbol "#" is considered a comment and ignored. At least, according to the standards. Rumours abound, though, that in the past some engines have ignored a line with a hash symbol on it wherever it is placed, so you may want to place each comment on a line by itself.

For example, the following robots.txt file is technically valid:

# My robots.txt file User-agent: * Disallow: /folder/ # My private folder Disallow: /file.htm # My private file

If you want to prevent robots from indexing anything at all on your site, you could add the following to your robots.txt file:

User-agent: * Disallow: /

If you want to prevent all robots, except for a particular one or two, from accessing a folder, you could write a file like this, which will allow GoogleBot to index everything on your site, but prevent all other robots from accessing the folder called, imaginatively, "folder":

User-agent: googlebot Disallow: User-agent: * Disallow: /folder/

Please note: Many people believe that it is necessary to define the robot-specific rules before the general rules. This is not necessary according to the robots.txt exclusion standard, however there is no evidence of it causing problems, so may be worth doing, if there is a small chance it will help things to work as you intend.

Once you have written a robots.txt file, it is often a good idea to run it through a validator to check for errors, as they may do considerable harm if they prevent your site from being indexed. SearchEngineWorld's robots.txt validator is the most proficient of those available, or if you prefer, there is a validator that understands more unusual commands like Crawl-delay available as well.

Example Files

This is the robots.txt file for AddedBytes.com. As you can see, I have disallowed the indexing of a few files, but not many. Specifically, I have asked Google not to index "404.php", which is the page a user is redirected to if a page is not found, and "friend.php", which is linked to from every page, but is there to allow users to refer friends to the site, and so should not really be indexed.

User-agent: * Disallow: /404.php Disallow: /friend.php

This file, from eBay, is again quite short, and simply specifies a few folders that should not be indexed:

User-agent: * Disallow: /help/confidence/ Disallow: /help/policies/ Disallow: /disney/

As you can see, Google will still list pages excluded by robots.txt, as Google is still aware they exist. However, Google will not index the content of the page and the page will not show up in searches except where a search includes the address of the excluded page.

Blank robots.txt files

It may be that you do not want to prevent spiders from indexing anything on your site. If that is the case, you should still add a robot.txt file, but an empty one, of this format:

User-agent: * Disallow:

This prevents spiders from generating a 404 error when the robots.txt file isn't found. It is basically just good practice to add a blank robots file, at the least, but not essential.

Be Careful

You may be thinking that adding the addresses of folders you do not with robots to index is a good way to prevent spiders from accidentally indexing sensitive areas of your site, like an administration area. While this is true, remember that anybody at all can view your robots.txt file, and therefore find the address(es) you'd rather were not indexed. If that includes your admninistration area, you may have saved them the trouble of searching for it.

There have been websites with unprotected administration areas online, whose admin area was hidden in an unusually named folder for "security" reasons - who added the name of the folder to their robots.txt file, opening up their admin area to anyone who wanted to have a poke around.

You must also be careful when writing your robots.txt file. Robots will usually err on the side of caution. If they do not recognise a command, they may well assume you meant them to stay away. Syntax errors in a robots.txt file can prevent your entire site from being indexed, so check it thoroughly before uploading it!

User-Agent Specific Commands


Googlebot has no extra commands specific to it, however it is allegedly a little brighter than the average crawler. Googlebot will supposedly understand wild card characters (*) in the "Disallow" field of the robots.txt file. However, Googlebot is the only engine even rumoured to be able to do this, so you would be wise to avoid using wild cards in the disallow field wherever possible.

MSNBot and Slurp

User-Agent: msnbot Crawl-Delay: 10 User-Agent: Slurp Crawl-Delay: 10

The above code is specific to MSN's spider, "MSNBot", and Inktomi's spider, "Slurp", and instructs the spiders to wait the specified amount of time, in seconds (10 seconds above, default is 1 second if not specified) before requesting another page from your site. MSNBot and Slurp have been known to index some sites very heavily, and this allows webmasters to slow down their indexing speed.

You could technically use this command with a user agent of "*" as well - the robots.txt exclusion standard instructs robots to just ignore commands they do not understand. However, if a robot sees something they do not understand in a robots.txt file, they may just not index your site. If using the "Crawl-Delay" command, you would be wiser to specify the user agents it should apply to.

List of User-Agent Names

  • Google: "googlebot"
  • Google's Image Search: "Googlebot-Image"
  • MSN: "msnbot"
  • Inktomi: "Slurp"
  • AllTheWeb: "fast"
  • AskJeeves: "teomaagent1" or "directhit"
  • Lycos: "lycos"

Mon, 19 Jul 2004 12:09:17 +0100 http://www.addedbytes.com/articles/online-marketing/robots-txt-file/ Dave Child ,,,,,
The Box Model For Beginners http://www.addedbytes.com/articles/for-beginners/the-box-model-for-beginners/ The term "box model" is often used by people when talking about CSS-based layouts and design. Not everyone understands what is meant by this though, and not everyone understands why it is so important.

Any HTML element can be considered a box, and so the box model applies to all HTML (and XHTML) elements.

The box model is the specification that defines how a box and its attributes relate to each other. In its simplest form, the box model tells browsers that a box defined as having width 100 pixels and height 50 pixels should be drawn 100 pixels wide and 50 pixels tall.

There is more you can add to a box, though, like padding, margins, borders, etc. This image should help explain what I'm about to run through:

Outline of box model

As you can see, a box is made up of four distinct parts. The outside one, the margin, is completely invisible. It has no background color, and will not obstruct elements behind it. The margin is outside the second part, which is the border. The border outlines the visible portion of the element. Inside the border is the third part of the box, the padding, and then inside that the content area of the box. The padding defines the space between the content area of the box and the border.

(Note that in the image above, the only border of the three drawn that would actually be visible is the solid line - the dashed lines are added to help demonstrate the box model).

When you define a width and a height for your box using CSS, you are defining not the entire area taken up by the content, padding, border and margin. You are actually just defining the content area itself - the bit right in the middle. The padding, border and margin must be added to that in order to calculate the total space occupied by the box. (From this point on, we will use width for demonstrations, but the same principles apply to both width and height).

box { width: 200px; border: 10px solid #99c; padding: 20px; margin: 20px; }

The above CSS, applied to a box, would mean that that box occupied 300 pixels of space horizontally on the page. The content of the box would occupy 200 pixels of that (dashed line added to demonstrate the edge of the area occupied by the box):

Box model demonstration.

In the above image, you can see that the pale blue area is 240 pixels wide (200 pixels of content plus 20 pixels padding either side). The border is 10 pixels wide either side, making the total width including the border 260 pixels. The margin is 20 pixels either side, making the total width of the box 300 pixels.

In practice, this can cause some confusion. For example, if I have a 100 pixel wide space available, and want to fill it with a pale red box with a dark red border and a small amount of padding, it would be very easy to write the CSS like so:

box { width: 100px; border: 1px solid #900; padding: 10px; margin: 0; background: #fee; }

(A declaration of 0, as used for the margin above, does not require a unit to be added. Any value other than 0 does require a unit, e.g. px for pixels. Also, although "background" is defined as a shorthand property, it is more widely supported than the more correct "background-color".)

However, that will not give us a 100 pixel wide box, as the width declaration defines the content area of the box. The content area of the box will be 100 pixels - the total width of the box as defined above will be 122 pixels:

Box model demonstration.

In order to set the above box to only occupy 100 pixels horizontally, you would need to set the width of the content area to be 100 pixels minus the padding and minus the border, in this case 78 pixels, like so:

box { width: 78px; border: 1px solid #900; padding: 10px; margin: 0; background: #fee; }

To calculate the overall width of a box, including all padding, borders and margins, you would use the following formula:

total box width = content area width + left padding + right padding + left border + right border + left margin + right margin


At this point, you should now have a good understanding of what the box model is, and how boxes should be treated by different browsers. However, as you will soon learn (if you did not know already), not every browser does as it is supposed to. In order to use boxes, and by extension make the most of CSS in your website, you will need to be aware of how the different browsers treat boxes in practice and how to overcome and work around the problems presented by these idiosyncrasies.

Top Notch

Opera 6 Opera 7 Mozilla Firefox Camino Safari Konqueror Netscape 6 Netscape 7 Internet Explorer 6

Most browsers released in the last few years have no problem with boxes and render boxes correctly. Opera 6 and 7, Mozilla 1 (and by extension other browsers based on the Gecko engine like Netscape 7, Camino and Firefox and other derivatives), Safari, Konquerer (and derivatives) and Internet Explorer 5 for the Mac are all shining examples of how a web browser should behave, all rendering boxes flawlessly. IE 6 for Windows also will render a box correctly, as long as the [url=http://www.addedbytes.com/design/dtds-explained]Document Type Definition[/url] for the page is correct.

Whoops, Mrs Miggins, You're Sitting On My Artichokes

Internet Explorer 5 Internet Explorer 6

Some browsers don't display a box correctly. Unlike those below here, these browsers are widely enough used on the web that it is usually worth the effort to work through the problem. There are various methods for doing this, some better than others, that follow on. Most notable among the browsers with problems are Internet Explorers 4 and 5 and Internet Explorer 6. IE 6 is easy to work around, by adding a correct DTD (which you should be doing anyway).

Internet Explorer 5 is the main reason there is a box model problem at all. It, unfortunately, does not follow the simple definition for box layout as defined by the W3c. When you define a width for a box and it is rendered in IE5, instead of that width defining the content area of the box, it includes the borders and padding. Margins are added on to the content width correctly, but padding and borders are not. Unfortunately, this leaves us with some unpleasant choices:

  1. Use a box model hack
    Hack's like [url=http://www.tantek.com/CSS/Examples/boxmodelhack.html]Tantek's box model hack[/url] are unfortunately something of a necessity. While some might argue that using hacks like this is completely missing the point of using CSS for web design, commercial necessity and the prevalence of IE5 leave us with little in the way of choice. The IE5 box model hack is in use all over the web and has spawned plenty of variants.
  2. Add in extra code
    Some might consider this a slightly "better" way of working around this problem. Rather than adding a style sheet hack, you can nest elements within each other. Adding a div within another div means that rather than using padding, you can use just margins, which are handled correctly by IE5. As with the box model hack, it is far from a perfect solution, but there are few other options if you want a site to look the same in IE5 as other more capable browsers.

Hall of Shame

Internet Explorer 4 Netscape 4

On the one hand, the browsers that I am about to mention are appalling, all failing dismally to render a simple box correctly for one reason or another. On a more positive note, users of these browsers, mostly old versions of current browsers, make up an extremely small, and continually shrinking, portion of web users. While you could probably find a workaround for the bugs in the display of boxes in these browsers, it is almost certainly not worth the effort - you are likely to cause yourself more harm than good with workarounds for these!

Netscape 4's box model is awful, but even worse, the simple box model hacks to fix the problem for IE5 and IE6 will crash Netscape 4. Netscape 4's style sheet support is abysmal overall, and it is being supported less and less. Though it is strictly a personal choice, I don't think it is worth the time and effort to support Netscape 4 any more - it's just not used enough, and the number of users is only ever going to shrink.

Internet Explorer 4 suffers, basically, the same problem as IE5. It treats boxes in a very similar way. However, it falls over in far more ways, and many of the available hacks will crash IE4. As it is also used by few people, and that number is dropping, many designers ignore it.

What does the future hold?

CSS3 promises us the option to determine how we want the user agent to treat boxes, and specify which box model we want to use. Support for CSS3 at a level that will be possible is many many years away yet. Until then, we are stuck with the CSS2 box model, and while IE5 is still used by a significant percentage of the web's population, we are going to have problems with boxes.

Fri, 09 Jul 2004 11:48:54 +0100 http://www.addedbytes.com/articles/for-beginners/the-box-model-for-beginners/ Dave Child ,,,,,,,,
On-The-Fly Validation http://www.addedbytes.com/blog/code/on-the-fly-validation/ Of all coding errors in websites (usually highlighted by code validators), there are a few that crop up time and time again. These common coding bugs account for more 90% of the mistakes in web sites. Despite being so prevalent, most designers still allow these basic errors to creep into their code.

Note: Many people may consider "validation errors" unimportant. However, if you are going to write a web page in a specific language it makes sense to actually use that language properly, rather than making up your own random dialect. After all, can you be sure that that dialect will be interpreted the same way every single time? Any while many people find errors like this easy to ignore, they should remember that while they might not stop a page being usable, what validators bring up are "coding errors" - mistakes in the markup of the page.

These scripts are intended to make life a touch easier for busy developers. Of course, these scripts will slow down your site, and are no substitute for actually writing valid code in the first place. They are intended to catch the occasional bug that you may have missed, or that may be introduced through a comments system, for example.

In order to make use of the following code, you will need to be using PHP 4 or higher on Apache. The following scripts make use of output buffering and work best with a caching system in place as well.

This script will not remove all of your validation errors. It cannot remove them all without running very slowly - there is a lot to check in each document. However, it can catch a few of the more common bugs that most designers miss at least once in a website.

To begin, the scripts we use start output buffering. This means that rather than send the page to the user as it is created, the page is saved on the server until the server is told to output the page (or the script ends). This will allow us to modify page output without needing to worry about editing the PHP behind it. To start output buffering, you need to include the following code at the top of each page. You can include it using the "include()" or "require()" functions, or using htaccess's superb auto_prepend_file function (which you can see in use in this caching tutorial).


After the script has run, we need to include another script at the end to process the page and output it to the user. You can, again, use "include()" or "require()", or auto_append_file in htaccess to include this script.

The script itself runs in three steps. The first step, below, grabs the contents of the output buffer. This will create a variable called "$output" that contains the page we were about to send to the user. $output contains the HTML after all PHP has run as normal, so the variable literally only contains what the user would normally see. The second line empties the output buffer (but does not send its contents to the user).

$output = ob_get_contents();
$output = trim($output);

Now $output contains the page we are about to send the user, it is time to run the various checks we want, to make sure there are no validation errors in place.

if ((strpos($output, "<!DOCTYPE") > strpos($output, "<html")) or (strpos($output, "<!DOCTYPE") === false)) {
    $output = str_replace('$lt;html', "$lt;!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\n$lt;html", $output);

First, we check for a DTD. These are important, as they tell the user agent (e.g. the browser) what language a page is written in. The above checks for the presence of a DTD before the <html> tag, and if it is missing it adds in the DTD for HTML 4.01 Transitional - probably the most common one in use today.

function encode_chars($text) {
    $text = str_replace("<", "&lt;", $text);

    $tag_list = '((\/?)(!DOCTYPE|!--|a(bbr|cronym|ddress|pplet|rea)?|b(ase(font)?|do|ig|lockquote|ody|r|utton)?|c(aption|enter|ite|(o(de|l(group)?)))|d(d|el|fn|i(r|v)|l|t)|em|f(ieldset|o(nt|rm)|rame(set)?)|h(1|2|3|4|5|6|ead|r|tml)|i(frame|mg|n(put|s)|sindex)?|kbd|l(abel|egend|i(nk)?)|m(ap|e(nu|ta))|no(frames|script)|o(bject|l|pt(group|ion))|p(aram|re)?|q|s(amp|cript|elect|mall|pan|t(r(ike|ong)|yle)|u(b|p))|t(able|body|d|extarea|foot|h|itle|r|t)|u(l)?|var)([^>]*))';

    $text = preg_replace("/(&lt;)" . $tag_list . "(>)/mi", "<$2>", $text);
    $text = preg_replace("/(>[^<]*)>/mi", "$1&gt;", $text);
    $text = str_replace("/>", ">", $text);

    return $text;

$output = encode_chars($output);

Next, we run a function on the script to check the tags on the page. Any tags that don't belong there are encoded so they are displayed rather than processed. We use the HTML 4.01 tag list, which means we will catch the worst of the invalid tags.

$output = preg_replace("/<img([^>]*)alt=([^>]*)>/im", "<img$1`alt=$2>", $output);
$output = preg_replace("/<img([^`|>]*)>/im", "<img alt=\" \"$1>", $output);
$output = preg_replace("/<img([^>]*)`alt=([^>]*)>/im", "<img$1alt=$2>", $output);

This small snippet of code checks for alt attributes on images. If they are missing, it adds a single space as an alt attribute. This is by no means optimal (and the regex and technique is ugly - if anyone can improve on this, please give me a shout!), however does mean that if an alt attribute is missed, screen readers will not simply give the name of the image file. You should always take the greatest care to ensure that all images have appropriate alt attributes.

Next, we do a little language-specific work. In the above code, we removed all closing slashes (e.g. in a <br /> tag). Now, if we are using XHTML, we add them back in for the appropriate elements. We also check the case of elements if using XHTML, as tags and attributes must be lower case in XHTML. This will only affect attributes whose values are quoted.

function process_attributes($text) {
    return preg_replace("/ ([a-z]+)=\"([^( |\")]*)\"/mie", "' ' . strtolower('$1') . '=\"' . stripslashes('$2') . '\"'", $text);

if (strpos($output, "//W3C//DTD XHTML") !== false) {
    $output = encode_chars($output, "XHTML");
    $output = preg_replace("/<(img|hr|meta|link|br|base|frame|input)([^>]*)>/mi", "<$1$2 />", $output);
    $output = preg_replace("/<(\/?)([a-z]+)( |>)/mie", "'<$1' . strtolower('$2') . '$3'", $output);
    $output = preg_replace("/<([^>]+)>/mie", "'<'.process_attributes(stripslashes('$1')).'>'", $output);
    $output = preg_replace("/&(?!#?[xX]?(?:[0-9a-fA-F]+|\w{1,8});)/i", "&", $output);

We also, at the end, encode any ampersands that should be encoded. Many thanks to [url=http://www.shauninman.com]Shaun Inman[/url] for the last line.

Finally, we need to send the processed output to the user.

$output = str_replace("<b>", "<strong>", $output);
$output = str_replace("<i>", "<em>", $output);
$output = str_replace("</b>", "</strong>", $output);
$output = str_replace("</i>", "</em>", $output);
echo $output;

At this stage, the code sent to the user will have a valid Document Type Definition. All tags will be correctly closed whether using HTML or XHTML. All images will have alt attributes. If we're using XHTML, all tags and attributes will be lower case (as long as the attributes are quoted). All invalid opening and closing tags will have been encoded. All ampersands should be properly encoded. And for good measure, we've replaced all bold (<b>) and italic (<i>) tags with the proper <strong> and <em> tags.

If you put it all together, you get the following code to be included at the end of each script:


$output = ob_get_contents();
$output = trim($output);
function process_attributes($text) {
    return preg_replace("/ ([a-z]+)=\"([^( |\")]*)\"/mie", "' ' . strtolower('$1') . '=\"' . stripslashes('$2') . '\"'", $text);

function encode_chars($text) {
    $text = str_replace("<", "&lt;", $text);

    $tag_list = '((\/?)(!DOCTYPE|!--|a(bbr|cronym|ddress|pplet|rea)?|b(ase(font)?|do|ig|lockquote|ody|r|utton)?|c(aption|enter|ite|(o(de|l(group)?)))|d(d|el|fn|i(r|v)|l|t)|em|f(ieldset|o(nt|rm)|rame(set)?)|h(1|2|3|4|5|6|ead|r|tml)|i(frame|mg|n(put|s)|sindex)?|kbd|l(abel|egend|i(nk)?)|m(ap|e(nu|ta))|no(frames|script)|o(bject|l|pt(group|ion))|p(aram|re)?|q|s(amp|cript|elect|mall|pan|t(r(ike|ong)|yle)|u(b|p))|t(able|body|d|extarea|foot|h|itle|r|t)|u(l)?|var)([^>]*))';

    $text = preg_replace("/(&lt;)" . $tag_list . "(>)/mi", "<$2>", $text);
    $text = preg_replace("/(>[^<]*)>/mi", "$1&gt;", $text);
    $text = str_replace("/>", ">", $text);

    return $text;

if ((strpos($output, "<!DOCTYPE") > strpos($output, "<html")) or (strpos($output, "<!DOCTYPE") === false)) {
    $output = str_replace('<html', "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\n<html", $output);

$output = encode_chars($output);
$output = preg_replace("/<img([^>]*)alt=([^>]*)>/im", "<img$1`alt=$2>", $output);
$output = preg_replace("/<img([^`|>]*)>/im", "<img alt=\" \"$1>", $output);
$output = preg_replace("/<img([^>]*)`alt=([^>]*)>/im", "<img$1alt=$2>", $output);

if (strpos($output, "//W3C//DTD XHTML") !== false) {
    $output = preg_replace("/<(img|hr|meta|link|br|base|frame|input)([^>]*)>/mi", "<$1$2 />", $output);
    $output = preg_replace("/<(\/?)([a-z]+)( |>)/mie", "'<$1' . strtolower('$2') . '$3'", $output);
    $output = preg_replace("/<([^>]+)>/mie", "'<'.process_attributes(stripslashes('$1')).'>'", $output);
    $output = preg_replace("/&(?!#?[xX]?(?:[0-9a-fA-F]+|\w{1,8});)/i", "&", $output);

$output = str_replace("<b>", "<strong>", $output);
$output = str_replace("<i>", "<em>", $output);
$output = str_replace("</b>", "</strong>", $output);
$output = str_replace("</i>", "</em>", $output);
echo $output;


Thu, 01 Jul 2004 13:40:00 +0100 http://www.addedbytes.com/blog/code/on-the-fly-validation/ Dave Child ,,,,
Faux Columns for Liquid Layouts http://www.addedbytes.com/blog/code/faux-columns-for-liquid-layouts/ In January of 2004, Dan Cederholm (author of Web Standards Solutions) posted an article on AListApart entitled "Faux Columns". In it, he explained how designers can overcome a common problem in CSS-based designs.

The problem is one that usually rears its ugly head with two and three column designs (though for now, we'll just worry about two columns). If your two columns each have a different background color, how do you make the colours extend to the bottom of the page using css? Equal height columns are difficult to achieve using height and overflow properties. Each column will be of a different height, and you do not always know which is the taller of the two. It is all too easy to end up with a site where one column just doesn't extend all the way to the bottom of the page, where it should end.

CSS does actually include a rather nifty little tool that can be used to work around this problem, the "min-height" declaration, that allows you to specify a minimum height for an element - which you can use to ensure that one specific column is always larger than the other, allowing you to avoid this problem. Unfortunately (and perhaps unsurprisingly) Internet Explorer does not support this declaration, so in practice it isn't a useful solution to the problem.

Dan, in his article (which if you haven't read yet, I suggest you do before continuing), outlines a solution he uses on his own site. This solution involves tiling a background on the page, to give the appearance that there are distinct columns that extend the full length of the page. It's a simple but clever solution, and can be seen in use on a great many sites on the web.

During the recent redesign of this site, though, I came across a small problem. Though Dan's solution is perfect for fixed-width layouts, it just wouldn't work with a percentage-based, liquid layout. The problem was simple - a graphic cannot alter itself based upon the user's screen. If you come up with a background image like the below, and tile it on a page, the left hand column will need to be 200 pixels wide. If your column is 20% of the page, though, that could make it anything from 20 to 2000 pixels - and as a result your columns will rarely look as you intended. This makes equal height columns in liquid layouts a tricky proposal.

However, Dan's solution can be used to apply a background to a page when the layout is liquid, using background positioning.

Background positioning can allow us to give our background the appearance of being liquid. When you positioning a background using pixels, you position the top left corner of the image. When you position a background using percentages or keywords, you don't. Instead, you position a point within the image itself. For example, let's say we have a page and a simple background image. We use the following to set and position the background:

body { background-image: url("image.gif"); background-repeat: no-repeat; background-position: 25% 10%; }

The above will set "image.gif" as the background of the page. It will position the background 25% of the way across the page from the left, and 10% of the way down the page from the top. However, it is not the left hand corner that will appear at that point. It is the point 25% from the left hand side of the background image and 10% from the top that will appear at that point, like so:

Example of background positioning based upon percentages

We can use this to apply a background to a page that will give the illusion of a pair of columns, even though the columns are not of a fixed width. Let's say we have two columns (for now), of 25% and 75% width. We can create a simple image, 1 pixel tall by 2000 pixels wide (why so wide will become apparent shortly - but don't be afraid to go even larger if you wish - 4000 pixels wouldn't necessarily be a bad thing). We want the left column to be a nice shade of orange, and the right a nice shade of grey, with a black line to divide them. So the image needs to have a black line 25% of the way along, with everything to the left orange and everything to the right grey, like so (scaled for visibility):

Example of background image

Now, we position the background using the following CSS:

body { background-image: url("background.gif"); background-repeat: repeat-y; background-position: 25% 0; }

Now, if you were to draw a line down the page, 25% of the way across from the left hand side, then 25% of our background image would be to the left of that line and 75% to the right. If we use the 2000 pixel wide background mentioned earlier, and position it as above, we'll have an orange background for 25% of the page, a black line, then grey will fill the remaining 75% of the page. You can see an example of that here. If you resize your browser window, while viewing the example, you will see the columns expand and contract to maintain the same proportions of the columns. With a little imagination, and the use of partly transparent background images, you can create image borders between elements using the same technique.

A little markup and a little more CSS, and we can turn the above into a respectable liquid page, with columns that expand and contract with the users' windows, and always extend just as far as they are needed. A more complete example is here, and this technique is also in use at this very site [please note that this technique is in use in versions 3, 4 and 5 of this site, accessible through the footer], allowing the flexible navigation and content columns to always appear equal in length, despite the fact they almost never are.

Tue, 22 Jun 2004 17:31:13 +0100 http://www.addedbytes.com/blog/code/faux-columns-for-liquid-layouts/ Dave Child ,,,,,,,,,,,
DTDs for Beginners http://www.addedbytes.com/articles/for-beginners/dtds-for-beginners/ A DTD, Document Type Definition, is an identifying tag that belongs at the top of every web page. They perform a twofold task - on the one hand, they help you write valid code, and on the other they help browsers to render that code correctly. How, you may ask, can a tag do this? Read on.

You will doubtless be aware of the basic structure of a web page, if you are a designer. There are two main sections, within a document (you might not know it, but the <html> section can come after the <body> section, if you want):

<html> <head> Header Data </head> <body> Visible Content </body> </html>

The above will probably be very familiar to you. Hopefully, you will be aware of the hundreds of other tags available to you as well. But did you know that the following can be both correct and incorrect?

<P align="center">Text here</P>

In XHTML, which is case sensitive, the above is gibberish - a tag that has no meaning. In HTML, it's a centered paragraph.

How does a browser know which it should pay attention to? Should it assume that you are writing HTML, and render the above as a centered paragraph? If you were writing XHTML, then no, it shouldn't, because a mistake has been made. It should do nothing. But how does the browser know what language you are writing and how to display it?

This is where the DTD comes in handy. A DTD is a tag that is placed within a page before anything else (including white space). It must be at the very front of the very first line of every page you write. Once there, and correctly formed, your DTD can then tell the browser what language you are writing. If you have written your code correctly, and you have a proper DTD in place, the browser will then render your page according to the standards laid down by the W3C, in most cases.

If you do not include a DTD, browsers will use "Quirks Mode" to render a page. Designed to accomodate poor coding and old hacks, "Quirks Mode" gives you little control and if your page is rendered in "Quirks Mode", it will look different in every single browser. All the hard work you've put in to styling your page and creating a beautiful new site will seem to have been somewhat wasted when you realise that many people cannot see the page properly because it has rendered badly.

So a DTD is essential to any page. It is used by validators to determine what language you are writing, so it can check your code (and if you're a web designer who doesn't validate your code, you should change your ways ... or consider a new career). It is used by browsers to render pages. And if you know what language you are writing, you can use the correct tags and markup for each part of your page, to help it become semantically correct.

DTDs are formed of two parts, and look a bit like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

The above is a DTD for XHTML 1.1. Some languages have subtle variations (for example, HTML can be Strict (for well-written pages), Transitional (for pages with deprecated tags or not-quite-perfect code), or Frameset (for pages with Frames)), and some, like XHTML 1.1 do not. Each of these variations will have it's very own DTD as well. Some require upper case "HTML" and some lower case. Each of them can be found on the W3C site, and a short list of the common ones is included on the following page.

Adding a DTD

If you are creating a web page, your DTD should reflect the language you intend to use for that page. Add it before anything else - let it be the very first thing you write (before even empty lines), like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html lang="en">

It is difficult adding a DTD to a page after it has been written, especially if you are using CSS for the look and layout, as when a browser comes out of "Quirks Mode" and renders a page correctly, much of your positioning, as well as padding and margins, may look wrong.

If you are adding a DTD to a page that has already been written, you are going to have some problems. That's a fact. That doesn't make it any less worthwhile adding one. Apart from anything else, you can be very sure you'll remember to add it first on the next site you write. You may want to go for a "Transitional" DTD, which allows you some flexibility (Transitional DTDs allow deprecated tags, like "<b>", and are relatively forgiving, unlike Strict ones).

If you haven't validated a page before (and if you don't have a DTD, chances are you haven't been able to validate your code), you'll probably be looking at something like the HTML 4.0 Transitional DTD, below. Add that to the top of a page, and validate with any one of the many web page validation tools available. Fix the bugs you find, then look at your page in a browser. If things are wrong, it will be down to code - CSS or HTML - and should not take too long to correct.

Once each of your pages has a proper DTD, you can rest easy, knowing that programs fetching and displaying your pages know which language they are written in and which set of rules they conform to. If your page validates, you know the code you have written is pretty close to accurate, although of course a validator cannot check the semantics of a document. Last, you know that, as the browsers now know what language to display, there is a good chance that everyone visiting your site is seeing more or less the same thing.

Common DTDs

The following is a list of those DTDs used often on the web. The chances are that, if you do not currently have a DTD on your pages, it should be one of the below.

  • XHTML 1.1
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
  • XHTML 1.0 Strict
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  • XHTML 1.0 Transitional
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  • XHTML 1.0 Frameset
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
  • HTML 4.01 Strict
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
  • HTML 4.01 Transitional
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
  • HTML 4.01 Frameset
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
  • HTML 3.2
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
  • HTML 2.0
    <!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">

Thu, 29 Apr 2004 16:39:00 +0100 http://www.addedbytes.com/articles/for-beginners/dtds-for-beginners/ Dave Child ,,
Accesskeys for Beginners http://www.addedbytes.com/articles/for-beginners/accesskeys-for-beginners/ Accesskeys (also known as "Accelerator Keys", "Shortcut Keys" or "Access Keys") can be used in most browsers, and work as shortcuts to enable people to navigate a site using a keyboard. Every browser treats these differently, some shifting focus to the link specified, and some activating the link as though it were clicked on.

Using Accesskeys

Note that in the case of form elements, the accesskeys will always move focus rather than activate the element.

  • Internet Explorer
    Press ALT-X, where X is the Accesskey letter. For example, ALT-1 on this site will [i]shift focus[/i] to the link to the homepage.
  • Mozilla, Netscape and derivatives
    Press ALT-X, where X is the Accesskey letter. For example, ALT-1 on this site will [i]activate[/i] to the link to the homepage.
  • Opera
    Press SHIFT-ESC then the Accesskey, and this will [i]activate[/i] the link.

Why use Accesskeys?

Accesskeys are commonly listed as an accessibility item by many people, but more and more web users are discovering and using accesskeys, as they make navigation around the sites you use the most a little quicker and easier. They should be used, where possible for the simple reason they make a site easier to use for a wider range of people. If you aren't convinced, try not using your mouse for a day and seeing how easy you find using the web. With accesskeys you will find it much easier.

At the very least, they should be used for the major links within your site, such as the search box and home button. Though there is no standard set yet for accesskeys, the following is a list of the common numbers used for a few of the more common links:

  • 1 - Home
  • 2 - Skip Navigation
  • 4 - Search Input
  • 9 - Contact / Feedback
  • 0 - Accessibility Statement (if there is one)

Adding Accesskeys To Your Site

There is a short list of tags that support the "accesskey" attribute: <a>, <area>, <button>, <input>, <label>, <legend>, and <textarea>. It is added simply as a normal attribute, for example:

<a href="foo.htm" accesskey="F">

It is wise to also a visual indication of which letter is to be used as an accesskey on any one link, for example by underlining that letter within the link. Adding it to the title doesn't hurt either, for example:

<a href="index.htm" accesskey="h" title="Accesskey: H. Link to home page."><u>H</u>ome<a>

It is also wise to avoid picking accesskeys that conflict with special keys already in use in an application. Internet Explorer and Mozilla both use ALT then a letter for accesskeys and this can often create a conflict. It is wise to avoid all of the following letters, as these are all already in use within common browsers: [i]a, b, d, e, f, g, h, t, v, w[/i].

Try and keep your accesskeys consistent, too. If a user spots an accesskey on one page, they may not check on the next page to see if they can still use it on this page. For that reason, it is very important that, which ever accesskeys you do use, you use the same accesskeys on every single page, without fail.

Last, do not be afraid to advertise the fact you are using accesskeys. People will want to know about it, and will use them if they know they are there, so add a list of the ones you are using on your site to your help pages or your accessibility statement!

Tue, 16 Mar 2004 09:17:17 +0000 http://www.addedbytes.com/articles/for-beginners/accesskeys-for-beginners/ Dave Child ,,,,
US States Select Box http://www.addedbytes.com/blog/code/us-states-select-box/ This states select list was last updated on the 15th March 2004, and should be accurate for that date (and I should imagine, far beyond). If you spot any inaccuracies, please let me know.

This list is provided for you to use as you see fit, but please do not reproduce it elsewhere without creditting this site.

The list comes in two versions. The first is a list without the two-letter codes for each state, the second with the two letter codes.

Without State Abbreviation Codes

<select name="state"> <option>Alabama</option> <option>Alaska</option> <option>Arizona</option> <option>Arkansas</option> <option>California</option> <option>Colorado</option> <option>Connecticut</option> <option>Delaware</option> <option>Florida</option> <option>Georgia</option> <option>Hawaii</option> <option>Idaho</option> <option>Illinois</option> <option>Indiana</option> <option>Iowa</option> <option>Kansas</option> <option>Kentucky</option> <option>Louisiana</option> <option>Maine</option> <option>Maryland</option> <option>Massachusetts</option> <option>Michigan</option> <option>Minnesota</option> <option>Mississippi</option> <option>Missouri</option> <option>Montana</option> <option>Nebraska</option> <option>Nevada</option> <option>New Hampshire</option> <option>New Jersey</option> <option>New Mexico</option> <option>New York</option> <option>North Carolina</option> <option>North Dakota</option> <option>Ohio</option> <option>Oklahoma</option> <option>Oregon</option> <option>Pennsylvania</option> <option>Rhode Island</option> <option>South Carolina</option> <option>South Dakota</option> <option>Tennessee</option> <option>Texas</option> <option>Utah</option> <option>Vermont</option> <option>Virginia</option> <option>Washington</option> <option>West Virginia</option> <option>Wisconsin</option> <option>Wyoming</option> </select>

With State Abbreviation Codes

<select name="state"> <option value="AL">Alabama</option> <option value="AK">Alaska</option> <option value="AZ">Arizona</option> <option value="AR">Arkansas</option> <option value="CA">California</option> <option value="CO">Colorado</option> <option value="CT">Connecticut</option> <option value="DE">Delaware</option> <option value="FL">Florida</option> <option value="GA">Georgia</option> <option value="HI">Hawaii</option> <option value="ID">Idaho</option> <option value="IL">Illinois</option> <option value="IN">Indiana</option> <option value="IA">Iowa</option> <option value="KS">Kansas</option> <option value="KY">Kentucky</option> <option value="LA">Louisiana</option> <option value="ME">Maine</option> <option value="MD">Maryland</option> <option value="MA">Massachusetts</option> <option value="MI">Michigan</option> <option value="MN">Minnesota</option> <option value="MS">Mississippi</option> <option value="MO">Missouri</option> <option value="MT">Montana</option> <option value="NE">Nebraska</option> <option value="NV">Nevada</option> <option value="NH">New Hampshire</option> <option value="NJ">New Jersey</option> <option value="NM">New Mexico</option> <option value="NY">New York</option> <option value="NC">North Carolina</option> <option value="ND">North Dakota</option> <option value="OH">Ohio</option> <option value="OK">Oklahoma</option> <option value="OR">Oregon</option> <option value="PA">Pennsylvania</option> <option value="RI">Rhode Island</option> <option value="SC">South Carolina</option> <option value="SD">South Dakota</option> <option value="TN">Tennessee</option> <option value="TX">Texas</option> <option value="UT">Utah</option> <option value="VT">Vermont</option> <option value="VA">Virginia</option> <option value="WA">Washington</option> <option value="WV">West Virginia</option> <option value="WI">Wisconsin</option> <option value="WY">Wyoming</option> </select>

This list was compiled from a variety of sources, both online and offline, and no guarantee is made as to its accuracy. Please report any mistakes to dave@addedbytes.com.

Sat, 14 Feb 2004 14:14:00 +0000 http://www.addedbytes.com/blog/code/us-states-select-box/ Dave Child ,,,,,,,,,,
META Tags http://www.addedbytes.com/articles/online-marketing/meta-tags/ Last Updated: June 2008.

META tags are a way to describe a page in HTML, invisibly to the user. Many search engines either do not use them at all, or give them little weight. However, they still have their uses and can provide a boost to your search engine placement.

The trick to using them well is to understand what they do, and providing the best possible information within them. It is important to realise as well that changing or adding META tags will not turn your website into a gold mine overnight, but as part of a well formed SEO strategy, they can certainly help.

There are many people who say you should only ever add two or three META tags to your site. There are those who say you should add hundreds. The simple fact is that there are many that could be appropriate to your site, and you should judge each of them on its individual merit.

META tags all go within the HEAD section of your site. That is to say, within the <head> and </head> tags.

<html> <head> META Tags and Title go here </head> <body> Main page content goes here </body> </html>


The Title Tag

<title>Search Engine Optimization > Meta Tags - AddedBytes.com</title>

The TITLE tag is NOT a META tag. But it does contain metadata, and it is the most important tag on a page and is closely related to them, so I am including it here.

Title tags are displayed in the top of a browser window, and are often used as a link from search engine results listings, so form them well. They should be descriptive and short (ideally under 70 characters), and they are also often used as bookmark titles, so it is important that you ensure your primary keyword phrase for a page is here, and that the title makes sense all by itself.

The Description Tag

<meta name="description" content="An article about META tags and how to use them effectively to boost your search engine placement.">

This is one of the few META tags that can be considered important. The text within this is displayed by some search engines as the description to your site. A description tag should usually be kept to under around 150 - 200 characters and it is important to ensure that this tag reads well, and that it describes the page accurately.

There is no point in telling the user that your page contains thousands of pictures of Alicia Silverstone in lacy underwear if when they arrive on the page they see nothing but a sales pitch for tinned goulash. An extreme example, perhaps, but does demonstrate the point that it is better to have visitors who are interested in your product or content than those who aren't. Numbers are unimportant if they don't convert to sales, and this will help to qualify your visitors before they arrive.

The Keywords Tag

<meta name="keywords" content="meta tags search engine optimization description keywords title">

Fairly self explanatory, this tag is used to list keywords for your page. These are words you think are relevant to your page - words that if entered into a search engine should return your site. Search engines do not pay much attention to this, if any, as it has been abused for many years, but some do still use it to some small extent, so you may consider it worth adding.

Try to limit yourself to as few keywords as possible (the less keywords you list, the more weight each will likely have), certainly no more than 25, and list them with nothing more than spaces between (some people use commas, however this is no longer necessary). There is also no need to repeat the words listed.

As has been widely reported on the web, this tag is not used by many engines, if at all, and you would be wise to spend your time optimising and improving your site in other ways rather than waste time on this particular tag, in my humble opinion.

The Robots Tag

<meta name="robots" content="index, follow"> <meta name="robots" content="noindex, follow"> <meta name="robots" content="index, nofollow"> <meta name="robots" content="noindex, nofollow">

The ROBOTS META tag is one that is very often used when it should not be. The four variations listed above are four of the more common variations in use, and each accomplishes a different task. Never use this tag unless you wish to prevent a search engine spider from doing something. That's what it's there for.

The first of the examples listed above is completely worthless. If you have it on your site, go and delete it. That tag does nothing more than tell a search engine spider to behave exactly as it normally does. It does not benefit a site, does not get you crawled faster or more often, and will not suddenly make your site more popular than Google.

The second of these can be useful, for example on printer-friendly pages (where the content on the page is a duplicate of the original). This tag tells a search engine spider not to list the page it is viewing, but to follow the links away from the page anyway. The third of these is the reverse of the above, and tells a spider to list a page in it's results but not to follow the links on the page. Both of these have their uses, but these are very rare, so think carefully before adding these before you do.

The last tag tells a spider not to index a page or follow the links on it. It is extremely rare that you would want to use this (why would anyone want a page on the web that people cannot find?) but is included for the sake of completeness (some people use this for login pages or other similar pages they do not want listed).

There are more instructions you can add to this tag, the most notable of which is NOARCHIVE. This simply tells a search engine spider not to serve archived copies of the page to people viewing the search engine results (for example, Google offer a cached copy of sites in search results, and this will prevent Google from doing so). The tag to add to only prevent search engines making archived copies of your site publically available is:

<meta name="robots" content="noarchive">

The Content-Type, Content-Style-Type and Content-Language Tags

<meta http-equiv="content-type" content="text/html; charset=UTF-8"> <meta http-equiv="content-style-type" content="text/css"> <meta http-equiv="content-language" content="en-GB">

These are again quite common on some sites, and again have their uses. It is a wise idea if you are using an unusual language or style to mention it here, but by no means essential, as with most META tags. The W3C provide a more comprehensive resource for [url=http://www.w3.org/International/O-charset.html]character set information[/url], so if you do wish to use this, I recommend that as a good place to start reading.

Meta tags that use the "http-equiv" attribute rather than the "name" attribute, like these, allow you define within a document something that would usually be defined in HTTP headers (sent by your server). If you have no control over the headers sent with your web pages, but still need to define a content type or content style type (and so on), these are the tags you are looking for.

The Refresh Tag

<meta http-equiv="refresh" content="60"> <meta http-equiv="refresh" content="3; URL=http://www.addedbytes.com/">

Most useful on a chat page, or when a page has moved, this instructs a browser to refresh the page after a certain interval of seconds. If the second half of the content attribute is a URL, the refresh will take the user to the URL specified rather than simply refresh the current page. This can be, and sometimes is, used mischievously to prevent a user from clicking their back button to leave a page, something likely to annoy visitors enough that they may never return.

The Pragma Tag

<meta http-equiv="pragma" content="no-cache">

Not very widely used, this tag asks a browser not to cache a page. Though this can be useful if a page on your site is frequently updated (for example a news site or a forum), it will often just increase your bandwidth bills and slow down your users' browsing experience. There is also no guarantee that a browser will pay attention to it.

Interestingly enough though, Microsoft recommend that if you do want to use this, you add the tag in a second HEAD at the end of the document, like so:

<html> <head> <title>Document</title> </head> <body> Content </body> <head> <meta http-equiv="pragma" content="no-cache"> </head> </html>

The Revisit-After and Expires Tags

<meta name="Revisit-After" content="30 days"> <meta http-equiv="expires" content="Mon, 03 Nov 2003 01:23:45 GMT">

There are a huge number of sites that say you should add the first of these to your site, because it tells search engine spiders how often to index your page. Which is a common misconception. The tag was created by SearchBC, who have said they no longer use it. Originally, it was created as a tool to suggest to the spider how often a page should be indexed. Few have ever been able to agree on the format of the tag. At the end of the day, remember that the search engines do not care how often you want them to index your pages - they will index as and when they feel like it. Some are clever enough to have a rough idea of how often you update your site, and will make use of that. Some are not that bright, and will come around when the mood takes them.

Assuming you are happy for the spiders to index your site as often as possible, as most people are, you would do well to leave this out. The spiders will return to your site as often as they deem fit, and the only way to influence the frequency this occurs at is to just keep adding new content on a regular basis.

The "Expires" tag tells browsers and search engine spiders when the document should be considered expired. This is worth using, of course, if there is a date on which the relevant document will be no longer valid. However, at this time, the search engines will often drop the page from their index - you should use the "Expires" tag only if this is what you want.

Useless Tags

<meta name="generator" content="EditPlus2"> <meta name="copyright" content="AddedBytes.com"> <meta name="author" content="Dave Child">

A select few engines sometimes make small use of a select few of these, but most of these (and the others to be found on this [url=http://www.bauser.com/websnob/meta/useless.html]list of useless META tags[/url]) are better placed on a page, or not used at all. Most of these are added automatically by HTML editors, and some are added by over-zealous META tag addicts. In my opinion, these are best avoided, as they do little more than clutter up your code.

ICRA Label

<meta http-equiv="pics-label" content='(pics-1.1 "http://www.icra.org/ratingsv02.html" comment "ICRAonline EN v2.0" l gen true for "http://www.addedbytes.com" r (nz 1 vz 1 lc 1 oz 1 cz 1) "http://www.rsac.org/ratingsv01.html" l gen true for "http://www.addedbytes.com" r (n 0 s 0 v 0 l 1))'>

Last but not least, something a little more unusual. The ICRA (Internet Content Rating Association) is an ideal I am happy to support, as they provide a means for helping webmasters to identify their content as suitable (or not) for certain age groups.

Simply put, you can visit their [url=http://www.icra.org/_en/label/extended/]label generator[/url] and tell the generator what your site contains. That data can then be used to help keep any content not appropriate for young eyes away from them. The data is used by some search engines and some browsers can be set to avoid pages without labels.

Mon, 03 Nov 2003 13:06:00 +0000 http://www.addedbytes.com/articles/online-marketing/meta-tags/ Dave Child ,,,,,
HTML for Beginners http://www.addedbytes.com/articles/for-beginners/html-for-beginners/ Once upon a time, a long time ago, there were a few computers connected together in a small distributed network. This was the 60s, a time of paranoia and space exploration, rock and roll and flower power, and the network was called ARPANET, and was an effort to provide a means of communication that would survive a nuclear attack. Over time, as the Cold War came to a conclusion, ARPANET evolved and grew, and plans began to emerge for its development.

(Rather than bore you will all the specifics of the emergence of the internet, if you are interested you can have a look at this [url=http://www.zakon.org/robert/internet/timeline/]detailed internet timeline[/url], although rumours abound that [url=http://www.c2000.com/fun/html.htm]the internet in fact began long, long before that[/url].)

Within a decade, email had emerged (Elizabeth II sending her first email in 1976), and in 1984 DNS (the system that controls domain names) was created. In 1990, as ARPANET finally shut down, the World's first commercial Internet Service Provider came online. The internet as we know it was born.

About this time, a great deal of data was already being thrown around the internet. Unfortunately, with a great number of different systems being used to view it, much of that data was in simple text form, sent by email. Out of the need for a better system for formatting data to show on all these different systems in the same way, HTML was finally born.

It was in 1989 that [url=http://www.w3.org/People/Berners-Lee/]Tim Berners-Lee[/url] invented the World Wide Web. In 1994, the same year the first version of Netscape's Navigator browser was released, he founded the W3C, the organisation now responsible for the standards used on the web, including HTML. The next year saw competition in the browser market, with the release of Internet Explorer, and the browser wars of the 90s were begun.

HTML was created out of the need to ensure that data was usable no matter what system you were running, and only by having a standard definition of the language could that be achieved. Tim Berners-Lee's logical markup provides an excellent, portable way to do that, using nothing more than any simple text editor.

HTML stands for HyperText Markup Language, and is designed to do just that - provide a way of laying out data so that no matter the system used to view that data, be it a PC, PDA or a screen reader, the data is still usable. It is a language invented to define the structure of web pages, to allow you to create headings to your web documents, emphasize those points that require it, allow you to show tables of data and offer readers related resources.

HTML uses a system of logical tags to indicate the purpose of each area of a document. Each tag, and there are many, can give an entirely different meaning to any different part of a document. For example, the text "logical tags" above is enclosed in a pair of tags to indicate emphasis, like so: <em>logical tags</em> (the first tag in HTML indicating the start of the emphasized text, the second the end).

The chances are that, if you are using a current browser, the above text looked italic to you (if you were using a screen reader, that text may have been spoken in a slightly louder tone, or at a different pitch). HTML has been designed to only indicate that there is emphasis on those words, nothing more, and the form that emphasis takes can depend on the page designer or the tool used to view the page.

This is where HTML becomes slightly tricky. HTML is not a presentation language. That is to say, HTML is not written to provide a method to lay out a web page, or to specify colours, or spacing, or fonts. Neither, though, is it a programming language, being unable to process data or do any calculations.

However, in the mid to late 1990s, there was no other realistic way to achieve the effects that were required of the web at the time than to use HTML. Thus, the web design community started to work around problems they faced with a lack of a well-implemented method for achieving the look of pages that they required, and started to use workarounds and hacks.

Before long, people were using nested tables to lay out pages, and the look of text was being specified using hundreds of font tags spread throughout sites. Updating a web site became a nightmare, and designs had to be tested in a selection of browsers to ensure that all of the little quirks that those browsers had were catered to. Cross-browser compatibility was a phrase heard far too often in the 90s.

Fortunately, this phase is coming to an end. CSS (cascading style sheets) are being more widely used, thanks to better implementation of web standards by the browser manufacturers, allowing designers to use HTML for what it was intended - indicating purpose rather than defining style.

Designers are able to now separate the structure of their web pages from the presentation, making for a faster, more useful, more accessible internet. If you are considering learning HTML, now is a good time to start.

Sat, 01 Nov 2003 09:03:00 +0000 http://www.addedbytes.com/articles/for-beginners/html-for-beginners/ Dave Child ,