<?xml version="1.0" encoding="UTF-8" ?>

<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<channel>
			<title>Tagged with "code"</title>
			<link>http://www.addedbytes.com/feeds/tag-feed/</link>
			<description></description>
			<language>en</language>
			<copyright>Web Development in Brighton - Added Bytes 2006</copyright>
			<ttl>120</ttl>
			<item>
				<title>Writing Secure PHP, Part 4</title>
				<link>http://www.addedbytes.com/articles/writing-secure-php/writing-secure-php-4/</link>
				<description><![CDATA[ The fourth part of the <a href="http://www.addedbytes.com/articles/writing-secure-php/">Writing Secure PHP</a> series, covering cross-site scripting, cross-site request forgery and character encoding security issues. <p>In <a href="http://www.addedbytes.com/php/writing-secure-php/">Writing Secure PHP</a>, <a href="http://www.addedbytes.com/security/writing-secure-php-2/">Writing Secure PHP, Part 2</a> and <a href="http://www.addedbytes.com/security/writing-secure-php-3/">Writing Secure PHP, Part 3</a> I covered many of the common mistakes PHP developers make, and how to avoid some potential security problems. This article covers some of the more advanced security problems common to PHP on the web.</p>

<h3>Cross-Site Scripting (XSS)</h3>

<p>Cross-site scripting (often abbreviated to XSS) is a form of injection, where an attacker finds a way to have the target site display code they control. In its most basic form, this can be as simple as a site that allows HTML characters in usernames, where someone can specify a username like:</p>

<pre class="php">DaveChild&lt;script type="text/javascript" src="http://www.example.com/my_script.js"&gt;&lt;/script&gt;</pre>

<p>Now, whenever someone sees my username on the target site, the script I've added to my username will run. I could potentially use this to grab the person's login information, log their keystrokes - any number of nefarious activities.</p>

<p>As a developer, you can combat this type of attack by encoding or removing HTML characters (watch out for character encoding issues, as outlined next). Even better than stripping out unwanted characters is to allow a whitelist of safe characters in usernames and other fields. Be especially careful with e-commerce sites where you are listing orders in a CMS - an XSS vulnerability may allow an attacker to gain administrative access to your CMS. It is also important to turn off TRACE and TRACK support on the server, as if there is a vulnerability (and always assume that despite your best efforts there will be) these potentially allow an attacker to steal a user's cookie.</p>

<p>As a user you are also vulnerable to this sort of attack, and it is very difficult, at the moment, to make yourself safe against it. Vigilance is key, and to that end I have released a <a href="http://www.addedbytes.com/tools/xss-alarm-userscript/">userscript that warns you about third party scripts</a> (for users of GreaseMonkey, Opera or Chrome).</p>

<h3>Cross-Site Request Forgery (CSRF)</h3>

<p>Despite the similar name, CSRF is unconnected to XSS. CSRF is a form of attack where an authenticated user performs an action on a site without knowing it.</p>

<p>Let's assume that Jack is logged in to his bank, and has a cookie stored on his computer. Each time he sends an HTTP request to the bank (i.e., views a page or an image on a page) his browser sends the cookie along with the request so that the bank knows that it's him making the request.</p>

<p>Jill, meanwhile, runs a different website and has managed to get Jack to visit it. One of the items on the page is in fact loaded from the bank, for example in an iframe. The URL of the iframe or request contains instructions to the bank to transfer money from Jack's account to Jill's. Because the request is coming from Jack's computer, and includes his cookie, the bank assumes it is a legitimate request and the money is transferred.</p>

<p>This type of attack is extremely dangerous and virtually untracable. As a developer, your job is to protect against it, and the best way to do that is to remember <a href="http://www.addedbytes.com/php/writing-secure-php/">Rule Number One: Never, Ever Trust Your Users</a>. No matter how authenticated they are, do not assume every request was intended.</p>

<p>In practical PHP terms, you can combat CSRF with several relatively simple coding habits. Never let the user do anything with a GET request - always use POST. Confirm actions before performing them with a confirmation dialog on a separate page - and make sure <em>both</em> the original action button or link <em>and</em> the confirmation were clicked. Even better, have the user enter information like letters from their password on the confirmation page.</p>

<p>Add a randomly generated token to forms and verify its presence when a request is made. Use <a href="http://javascript.internet.com/page-details/break-frames.html">frame-breaking JavaScript</a>. Time-out sessions with a short timespan (think minutes, not hours). Encourage the user to log out when they've finished. Check the HTTP_REFERER header (it can be hidden, but is still worth checking as if it is a different domain to that expected it is definitely a CSRF request).</p>

<h3>Character Encoding</h3>

<p>Character encoding in PHP and associated database systems is worthy of its own series. In any one request, there may be more different character encodings in use than you might think.</p>

<p>For example, a single request and response (uploading a file to a server and writing information to a database) may involve all of the following differently items with different character encodings: the HTTP request headers, post data, PHP's default encoding, the PHP MySQL module, MySQL's default set, the set of each table being used, a file being opened and read, a new file being created and written, the response headers and the response body.</p>

<p>English-speaking developers generally don't have much cause to get embroiled in character encoding issues, and that results in a lot of developers with a serious lack of understanding of how character encodings work and fit together. For those that do have a reason to look at character encodings, usually that interest ends with the setting of the response's character set.</p>

<p>However, character sets are a fundamental part of all web development. English alone can exist in any one of a wide variety of sets, and developers are usually familiar with the most common two: ISO-8859-1 and UTF-8. Fewer are familiar with UCS-2, UTF-16 or windows-1252. Still fewer are familiar with commonly used alternative language sets (e.g, GB2312 for Chinese).</p>

<p>Which, in a very roundabout way, brings me on to the security pitfalls of character encodings. Where data is processed by PHP using one character set, but a database server uses a different character set, a character (or series of characters) deemed safe by PHP may in fact allow SQL injection against the database.</p>

<p>PHP security expert Chris Shiflett <a href="http://shiflett.org/blog/2006/jan/addslashes-versus-mysql-real-escape-string">has written about this issue</a> and included an example of how it can be exploited to allow SQL injection even where input is sanitized using addslashes().</p>

<p>The solution is to always <em>always</em> use mysql_real_escape_string() rather than addslashes() (or use prepared statements / stored procedures), and to explicitly state character sets at all stages of interaction. Ideally, use the same character set throughout your system (UTF-8 is recommended) and where PHP allows you to specify a character encoding for a function (e.g., htmlspecialchars() or htmlentities()), make use of it.</p>

<p>It's not just SQL that's vulnerable as a result of character encoding bugs. Cross-site scripting is possible even where HTML characters are escaped if character sets are not handled properly. Fortunately, once again that is simple to avoid by properly setting character encodings at all stages of the process and specifying character encoding for functions where possible.</p> <br><br>]]></description>
				<pubDate>Thu, 11 Sep 2008 13:11:14 +0100</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/articles/writing-secure-php/writing-secure-php-4/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=code&amp;start=0" class="ditto_tag" rel="tag">code</a>,<a href="/feeds/tag-feed/?tags=coding&amp;start=0" class="ditto_tag" rel="tag">coding</a>,<a href="/feeds/tag-feed/?tags=development&amp;start=0" class="ditto_tag" rel="tag">development</a>,<a href="/feeds/tag-feed/?tags=mysql&amp;start=0" class="ditto_tag" rel="tag">mysql</a>,<a href="/feeds/tag-feed/?tags=php&amp;start=0" class="ditto_tag" rel="tag">php</a>,<a href="/feeds/tag-feed/?tags=programming&amp;start=0" class="ditto_tag" rel="tag">programming</a>,<a href="/feeds/tag-feed/?tags=security&amp;start=0" class="ditto_tag" rel="tag">security</a>,<a href="/feeds/tag-feed/?tags=tips&amp;start=0" class="ditto_tag" rel="tag">tips</a>,<a href="/feeds/tag-feed/?tags=tutorial&amp;start=0" class="ditto_tag" rel="tag">tutorial</a>,<a href="/feeds/tag-feed/?tags=web&amp;start=0" class="ditto_tag" rel="tag">web</a>,<a href="/feeds/tag-feed/?tags=webdesign&amp;start=0" class="ditto_tag" rel="tag">webdesign</a>,<a href="/feeds/tag-feed/?tags=webdev&amp;start=0" class="ditto_tag" rel="tag">webdev</a>
			</item>

			<item>
				<title>PHP Querystring Functions</title>
				<link>http://www.addedbytes.com/blog/code/php-querystring-functions/</link>
				<description><![CDATA[ <p>Adding and removing variables to and from URLs using PHP can be a relatively simple process admittedly, but I have a couple of functions I use often to make the process even less time-consuming.</p><br />
<br />
<h3>Add Querystring Variable</h3><br />
<br />
<p>A PHP function that will add the querystring variable $key with a value $value to $ur</p> <p>Adding and removing variables to and from URLs using PHP can be a relatively simple process admittedly, but I have a couple of functions I use often to make the process even less time-consuming.</p>

<h3>Add Querystring Variable</h3>

<p>A PHP function that will add the querystring variable $key with a value $value to $url. If $key is already specified within $url, it will replace it.</p>

<pre class="php">function add_querystring_var($url, $key, $value) {
    $url = preg_replace('/(.*)(?|&amp;)' . $key . '=[^&amp;]+?(&amp;)(.*)/i', '$1$2$4', $url . '&amp;');
    $url = substr($url, 0, -1);
    if (strpos($url, '?') === false) {
        return ($url . '?' . $key . '=' . $value);
    } else {
        return ($url . '&amp;' . $key . '=' . $value);
    }
}</pre>

<h3>Remove Querystring Variable</h3>

<p>A PHP function that will remove the variable $key and its value from the given $url.</p>

<pre class="php">function remove_querystring_var($url, $key) {
    $url = preg_replace('/(.*)(?|&amp;)' . $key . '=[^&amp;]+?(&amp;)(.*)/i', '$1$2$4', $url . '&amp;');
    $url = substr($url, 0, -1);
    return ($url);
}</pre> <br><br>]]></description>
				<pubDate>Tue, 05 Dec 2006 15:41:30 +0000</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/blog/code/php-querystring-functions/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=code&amp;start=0" class="ditto_tag" rel="tag">code</a>,<a href="/feeds/tag-feed/?tags=development&amp;start=0" class="ditto_tag" rel="tag">development</a>,<a href="/feeds/tag-feed/?tags=functions&amp;start=0" class="ditto_tag" rel="tag">functions</a>,<a href="/feeds/tag-feed/?tags=links&amp;start=0" class="ditto_tag" rel="tag">links</a>,<a href="/feeds/tag-feed/?tags=php&amp;start=0" class="ditto_tag" rel="tag">php</a>,<a href="/feeds/tag-feed/?tags=programming&amp;start=0" class="ditto_tag" rel="tag">programming</a>,<a href="/feeds/tag-feed/?tags=querystring&amp;start=0" class="ditto_tag" rel="tag">querystring</a>,<a href="/feeds/tag-feed/?tags=reference&amp;start=0" class="ditto_tag" rel="tag">reference</a>,<a href="/feeds/tag-feed/?tags=tips&amp;start=0" class="ditto_tag" rel="tag">tips</a>,<a href="/feeds/tag-feed/?tags=url&amp;start=0" class="ditto_tag" rel="tag">url</a>,<a href="/feeds/tag-feed/?tags=variable&amp;start=0" class="ditto_tag" rel="tag">variable</a>
			</item>

			<item>
				<title>RSS to iCal</title>
				<link>http://www.addedbytes.com/blog/rss-to-ical/</link>
				<description><![CDATA[ <p>I have been looking for a way to convert the BBC weather feed for my area to iCal, so I can subscribe to it. It's date-based, after all, and RSS never seemed to me to be an appropriate format for subscribing to weather information. iCal always struck me as being "better" for that purpose.</p> <p>I have been looking for a way to convert the BBC weather feed for my area to iCal, so I can subscribe to it. It's date-based, after all, and RSS never seemed to me to be an appropriate format for subscribing to weather information. iCal always struck me as being "better" for that purpose. Of course, the BBC only have an RSS feed for local weather. What I needed was a converter.</p>

<p>After some hunting, I discovered that Dean Sanvitale had written a PHP script to convert RSS feeds to iCal format. However, his site (codent.com) appears to be long since abandoned and the script is no longer available from there. Fortunately, the Wayback Machine did have a copy. Dean originally released the script under a <a href="http://creativecommons.org/licenses/by-sa/1.0/">Creative Commons License</a> which, fortunately, allows me to make the script available to download from this site (note: the script is available from this site under the same license).</p>

<p>So, if you're looking for a way to convert an RSS feed to iCal, this PHP script will do the job. Thanks Dean!</p>

<p>Source: <a href="http://www.addedbytes.com/rss2ical.txt">rss2ical.txt</a></p> <br><br>]]></description>
				<pubDate>Thu, 19 Oct 2006 12:14:16 +0100</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/blog/rss-to-ical/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=bbc&amp;start=0" class="ditto_tag" rel="tag">bbc</a>,<a href="/feeds/tag-feed/?tags=code&amp;start=0" class="ditto_tag" rel="tag">code</a>,<a href="/feeds/tag-feed/?tags=convert&amp;start=0" class="ditto_tag" rel="tag">convert</a>,<a href="/feeds/tag-feed/?tags=ical&amp;start=0" class="ditto_tag" rel="tag">ical</a>,<a href="/feeds/tag-feed/?tags=php&amp;start=0" class="ditto_tag" rel="tag">php</a>,<a href="/feeds/tag-feed/?tags=rss&amp;start=0" class="ditto_tag" rel="tag">rss</a>,<a href="/feeds/tag-feed/?tags=rss2ical&amp;start=0" class="ditto_tag" rel="tag">rss2ical</a>,<a href="/feeds/tag-feed/?tags=tools&amp;start=0" class="ditto_tag" rel="tag">tools</a>,<a href="/feeds/tag-feed/?tags=weather&amp;start=0" class="ditto_tag" rel="tag">weather</a>,<a href="/feeds/tag-feed/?tags=web&amp;start=0" class="ditto_tag" rel="tag">web</a>,<a href="/feeds/tag-feed/?tags=webdev&amp;start=0" class="ditto_tag" rel="tag">webdev</a>
			</item>

			<item>
				<title>Preload Images with CSS</title>
				<link>http://www.addedbytes.com/blog/code/preloading-images-with-css/</link>
				<description><![CDATA[ How to preload images using CSS and so avoid delays with rollover effects. <p>As support for CSS improves, pseudo-selectors like :hover, :active and :focus will become more widely used. Already :hover is in use on many sites to provide rollover states to buttons, as on this site (the menu bar). The other pseudo selectors will, in time, give far more opportunities for the use of rollover images.</p>

<p>One potential problem with image rollovers, though, is that in order for an image to be displayed, it must be downloaded. Consequently, for rollovers to work smoothly and quickly, all the necessary images must be already available on the user's PC. Otherwise, the rollovers will behave badly, like in this <a href="http://www.addedbytes.com/css_preload/">example using large images</a>.</p>

<p>Until recently, rollover effects were achieved through use of JavaScript, and as a result, a plethora of solutions to the preloading image problem in JavaScript are available. However, using JavaScript to preload images, though not a bad idea when using JavaScript to control rollovers, becomes less bright when it is CSS that's controlling them. A user could very easily (and this is becoming more common) have a CSS-capable browser without JavaScript support or with JavaScript turned off.</p>

<p>So there's a clear need for a way to use CSS to preload images or find another way to avoid the problem. Which gives us two relatively simple solutions to our problem.</p>

<p>The first solution is to create a single background image for your element that actually contains both the rollover and non-rollover images, and then position is using the background-position CSS property. Instead of changing the image when the mouse moves over the element, you can simply change the background-position to reveal the previously hidden rollover image. There's a more detailed <a href="http://wellstyled.com/css-nopreload-rollovers.html">explanation of this technique</a> over at WellStyled.com.</p>

<p>The other option available to you is to trick the browser into downloading the image before it is required for the rollover. This can be done by applying the image as a background to an element, and then hiding it using the background-position property. The image will then be downloaded but will not be displayed. Then, when the rollover is activated, it will operate smoothly and instantly.</p>

<p>First, you need to select an element that doesn't currently have a background image. If so select an element that does have a background image, you will either end up not preloading the image you are after, or you will prevent the element's normal background displaying. Neither is ideal.</p>

<p>Once you have picked an element to use for this purpose, you need to add the background image. The following CSS can be applied to the element and will place the background image outside the viewable area of the image:</p>

<code>background-image: url("rollover_image.png");
background-repeat: no-repeat;
background-position: -1000px -1000px;</code>

<p>Your rollover image will then be loaded when the page itself is initially loaded, along with the other images. When a rollover is then activated, the image will already be available to the browser and the effect will be instant.</p> <br><br>]]></description>
				<pubDate>Thu, 23 Dec 2004 10:13:43 +0000</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/blog/code/preloading-images-with-css/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=code&amp;start=0" class="ditto_tag" rel="tag">code</a>,<a href="/feeds/tag-feed/?tags=css&amp;start=0" class="ditto_tag" rel="tag">css</a>,<a href="/feeds/tag-feed/?tags=design&amp;start=0" class="ditto_tag" rel="tag">design</a>,<a href="/feeds/tag-feed/?tags=howto&amp;start=0" class="ditto_tag" rel="tag">howto</a>,<a href="/feeds/tag-feed/?tags=image&amp;start=0" class="ditto_tag" rel="tag">image</a>,<a href="/feeds/tag-feed/?tags=images&amp;start=0" class="ditto_tag" rel="tag">images</a>,<a href="/feeds/tag-feed/?tags=preload&amp;start=0" class="ditto_tag" rel="tag">preload</a>,<a href="/feeds/tag-feed/?tags=rollover&amp;start=0" class="ditto_tag" rel="tag">rollover</a>,<a href="/feeds/tag-feed/?tags=web&amp;start=0" class="ditto_tag" rel="tag">web</a>,<a href="/feeds/tag-feed/?tags=webdesign&amp;start=0" class="ditto_tag" rel="tag">webdesign</a>,<a href="/feeds/tag-feed/?tags=webdev&amp;start=0" class="ditto_tag" rel="tag">webdev</a>
			</item>

			<item>
				<title>View Page Structure</title>
				<link>http://www.addedbytes.com/blog/code/view-page-structure/</link>
				<description><![CDATA[ A tool that outputs the structure of a page. Makes working with CSS (especially resolving inheritance issues) much easier. <p>A couple of days ago, I was having a little CSS trouble. In the end, it turned out that I had set a property of an element "above" in the document tree, and the problematic element was inheriting that property.</p>

<p>It struck me that it would be easier to work through this kind of CSS problem with some kind of simple tool to show how a page was put together. If I could see all the tags on the page in a nested format, with parent and child relationships obvious, and without all the text getting in the way, my life would be easier.</p>

<p>So, I put together this tool. In simple terms, it will fetch a page from a web server and output the tags within the page in a nested list. The JavaScript side of it will also highlight children of an element when you hover over it.</p>

<p>Classes and ids attributes are highlighted, as are tag names. Class and ID names, though, must be enclosed in quotation marks to be highlighted. Text, closing tags and line breaks are not shown. Though I can understand some people may find it useful to see text, I found it made the tree too large to be usable.</p>

<p>I've used it a few times, and am quickly finding it saves quite a lot of time solving simple CSS problems or conflicts. Which is exactly what it was supposed to do. Enjoy!</p>

<h3>Highlighting Issues</h3>

<p>When writing the tool, I came across a fairly unusual problem. I wanted, when the mouse was over an element, to highlight its children. However, this cannot be done with CSS (at least, I couldn't think of a way to make it work).</p>

<p>The problem with the CSS was that whenever you hover over an element, you are also hovering over its parents. So they, and their children, are highlighted - meaning everything is highlighted. For this reason, the highlighting of elements uses JavaScript.</p>

<h3>How to Use</h3>

<p>The page structure tool is written to accept a URL either by GET or POST. You can therefore use it one of two ways.</p>

<p>First, you can use the tool by visiting the URL below, replacing "##url##" with the address of the page you want to view:</p>

<p>http://www.addedbytes.com/view_structure.php?url=##url##</p>

<p>Alternatively, you can use the following form to submit an address to the page:</p>

<form action="http://www.addedbytes.com/view_structure.php" method="post"><label for="viewstrucinput">Enter URL</label> <input id="viewstrucinput" name="url" type="text" /> <input type="submit" value="View" /></form>

<h3>Bookmarklet</h3>

<p>To make life a little easier, I've coded a quick JavaScript bookmarklet for you to use, that, when activated, will automatically submit the URL of the page you are viewing to the tool. Simply copy or drag the link below to your links bar, your favourites folder or anywhere else you wish:</p>

<ul><li><a href="javascript:void(location.href='http://www.addedbytes.com/view_structure.php?url='+location.href);">View Page Structure</a></li></ul>

<h3>Notes</h3>

<ul><li>This tool works best with valid code, especially XHTML.</li><li>A certain amount of basic code improvement is done before processing (for example all empty tags are automatically closed).</li><li>Sites with non-empty tags that aren't closed properly may not show up correctly.</li><li>Sites with large amounts of nested code should still show up, but it may be difficult to view the output.</li></ul>

<h3>Example</h3>

<p>If you want to see an example of the output of this tool, you can view the  <a href="http://www.addedbytes.com/view_structure.php?url=http://www.addedbytes.com">structure for AddedBytes.com</a>.</p> <br><br>]]></description>
				<pubDate>Tue, 12 Oct 2004 16:24:00 +0100</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/blog/code/view-page-structure/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=cheatsheet&amp;start=0" class="ditto_tag" rel="tag">cheatsheet</a>,<a href="/feeds/tag-feed/?tags=code&amp;start=0" class="ditto_tag" rel="tag">code</a>,<a href="/feeds/tag-feed/?tags=css&amp;start=0" class="ditto_tag" rel="tag">css</a>,<a href="/feeds/tag-feed/?tags=design&amp;start=0" class="ditto_tag" rel="tag">design</a>,<a href="/feeds/tag-feed/?tags=imported&amp;start=0" class="ditto_tag" rel="tag">imported</a>,<a href="/feeds/tag-feed/?tags=resources&amp;start=0" class="ditto_tag" rel="tag">resources</a>,<a href="/feeds/tag-feed/?tags=tool&amp;start=0" class="ditto_tag" rel="tag">tool</a>,<a href="/feeds/tag-feed/?tags=tools&amp;start=0" class="ditto_tag" rel="tag">tools</a>,<a href="/feeds/tag-feed/?tags=useful&amp;start=0" class="ditto_tag" rel="tag">useful</a>,<a href="/feeds/tag-feed/?tags=webdesign&amp;start=0" class="ditto_tag" rel="tag">webdesign</a>,<a href="/feeds/tag-feed/?tags=xhtml&amp;start=0" class="ditto_tag" rel="tag">xhtml</a>
			</item>

			<item>
				<title>Writing Secure PHP, Part 1</title>
				<link>http://www.addedbytes.com/articles/writing-secure-php/writing-secure-php-1/</link>
				<description><![CDATA[ Learn how to avoid some of the most common mistakes in PHP, and so make your sites more secure. <p><a href="http://www.php.net">PHP</a> is a very easy language to learn, and many people without any sort of background in programming learn it as a way to add interactivity to their web sites. Unfortunately, that often means PHP programmers, especially those newer to web development, are unaware of the potential security risks their web applications can contain. Here are a few of the more common security problems and how to avoid them.</p>

<p>[Writing Secure PHP is a series. <a href="http://www.addedbytes.com/php/writing-secure-php-2/">Part 2</a>, <a href="http://www.addedbytes.com/php/writing-secure-php-3/">Part 3</a> and <a href="http://www.addedbytes.com/php/writing-secure-php-4/">Part 4</a> are currently also available.]</p>

<h3>Rule Number One: Never, Ever, Trust Your Users</h3>

<p>It can never be said enough times, you should never, ever, ever trust your users to send you the data you expect. I have heard many people respond to that with something like "Oh, nobody malicious would be interested in my site". Leaving aside that that could not be more wrong, it is not always a malicious user who can exploit a security hole - problems can just as easily arise because of a user unintentionally doing something wrong.</p>

<p>So the cardinal rule of all web development, and I can't stress it enough, is: <strong>Never, Ever, Trust Your Users</strong>. Assume every single piece of data your site collects from a user contains malicious code. Always. That includes data you think you have checked with client-side validation, for example using JavaScript. If you can manage that, you'll be off to a good start. If PHP security is important to you, this single point is the most important to learn. Personally, I have a "PHP Security" sheet next to my desk with major points on, and this is in large bold text, right at the top.</p>

<h3>Global Variables</h3>

<p>In many languages you must explicitly create a variable in order to use it. In PHP, there is an option, "register_globals", that you can set in php.ini that allows you to use global variables, ones you do not need to explicitly create. </p>

<p>Consider the following code:</p>

<pre class="php">if ($password == "my_password") {
    $authorized = 1;
}

if ($authorized == 1) {
    echo "Lots of important stuff.";
}</pre>

<p>To many that may look fine, and in fact this exact type of code is in use all over the web. However, if a server has "register_globals" set to on, then simply adding "?authorized=1" to the URL will give anyone free access to exactly what you do not want everyone to see. This is one of the most common PHP security problems.</p>

<p>Fortunately, this has a couple of possible simple solutions. The first, and perhaps the best, is to set "register_globals" to off. The second is to ensure that you only use variables that you have explicitly set yourself. In the above example, that would mean adding "$authorized = 0;" at the beginning of the script:</p>

<pre class="php">$authorized = 0;
if ($password == "my_password") {
    $authorized = 1;
}

if ($authorized == 1) {
    echo "Lots of important stuff.";
}</pre>

<h3>Error Messages</h3>

<p>Errors are a very useful tool for both programmer and hacker. A developer needs them in order to fix bugs. A hacker can use them to find out all sorts of information about a site, from the directory structure of the server to database login information. If possible, it is best to turn off all error reporting in a live application. PHP can be told to do this through .htaccess or php.ini, by setting "error_reporting" to "0". If you have a development environment, you can set a different error reporting level for that.</p>

<h3>SQL Injection</h3>

<p>One of PHP's greatest strengths is the ease with which it can communicate with databases, most notably <a href="http://www.mysql.com">MySQL</a>. Many people make extensive use of this, and a great many sites, including this one, rely on databases to function.</p>

<p>However, as you would expect, with that much power there are potentially huge security problems you can face. Fortunately, there are plenty of solutions. The most common security hazard faced when interacting with a database is that of SQL Injection - when a user uses a security glitch to run SQL queries on your database.</p>

<p>Let's use a common example. Many login systems feature a line that looks a lot like this when checking the username and password entered into a form by a user against a database of valid username and password combinations, for example to control access to an administration area:</p>

<pre class="php">$check = mysql_query("SELECT Username, Password, UserLevel FROM Users WHERE Username = '".$_POST['username']."' and Password = '".$_POST['password']."'");</pre>

<p>Look familiar? It may well do. And on the face of it, the above does not look like it could do much damage. But let's say for a moment that I enter the following into the "username" input box in the form and submit it:</p>

<pre class="php">' OR 1=1 #</pre>

<p>The query that is going to be executed will now look like this:</p>

<pre class="sql">SELECT Username, Password FROM Users WHERE Username = '' OR 1=1 #' and Password = ''</pre>

<p>The hash symbol (#) tells MySQL that everything following it is a comment and to ignore it. So it will actually only execute the SQL up to that point. As 1 always equals 1, the SQL will return all of the usernames and passwords from the database. And as the first username and password combination in most user login databases is the admin user, the person who simply entered a few symbols in a username box is now logged in as your website administrator, with the same powers they would have if they actually knew the username and password.</p>

<p>With a little creativity, the above can be exploited further, allowing a user to create their own login account, read credit card numbers or even wipe a database clean.</p>

<p>Fortunately, this type of vulnerability is easy enough to work around. By checking for apostrophes in the items we enter into the database, and removing or neutralising them, we can prevent anyone from running their own SQL code on our database. The function below would do the trick:</p>

<pre class="php">function make_safe($variable) {
    $variable = mysql_real_escape_string(trim($variable));
    return $variable;
}</pre>

<p>Now, to modify our query. Instead of using _POST variables as in the query above, we now run all user data through the make_safe function, resulting in the following code:</p>

<pre class="php">$username = make_safe($_POST['username']);
$password = make_safe($_POST['password']);
$check = mysql_query("SELECT Username, Password, UserLevel FROM Users WHERE Username = '".$username."' and Password = '".$password."'");</pre>

<p>Now, if a user entered the malicious data above, the query will look like the following, which is perfectly harmless. The following query will select from a database where the username is equal to "\' OR 1=1 #".</p>

<pre class="sql">SELECT Username, Password, UserLevel FROM Users WHERE Username = '\' OR 1=1 #' and Password = ''</pre>

<p>Now, unless you happen to have a user with a very unusual username and a blank password, your malicious attacker will not be able to do any damage at all. It is important to check all data passed to your database like this, however secure you think it is. HTTP Headers sent from the user can be faked. Their referral address can be faked. Their browsers User Agent string can be faked. Do not trust a single piece of data sent by the user, though, and you will be fine.</p>

<h3>File Manipulation</h3>

<p>Some sites currently running on the web today have URLs that look like this:</p>

<pre class="php">index.php?page=contactus.html</pre>

<p>The "index.php" file then simply includes the "contactus.html" file, and the site appears to work. However, the user can very easily change the "contactus.html" bit to anything they like. For example, if you are using <a href="http://www.apache.org/">Apache</a>'s mod_auth to protect files and have saved your password in a file named ".htpasswd" (the conventional name), then if a user were to visit the following address, the script would output your username and password:</p>

<pre class="php">index.php?page=.htpasswd</pre>

<p>By changing the URL, on some systems, to reference a file on another server, they could even run PHP that they have written on your site. Scared? You should be. Fortunately, again, this is reasonably easy to protect against. First, make sure you have correctly set "open_basedir" in your php.ini file, and have set "allow_url_fopen" to "off". That will prevent most of these kinds of attacks by preventing the inclusion of remote files and system files. Next, if you can, check the file requested against a list of valid files. If you limit the files that can be accessed using this script, you will save yourself a lot of aggravation later.</p>

<h3>Using Defaults</h3>

<p>When MySQL is installed, it uses a default username of "root" and blank password. SQL Server uses "sa" as the default user with a blank password. If someone finds the address of your database server and wants to try to log in, these are the first combinations they will try. If you have not set a different password (and ideally username as well) than the default, then you may well wake up one morning to find your database has been wiped and all your customers' credit card numbers stolen. The same applies to all software you use - if software comes with default username or password, change them.</p>

<h3>Leaving Installation Files Online</h3>

<p>Many PHP programs come with installation files. Many of these are self-deleting once run, and many applications will refuse to run until you delete the installation files. Many however, will not pay the blindest bit of attention if the install files are still online. If they are still online, they may still be usable, and someone may be able to use them to overwrite your entire site.</p>

<h3>Predictability</h3>

<p>Let us imagine for a second that your site has attracted the attention of a Bad Person. This Bad Person wants to break in to your administration area, and change all of your product descriptions to "This Product Sucks". I would hazard a guess that their first step will be to go to http://www.yoursite.com/admin/ - just in case it exists. Placing your sensitive files and folders somewhere predictable like that makes life for potential hackers that little bit easier.</p>

<p>With this in mind, make sure you name your sensitive files and folders so that they are tough to guess. Placing your admin area at http://www.yoursite.com/jsfh8sfsifuhsi8392/ might make it harder to just type in quickly, but it adds an extra layer of security to your site. Pick something memorable by all means if you need an address you can remember quickly, but don't pick "admin" or "administration" (or your username or password). Pick something unusual.</p>

<p>The same applies to usernames and passwords. If you have an admin area, do not use "admin" as the username and "password" as the password. Pick something unusual, ideally with both letters and numbers (some hackers use something called a "dictionary attack", trying every word in a dictionary as a password until they find a word that works - adding a couple of digits to the end of a password renders this type of attack useless). It is also wise to change your password fairly regularly (every month or two).</p>

<p>Finally, make sure that your error messages give nothing away. If your admin area gives an error message saying "Unknown Username" when a bad username is entered and "Wrong Password" when the wrong password is entered, a malicious user will know when they've managed to guess a valid username. Using a generic "Login Error" error message for both of the above means that a malicious user will have no idea if it is the username or password he has entered that is wrong.</p>

<h3>Finally, Be Completely and Utterly Paranoid</h3>

<p>If you assume your site will never come under attack, or face any problems of any sort, then when something eventually does go wrong, you will be in massive amounts of trouble. If, on the other hand, you assume every single visitor to your site is out to get you and you are permanently at war, you will help yourself to keep your site secure, and be prepared in case things should go wrong.</p>

<p><em>Ready for more? Try <a href="http://www.addedbytes.com/security/writing-secure-php-2/">Writing Secure PHP, Part 2</a>.</em></p> <br><br>]]></description>
				<pubDate>Fri, 16 Jul 2004 10:07:15 +0100</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/articles/writing-secure-php/writing-secure-php-1/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=code&amp;start=0" class="ditto_tag" rel="tag">code</a>,<a href="/feeds/tag-feed/?tags=coding&amp;start=0" class="ditto_tag" rel="tag">coding</a>,<a href="/feeds/tag-feed/?tags=development&amp;start=0" class="ditto_tag" rel="tag">development</a>,<a href="/feeds/tag-feed/?tags=mysql&amp;start=0" class="ditto_tag" rel="tag">mysql</a>,<a href="/feeds/tag-feed/?tags=php&amp;start=0" class="ditto_tag" rel="tag">php</a>,<a href="/feeds/tag-feed/?tags=programming&amp;start=0" class="ditto_tag" rel="tag">programming</a>,<a href="/feeds/tag-feed/?tags=security&amp;start=0" class="ditto_tag" rel="tag">security</a>,<a href="/feeds/tag-feed/?tags=tips&amp;start=0" class="ditto_tag" rel="tag">tips</a>,<a href="/feeds/tag-feed/?tags=tutorial&amp;start=0" class="ditto_tag" rel="tag">tutorial</a>,<a href="/feeds/tag-feed/?tags=web&amp;start=0" class="ditto_tag" rel="tag">web</a>,<a href="/feeds/tag-feed/?tags=webdesign&amp;start=0" class="ditto_tag" rel="tag">webdesign</a>,<a href="/feeds/tag-feed/?tags=webdev&amp;start=0" class="ditto_tag" rel="tag">webdev</a>
			</item>

			<item>
				<title>Gunning-Fog Index</title>
				<link>http://www.addedbytes.com/blog/code/gunning-fog-function/</link>
				<description><![CDATA[ The Gunning-Fog Index is a measure of text readability based upon sentence length and difficult words in a passage. <p><strong><span style="color: #f00;">PLEASE NOTE:</span> This code is now considered out of date. An updated version has been released under an open source license as a Google Code project: <a href="http://code.google.com/p/php-text-statistics/">php-text-statistics</a>. There is more about this change in the post <a href="http://www.addedbytes.com/blog/readability-code-open-sourced/">Readability Code Open Sourced</a>.</strong></p>

<p>A tool for <a href="http://www.readability-score.com/">checking the readability scores of text</a> is available - this article covers the functions behind that tool.</p>

<p>The Gunning-Fog index is a measure of text readability. It represents the approximate reading age of the text - the age someone will need to be to understand what they are reading.</p>

<p>The following is the algorithm to determine the Gunning-Fog index:</p>

<code>(average_words_sentence + percentage_of_words_with_more_than_three_syllables) * 0.4</code>

<p>The above produces a number, which is a rough measure of the age someone must be to understand the content. The lower the number, the more understandable the content will be to your visitors. Web sites should aim to have content that falls roughly in the 11-15 range for this test.</p>

<p>Any number returned over the value of 22 can be taken to be just 22, and is roughly equivalent to post-graduate level.</p>

<p>Below are a selection of function you can use to determine the Gunning-Fog index of text. To calculate this, all you need to is call the function as follows, where $text is the text you wish to measure the readability of.</p>

<code>$gunning_fog_score = gunning_fog_score($text);</code>

<code>function gunning_fog_score($text) {
    return ((average_words_sentence($text) + percentage_number_words_three_syllables($text)) * 0.4);
}</code>

<code>function average_words_sentence($text) {
    $sentences = strlen(preg_replace('/[^\.!?]/', '', $text));
    $words = strlen(preg_replace('/[^ ]/', '', $text));
    return ($words/$sentences);
}</code>

<code>function percentage_number_words_three_syllables($text) {
    $syllables = 0;
    $words = explode(' ', $text);
    for ($i = 0; $i &lt; count($words); $i++) {
        if (count_syllables($words[$i]) &gt; 2) {
            $syllables ++;
        }
    }

    $score = number_format((($syllables / count($words)) * 100));

    return ($score);
}</code>

<code>function count_syllables($word) {

    $subsyl = Array(
        'cial'
        ,'tia'
        ,'cius'
        ,'cious'
        ,'giu'
        ,'ion'
        ,'iou'
        ,'sia$'
        ,'.ely$'
    );

    $addsyl = Array(
        'ia'
        ,'riet'
        ,'dien'
        ,'iu'
        ,'io'
        ,'ii'
        ,'[aeiouym]bl$'
        ,'[aeiou]{3}'
        ,'^mc'
        ,'ism$'
        ,'([^aeiouy])\1l$'
        ,'[^l]lien'
        ,'^coa[dglx].'
        ,'[^gq]ua[^auieo]'
        ,'dnt$'
    );

    // Based on Greg Fast's Perl module Lingua::EN::Syllables
    $word = preg_replace('/[^a-z]/is', '', strtolower($word));
    $word_parts = preg_split('/[^aeiouy]+/', $word);
    foreach ($word_parts as $key =&gt; $value) {
        if ($value &lt;&gt; '') {
            $valid_word_parts[] = $value;
        }
    }

    $syllables = 0;
    // Thanks to Joe Kovar for correcting a bug in the following lines
    foreach ($subsyl as $syl) { 
        $syllables -= preg_match('~'.$syl.'~', $word); 
    } 
    foreach ($addsyl as $syl) { 
        $syllables += preg_match('~'.$syl.'~', $word); 
    }
    if (strlen($word) == 1) {
        $syllables++;
    }
    $syllables += count($valid_word_parts);
    $syllables = ($syllables == 0) ? 1 : $syllables;
    return $syllables;
}</code> <br><br>]]></description>
				<pubDate>Tue, 06 Jul 2004 11:41:35 +0100</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/blog/code/gunning-fog-function/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=code&amp;start=0" class="ditto_tag" rel="tag">code</a>,<a href="/feeds/tag-feed/?tags=language&amp;start=0" class="ditto_tag" rel="tag">language</a>,<a href="/feeds/tag-feed/?tags=php&amp;start=0" class="ditto_tag" rel="tag">php</a>,<a href="/feeds/tag-feed/?tags=programming&amp;start=0" class="ditto_tag" rel="tag">programming</a>,<a href="/feeds/tag-feed/?tags=readability&amp;start=0" class="ditto_tag" rel="tag">readability</a>,<a href="/feeds/tag-feed/?tags=tools&amp;start=0" class="ditto_tag" rel="tag">tools</a>
			</item>

			<item>
				<title>Output Caching for Beginners</title>
				<link>http://www.addedbytes.com/articles/for-beginners/output-caching-for-beginners/</link>
				<description><![CDATA[ High-traffic sites can often benefit from caching of pages, to save processing of the same data over and over again. This caching tutorial runs through the basics of file caching in PHP. <p>Caching of output in PHP is made easier by the use of the output buffering functions built in to PHP 4 and above.</p>

<p>You'll need to use two files to set up a caching system for your site. The first, "begin_caching.php" in this case, will run before any other PHP on your site. The second, "end_caching.php" in this case, runs after normal scripts have run. The two scripts effectively wrap around your current site.</p>

<p>You can achieve this wrapping effect one of two ways. The first way is to simply use the include() function and add them manually to every script you run. Unfortunately, this method can take some time, but is arguably more portable than the alternative.</p>

<p>The alternative relies on adding the following two lines of code (modified to reflect the correct path to the two PHP files needed) to your htaccess file. This is my preferred method, just because it requires no modification to existing scripts, and can very easily and quickly be turned off (just by commenting out the relevant lines in the htaccess file).</p>

<pre class="php">php_value auto_prepend_file /full/path/to/begin_caching.php
php_value auto_append_file /full/path/to/end_caching.php</pre>

<p>Next, we move on to the scripts that do the work. There are several stages to caching a document:</p>

<ol><li>Receive request for page</li><li>Check for the existence of a cached version of that page</li><li>Check the cached copy is still valid<ul><li>If it is, send the cached copy</li><li>If not, create a new cached copy and send it</li></ul></li></ol>

<p>To begin with, the script below contains a few basic settings. Here, you can set the directory you want to save cached files to (I would recommend keeping that directory outside your web root directory or at least protecting it from view through a normal browser). This script will need to be able to create files in this directory, and you need to allow this by setting the permissions of the directory. The permissions depend upon your server set up, so you may want to start by setting them to 777 while testing the script, and then reduce them to the lowest levels possible once the script is working.</p>

<p>You can also set the time, in seconds, a cached file should be considered valid for after creation, and set the file extension for saved files. It would be wise to not name them ".php", just for safety's sake.</p>

<pre class="php">&lt;?php

    // Settings
    $cachedir = '../cache/'; // Directory to cache files in (keep outside web root)
    $cachetime = 600; // Seconds to cache files for
    $cacheext = 'cache'; // Extension to give cached files (usually cache, htm, txt)

    // Ignore List
    $ignore_list = array(
        'addedbytes.com/rss.php',
        'addedbytes.com/search/'
    );

    // Script
    $page = 'http://' . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI']; // Requested page
    $cachefile = $cachedir . md5($page) . '.' . $cacheext; // Cache file to either load or create

    $ignore_page = false;
    for ($i = 0; $i &lt; count($ignore_list); $i++) {
        $ignore_page = (strpos($page, $ignore_list[$i]) !== false) ? true : $ignore_page;
    }

    $cachefile_created = ((@file_exists($cachefile)) and ($ignore_page === false)) ? @filemtime($cachefile) : 0;
    @clearstatcache();

    // Show file from cache if still valid
    if (time() - $cachetime &lt; $cachefile_created) {

        <em>//ob_start('ob_gzhandler');</em>
        @readfile($cachefile);
        <em>//ob_end_flush();</em>
        exit();

    }

    // If we're still here, we need to generate a cache file

    ob_start();

?&gt;</pre>

<p>The file starts by generating an MD5 hash of the page that has been requested. It will use the complete requested URL, and the MD5 hash will be a 32 digit number, unique for each file. It then checks for the existence of this file.</p>

<p>If the file exists, it checks to see when it was last updated. If the file is older than the allowed time, it acts as though no cache existed (carrying on and generating a new file). If the file is still valid, it simply displays it.</p>

<p>There is also, in the settings, a list of pages to ignore when caching. This can be search results, comments pages, a news page or news feed - anything that should always be up to date. Simply add anything you do not want cached into here, and it will not be cached. You can add directories, or parts of URLs - the above simply searches for a text string. In the example above, I have left out the "http://www" portion of the URL, as this can be missed out by some visitors.</p>

<p>Finally, the two lines in italics above are both commented out. You can, if you like, uncomment these, and that will use outbut buffering to gzip your content before sending it to users, making your site even faster for them. Please note, though, that output buffering with gz encoding is not available in versions of PHP previous to 4.0.5.</p>

<p>Which brings us to the second file, "end_caching.php". At the end of the first file, if no cache exists, we start output buffering. This means that rather than send the page to the user, we are saving it for use later. In the second script below, we take the contents of the output buffer, and write it to a file.</p>

<pre class="php">&lt;?php

    // Now the script has run, generate a new cache file
    $fp = @fopen($cachefile, 'w'); 

    // save the contents of output buffer to the file
    @fwrite($fp, ob_get_contents());
    @fclose($fp); 

    ob_end_flush(); 

?&gt;</pre>

<p><strong>Important:</strong> If you do not have "register_globals" set to off in php.ini, make sure you add the following to the beginning of "end_caching.php" (straight after the "&lt;?php" line) to aid security. This will ensure that an attacker cannot visit "end_caching.php" directly and overwrite an important file on your site (or read its contents).</p>

<pre class="php">    $cachedir = '../cache/'; // Directory to cache files in (keep outside web root)
    $cacheext = 'cache'; // Extension to give cached files (usually cache, htm, txt)
    $page = 'http://' . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI']; // Requested page
    $cachefile = $cachedir . md5($page) . '.' . $cacheext; // Cache file to either load or create</pre>

<p>And there we have it. If a cached document exists, it is shown to the user, and if not, one is created.</p>

<p>Finally, you need to make sure the cache remains reasonably clean. Over time, out of date or redundant files could build up, and these should be removed regularly. For this reason, I usually set up an automated script to delete all cache files once a week (or less often, depending on the traffic of the site), but this will depend greatly upon the server software you are using.</p>

<p>The script below is one example of a script to delete all cache files. You will need to set the cache directory at the beginning before running the script. You can either use this manually, visiting the page through your browser whenever you want to empty the cache, or run it automatically. An example of a CRON job used to run this script automatically is below the script (the " &gt;/dev/null 2&gt;&amp;1" bit at the end of the crontab prevents the server emailing me every time the script runs). Please note that this last script will be cached too, unless you specify otherwise!</p>

<pre class="php">&lt;?php

    // Settings
    $cachedir = '../cache/'; // Directory to cache files in (keep outside web root)

    if ($handle = @opendir($cachedir)) {
        while (false !== ($file = @readdir($handle))) {
            if ($file != '.' and $file != '..') {
                echo $file . ' deleted.&lt;br&gt;';
                @unlink($cachedir . '/' . $file);
            }
        }
        @closedir($handle);
    }

?&gt;</pre>

<pre class="php">curl http://www.your_domain.com/empty_caching.php &gt;/dev/null 2&gt;&amp;1</pre> <br><br>]]></description>
				<pubDate>Wed, 09 Jun 2004 16:13:00 +0100</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/articles/for-beginners/output-caching-for-beginners/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=article&amp;start=0" class="ditto_tag" rel="tag">article</a>,<a href="/feeds/tag-feed/?tags=cache&amp;start=0" class="ditto_tag" rel="tag">cache</a>,<a href="/feeds/tag-feed/?tags=caching&amp;start=0" class="ditto_tag" rel="tag">caching</a>,<a href="/feeds/tag-feed/?tags=code&amp;start=0" class="ditto_tag" rel="tag">code</a>,<a href="/feeds/tag-feed/?tags=development&amp;start=0" class="ditto_tag" rel="tag">development</a>,<a href="/feeds/tag-feed/?tags=for+beginners&amp;start=0" class="ditto_tag" rel="tag">for beginners</a>,<a href="/feeds/tag-feed/?tags=performance&amp;start=0" class="ditto_tag" rel="tag">performance</a>,<a href="/feeds/tag-feed/?tags=php&amp;start=0" class="ditto_tag" rel="tag">php</a>,<a href="/feeds/tag-feed/?tags=programming&amp;start=0" class="ditto_tag" rel="tag">programming</a>,<a href="/feeds/tag-feed/?tags=tutorial&amp;start=0" class="ditto_tag" rel="tag">tutorial</a>,<a href="/feeds/tag-feed/?tags=web&amp;start=0" class="ditto_tag" rel="tag">web</a>,<a href="/feeds/tag-feed/?tags=webdev&amp;start=0" class="ditto_tag" rel="tag">webdev</a>
			</item>

			<item>
				<title>Email Address Validation</title>
				<link>http://www.addedbytes.com/blog/code/email-address-validation/</link>
				<description><![CDATA[ How to validate email addresses according to ISO standards with PHP. <p><strong><span style="color: #f00;">PLEASE NOTE:</span> This function is now considered out of date. An updated version incorporating many of the comments below has been released under an open source license as a Google Code project: <a href="http://code.google.com/p/php-email-address-validation/">php-email-address-validation</a>. There is more about this change in the post <a href="http://www.addedbytes.com/blog/email-address-validation-v2/">Email Address Validation Updated</a>.</strong></p>

<p>Many email address validators will actually throw up errors when faced with a valid, but unusual, email address. Many, for example, assume that an email address with a domain name extension of more than three letters is invalid. However, new TLDs such as ".info", ".name" and ".aero" are perfectly valid but longer than three characters. Many email address validators fail to take into account that you do not necessarily need a domain name in an email address - an IP address is fine.</p>

<p>The first step to creating a PHP script for validating email addresses is to work out <em>exactly</em> what is and is not valid. RFC 2822, that specifies what is and is not allowed in an email address, states that the form of an email address must be of the form "local-part @ domain".</p>

<p>The "local-part" of an email address must be between 1 and 64 characters in length and may be made up in any one of three ways. It can be made up of a selection of characters (and only these characters) from the following selection (though the period can not be the first of these):</p>

<ul><li>A to Z</li><li>0 to 9</li><li>!</li><li>#</li><li>$</li><li>%</li><li>&amp;</li><li>'</li><li>*</li><li>+</li><li>-</li><li>/</li><li>=</li><li>?</li><li>^</li><li>_</li><li>`</li><li>{</li><li>|</li><li>}</li><li>~</li><li>.</li></ul>

<p>Or, it can be made up of a quoted string containing any characters except "\". Older email addresses may be made up differently, and may contain a combination of the above. The following are all valid as the first part of an email address:</p>

<ul><li>dave</li><li>+1~1+</li><li>{_dave_}</li><li>""</li><li>dave."dave" (Note that this is considered an obsolete form of address - new addresses created should not be of this form, but it is still considered valid.)</li></ul>

<p>The following, though similar, are all invalid:</p>

<ul><li>-- dave -- (spaces are invalid unless enclosed in quotation marks)</li><li>[dave] (square brackets are invalid, unless contained within quotation marks)</li><li>.dave (the local part of a domain name cannot start with a period)</li></ul>

<p>The "domain" portion of the email address can also be made up in different ways. The most common form is a domain name, which is made up of a number of "labels", each separated by a period and between 1 and 63 characters in length. Labels may contain letters, digits and hyphens, however must not begin or end with a hyphen (officially, a label must begin with a letter, not a digit, however many domain names have been registered beginning with digits so for the purposes of validation we will assume that digits are allowed at the start of domain names). A domain name, technically, need be only one label. However in practice domain names are made up of at least two labels, so for the purposes of validation we will check for two. A domain name may not be over 255 characters in total. A domain portion of an email address may also be an IP address, which can in turn be enclosed in square brackets.</p>

<p>In order to check that email addresses conform to these guidelines, we'll need to use regular expressions. First, we need to match the three possible forms of the local part of an email address, using the two patterns below (we'll add in escape characters later, when we put the function together): </p>
 
<code>^[A-Za-z0-9!#$%&amp;'*+-/=?^_`{|}~][A-Za-z0-9!#$%&amp;'*+-/=?^_`{|}~\.]{0,63}$</code>

<code>^"[^(\|")]{0,62}"$</code>

<p>We can use the two patterns we've defined here to check for obsolete local parts of email addresses too, saving ourselves from needing a third pattern.</p>

<p>Next, we need to check the domain portion of the email address. It can either be an IP address or a domain name, so we can use the two patterns here to validate it:</p>

<code>^\[?[0-9\.]+\]?$</code>

<code>^[A-Za-z0-9][A-Za-z0-9-]*[A-Za-z0-9](.[A-Za-z0-9][A-Za-z0-9-]*[A-Za-z0-9])+$</code>

<p>The above pattern will match any valid domain name, but will also match an IP address, so we only need the above to check the "domain" portion of the email.</p>

<p>Putting it all together gives us the following function. Call it like any normal function, and you will get back a value of "true" if the string entered is a valid email address, or "false" if the input was an invalid email address.</p>

<code>function check_email_address($email) {
    // First, we check that there's one @ symbol, and that the lengths are right
    if (!ereg("^[^@]{1,64}@[^@]{1,255}$", $email)) {
        // Email invalid because wrong number of characters in one section, or wrong number of @ symbols.
        return false;
    }
    // Split it into sections to make life easier
    $email_array = explode("@", $email);
    $local_array = explode(".", $email_array[0]);
    for ($i = 0; $i &lt; sizeof($local_array); $i++) {
         if (!ereg("^(([A-Za-z0-9!#$%&amp;'*+/=?^_`{|}~-][A-Za-z0-9!#$%&amp;'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$", $local_array[$i])) {
            return false;
        }
    }    
    if (!ereg("^\[?[0-9\.]+\]?$", $email_array[1])) { // Check if domain is IP. If not, it should be valid domain name
        $domain_array = explode(".", $email_array[1]);
        if (sizeof($domain_array) &lt; 2) {
                return false; // Not enough parts to domain
        }
        for ($i = 0; $i &lt; sizeof($domain_array); $i++) {
            if (!ereg("^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|([A-Za-z0-9]+))$", $domain_array[$i])) {
                return false;
            }
        }
    }
    return true;
}</code>

<p>Using the function above is relatively simple, as you can see:</p>

<code>if (check_email_address($email)) {
    echo $email . ' is a valid email address.';
} else {
    echo $email . ' is not a valid email address.';
}</code>

<p>You can now validate email addresses entered into your site against the specifications that define email addresses (more or less - domain names that start with a number are supposed to be invalid, but do exist).</p>

<p>Finally, please do remember that because an email <em>looks</em> valid does not mean it is in use. Using a script for validating email addresses is a good start to email address validation, but though it can tell you an email address is technically valid it cannot tell you if it is in use. You might benefit from checking in more depth, for example seeing if a domain name is registered. Even better, fire off an email to the address given by a user and get them to click a link to confirm it is real - the only way to be 100% sure.</p> <br><br>]]></description>
				<pubDate>Tue, 01 Jun 2004 13:16:31 +0100</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/blog/code/email-address-validation/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=code&amp;start=0" class="ditto_tag" rel="tag">code</a>,<a href="/feeds/tag-feed/?tags=development&amp;start=0" class="ditto_tag" rel="tag">development</a>,<a href="/feeds/tag-feed/?tags=email&amp;start=0" class="ditto_tag" rel="tag">email</a>,<a href="/feeds/tag-feed/?tags=php&amp;start=0" class="ditto_tag" rel="tag">php</a>,<a href="/feeds/tag-feed/?tags=programming&amp;start=0" class="ditto_tag" rel="tag">programming</a>,<a href="/feeds/tag-feed/?tags=regex&amp;start=0" class="ditto_tag" rel="tag">regex</a>,<a href="/feeds/tag-feed/?tags=regexp&amp;start=0" class="ditto_tag" rel="tag">regexp</a>,<a href="/feeds/tag-feed/?tags=security&amp;start=0" class="ditto_tag" rel="tag">security</a>,<a href="/feeds/tag-feed/?tags=tutorial&amp;start=0" class="ditto_tag" rel="tag">tutorial</a>,<a href="/feeds/tag-feed/?tags=validation&amp;start=0" class="ditto_tag" rel="tag">validation</a>,<a href="/feeds/tag-feed/?tags=webdesign&amp;start=0" class="ditto_tag" rel="tag">webdesign</a>
			</item>

			<item>
				<title>VBScript Regular Expressions</title>
				<link>http://www.addedbytes.com/blog/code/vbscript-regular-expressions/</link>
				<description><![CDATA[ Regular expression reference and examples for VBScript. <p>Regular expressions in VBScript are two words that can bring many to their knees, weeping, but they are not as scary as some would have you believe. With their roots in Perl, regular expressions in VBScript use similar syntax, and the chances are that you may already be familiar with the concepts here if you have played with regular expression matching before.</p>

<p>Below, you will find three sections. The first section, <a href="http://www.addedbytes.com/asp/vbscript-regular-expressions/#reference">Reference</a>, is a simple reference listing the most-used of the various symbols and characters used in regular expressions. The second section, <a href="http://www.addedbytes.com/asp/vbscript-regular-expressions/#functions">Functions</a>, has two functions in it that may make life easier for you. The third section, <a href="http://www.addedbytes.com/asp/vbscript-regular-expressions/#examples">Examples</a>, is where the fun begins - examples of regular expressions in action.</p>

<h3 id="reference">Reference</h3>

<p><strong>Character Sets and Grouping</strong></p>

<ul><li class="reference"><span class="listpad">.</span> - Any single character (except new line character, "\n")</li><li class="reference"><span class="listpad">[]</span> - Encloses any set of characters</li><li class="reference"><span class="listpad">^</span> - Matches any characters not within following set</li><li class="reference"><span class="listpad">[A-Z]</span> - Any upper case letter between A and Z</li><li class="reference"><span class="listpad">[a-z]</span> - Any lower case letter between a and z</li><li class="reference"><span class="listpad">[0-9]</span> - Any digit from 0 to 9</li><li class="reference"><span class="listpad">()</span> - Group section. Also can then be back-referenced with $1 to $n, where n is the number of groups</li><li class="reference"><span class="listpad">|</span> - Or. (ab)|(bc) will match "ab" or "bc"</li></ul>

<p><strong>Repetition</strong></p>

<ul><li class="reference"><span class="listpad">+</span> - One or more</li><li class="reference"><span class="listpad">*</span> - Zero or more</li><li class="reference"><span class="listpad">?</span> - Zero or one</li><li class="reference"><span class="listpad">{5}</span> - Five</li><li class="reference"><span class="listpad">{1,3}</span> - One to three</li><li class="reference"><span class="listpad">{2,}</span> - Two or more</li></ul>

<p><strong>Positioning</strong></p>

<ul><li class="reference"><span class="listpad">^</span> - Start of string</li><li class="reference"><span class="listpad">$</span> - End of string</li><li class="reference"><span class="listpad">\b</span> - End of word</li><li class="reference"><span class="listpad">\n</span> - New line</li><li class="reference"><span class="listpad">\r</span> - Carriage return</li></ul>

<p><strong>Miscellaneous</strong></p>

<ul><li class="reference"><span class="listpad">\</span> - Escape character</li><li class="reference"><span class="listpad">\t</span> - Tab</li><li class="reference"><span class="listpad">\s</span> - White space</li><li class="reference"><span class="listpad">\w</span> - Matches word (equivalent of [A-Za-z0-9_])</li></ul>

<p>Please note that the escape character mentioned above is not usable in normal VBScript. Regular expression syntax is based upon Perl regular expression syntax. To escape a character in VBScript, you usually double it. For example, the following will print out 'This is a "quoted" piece of text'.</p>

<code>response.write("This is a ""quoted"" piece of text.")</code>

<h3 id="functions">Functions</h3>

<p>The first of the functions below, ereg (named after the PHP function to keep me from going quite quite mad), is the one you will probably use most. Simply put, if you feed in a string, pattern, and choose whether or not you would like to ignore the case of letters in either, the function will return TRUE if the string contains the pattern, or FALSE if not.</p>

<code>function ereg(strOriginalString, strPattern, varIgnoreCase)
    ' Function matches pattern, returns true or false
    ' varIgnoreCase must be TRUE (match is case insensitive) or FALSE (match is case sensitive)
    dim objRegExp : set objRegExp = new RegExp
    with objRegExp
        .Pattern = strPattern
        .IgnoreCase = varIgnoreCase
        .Global = True
    end with
    ereg = objRegExp.test(strOriginalString)
    set objRegExp = nothing
end function</code>

<p>Next up we have ereg_replace. Like it's shorter cousin, you need to feed it a string, a pattern and choose your case sensitivity. This time, you must also add a replacement. This function will replace all instances of the pattern with the replacement in the string (if you change ".Global = True" to ".Global = False" then the function will only replace the first instance of the pattern with the replacement).</p>

<code>function ereg_replace(strOriginalString, strPattern, strReplacement, varIgnoreCase)
    ' Function replaces pattern with replacement
    ' varIgnoreCase must be TRUE (match is case insensitive) or FALSE (match is case sensitive)
    dim objRegExp : set objRegExp = new RegExp
    with objRegExp
        .Pattern = strPattern
        .IgnoreCase = varIgnoreCase
        .Global = True
    end with
    ereg_replace = objRegExp.replace(strOriginalString, strReplacement)
    set objRegExp = nothing
end function</code>

<h3 id="examples">Examples</h3>

<p><strong>Example 1: Checking hexadecimal string</strong></p>

<p>A hexadecimal number can be made up of any digit, and any letter, upper or lower case, between a and f, inclusive. So to check if a string is actually hexadecimal, the following will do quite nicely (strOriginalString is the original string to be tested):</p>

<code>&lt;%
if ereg(strOriginalString, "[^a-f0-9\s]", True) = True then
    response.write "String is not hexadecimal."
else
    response.write "String is hexadecimal."
end if
%&gt;</code>

<p>The pattern, "[^a-f0-9\s]" matches anything that is <strong>not</strong> in the set of characters specified (so if there is anything in the string that is not in that set, the function will return True). The characters specified are all letters between a and f inclusive, and we've specified a case insensitive match, so upper case letters will be treated the same way. We are also allowing whitespace (new lines, spaces, carriage returns and tabs), which is what the "\s" represents in regular expressions.</p>

<p>Example string that returns False (and is therefore hexadecimal):</p>

<code>AAcc99</code>

<p><strong>Example 2: Masking the last section of an IP address</strong></p>

<p>An IP address is made up of four sets of numbers seperated by periods. It's common practice, if you are going to display visitor (or any) IP address on your site, to mask the last (fourth) set of numbers. Here's a way to use ereg_replace to do just this:</p>

<code>&lt;%
strOriginalString = ereg_replace(strOriginalString, "([^0-9])([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.[0-9]{1,3}([^0-9])", "$1$2.$3.$4.***$5", True)
%&gt;</code>

<p>This is a little more tricky, as you'd hopefully expect from a second example. It looks harder than it is though, so one step at a time. There are actually only a few entities in the pattern - they are just repeated. The most important is this: "([0-9]{1,3})". It matches a section of an IP adress, and is enclosed in brackets so that this section can be used in the replacement of the pattern as well (otherwise we would not be able to keep the first three parts of the IP address to display). You can see these sections in use, referenced with "$2", "$3" and "$4" in the replacement. The pattern within the brackets simply says "between one and three digits between 0 and 9".</p>

<p>The second repeated section is "\.". We use a backslash before the period to indicate that this period (the character following the backslash) is to be treated as a normal period. We call this an <em>escaped character</em>, and this is a fairly common practice. The period, unescaped (without the backslash), is used as a symbol representing "any character except the new line character".</p>

<p>Example input text:</p>

<code>My IP address is 123.456.78.9 but 4444.1.1.1 is just a bunch of random numbers, and so is 12.34.56, and 1.1.1.1 is another valid IP.</code>

<p>Example output text:</p>

<code>My IP address is 123.456.78.*** but 4444.1.1.1 is just a bunch of random numbers, and so is 12.34.56, and 1.1.1.*** is another valid IP.</code>

<p><strong>Example 3: Making the second word of every sentence in a string bold, as long as the word before only contains upper case letters and the second word does not contain an even digit</strong></p>

<p>Getting more interesting now, this example is not in the least bit useful in practice, but should prove to be a useful demonstration of the power of regular expressions. It sounds tough - but with regular expressions, it's a walk in the park.</p>

<code>&lt;%
strOriginalString = ereg_replace(". " &amp; strOriginalString, "(\.|!|\?)\s([A-Z]+)\s([^02468\s]+)\s", "$1 $2 &lt;strong&gt;$3&lt;/strong&gt; ", False)
strOriginalString = mid(strOriginalString, 2)
%&gt;</code>

<p>We start by adding an artificial period and space to the beginning of the string, just to make sure we catch the first sentence, and add a line to strip our extra characters out afterwards. We only want those sentences split with punctuation <em>and</em> a space, or we'll end up with bold decimals and it will be very messy indeed. So, we check for puncuation, followed by a space, followed by a word made entirely of capitals, followed by another space, followed by a second word that doesn't contain even numbers, or whitespace, followed by a space. If we find that, we replace it with the same items we picked up in brackets, only with a &lt;strong&gt;&lt;/strong&gt; tag pair around the second word.</p>

<p>Example input text:</p>

<code>THE quick brown fox jumped over the lazy dog? Many red balloons blew up! EVEN num2ber sentence. ODD num3ber sentence.</code>

<p>Example output text:</p>

<code>THE <strong>quick</strong> brown fox jumped over the lazy dog? Many red balloons blew up! EVEN num2ber sentence. ODD <strong>num3ber</strong> sentence.</code> <br><br>]]></description>
				<pubDate>Fri, 07 Nov 2003 09:29:40 +0000</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/blog/code/vbscript-regular-expressions/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=asp&amp;start=0" class="ditto_tag" rel="tag">asp</a>,<a href="/feeds/tag-feed/?tags=code&amp;start=0" class="ditto_tag" rel="tag">code</a>,<a href="/feeds/tag-feed/?tags=expressions&amp;start=0" class="ditto_tag" rel="tag">expressions</a>,<a href="/feeds/tag-feed/?tags=programming&amp;start=0" class="ditto_tag" rel="tag">programming</a>,<a href="/feeds/tag-feed/?tags=reference&amp;start=0" class="ditto_tag" rel="tag">reference</a>,<a href="/feeds/tag-feed/?tags=regex&amp;start=0" class="ditto_tag" rel="tag">regex</a>,<a href="/feeds/tag-feed/?tags=regexp&amp;start=0" class="ditto_tag" rel="tag">regexp</a>,<a href="/feeds/tag-feed/?tags=regular&amp;start=0" class="ditto_tag" rel="tag">regular</a>,<a href="/feeds/tag-feed/?tags=regular-expressions&amp;start=0" class="ditto_tag" rel="tag">regular-expressions</a>,<a href="/feeds/tag-feed/?tags=scripting&amp;start=0" class="ditto_tag" rel="tag">scripting</a>,<a href="/feeds/tag-feed/?tags=vb&amp;start=0" class="ditto_tag" rel="tag">vb</a>,<a href="/feeds/tag-feed/?tags=vbscript&amp;start=0" class="ditto_tag" rel="tag">vbscript</a>
			</item>
	</channel>
</rss>