The fourth part of the Writing Secure PHP series, covering cross-site scripting, cross-site request forgery and character encoding security issues.
In Writing Secure PHP, Writing Secure PHP, Part 2 and Writing Secure PHP, Part 3 I covered many of the common mistakes PHP developers make, and how to avoid some potential security problems. This article covers some of the more advanced security problems common to PHP on the web.
Cross-site scripting (often abbreviated to XSS) is a form of injection, where an attacker finds a way to have the target site display code they control. In its most basic form, this can be as simple as a site that allows HTML characters in usernames, where someone can specify a username like:
Now, whenever someone sees my username on the target site, the script I've added to my username will run. I could potentially use this to grab the person's login information, log their keystrokes - any number of nefarious activities.
As a developer, you can combat this type of attack by encoding or removing HTML characters (watch out for character encoding issues, as outlined next). Even better than stripping out unwanted characters is to allow a whitelist of safe characters in usernames and other fields. Be especially careful with e-commerce sites where you are listing orders in a CMS - an XSS vulnerability may allow an attacker to gain administrative access to your CMS. It is also important to turn off TRACE and TRACK support on the server, as if there is a vulnerability (and always assume that despite your best efforts there will be) these potentially allow an attacker to steal a user's cookie.
As a user you are also vulnerable to this sort of attack, and it is very difficult, at the moment, to make yourself safe against it. Vigilance is key, and to that end I have released a userscript that warns you about third party scripts (for users of GreaseMonkey, Opera or Chrome).
Despite the similar name, CSRF is unconnected to XSS. CSRF is a form of attack where an authenticated user performs an action on a site without knowing it.
Let's assume that Jack is logged in to his bank, and has a cookie stored on his computer. Each time he sends an HTTP request to the bank (i.e., views a page or an image on a page) his browser sends the cookie along with the request so that the bank knows that it's him making the request.
Jill, meanwhile, runs a different website and has managed to get Jack to visit it. One of the items on the page is in fact loaded from the bank, for example in an iframe. The URL of the iframe or request contains instructions to the bank to transfer money from Jack's account to Jill's. Because the request is coming from Jack's computer, and includes his cookie, the bank assumes it is a legitimate request and the money is transferred.
This type of attack is extremely dangerous and virtually untracable. As a developer, your job is to protect against it, and the best way to do that is to remember Rule Number One: Never, Ever Trust Your Users. No matter how authenticated they are, do not assume every request was intended.
In practical PHP terms, you can combat CSRF with several relatively simple coding habits. Never let the user do anything with a GET request - always use POST. Confirm actions before performing them with a confirmation dialog on a separate page - and make sure both the original action button or link and the confirmation were clicked. Even better, have the user enter information like letters from their password on the confirmation page.
Character encoding in PHP and associated database systems is worthy of its own series. In any one request, there may be more different character encodings in use than you might think.
For example, a single request and response (uploading a file to a server and writing information to a database) may involve all of the following differently items with different character encodings: the HTTP request headers, post data, PHP's default encoding, the PHP MySQL module, MySQL's default set, the set of each table being used, a file being opened and read, a new file being created and written, the response headers and the response body.
English-speaking developers generally don't have much cause to get embroiled in character encoding issues, and that results in a lot of developers with a serious lack of understanding of how character encodings work and fit together. For those that do have a reason to look at character encodings, usually that interest ends with the setting of the response's character set.
However, character sets are a fundamental part of all web development. English alone can exist in any one of a wide variety of sets, and developers are usually familiar with the most common two: ISO-8859-1 and UTF-8. Fewer are familiar with UCS-2, UTF-16 or windows-1252. Still fewer are familiar with commonly used alternative language sets (e.g, GB2312 for Chinese).
Which, in a very roundabout way, brings me on to the security pitfalls of character encodings. Where data is processed by PHP using one character set, but a database server uses a different character set, a character (or series of characters) deemed safe by PHP may in fact allow SQL injection against the database.
PHP security expert Chris Shiflett has written about this issue and included an example of how it can be exploited to allow SQL injection even where input is sanitized using addslashes().
The solution is to always always use mysql_real_escape_string() rather than addslashes() (or use prepared statements / stored procedures), and to explicitly state character sets at all stages of interaction. Ideally, use the same character set throughout your system (UTF-8 is recommended) and where PHP allows you to specify a character encoding for a function (e.g., htmlspecialchars() or htmlentities()), make use of it.
It's not just SQL that's vulnerable as a result of character encoding bugs. Cross-site scripting is possible even where HTML characters are escaped if character sets are not handled properly. Fortunately, once again that is simple to avoid by properly setting character encodings at all stages of the process and specifying character encoding for functions where possible.