The Problem
The recent RockYou.com password problems have spawned plenty of debate online about the best way to store passwords and build a site securely.
Part of being a good, security-conscious web developer is paranoia, and it's apparent that the RockYou.com developers could have used a little more of it. They made two mistakes in their work, not one. Their first, and most obvious one, is that they had a SQL injection hole somewhere. Their second was their assumption that their measures to protect their data were enough to do so.
A healthy dose of paranoia would have led their developers to make the opposite assumption - that whatever they did to protect the data, sooner or later someone would be able to access it.
The result of this second mistake is that, rather than simply announcing a security hole has been found and closed, they have had to deal with the fact that the passwords of more than 32 million people have been exposed, in plain text, to an unknown number of people. As most people use the same password for multiple places, and most will be unaware that this has happened, we can safely assume that the access details of millions of email accounts are in the open and unchanged. That's a bad day in code-land by anyone's standards.
Hashing
The solution to the problem is to first assume that all data will be exposed at some point to an intruder of some sort. Once you assume that, it becomes important to ensure that the damage resulting from that exposure is minimal.
Which brings me on to hashes. Hashes are one-way functions that generate a representation, usually a number, of the data put in to them. They always generate the same hash from the same data, and there is no simple way to reverse the process.
This makes them incredibly useful for password storage. Instead of storing a user's password, you can store the hash of the password. When a user logs in again, instead of checking the password they type in against the one you have stored, you calculate the hash of the password they type in and compare that to the stored hash.
There are lots of different hashing algorythms, the most commonly used being MD5 and SHA1.
Are Hashes Secure?
Unfortunately, ensuring passwords are stored securely isn't as simple as just using storing a simple hash of a password. Two of the strengths of hashes are also their largest potential weakness: they are small to store and quick to generate.
To generate SHA1 and MD5 hashes of every word in English, for example, takes moments. To store that amount of data is also trivial. To generate hashes of all combinations of letters and numbers, plus a few commonly used punctuation marks, up to say 8 characters, is much slower but still doable without any special setup or equipment.
Tables of precalculated hashes of data like this are easily found online or easily generated. If you have a hash of some data (like a password) and you want to see what that data originally was, you can compare the hash to the entries in your precalculated table. If you find a match, you have discovered the data that was originally used to generate the hash - the password you were trying to find out.
So basic password hashing is, essentially, useless for the majority of users. It is a simple process to compare hashes of basic passwords to a table of precalculated hashes and thereby "dehash" passwords en masse.
Some people recommend nesting hashes as a way to make add complexity and therefore more security. Unfortunately, to generate tables of nested hashes is almost as easy as plain hashes by themselves, and no more secure.
Add Salt!
The solution is to hash more than just the user's password, and this process is called "salting". For example, instead of storing a hash of a user's password, you could store the hash of their email address and their password together.
This is effective because tables of hashes of generated data of more than about 10 characters start to become problematic to generate and store. At around that point, tables must be generated based upon dictionaries and known words, rather than on programatically generated lists of all possible passwords in a range.
The average length of "email plus password" is easily in the region of 25 characters. Not only that, but if someone worked out that you were using hashes of "email plus password", they would still need to generate a new table for every password they wanted to dehash.
This level of complexity, added to a reasonably strong password policy, ensures that if (or when) your user data is exposed, the work involved in extracting usable passwords from it is going to stop all but the most determined attackers. Not only that, but even they will find extraction of data in bulk prohibitively difficult.
25 Comments
Nice article. Unfortunately it seems like a practice many developers neglect until disaster strikes.
#1, Chris Wiliams, 16 December 2009. Reply to this.
A very comprehensive explanation, thank you!
#2, aex, 16 December 2009. Reply to this.
I've always used seemingly long salts (around 40 characters or so) that were randomly generated. My favorite place to get such a salt is from wordpress: https://api.wordpress.org/secret-key/1.1/
Is there any benefit for using something bigger than SHA1?
#3, Ryan Rampersad, USA, 16 December 2009. Reply to this.
Well-written, though it's sad it has to be written at all. I mean c'mon, hashing passwords is web security 101.
One thing to add to improve security on hashes is a "pepper" in addition to salt. The salt is the non-static stored item (like the user's email) that's different on each entry. But a determined attacker who knew that was the salt could still crack it.
By using a static salt (pepper if you will) that is stored as its own variable in a script (not in the web-directory) would then require the attacker to not only access the MySQL database but get into the filesystem.
BTW addedbytes... Tab doesn't work for these comment boxes, I think because tabindex on them is set to 9-digit numbers, it must confuse the browser into thinking they're not tabable.
- Adam
#4, Adam Wolf, United States, 16 December 2009. Reply to this.
Thansk for sharing - I found the bit about hashing useful.
Eoin
#5, Eoin Redmond, Ireland, 22 December 2009. Reply to this.
HMAC is also a good approach and component of a solution to this problem.
#6, Ryan T, 23 December 2009. Reply to this.
Yes.. this is a Good procedure to do provide good security..
Have you observed in google's Login page?
Once you enter username and change the focus to password field, some process starts in background and you can see the data transfer progress bar showing some activity..
Is it some processes for security or what i dont know..
#7, Nanjangud, India, 27 December 2009. Reply to this.
Adam Wolf suggests the same solution that I use as standard on my apps:
app-specific salt ( or pepper, as Adam says )
+ random salt generated whenever pass is reset
+ password
...and SHA-512 that all together.
so even if your pass is "bob" it'll be hashing something like "randomsalt-Staticpepper-bob"
PHP's hash function with algo comparisons:
http://us.php.net/manual/en/function.hash.php
Per the comments on that page: "The well known hash functions MD5 and SHA1 should be avoided in new applications. Collission attacks against MD5 are well documented in the cryptographics literature and have already been demonstrated in practice. Therefore, MD5 is no longer secure for certain applications."
#8, James, USA, 27 December 2009. Reply to this.
Good article. Security is always on our minds
#9, avanzaweb, Spain, 31 December 2009. Reply to this.
Thanks for this article, i'm a fairly new developer and did not know what salting was, i was curious after setting salts for my wordpress sites without actually knowing what they did.
Your explanation cleared everything up for me, it seems like a straightforward enough idea.
#10, ralcus, 16 January 2010. Reply to this.
Great post. I new vaguely what salting is, but now I know exactly what it is. this post has cleared up any questions I had.
Thanks.
#11, Shane Heaters, UK, 19 February 2010. Reply to this.
Will, that is vendicated your opinion, dude. =)
#12, term paper services, 26 February 2010. Reply to this.
1. add salt
2. add user-agent and other fingerprinting techniques
3. use ssl if possible
4. make sure your server is secure and properly configured.
5. don't use a write enabled db user if you're just reading from the db.
6. store your backups in a very secure place.
7. and never ever send user details via email.
i'm often surprised that there is such a lack of hacking considering so few people follow the basics.
Replies: #21.
#13, murray, south africa, 4 March 2010. Reply to this.
Everyone kept on talking about salting, now I know that it really did make sense. Thank you for enlightening me.
#14, Hamsterkäfig, Germany, 19 March 2010. Reply to this.
Awesome. Every bit of it was useful.
#15, Kiran Ruth R, Indai, 20 March 2010. Reply to this.
This issue is usually delt with once the horse has bolted. I tend to always hash passwords but never thought of combining both email and password together. Definitely going to explore that one!
Cheers!
Replies: #22.
#16, SiteOne Web Design, UK, 5 June 2010. Reply to this.
Nice article. I have been using simple sha1 hashes for my passwords, and would like to implement a salting method. Is there a way to retroactively salt the passwords that have already been created? Or do I have to tell those users to create a new password?
Replies: #20.
#17, Scott Lieberman, 14 April 2011. Reply to this.
This is probably obvious, but one way to retroactively salt existing unsalted passwords would be to crack them, possibly using one of the techniques mentioned in this excellent article, and then rehash them, with salt. However, it might be too hard to crack the stronger passwords.
A better idea would be to change your hashing code to do a two-step hash: sha1( concat( sha1(password), salt) ). To convert to that system, you'd run a one-time batch process to apply the two-step hash to all existing passwords.
But this would mean that you'd be stuck with the more expensive double-hash forever. A final improvement, then, would be to do the two-step hash mentioned above, marking the existing passwords as double-hashed, but use a one-step hash for new passwords. The database would then migrate over time to the one-step hash as users changed their passwords, and you could eventually retire the double hash code.
#18, Pearitybit, USA, 17 January 2012. Reply to this.
Very nice explanation, and very helpful in light of the LinkedIn break-in. I've taken the liberty of creating a permanent citation at http://www.webcitation.org/68FhbGnz4
#19, Elliot Wilen, USA, 7 June 2012. Reply to this.
#17 it's smarter to just add to your table a "migrated" boolean, start everyone in false, and the next time your user logs in you check if it's been migrated or not. If not, you treat his password like a password change, maybe even ask him to re-login, like sometimes gmail does. Then produce the newly salted hash and store it, and mark it as migrated. This way you don't have to wait till all users change their password to retire the original hash, you just need to wait till all of them login, much faster, much time efficient.
#20, Javier Matusevich, 11 June 2012. Reply to this.
#13 If user details should not be sent through email, what is the better way to notify the user about the new password?
#21, Shyju, India, 12 June 2012. Reply to this.
#16 I've read lots of places that you shouldn't use the username or user email as salt, for that is a easily known/guessed salt and makes the cracking easy.
#22, Manoah F. Adams, United States, 14 June 2012. Reply to this.
I have yet to find any explanation as to why salts are any use in light of the fact that any hacker who gets hold of a dump of your table of password hashes will also likely get to the table of salts. It seems to me that this only adds a slight layer of difficulty, but hardly an additional layer of security. I think that Google's optional security measure of sending a verification code to the user's cell phone is, though awkward, an example of a truly robust additional layer of security.
Replies: #26.
#23, Manoah F. Adams, United States, 14 June 2012. Reply to this.
Manoah: The reason is that without salting somebody can just compare all passwords in a database against a simple rainbow table and extract a large number of passwords in one go, very quickly. With a salt that is unique to each hash, they would effectively require one rainbow table for each hash to find the password. Not impossible, but a lot of work, and a lot of resource would be required to achieve it.
Salting isn't the perfect solution (using bcrypt or a similar hashing algo that is slow to run by design is better), but even the two factor system isn't infallible (as Cloudflare recently posted: http://blog.cloudflare.com/post-mortem-todays-attack-apparent-google-app ).
#24, DaveChild, United Kingdom, 14 June 2012. Reply to this.
Thanks for that explanation. I am working on a project that uses hashing and salt to protect card data and compare card data. This article was a good foundation course for me.
#25, Meena Santhanagopalan, India, 26 September 2012. Reply to this.