How to Secure a Website Against Weak User Passwords

My Own Dubious Password Advice

Recently I read an article in New Zealand's Dominion Post, which gave advice on choosing strong passwords. I decided to write a letter to the editor countering what I saw as some errors and omissions in that advice.

The first point I made was that very strong passwords are only required if either:

An attacker has gained read access to the website's internal files (in particular the file containing password hashes, but there wasn't room for that much detail in my letter).
The website allows unlimited failed login attempts to a user account.

I suggested that a word randomly chosen from a dictionary would supply sufficient protection, implying that the above two scenarios shouldn't happen on a well-implemented website.

I realise now (especially after seeing my letter in print), that this is perhaps not the best advice, for two reasons:

In many cases an attacker will gain access to a site's internal files, and the simplest and least risky way the attacker can exploit their success is to take a copy of the password hash file, and then search off-line (i.e. on their own system) for passwords that match the check values in the file.
Probably many websites do allow unlimited logins.

My advice might have made sense in an ideal world, but the real world isn't ideal, and the average user has no easy way to judge how good the security of a website really is. It may be that the requirement for a user to choose a strong password compensates for the shortcomings of a website's security systems, but if that is the case, then so be it.

(The other point I made in my letter was that you shouldn't enter a critical password into any system booted off a hard disk, an issue which I discuss in more detail here.)

Bears and Running Shoes

Another reason for choosing the strongest possible passwords is the bear/running-shoe principle. If you haven't already heard the joke, it goes like this:

Two campers in the forest have just woken up and are walking around their campsite barefoot. Suddenly they see a bear running towards them. One of the campers starts putting his running shoes on. The other camper says "There's no point putting your shoes on, because you can't run faster than a bear." The first camper replies "I don't have to run faster than the bear; I just have to run faster than you."

Something similar applies to password strength. An attack against passwords for a website might succeed because those responsible for the website's security allowed an attacker to read the password hash file, or they allowed unlimited failed login attempts. If one of these things happens, and your password is weaker than everyone else's, then your password will probably be stolen before anyone else's. In the worst case, your account might be the only one to be broken into, and there will be no hard proof that a general breakdown in security has happened on the website. You will be left with the responsibility, even though the website was partly at fault.

On the other hand, if you choose a strong password, and it is recovered anyway, you will be one of many, and it will be more obvious that the website has suffered a general security failure.

Advice to Developers

My original advice was not very good advice for users, but it can be transmuted into good advice for developers. This new good advice will be based on attempts to answer the following question:

How can a website reduce the required strength (AKA "guess factor") of a user's password to an absolute minimum?

The "Guess Factor"

What I call the guess factor of a password is the number of guesses an attacker requires to discover the password, on the assumption that there is some means for the attacker to determine whether or not a possible value of the password is the correct password.

This factor is not really a property of the password itself; it's a property of how the password is chosen. To get a precise value, we need to assume that the password is randomly chosen from some precisely defined set of possible values. In practice this doesn't happen, because users just "think" of some value for a new password, and it is hard to say what set of possible passwords the password was chosen from.

To estimate the guess factor of your own passwords, you have to create an accurate mathematical model of the thought processes you follow when creating new passwords, and you also have to model what information a potential attacker might have about any of the information you used in order to think up a password. If that's too hard and too complicated, and you still need to calculate an accurate guess factor, then you have to alter your password choosing process so that it can be mathematically modelled more easily (which in practice means that you must accept password choices made by a random password generator).

For the sake of this article, I will ignore these complications, and just assume that the guess factor can be calculated. For example, if you choose a word completely randomly from a 10000 word dictionary, then your guess factor is 10000. If you add two completely randomly chosen digits to the end of your word, your guess factor has gone up to 1 million. And so on.

Preventing The Password File Attack

The weakest way for any system to check a user password is to have a copy of the password to check against. This is very weak because that copy of the password might be read by an attacker, who can then use it to log in to the application. To avoid this possibility, we can store a hash of the password instead of the password itself, where the hash has been calculated in a manner which is not easily reversible. The login application checks a user-supplied password by applying the same hash function to the supplied password and comparing the result to the stored hash value. Even if an attacker could read the hash value, they cannot use it to derive the password. (Well actually this is not quite true, as will be explained soon.)

For some reason, it was decided by those who implemented password security that hashing is such a good solution that there is actually no need to protect the password file from prying eyes. On Unix systems, user names and password hash values were traditionally stored in a file called /etc/passwd, and this file could be read by any user on the system. Unfortunately, if you can guess a password value, and compare it against the hash value, then you can determine if your guess is correct. Computers being what they are, they are always getting faster and faster, and these days it is not at all infeasible to guess 1,000,000,000 possible passwords and check them against a supplied hash value. If a user has chosen a password from a set of values less than 1,000,000,000 in size, then their password can be guessed by anyone with access to the file of password hash values.

Taking this difficulty into account, it was eventually decided by experts on password security that it was not such a good idea to put password check values where anyone could read them, and now they are put into another file called /etc/shadow, which can only be read by processes that really need to read it.

Setting file permissions to control access to the password file is a start, but it is probably not enough.

If, for example, the password file is only readable by the "root" account, then if a criminal attacker is able to read this file, they probably did so by gaining access to the root account. If an attacker has gained access to root, then they are in a position to do all sorts of mischief, and it might seem pointless to worry about the fact that they can read the password file. But the advantage of reading a copy of the password file is that mischief can be done without causing any visible changes to the system being attacked. The attacker takes a copy of the password file, searches for user passwords that can be guessed, logs in to user accounts using successfully guessed passwords, steals stuff, and it all looks like the users themselves are doing it.

So security-wise, there is still something to be gained by making the password file really hard to read, for example so hard to read that even the root account isn't allowed to read it.

A Separate Password Management Service

A good way to stop even root from reading the password file is to put the password maintenance function on a completely separate computer or hardware device, which we might call the password module. The password module provides applications on other computers on your internal network with the following services:

Allow local applications to "register" and "log in" as users of the service.
Accept requests from local applications to create new users with passwords.
Accept requests from local applications to change user passwords.
Verify the correctness of supplied passwords.

The password module could be implemented as a dedicated device whose only function is to perform these services. Keeping the interface and implementation of this device simple will reduce any possibility of being able to break into it. Alternatively, the password module could be on an ordinary computer, but one which is connected to the network by a tightly controlled "password service firewall", which only allows requests specific to the password service to pass through to the password module. The advantage of having the password module on an ordinary computer is that software upgrades, hardware upgrades, replication and backup (you really don't want to lose all of your users' password check values) can all be done the same way that you normally do them for an ordinary computer.

Detecting that Password Information has been Read

In addition to the above approach, or perhaps instead of it, it is possible to protect a password file by detecting whether or not it has been read.

Of course you cannot tell if a file has been read just by looking at it – information that has been read by an unauthorised reader is indistinguishable from information that has not been read by an unauthorised reader.

But there is one easy way to know if someone has read a file, which is if they supply information to you which could only be gained by someone who has read the file. In the case of a password file, what this means is that you need to fill it with fake user names and passwords. For example, if half the user names and passwords in a password file are fake, then it is very likely that an attacker who has read the password file will attempt to log in to a fake user account, and you will immediately know that the file has been compromised.

Information about which users are fake users should not be stored in the password file itself, otherwise the attacker will know which values not to use. Ideally it should not be stored anywhere on your local network, which means that an access to a fake user account should trigger a signal sent to a completely external system, indistinguishable from a legitimate information delivery to normal users. (Another tricky point is that the fake users' passwords have to be equivalent in weakness to the real users' passwords, so that (1) the attacker can break them, and (2) they are not obviously fake. This implies you have to know as much about the weaknesses of real users' passwords as the attacker does.)

Preventing Unlimited Logins

The other vulnerability of weak passwords is that an attacker can make repeated login attempts, trying different passwords until they find one that is accepted. For example, if a user changes their password for a system once a year, and the system can process 100 login attempts per second, then an attacker has something like 3,000,000,000 opportunities to discover the user's password.

Of course common sense says that if 3,000,000,000 failed login attempts have occurred for a given user, then almost all of those attempts were probably not from the legitimate owner of the account.

This vulnerability can be reduced by disallowing unlimited login attempts. Preventing repeated login attempts is called account lockout. The problem with account lockout is that the real user can get locked out when they don't want to be locked out. Badly implemented account lockout can be used by an attacker to perform a denial of service attack against the website user. For example:

If account login attempts are throttled to 1 per minute, then an attacker can lock out a legitimate user by continuously performing 2 login attempts per minute.
If an account is completely locked out after N failed login attempts, then an attacker can lock out a legitimate user by performing N logins with invalid passwords.

Lockout Should Depend on IP Address

A straightforward way to avoid denial of service problems is to make throttling and full lockout a function of IP address. An attacker has one IP address, and the legitimate user has another IP address. If lockout criteria are applied separately to each IP address, then it will not be possible for an attacker to lock out the real user.

The attacker may of course have more than one IP address at their disposal, and they may even have thousands to play with, if they control a botnet. This can be a problem if lockout is made IP-dependent. For example, if only 100 failed login attempts are allowed per month per IP address, and the attacker has 1000 IP addresses under their control, the attacker can perform 100,000 attempts on a user's password each month.

One solution to this problem is part of a solution to the general problem of botnets – if thousands of Internet computers are being used for some illegitimate purpose, the owners of those computers should be notified that their computers are very probably being used illegitimately (unless they just happened to be a real user of the system at that time), and their Internet providers should also be notified. One hopes that most owners of "owned" computers will be sufficiently disconcerted to do something about the problem (like reinstalling, and learning how to secure their computers better).

If the actions of a botnet are traced in this way, then the owner of the botnet has to consider the benefits and costs of how they use it. It seems pointless to use a 1000 computer botnet, consisting of 1000 computers that have already been broken into, to break into one website account belonging to one user of one computer that the botnet owner hasn't broken into. By using a botnet the owner exposes its existence, and he or she risks losing all of it. Criminals using owned computers to hack into other computers will prefer to use techniques with better odds than those of guessing random passwords (for instance, looking for unpatched security holes).

Automatic Password Extension

Another approach to mitigate password guessing is to directly contact a user whose account is being attacked by an attacker guessing passwords. Your website could send the user an email warning them about the attempts to break in to their account, and the email could advise them to change their password and make it stronger. Better still, you could change the password for them, and then send an email telling the user what the new password is. To avoid losing the security of the existing password, the password change could consist of adding a new secondary password to the existing (primary) password, where the secondary password is specified in the email, and will be asked for in addition to the primary password the next time the user tries to log in to their account.

Whether or not this approach is acceptable depends on whether a user wants access to their account to potentially depend on them having immediate access to the relevant email account. Which is why I would suggest making it an option configurable by the user.

IP Address Restrictions

I've already mentioned that account lockout could be made IP dependent, i.e. applying login restrictions on a per-IP basis. Even better would be to distinguish IP addresses according to the probability that the IP address belongs to the legitimate owner of the account.

The most extreme approach would be to identify the user's specific IP address, and only allow logins from that address. For some users and some accounts, that may be an ideal solution, and it would create peace of mind to know that some hacker from some foreign country cannot log in to your account even if they do know the password. However, in many cases, it is likely to be too restrictive:

If a user logs in to a system with dynamic IP address assignment, they will have to specify a range of IP addresses that they might be using. (To know what this is, they will need reliable information from their ISP.)
Details relevant to their Internet connection could change without notice, for example an ISP might start using or stop using a particular block of addresses.
It might be more practical to specify a domain name for their account.
Sometimes users want to access a website account regularly from more than one location, for example from work and from home.

To be useful, the IP or domain name restrictions for an account have to be configurable by the user. They may need to be combined with options for account lockout, so that, for example, a user can log in from any IP address, but account lockout options are stricter for IP addresses not identified as belonging to the user's regular ISP.

Conclusion: The Allure of Simplicity

Passwords have a certain simplicity. The developer of a login service assumes that a user is who they are meant to be if and only if they know the password. Your application performs a simple check to see if the password is valid, and that is it.

This keeps things simple for the application developer, because everything depends on the password. The system refuses to take anything into account other than the fact that a login request for a user is accompanied by a correct password.

But life is harder for the user if everything depends on the password. The user has to create unique unguessable passwords. The user has to create a different unguessable password for each service that they use. The user has to remember all those passwords. If the user writes the passwords down, they have to worry about someone else stealing the list. If the user doesn't write the passwords down, they have to worry about forgetting them.

Life is more complicated for the developer if the password list has to be maintained on a physically separate system, and if user-configurable options are provided for account lockout, notifications of attempted breakins (with optional password extension) and IP address or domain name restrictions. But these options can make life a little easier for the user (even though the user has to think about what they mean), because they can make the security of the user's account not quite so dependent on the unguessability of a single random and hard to remember item of information.

a blog about things that I've been thinking hard about