Security in software: seriously, leave it to the experts
A common failure seen in software engineering (and indeed in many aspects of life) is people trying to manually write their own code for security purposes and inadvertently failing to write it properly.
This article goes into this topic and explains precisely why most security matters are best done with existing frameworks and functions. It is also written to be understandable if you are not familiar with software development, excluding the at a glance section below.
Table of Contents
· At a glance: what we should and shouldn’t do ourselves
∘ Things we should do
∘ Things we shouldn’t do
· A basic overview of password security
∘ Encryption (hashing) algorithms
∘ Rainbow tables
∘ More computationally expensive algorithms
· The problem with doing it yourself
· Leave it to the experts
At a glance: what we should and shouldn’t do ourselves
As developers, when we build software, there are certain security practices we should and should not engage in. This is a non-exhaustive list.
Things we should do
- Filter ALL user input, to prevent cross-site scripting attacks and other malicious code insertion
- Use parameterised database queries to prevent SQL injection
- Ensure all data that must remain private at all times, like API keys and secrets and user access tokens, are stored securely
Things we shouldn’t do
- Attempt to make our own encryption algorithms or random number generators for cryptographic use, or make our own salt values
- Insist on using a specific encryption algorithm hardcoded into our application, unless there is an extremely good reason (e.g. a complete lack of any standardised solution in the language in question)
- Configure any security-related part of our software unless we know exactly what we are doing — leave it to someone better qualified (webserver configuration, etc)
A basic overview of password security
Encryption (hashing) algorithms
When you register for a website or service, you are asked to create a password. The website will use a one-way encryption algorithm to create a “hash” of your password: a hash is effectively a ‘string’ of letters and numbers (in computer science, a ‘string’ is like a word: some letters or other characters.). It then stores this hash, alongside your username, in its database.
The hash, and the one-way algorithm used to make it, fulfil a few requirements:
- The hash is always the same length, no matter what password is given
- The hash bears no relation to the password — if you change the password by one letter or number, the hash could change a lot or very little but it will never change in a predictable manner
- You cannot feasibly turn the hash back into the password — i.e. the algorithm only goes one way
This hash means that the website doesn’t need to store your password: when you next log in, it can check if the hash of the password you give is the same as the one they have in their database. This is a lot more secure than storing the actual password, because if someone hacked the website, your password is not exposed.
You might wonder how such an algorithm can always be secure: the simple answer is that it can’t, at least not forever. Over time, vulnerabilities in such algorithms are gradually found, or computing power becomes so strong that it becomes plausible to compute every possible hash. The latter approach no longer works due to an approach called salting which is described later, but it has happened in the past; a table full of pre-computed hash values is known as a “rainbow table”.
If you think about it, you could compromise any hashing algorithm by just computing the hash of every possible password up to a certain length, e.g. every password up to 20 characters long (and it has happened in the past — but wouldn’t work now). This is known as a rainbow table.
In theory that sounds easy, but it’s actually extremely difficult in practice, because of how many possible permutations there are. Imagine, for example, that your password can contain A-Z, a-z, and 0–9 as the only possible characters.
That is 62 different possible characters (upper and lower case letters are different characters in computer terms). If your password is one character long, there are 62 possible combinations. What if it’s two characters long? 62², which is 3844 combinations. Three characters long? 62³, 238328 combinations.
As you can see, this gets big very quickly. Even with this restrictive character choice, the rainbow table of passwords up to 10 characters long has 8.3929937e+17 possible combinations (or approx 839299370000000000). You won’t finish that many hashes anytime soon with your home PC.
In times past, MD5 was a common algorithm used to secure passwords; however, as computer processor power increased exponentially, MD5 was compromised as rainbow tables of it became available. So how do we fix this?
The answer is a process known as salting.
Salting is a common practice used for passwords today. Instead of hashing the password alone, you add a “salt” to it (another random string of letters and numbers), and hash the resulting string. You can then store the salt alongside the hash in your database.
This is useful because simply by adding a salt, the rainbow table’s values are no longer valid, as the resulting hash of each password will now be different. If you are randomly generating a different salt for each user in your database, an attacker would have to compute one rainbow table *per user*, on top of the already huge outlay of doing a single rainbow table anyway.
More computationally expensive algorithms
MD5, and other algorithms like SHA1 and SHA256, are no longer used today for password security in any place that takes security seriously.
There are many reasons for this, but one of the reasons is that today, computing the MD5 hash of a given password is extremely fast to do. Algorithms that take more work from the computer are more secure for password use, as computing a rainbow table of them would be prohibitively difficult, and this also limits brute-force attacks in some regards. Of course, it still has to be reasonably fast — nobody wants to log into a website and wait 5 minutes for it to check the password — but at the same time, it shouldn’t be possible to check a trillion password hashes in 0.00000000000001 seconds either.
Other reasons, like hash collision vulnerabilities, are beyond the scope of this article.
The problem with doing it yourself
Doing this kind of security work yourself frequently leaves your applications vulnerable to all sorts of attacks: from failing to use a proper cryptographically secure random number generator, or using your own salts that aren’t up to scratch, you can easily end up making a serious security vulnerability in your code.
In particular, hard-coding an algorithm or a particular implementation of it into your code is unquestionably a terrible idea. There are still companies today using salted MD5 for passwords, which is the result of both a lack of standardisation at the time and a belief that MD5 was unlikely to ever be compromised.
Today, most standard security methods will automatically use a default hashing option. Taking PHP as an example, it uses bcrypt by default, but allows the default option to change (and also changes the length of the resulting hash over time, to ensure that brute-force attacks remain implausible). It also takes care of generating a cryptographically secure salt, allowing you as the developer to simply make one call in your code and never have to worry about specific algorithms or implementations again.
Doing it yourself — e.g. insisting on a particular length, cost or other parameters — invariably leaves an application exposed to forgetting to update this over time and becoming vulnerable to attack.
Leave it to the experts
When you’re say, making a website — you might feel the temptation to implement your own password validation. Few people are insane or foolish enough to try and implement their own hashing algorithm, but quite a few will hardcode a specific existing one into their code. Don’t do this.
For many programming languages and frameworks, libraries exist that are specifically written and implemented by security experts so that you don’t have to delve into this part yourself. Not only are they more likely to be secure than your own implementation, but on top of that, not hardcoding a specific algorithm or method into your code means it will not become insecure later when that algorithm or method does.
For example, PHP has password_hash() and password_verify() functions. These not only use highly secure password hashing algorithms, but also generate a cryptographically secure random salt as well: they’re easy to use and will remain secure.
Other languages have their own libraries for generating and verifying hashes, or for generating cryptographically secure random numbers (e.g. Java’s SecureRandom).
This isn’t just for passwords, either. Unless you’re an expert, leave every security-related matter to someone who knows that area better than you do. I recently began hosting a website with some of my own apps on it, but I bought a managed VPS solution: I am not an expert in ensuring server firewalls and so forth are configured correctly, as that is a field that requires up-to-date specialist knowledge. I’d rather leave it to an expert in that field.