The Problem With Special Characters
This may seem a bit obtuse if you aren't a tech junkie but bear with me; it's important. In the world of information technology, we need to be able to interact with software using more than just natural language and numbers. We need symbols that have what is referred to as "special meaning." Special meaning means that a single symbol is interpreted in a specific way to the software we're interacting with; much like a shortcut. Here are two examples:
Example 1
ls -l *
This is a Unix command we might type at the command line to display information about the files in the current directory. It contains two special characters: the dash '-' and the asterisk '*'. To the shell (where we type the command), the dash character means that the software should interpret the following character as a parameter to the command. In this case, -l means to display more information in the output than the ls command otherwise would. The asterisk means to display information about every file and directory in the current directory.
Example 2
http://www.example.com/store?item=123456&quantity=2
This example is not unlike information we might find in the address line of a web browser during a purchase. We have lots of characters with special meaning to the web server. The colon, forward slash, question mark, ampersand, and equal sign all have special meaning to a web server.
The problem here is that if we need to pass a password to a command, and that password uses characters with special meaning, it can confuse the software that reads and tries to interpret the entire command. In short, we get errors.* There are ways around this in some cases, and certainly we could write the software that does the interpretation of the command to more elegantly read and interpret the characters but this leads to increased complexity (increasing the potential for bugs), and it introduces additional code that slows down the process of figuring out what you want the software to do. So in short, more bugs, and more time.
A Mathematical Solution to Eliminating Special Characters
Consider the purpose of a password. It's basically a key. The arrangement of characters in a password is not unlike the peaks and valleys of a physical key that you would use to lock your house. If that key is more complicated, it's harder to reproduce. Hackers use software to try to reproduce the arrangement of characters in a uniform way to 'guess' your password. This is the essence of brute-force hacking. It works like this:
First try a, then b, then c followed by the remaining letters. Next, try aa, then ab, then ac, and so on.
We can leverage one very interesting constraint on brute-force hacking that can render the process ineffective: compute power. Computing systems can only do so much in a given period of time. The method described above requires a very large number of combinations, each of which requires a certain amount of the computer's time to complete. There is a way we can construct a password such that it requires more time to hack, effectively thwarting the brute-force hacking attempt. In short, we use the limitations of the hacker's computing systems against him.
This is where we circle back to the character set. We can make it take longer to guess a password by increasing the password's complexity. We do this by adding additional characters to the character set we use. In the following table we show the number of attempts that a hacker would have to make to try to guess a password that is 8 characters long using increasingly complex character sets:
Character Set | Combinations |
a-z | 1,562,275 |
a-z, A-Z | 752,538,150 |
a-z, A-Z, 0-9 | 3,381,098,545 |
a-z, A-Z, 0-9, . , ! $ & # @ etc. (a total of 23 special characters from the US keyboard) | 48,124,511,370 |
Table 1: Combinations of characters by increasing the character set
Just by adding characters with special meaning, we've forced the hacker's computer to do roughly 16 times more work in order to crack our password than if we used only letters and digits. But, we still have the issue of software complexity described above. That brings us to the silver bullet of strong passwords. Instead of adding characters to the character set we use, we could increase the length of the password. But would that really do much? Wouldn't we have to add many characters in order to increase the number of combinations to make it prohibitive to crack? Let's see:
Character Set | Password Length | Combinations |
a-z, A-Z, 0-9 | 8 | 3,381,098,545 |
a-z, A-Z, 0-9 | 9 | 20,286,591,270 |
a-z, A-Z, 0-9 | 10 | 107,518,933,700 |
a-z, A-Z, 0-9 | 11 | 508,271,323,000 |
Table 2: Combinations of characters by increasing length
Those are some pretty big numbers. Let's break this down into English, comparing the last line of Table 1 with row 3 of Table 2. By adding just 2 more characters onto the length of the password, we more than doubled the number of guesses, reduced the required characters in the character set by 23, and eliminated any problems that special characters might cause on the command line. Adding just one more character to the password length means our hacker friend's computer will need to guess up to another 4 hundred billion combinations of letters and numbers.
That is the power of the mathematical topic of combinations, and that's why you are much better off increasing the length of the password, than just adding special characters. In fact, you can even use simple words of your own language one after another, so long as the length of the password remains long.**
Just for fun, how long do you think it takes to attempt a brute-force attack? In 2018, using the most powerful computers available, (not what's sitting your your desk) a 12 character password would take about a year. A 13 character password would take 64 years! As time passes, advancements in computing power will shorten these times. The elegance of the process is that we just add one more character to ensure a long time required for the attack.
Conclusion
Passwords are necessary today but complexity is not. Increasing the length of a password increases its protection from being guessed much more than just increasing the character set used.
* Yes, I know, we should not be passing password in at the shell or in the address line of a web browser. In a perfect world, we would not. In the real world, we often have to deal with software that we have no control over, and it may be a necessity.
** I would recommend against using individual words separated by spaces. There is a variant of brute-force attacks that use words from the dictionary in a similar fashion. If using full words, you'll want to remove spaces, or insert them in the middle of words to prevent each string from matching a word from the dictionary.