What is encryption? There are many definitions of the term - some simple, and some more detailed and complex. To put it simply, encryption is a method used to modify a readable text in such a way so that it cannot be read by third parties but can be made readable again by the recipient, assuming s/he is authorized to read it.
Encryption and cryptography existed long before the computer era. Although the increasing power of computers makes it possible to create better and more encryption routines, or ciphers, and cryptography has undoubtedly become part of information technology, it has been in existence for thousands of years. Before I discuss modern cryptography, I'd first like to explain a few older encryption methods.
The Caesar cipher
This is one of the oldest known ciphers and, as its name suggests, it was used by the Roman leader Julius Caesar as a means of securing important information sent via messenger. The cipher was used to ensure that the army could not be located if the information was intercepted by the enemy. The cipher consisted of replacing each letter with the letter 3 places further on in the alphabet. Ciphers of this type, in which one letter is replaced by another, are called substitution ciphers. Monoalphabetic ciphers (the Caesar cipher is one of them) are a type of substitution cipher in which each letter of the original (unencrypted) text has a fixed counterpart among the letters of the encoded text.
This is a cipher which consists of shifting letters 13 places further on in the alphabet, beginning with the letter being encoded. This is a very simple cipher which does not provide any real security. Nowadays, this kind of encryption is used more as a curiosity, or to encrypt information which is not particularly important. The cipher does not distinguish between lower-case and upper-case letters. The example below shows that the mechanism used in this type of encryption is very similar to that used in the Caesar cipher, the only difference being the value of the shift.
This cipher was used by the German army to encrypt commands and instructions during World War I. It is an improved version of the earlier ADFGX cipher and consists of assigning a pair of letters (using only A, D, F, G, V or X) to each letter of the text being encoded. A key word, or password, is then created to make cryptanalysis more difficult. We also have a table filled with letters and numbers. The person to whom the message is sent has to know both the password and the way the letters are arranged in the table.
Suppose we want to encrypt the word "Kaspersky". To do that, we must first combine the letters into pairs:
K - FV
A - GX
S - AG
P - AV
E - VA
R - VD
S - AG
K - FV
Y - XF
The next step is choosing the password - let's use the word SZYFR (CIPHER). Next, we enter pairs of letters from the ADFGVX table into a new table in a linear sequence. Each column must have the same number of letters, so if there are not enough characters, more must be added. In this case, the two last letters in the table represent the number 0.
The last step is to arrange the columns containing the letters of the password in alphabetical order.
The word "Kaspersky" can then be encrypted using the last table. To do this, we rewrite the letters from left to right. To make the encrypted content more difficult to analyse, a space is inserted after every 6th character (because there are 6 letters in ADFGVX) in the sequence.
This is a very secure cipher, as there is no method for breaking it (including brute force, i.e. substituting every possible character to establish the correct order). It was effectively created in 1917 by Gilbert Vernam. There are two versions of this cipher - a binary one and an ordinary, character-based one. They differ in the method used to encrypt the text - one uses the XOR algorithm, the other the Vigenere algorithm. A password is used and it is extremely secure as it is the same length as the message, it is random, and it is used only once. The recipient generates any reply using a new key. Although both the XOR and Vigenere algorithms are substitution ciphers (and such ciphers can be broken) as long as the three conditions mentioned above are fulfilled, this encryption method is completely secure. The example below uses the binary method, i.e. using the logical operator XOR.
XOR is also called "exclusive disjunction" or "exclusive or". If either p or q is true (has the value 1), then the whole expression is also true.
Thus, in order to encrypt the word "Kaspersky", it first has to be written in binary. In practice, of course, an encrypted message would be longer, and therefore more secure, bearing in mind that the key will be the same length as the message.
Once the message has been written in binary, the password is generated. It must, however, be created in a totally random way, which is not easy. Although there are pseudo-random number generators, using them does not - as their name suggests - guarantee total randomness and certain regularities (periodicity) can be detected in the result. The best solution to this problem is to use a number generator which is based, for example, on randomness and on the changeability of the temperature of computer components (such as a processor). In order to illustrate the one-time pad cipher, the following numbers have been generated:
The next step is to perform the XOR operation:
After encryption, the message ("Kaspersky") looks like this:
Because the password which has been generated is used only once, it is called a one-time password. Attempts to crack this password will be unsuccessful, because it is impossible to reconstruct the original message without knowing the key. This is due to the fact that the cryptogram is as random as the key. Knowing the result of the XOR operation, we can only substitute characters one by one, and calculate the password based on them. If, however, we do not know whether or not the character substituted is correct, we will not be able to discover the password - all the more so as it will be used only once. It's therefore not possible to perform cryptanalysis based on the frequency of occurrence of certain symbols (as in the substitution ciphers), as these will change with every message.
Modern encryption algorithms
This section looks at the encryption algorithms used today and covers the difference between symmetric and asymmetric encryption. It also provides a brief description of the AES and RSA algorithms, the second one being covered in more detail and using an example.
This owes its name to the fact that most ciphers based on symmetric cryptography use the same key to encrypt and decrypt a message. Ciphers which use symmetric cryptography can be split into two groups: block and stream ciphers. In block ciphers, a message is divided into blocks of data before being encrypted, while stream ciphers encrypt individual data units. As the recipient is only able to decrypt the message if provided with the key, if the key is intercepted, this can compromise security. The strength of symmetric cryptography lies mainly in the key.
AES is a symmetric cipher, chosen as the winner of a contest designed to find a replacement for the obsolete standard DES, which did not provide adequate security. AES uses 128-, 196- and 256-bit keys. It is an algorithm which operates on blocks of varying length, and, as the keys themselves are of different lengths, ensures a very high level of security.
This type of cryptography is very common today, with digital signatures being one example of its use. When encrypting a message, a pair of keys is created: a private and a public one. The private key is designed only for its creator. S/he can use it to sign a message to verify that it is genuine. The public key is openly available and each recipient can check, among other things, whether the message s/he received has been modified between being sent and being received. The public key can be also used to encrypt messages, but these will be decrypted using the private key.
Here is an example which uses RSA to demonstrate how modern encryption algorithms work. RSA is sometimes called the "Rivest, Shamir and Adleman algorithm", with RSA being the first letters of the names of its creators. It was the first algorithm based on the above-mentioned asymmetric cryptography method and is widely used for digital signatures. The algorithm was created as a way of implementing the Diffie-Hellman key agreement, and eventually resulted in the idea of making one key available to all users and performing the encryption with another, individual one, finally being put into practice.
Before presenting the example which illustrates encryption using the RSA algorithm, here is an explanation of the symbols and the calculations used:
p - 1st large prime number
q - 2nd large prime number
(prime numbers can only be divided by 1 and themselves)
n - product of large prime numbers
(in 256-bit encryption we get the number of digits for n >300)
m - text written as a number
e - encryption key which is the relatively prime number for the product (p-1)(q-1), and e < n (relatively prime numbers have the number 1 as the common divisor).
Private key (decryption key) - made up of the numbers d and n, where d is calculated according to the following formula:
ed = 1 (mod (p-1)(q-1))
Public key - numbers n and e
c = me (mod n)
m = cd (mod n)
This is how the process of encryption takes place. We already know that large prime numbers are needed. In this example, however, smaller prime numbers are used; while they are too small to make the algorithm secure, they make it easier to perform the sample calculations.
Assume we want to encrypt the letter Y (a single letter rather than a word or sentence has been chosen for the sake of simplicity). In the decimal system, the letter Y is the number 89. The message is now in the form of a number. This is m. Now, we must find p and q, which are prime numbers. 19 and 29 can be divided only by themselves and 1, so they meet the criteria for being prime numbers (again it should be stressed that in "real" encryption these numbers would be much larger). Now let's start the calculation:
n = p * q
n = 551
(p-1)*(q-1) = 504
e = 5
This is a relatively prime number, because 5 and 504 have a common divider: 1 (the assumption e<n has been fulfilled). It should be noted that the value of e is chosen, rather than calculated.
We have all the data we need to start the encryption. During the decryption process, we will also have to calculate d.
m = 89
c = me (mod n)
c=895 (mod 551)
c = 5584059449 (mod 551)
c = 90
After encryption, the original message "Y" has a value of 90. In order to decrypt this, it is necessary to use the private key. First, however, we must calculate d, as mentioned earlier (the formulas below have been simplified for ease of reading):
ed = 1 (mod (p-1)*(q-1))
5d = 1 (mod 504)
5d = 505
d = 505/5
d = 101
The message is decrypted using the following calculations:
c = 90
m = cd (mod n)
m = 90101 (mod 551)
m = 89
89 is the number which represents Y (the original message), demonstrating that the message has been encrypted and decrypted correctly. As operations of this type are performed on much bigger values, using this algorithm ensures a greater sense of security. "Sense of security", rather than "security" because security can never be 100%. All algorithms have weaknesses and they will all be cracked sooner or later; because the power of computers is constantly increasing, this is only a matter of time. The weakness in RSA is factorisation (fortunately, due to the algorithm's complexity, factorisation takes too long to achieve an effective result).
Factorisation is the breaking down of a large integer into factors which, when multiplied together give the original integer. Here is an example: X is the large integer, and after breaking it down, we get the factors y1, y2, y3, yn. So, x=y1 y2 y3 yn.
In order to obtain a private key, the number used to create the public key must be factorised, giving the prime numbers p and q. If this can be done, it would be extremely easy to decrypt the message. In terms of RSA security, it is fortunate that the factorisation of very large numbers, such as those used nowadays, is very difficult and time-consuming. Although there are algorithms designed to speed up factorisation, the process still takes a significant amount of time. Security is increased by using longer keys (nowadays, keys longer than 1024 bits are used) and the longest RSA key cracked so far was 663 bits.
It is quantum computers that pose a real threat to the security and even the ultimate usefulness of RSA. As the name suggests, quantum computers operate on the basis of quantum mechanics, and their computing power will significantly exceed that of traditional computers, making it possible to completely factorise a large number far more quickly than today. The solution to this problem will, therefore, be quantum cryptography... It must be remembered, however, that encryption algorithms not only provide protection, but also present a challenge: although each new algorithm provides a certain level of protection, no-one can be sure how long it will last. Sooner or later, somebody will always find a way to crack it.
To encrypt or not to encrypt...
Many people have no idea how valuable the data stored on their computers or removable disks is. Although music files or photos could be used by cybercriminals for blackmail purposes) the real treasure lies in the data saved in Internet browsers, emails stored on the hard drive, instant messaging applications with saved passwords, and a wealth of personal and proprietary information in other forms. If you're selling your computer or hard drive, or simply passing it on to a friend, just formatting it is not enough: it will still be possible to recover all the information from it, and this can be done using free utilities. There are plenty of examples where personal or corporate data has been used for fraudulent purposes, including credit card theft and identity theft. There are also cases where blackmailers have bought used hard drives in Internet auctions solely for the purpose of recovering the deleted data and using it to blackmail the previous owners. Unfortunately, some victims give in to blackmailers' demands, preferring to pay the ransom rather than risk their personal or company information being revealed.
One method which can be used to combat such threats is disk encryption. Individual files, partitions and whole hard drives can all be encrypted, and some programs enable the creation of encrypted partitions, which can be accessed only once they have been connected to. This makes transferring important or confidential documents between computers safer: transfer files to the partition on computer A, then place the whole partition on the removable disk and connect it to computer B. Even if the removable disk is lost or stolen, the losses should (depending on the encryption algorithm used) be relatively insignificant, and encryption will provide a larger window for incident reaction and subsequent securing of data.
To prove that encrypting your data is not difficult and that anyone can do it, here are three short movies. They show the process of data encryption using the free TrueCrypt program:
- Creating an encrypted file warehouse
- Securing the volume with an additional key
- Using removable disks as encrypted disks
Such solutions would be of particular use to institutions and organizations which store personal data. Unfortunately, there are an ever increasing number of reports of the theft or loss of CDs and disks containing personal data. These cases wouldn't have been so worrying if the media on which the data was held had been secured using strong encryption algorithms.
In 2007, there were two high profile cases involving the loss of the personal data of a combined 28 million UK citizens. In reaction to the first of these cases, where 25 million people's data was lost, Jeremy Clarkson, the presenter of the popular British TV program "Top Gear" announced publicly that such stories were a fuss about nothing, and that his confidential data would be of no use to anyone. He then made his bank account and sort code available in the press. £500 quickly disappeared from his account. After the incident, he said:
"The bank cannot find out who did this ... and they cannot stop it from happening again. I was wrong and I have been punished for my mistake".
Meanwhile, the US based organization privacyrights.org estimates that 340,102,273 records containing personal data have been breached so far in 2009 in the US alone. These numbers show just how valuable personal data can be, and how important it is to protect it. Although individuals cannot influence how data is stored by commercial institutions and government bodies, they can safeguard the data stored on their own computers.
Encryption and threats
Sometimes, however, data encryption can be used for purposes other than those described above. The methods designed to protect data from unauthorised access can also be used to prevent the owners of data and files from accessing them. In 2006, the first variant of a virus called Gpcode appeared on the Web. This virus encrypted over 80 types of files on victim computers and then demanded a ransom for their decryption.
The virus spread via email. Victims received emails which looked like this:
We are writing to you regarding the resume you have posted on the job.ru website. I have a vacancy that is suitable for you. ADC Marketing LTD (UK) is opening an office in Moscow and I am searching for appropriate candidates. I will soon be asking you to come in for an interview at a mutually convenient time. If you are interested in my offer, please fill out the attached form related to compensation issues and email the results to me.
Unsurprisingly, the attachment contained a Trojan which then downloaded the virus itself. Gpcode encrypted many types of file, including PDF, DOC, HTML, and RAR format files. The malicious programs which had been downloaded and installed when the message was opened were then deleted, leaving only the following message:
Some files are coded by RSA method.
To buy decoder mail: email@example.com
with subject: REPLY
Initially, the encryption algorithms used were very weak (56 bits long) and were far from being perfect. Over time, however, the author of the virus improved the malicious code, ultimately managing to implement an RSA key which was 660 bits long. Although this provides extremely strong protection, and the longest RSA key ever cracked is 663 bits long, Kaspersky Lab experts were able to identify errors made by the author of Gpcode, and developed a decryption routine the same day this variant of the virus was identified.
Encryption is a double-edged sword: as the examples above show, it can be used to provide protection or to cause a great deal of damage. Although encryption ensures freedom of communication due to the fact that encrypted messages cannot be read without appropriate authorization (i.e. the correct password) it can also be used to weaken a country's security. After all, if anyone can use encryption, criminal organisations can also apply cryptography to conceal information.
As mentioned above, we do not have a say in the way our data is stored by others. However, we can all take responsibility for protecting the data we store on our computers and systems, and encryption is one of the tools we can use to do this.