Encryption of Text

1. Introduction

Encryption is the process of taking text in documents and encoding it in a systematic fashion. This is typically done for reasons of security, so that no one can read your file. Today, when transmitting sensitive files across the Internet, it is especially critical to have some form of protection.

Fortunately, there are extremely good algorithms available today to do encryption for us (see the book PGP, authored by Simson Garfinkel, and distributed by O'Reilly and Associates, Inc. of Sebastopol, CA). PGP (Pretty Good Privacy), a program that performs encryption and decryption, is available for free from many sites on the Internet. It runs on Unix, Mac and DOS platforms, although it is a bit cumbersome to use.

Here, we shall embark upon some projects which will illustrate various methods of encryption. This presentation is intended to serve only as an introduction to the principles and techniques of encryption, and is rather elementary. However, we will get to the point of being able to perform quite robust encryption, which is very difficult to "break," or decipher.

First, we provide some definitions. Text consists of keyboard characters, which can be printed to the screen and to output files. A line of text is composed of individual characters, including all those found on a computer's keyboard (plus some more that we won't discuss). This encompasses all the lowercase and uppercase letters a-z and A-Z, the numbers 0-9, and all the special keyboard characters such as @#$%^&*(){}[], as well as all punctuation including ,.:;"'?!`. All in all there are 256 characters which can be stored internally, although only 96 of the 256 can be printed.

The original message to be encrypted exists as lines of text, which we call "plain text." After encryption, we refer to the message as "encrypted text." The translator from plain text to encrypted text we call the "encryption algorithm," and the translator from encrypted text back to plain text we call the "decryption algorithm." We use an "encryption key" to encrypt and subsequently a "decryption key" to "unlock" the encrypted text.

2. Top-Down Approach

Our purpose here will be to describe ways of encrypting files, and to outline a Fortran code to accomplish this task. The steps required to accomplish this for a file containing lines of text are as follows:

Step 1: Open files: 1) the input file with the plain text in it, 2) the output file to contain the encrypted text (here, we may choose simply to write to standard output, i.e. to the screen), and possibly a third file containing an encryption key (although the "key" may be entered from the terminal or be included as part of the code).
Step 2: Read in from the input file the next line of plain text, testing for end of file.
Step 3: Encrypt the single line of plain text, character by character.
Step 4: Write the encrypted line of text to the output file. Repeat steps 2 through 4 until the end-of-file record of the input file is encountered.

3. Character Variables, Substrings and Arrays

Assemblages of single characters are termed strings. Strings may contain whole words, parts of words, or more than one word. Strings can be data, such as the literal, 'AiS is fun,' or variables which must be defind to contain a specific length of characters (actually, in Fortran 90 characters can be stored in allocatable arrays, which size can be adjusted exactly to fit the data, but we will not cover this here). We briefly review strings and characters.

Recall that string variables can be used to store many characters, i.e.

   character (len=23) :: string1, string2, string3
   character (len=46) :: string4
Here, three string variables are defined to contain storage space for up to 23 characters each. If fewer than 23 characters are stored in these string variables, they will be stored left-justified (i.e. existing at the left or beginning of the string). If more than 23 characters are attempted to be stored, the characters are truncated such that only the first 23 are stored. Normally, it is a good idea to code so as to define a large storage space for a string, and to keep track of the length and location of the characters stored in the string. It is also a good idea to check whether the storage allocated for a string is ever exceeded. If it is, then the code should be designed to "trap" the error and print out a warning and maybe even stop.

Note that string variableas are just like one-dimensional arrays in that they have beginning and ending locations in storage, and contain data in between. Substrings are defined as portions of strings, and are accesed in Fortran much like arrays, with a starting value and an ending value, except that there is no stride - all characters are accessed. Thus, a substring is referenced as string(beginning: ending), where beginning is the beginning index of the substring and ending is the ending index of the substring. If beginning is omitted, a beginning of 1 is used, while if ending is omitted, a value of the length of the string is used. Some examples follow.

It is possible to combine strings in two ways. The best way to combine complete substrings is to use the concatenation or "joining" operator // . This is shown as

string4 = string1 // string2
Note that there should be sufficient storage in the resultant string to hold all of the contents of the two strings being combined. Another way to combine strings is through the use of a do loop, providing much more flexibility at the expense of more complex coding. For example, to achieve the same string4 as above

!
!  combine strings in order
!
do i = 1, 23
   j = i + 23
   string4(i:i) = string1(i:i)
   string4(j:j) = string2(i:i)
end do
Of course, a much easier way to accomplish the above, since the strings are in sequence, is with explicit references:

string4(1:46) = string1(1:23) // string2(1:23)
The flexibility of do loops is illustrated to combine string1 and string2 in an interleaved fashion as follows:

j = 0
do i = 1, 23
   j = j + 1
   string4(j:j) = string1(i:i)
   j = j + 1
   string4(j:j) = string2(i:i)
end do
Normally, strings are encrypted character by character, requiring individual characters to be referenced via substrings. However, another way to reference individual characters is to declare arrays to consist of character variables of length one, and reference them as individual array elements. An example of this follows.

!
! declare variables
!
character(len=1) :: char(80)
integer :: i
!
   .
   .
   .
!
!  print out charcater by character
!
do i = 1, 80
  print *,'character no. ',i,'  = ',char(i)
end do


Armed with substrings and do loops, one can combine strings in any fashion.

4. Internal Character Storage - ASCII Codes

It is useful to understand how individual characters of text are stored internally in computers. This understanding will allow us to deal with characters arithmetically as numbers instead of as characters. This makes the operations to be performed much more efficient, as computers are designed and built to deal with numbers, and they do so very efficiently.

Thus, the encryption process can be thought of as reading in text, translating the text to a number which reprsents the character, encrypting the number, associating the new, encrypted character with the encrypted number. We now turn our attention to the details of how characters are associated with numbers internally in computers.

Recall that information is stored in digital computers in binary form. This includes all information, encompassing real numbers, integers, and text. The letters of the alphabet, including all keys that exist on the keyboard, some non-printing characters, and some "extended" characters which do not exist on the keyboard, are encoded in binary form using the American Standard Code for Information Interchange, abbreviated ASCII. The ASCII codes are published widely, and may be found in Appendix D of Ellis, Philips and Lahey. Specifically, note the following codes:

On standard computers, only ASCII characters 32 through 127 print. However, some computers will print ASCII characters 128 through 255, but some of these may look a bit strange (these characters are used for different purposes on different computers, sometimes to represent different fonts, sometimes to represent special characters in other languages such as the English symbol for the pound or the inverted question mark in Spanish, etc.). ASCII characters 0 through 31 are special, as some are like characters 128 through 255 which are special purpose and do not print on standard terminals, but others do have an observable effect. The ASCII characters 0-31 are given below, as they are not listd in appendix A of Ellis, Philips and Lahey.

ASCII character  0  -  null character
ASCII character  1  -  start of heading
ASCII character  2  -  start of text
ASCII character  3  -  breaks, end of text
ASCII character  4  -  end of transmission
ASCII character  5  -  enquiry
ASCII character  6  -  acknowledge
ASCII character  7  -  bell (causes terminal to beep)
ASCII character  8  -  backspace
ASCII character  9  -  horizontal tab
ASCII character 10  -  line feed
ASCII character 11  -  vertical tab
ASCII character 12  -  form feed
ASCII character 13  -  carriage return
ASCII character 14  -  shift out
ASCII character 15  -  shift in
ASCII character 16  -  data link escape
ASCII character 17  -  device control 1
ASCII character 18  -  device control 2
ASCII character 19  -  device control 3
ASCII character 20  -  device control 4
ASCII character 21  -  negative acknowledge
ASCII character 22  -  synchronous idle
ASCII character 23  -  end of transmission block
ASCII character 24  -  cancel
ASCII character 25  -  end of medium
ASCII character 26  -  end-of-file
ASCII character 27  -  escape
ASCII character 28  -  file separator
ASCII character 29  -  group separator
ASCII character 30  -  record separator
ASCII character 31  -  unit separator
Note that, although none of these ASCII codes result in an individually printed character, some of them have an observable effect. For example, ASCII character 8 is a backspace, character 9 is a tab, character 10 is a line feed, character 13 is a carriage return - all of which have a visible effect. Character 7 has an audible effect. Also, some characters (4-6, and 21-24) are used between modems for coordinating transmissions. Character 26 is the end-of-file or EOF character - this is what we actually test for when we use the Fortran read (1, format, end=sn). You will be able to observe the effects of printing these ASCII characters by working exercise 1.

Indeed, the ASCII codes represent one way of encrypting characters. The internal storage as numbers is actually encryption.

4.1 Fortran Intrinsic Functions for Characters

Some Fortran intrinsic functions apply specifically to characters. These can be found in Appendix A of Ellis, Philips and Lahey on pages 712-715. Some of these are as follows:

character(len = 1) :: single_character
integer :: i
!
i = 89
print *,' the character corresponding to ASCII code ',i, &
   'is = ',achar(i)
!
single_character = '3'
print *,' the ASCII code of character ',single_character, &
   'is = ',ichar(single_character)
!
print *,' the length of ASCII string ',single_character, &
   'is = ',len(single_character)
See Appendix A.7 of Ellis, Philips and Lahey on pages 712-715, and especially see len_trim(), trim(), lgt(), and lle() for more character intrinsic functions.

Character and String Exercises

Exercise 1. ASCII Codes and Characters

Write a Fortran program to print out all ASCII characters, from 0 to 255. Print the ASCII codes and the characters to the screen. Use the Fortran 90 intinsic function achar() (in FORTRAN 77, this function is char()).

Exercise 2. Parsing

Write a Fortran program to contain the ASCII string,"AiS is fun." Parse the string into individual characters, and print out the individual characters together with their ASCII codes. Remember to include blanks, and the period. Do not include the quotation marks.

Exercise 3. Frequency Counts

Obtain a large file of text (make sure the file is at least 100,000 Bytes long).

  1. Use the Unix utility wc to determine: 1) the numbers of lines, words and characters in the file, 2) the average number of words per line, and 3) the average number of letters per word. You may need a hand-held calculator to perform the calculations.
  2. Write a program to read in a text file, and determine the same information as in step (1). What are reasons for slight differences, if any?
  3. Modify the program you wrote in step (2) to determine frequency counts for all 256 ASCII characters. A frequency count is the fraction of time each character occurs, compared to the total number of characters in the document. Thus, the sum of frequency counts for all 256 ASCII characters must equal 1. This technique is sometimes used to decrypt encoded files based on knowing the average frequency of occurrance of each letter in the various languages.

4.2 Morse Code

To those of us who are Scouts, the storage of letters in ASCII code should be somewhat familiar to us as similar to Morse code. Morse code represents the 26 letters of the alphabet and the numbers 0 to 9 as encoded in dots and dashes. This actually is binary storage, as the dots and dashes can be thought of as 0's and 1's. Morse code is well designed, to be transmitted efficiently. Letters which appear most frequently, e and t are represented by a single Morse character - a dot for the "e" and a dash for the "t." Letters which appear the next most frequently are represented as a series of two dots and dashes, and so on, while those which appear the least requently appear as a series of four dots and dashes. This supports our hypothesis of being able to "break" codes easily, where individual characters are representad by the same code all the time, by obtaining a large message, and associating the encrypted characters using frequency counts.

5. Methods of Encryption

We will cover several methods of encryption, but first we must cover random numbers as necessary background for what is to follow.

5.1 Random Numbers

Many of the techniques for encrypton involve "random" numbers. Actually, all techniques for generating ranom numbers on computers involve some form of mathematical algorithm, which produces a sequence of numbers in a definite, repeatable pattern. As such, the numbers are only "pseudorandom," but we often use just the term "random" to describe them. There are many ways to generate random numbers; many involve sophisticated mathematics, beyond the scope of this work. Our focus will be on the use rather then the generation of random numbers.

Fortran 90 has a very good random number generator. Actually, Fortran 90 is the first language to include a random number generator as an intrinsic subroutine which is part of the language - this is extremely useful, as it makes programs using random numbers portable across all computers which have Fortran 90 compilers. We now turn our attention to the details of Fortran 90's random number generator (see page 711 of Ellis, Philips and Lahey).

To obtain a sequence of random numbers in Fortran 90, one calls the subroutine random_number (real_variable). If real_variable is a scalar (that is, not an array), a single random number is returned to real_number. If real_variable is an array, the entire array is filled with random numbers. The random numbers returned are uniformly distributed between 0 and 1. This means that, if the interval (0,1) is partitioned into uniform bins, then the random numbers returned are equally likely to appear in any bin. Specifically, if we partition (0,1) into 100 bins, then each random number has the same chance of being in any of the bins - 1%. The effect of this is that, after many random numbers are returned, nearly the same number will be in each bin. This is actually how nature behaves in a perfectly random world. For example, when we draw a card from a randomly shuffled deck, it is equally likely that we will get any of the 52 cards. It is useful to familiarize ourselves with random numbers through some exercises.

Random Number Exercises

Exercise 4 Random Numbers

Write a Fortran 90 code to: 1) prompt the user to input the number of random numbers to be generated and printed, 2) read the number of random numbers to be generated as input from the keyboard, 3) fill an array (make sure there is sufficient storage) with random numbers uniformly distributed between 0 and 1, and 4) print out the number of the random number and its value. Run the code twice, generating 10 random numbers, and compare results between the two runs. What do you observe? What can be concluded from this? Run the code and generate 100 random numbers. Do you observe any patterns in the random numbers? Comment on whether there should be or should not be any patterns in the "random" numbers.

Exercise 5 Random Number Averages

Write a Fortran 90 code to fill arrays with random numbers between 0 and 1, and compute their averages. Fill an array of length 100,000 with random numbers. Compute and print out averages for the first: 1) 10, 2) 100, 3) 1,000, 4) 10,000 and 5) 100,000 of these random numbers. What do you observe about the averages as the number of samples is increased? Run the code twice, and compare results between the two runs. What do you observe? What can be concluded from this? (Hint: Use the array intrinsic function sum() on page 720 of Ellis, Philips and Lahey.)

5.2 Shuffling

Many ways exist to encrypt text. Perhaps the simplest way to encrypt text is to shuffle the characters. In this method, used in Cryptograms in the puzzle sections of newspapers, each plain text character is represented by another "encrypted" character. For example, if we shuffle so that each character is represened by the succeeding one, then the plain text a, b, c, ..., y, z is represented by the characters b, c, d, ..., z, a. If we let capitals remain capitals, and smalls remain smalls, then the plain text, "AiS is fun." is encrypted as: "BjT jt gvo." This is a very particular example, and would be trivially easy to "break" or decipher. A more typical example is when the encrypted characters are randomly associated with the plain text chcracters. This is often accomplished using a random number generator.

As a specific example, let us consider the capital letters A-Z, represented by ASCII codes 65 to 90. If we let our plain text be represented by indices 1 to 26 (we just need to subtract the offset of 64 from the ASCII codes), then we just need to generate an array of shuffled indices to perform the encryption as follows:

program shuffle
!
!  set storage
!
integer :: indices_shuffled(26), index_random, index_save, index_ascii
integer :: offset, nleft
real :: rn
!
! fill the array of indices
!
do i = 1, 26
  indices_shuffled(i) = i
end do
!
!  now "shuffle" the indices randomly
!
do i = 1, 25
!
!    -  randomly pick an index from i to 26
!
  call random_number (rn)
  nleft = 27 - i
  index_random = int(rn*nleft) + 1
!
!    - interchange the indices
!
  index_save = indices_shuffled(i)
  indices_shuffled(i) = indices_shufled(index_random)
  indices_shuffled(index_save) = index_save
end do
!
! print out randomly shuffled indices
!
print *,' shuffled indices follow'
!
do i = 1, 26
  print *, i,'  - index = ',indices_shuffled(i)
end do
!
!  skip two lines

! print * print * ! ! print out encryption table for characters ! offset = 64 print *,' E n c r y p t i o n T a b l e F o l l o w s' print *,' index *** plain text character *** encrypted character' do i = 1, 26 print *, ' ',i,' ',achar(i+offset),' -> ', & achar(offset+indices_shuffled(i)) end do ! ! stop execution ! stop ! ! end compilation ! end program shuffle

Shuffling Exercises

Exercise 6 Full Encryption Table

Extend the Fortran 90 code above to generate a full ASCII encryption table for all ASCII characters which print (these are ASCII codes 32 through 127). As above, print out the shuffled indices (here, there will be 96 indices), and the ASCII conversion table.

Exercise 7 Full Decryption Table

Extend the Fortran 90 code of Exercise 6 to generate a full ASCII decryption table for ASCII characters 32 through 127. A decryption table is a translation (or association) between the encrypted character and the original ASCII character. To do this, the encryption table must be "inverted," or made to associate "backwards" rather than "forwards." Print out both tables of indices, and both the encryption and decryption tables. Make sure that any plain text character, when encrypted and decrypted, results in the original plain text character.

Exercise 8 Cryptograms

Using the above techniques, write a Fortran 90 code to take as input a full line (string) of characters, generate a table of shuffled indices, and print out the encrypted line. Here, as it is best not to encrypt punctuation and special keyboard characters, you need only encrypt capital letters (ASCII codes 65 through 90), small letters (ASCII codes 97 through 122) and the numbers 0-9 (ASCII codes 48 through 57). See if your school newspaper wants to include your cryptograms as a regular feature.

5.3 Better Methods of Encryption

The "shuffling" method of encrypting messages is very easy to "break" or decipher, because there are simply not that many possibilities for the decryption table because each character of plain text encrypts to the same character of encrypted text. Better methods of encryption do not suffer from this limitation, as the same character of plain text encrypts to different characters of encrypted text. We shall explore a method using a specific string of text, called an "encryption key," to be used to encrypt plain text.

As an example, we shall use the encryption key, "AiS is fun." We shall encrypt the plain text, "Every good boy deserves favor." The key here is only 11 characters long, and the plain text is only 30 characters long. Normally, a key may be several thousand characters long - the longer the key, the more dificult it is to decipher the code. If the key is shorter than the plain text message to be encrypted, it is used over and over again, as many times as necessary to encrypt the entire message. When the end of the key is encountered, we jump to the beginning of the key and continue. This is called a "cyclical permutation" of the key.

The key can be used in a variety of ways to encrypt the plain text. One of the simplest ways is to add the ASCII codes of the key and the plain text. In this case, we must have some way to ensure that the resulting ASCII code is within acceptable bounds. For example, suppose we start encrypting the plain text above. If we select a random starting location in the key of the eighth character, "f," then the first encrypted character is computed as explained in the following paragraph.

Recall that the ASCII code of the encrypted character is the sum of the ASCII codes of the crurent characters of the key and the plain text. Here, the ASCII codes of the key character, "f" and the plain text character, "A" are 102 and 65, respectively. The sum is 167, which exceeds 127, the maximum ASCII code of the printable characters. An easy method to adjust for thus is to subtract 128 from the resulting ASCII code, if it is greater than 127. Thus, the first encrypted character will be ASCII code 167 - 128 = 39, or the ASCII character ampersand, "&". The above plain text message encrypts, beginning with key character, f, as follows:

ichar(f) + ichar(E) = achar(102 +  69 - 128) = achar( 43) = +
ichar(u) + ichar(v) = achar(117 + 118 - 128) = achar(107) = k
ichar(n) + ichar(e) = achar(110 + 101 - 128) = achar( 83) = S
ichar(.) + ichar(r) = achar( 46 + 114 - 128) = achar( 32) = neg. ack.
ichar(A) + ichar(y) = achar( 65 + 121 - 128) = achar( 58) = :
ichar(i) + ichar( ) = achar(105 +  32 - 128) = achar(  9) = hor. tab
ichar(S) + ichar(g) = achar( 83 + 103 - 128) = achar( 58) = :
ichar( ) + ichar(o) = achar( 32 + 111 - 128) = achar( 15) = shift in
ichar(i) + ichar(o) = achar(105 + 111 - 128) = achar( 88) = X
ichar(s) + ichar(d) = achar(115 + 100 - 128) = achar( 87) = W
ichar( ) + ichar( ) = achar( 32 +  32 -   0) = achar( 64) = @
ichar(f) + ichar(b) = achar(102 +  98 - 128) = achar( 72) = H
ichar(u) + ichar(o) = achar(117 + 111 - 128) = achar(100) = d
ichar(n) + ichar(y) = achar(110 + 121 - 128) = achar(103) = g
ichar(.) + ichar( ) = achar( 46 +  32 -   0) = achar( 78) = N
ichar(A) + ichar(d) = achar( 65 + 100 - 128) = achar( 37) = %
ichar(i) + ichar(e) = achar(105 + 101 - 128) = achar( 78) = N
ichar(S) + ichar(s) = achar( 83 + 115 - 128) = achar( 70) = F
ichar( ) + ichar(e) = achar( 32 + 101 - 128) = achar(  5) = enquiry
ichar(i) + ichar(r) = achar(105 + 114 - 128) = achar( 91) = [
ichar(s) + ichar(v) = achar(115 + 118 - 128) = achar(105) = i
ichar( ) + ichar(e) = achar( 32 + 101 - 128) = achar(  5) = enquiry
ichar(f) + ichar(s) = achar(102 + 115 - 128) = achar( 89) = Y
ichar(u) + ichar( ) = achar(117 +  32 - 128) = achar( 21) = neg. ack.
ichar(n) + ichar(f) = achar(110 + 102 - 128) = achar( 84) = T
ichar(.) + ichar(a) = achar( 46 +  97 - 128) = achar( 15) = shift in
ichar(A) + ichar(v) = achar( 65 + 118 - 128) = achar( 55) = 7
ichar(i) + ichar(o) = achar(105 + 111 - 128) = achar( 88) = X
ichar(S) + ichar(r) = achar( 83 + 114 - 128) = achar( 69) = E
ichar( ) + ichar(.) = achar( 32 +  46 -   0) = achar( 78) = N
Thus, if we use ASCII character 126 or ~ to denote non-printing characters, the plain text encrypts as follows:

Every good boy deserves favor.
+kS~:~:~XW@HdgN%NF~[i~Y~T~7XEN
Note that the eighth character is a horizontal tab, with the result that your output will have a series of spaces after the seventh character when the tab is displayed. Also note that: 1) the same plain text characters encrypt to different encrypted text characters, 2) the encryption key may be selected to begin at a "random" location - here location 8, 3) the encryption key is cyclically permuted to "wrap back upon itself," and 4) where the resulting ASCII codes exceed the maximum allowable value of 127, they are brought back into the acceptable range by subtracting 128. This latter point is not necessary, as all 256 ASCII codes can be used. However, it is convenient for us to be able to display most of the results. Finally, note that the present method of encryption allows non-printing ASCII characters 0-31 as resultants.

Encryption Exercises

Exercise 9 Encryption Code

Write a Fortran code to perform the above encryption. Make sure the answers obtained are exactly those shown above.

Exercise 10 Full Encryption Code

Write a Fortran code to perform encryption as above, by adding the ASCII code of the key to that of the plain text. Verify the code by performing the example above. Reserve storage for up to 2,000 characters of key text. Use the key:

"The tree of liberty must ever be replenished by the blood of patriots. Patrick Henry"

Include the quotation marks in the key. Construct the code so that it opens an input file of plain text, reads in the plain text line by line, encrypts the line of text and writes the encrypted text out to an output file. Choose a random location in the key to begin encryption. Use only ASCII codes 0-127, as above.

Exercise 11 Including Decryption

Modify the code of Exercise 10 to generate a decryption table too. Make sure that the plain text, encrypted and decrypted, results in the original text.

Exercise 12 Deciphering Code

Modify the code of Exercise 11 such that, given the key in Exercise 10 but not the starting location of the key, you determine the starting location of the key. Comment on what you could do to "break" the encryption if you do not know the key (or even the length of the key).