3. Syntax

In this part of the course, we will finally begin to learn how to cast our algorithms into actual computer programs. In order to do this, one needs a sort of set of grammar rules that are necessary to convert a natural language into a computer language. The set of such grammar rules are collectively referred to as the syntax of a computer language. Although each computer language is different, the elements that make up all languages are similar. Starting with this lecture we will be learning about these elements.

3.1 Variables

While discussing the algorithms above, we highlighted certain elements of each algorithm, namely process, decision, input, output etc. Of central importance among these elements are variables that enable the abstraction of the main objects, such as N and X. A variable in a computer program is in a way similar to a variable in maths. It’s basically a symbol whose content may change during the course of the program. The difference between variables in mathematics and computing is that mathematical variables are an abstract concept whose values may never be explicitly assigned whereas in most computer codes (except those which perform analytical computations), each variable has a well-defined value at any given time. This is in contrast with a constant whose value is assigned to a number, string or character once and for all without undergoing any changes.

Let us see examples of how variables are handled in different computer languges. We will star with the simpler Octave and shell and continue with higher-level languages Perl and Python.

Octave and Shell

octave:1> a=pi
a = 3.1416
octave:2> b=a^e
b = 22.459
octave:3> a=(sin(a/6))^2
a = 0.25000 

Here a and b are variables. At the beginning of the Octave session, a is assigned to the prestored number pi. On the next line, the second variable b is calculated by raising the first variable a to another special constant e. Notice here that the computer remembers the value of the variable a when called while calculating the value of the variable b and replaces in its place the number pi. In the final line, we observe common usage of variables available in many computer languages which is the reassignment of a variable through self-manipulation. On the right-hand side of the equation, we see an operation involving the old value of the variable a, namely pi. After this operation is completed, the resulting value is reassigned to the same variable a whereafter the value of this variable is modified to 0.25. Variables are necessary in computer programming for the flexibility of our programmes. It would be silly to write a program that knows only to add 2 and 4 but not 2 and 3, much like how absurd algebraic addition would be if it were only defined for 2 and 4 and not any other number. In practice, variables in different computing languages are accessed in different ways. In some languages such as C and Octave, they are called simply by their name, whereas in others, you might need a special additional character to access the value assigned to a particular variable. The shell programming language is an example to those languages where values of variables are extracted using the special character, $.

user@machine:~$ a="Phys200 is a great class."
user@machine:~$ echo $a
Phys343 is a great class.
user@machine:~$ b=4.3
user@machine:~$ echo The number that you entered is $b.
The number that you entered is 4.3.

Notice the difference in the way we assign and extract variables in shell. In assigning a number, a string or a character to a shell variable, we just simply use the name of the variable with the = sign. In accessing it however, we need to use the $ sign. Compare this to how variables are accessed directly without the need of a special character in Octave as in the example above.

Many computer languages are able to sustain several kinds of variable types. Let us see the examples available in Octave.

octave:1> a=3
a = 3
octave:2> b=3*pi
b = 9.4248
octave:3> c="a"
c = a
octave:4> d="Monday"
d = Monday
octave:5> f=[1 2 3 4]
f = 
1 2 3 4
octave:6> m=rand(2)
m =
0.0825605 0.0788895
0.0026895 0.9710686
octave:7> g=["a" "b" "c"]
g = abc 

In the snippet shown above, the below variables and their respective types have been used:

  1. a : integer
  2. b : real (usually called double or float in computing jargon)
  3. c : character (a letter, punctuation mark, special character or number that is not interpreted as integer or real number)
  4. d : string (collection of characters)
  5. f : array of integers
  6. m : a matrix
  7. g : array of characters (equivalent to a string)

Notice the following interesting usage:

octave:1> var1=3
var1 = 3
octave:2> var1+3
ans = 6
octave:3> var2="3"
var2 = 3
octave:4> var2+3
ans = 54

What exactly happened here? In the first line, var1 is set to 3, the integer. When one adds, 3 to var1 in the second line, it is interpreted as a simple addition and the answer is just 6, as expected. In line 3, var2 is defined to be "3", the string. The double quotes tells the computer that var2 is to be interpreted not as a number but as a string. When 3 is added to var2 in line 4, the numerical equivalent of var2 in the string encoding system called ASCII is called (which is 51) and the number 3 is added to it to get 54.

In Octave, there is no need to tell the computer which variable is of which type. It figures it out by itself as soon as you assign a value to a variable. In addition, the type of the variable may be changed throughout the Octave session. For instance, the above code snippet can be modified as follows:

octave:1> var1=3
var1 = 3
octave:2> var1+3
ans = 6
octave:3> var1="3"
var1 = 3
octave:4> var1+3
ans = 54

As you see here, the variable var1 was initially set to be an integer but later as a string. In other computer languages such as C, the type of the variable must be explicitly declared in the beginning and cannot be changed later unless redeclared. Here is a code fragment from a C code.

#include <stdio.h>
#include <math.h>
main() {
    int a;
    double b,c;
    a=3;
    b=3.4234;
    c=a*b;
    printf("The result is %lf.\n",c);
}

Perl

Another computer language that is similar to the shell is Perl. When you invoke perl in the command line, it takes you into a different prompt where you can type commands.

user@machine:~$ perl -d -e 1
main::(-e:1): 1
DB<1> $a=3
DB<2> $b=pi()
Undefined subroutine &main::pi called at (eval 7)[/usr/share/perl/5.18/perl5db.pl:732] line 2.
DB<3> use Math::Trig
DB<4> $b=pi
DB<5> $c=$a*$b
DB<6> print 'The result is ',$c
The result is 9.42477796076938
DB<7> print 'The result is ',$c,'.'
The result is 9.42477796076938.
DB<8> exit()
Debugged program terminated. Use q to quit or R to restart,
use o inhibit_exit to avoid stopping after program termination,
h q, h R or h o to get additional info.
DB<9> q

The greatest difference that we see in this case is the reference to the variables. Instead of using direct assignment, we need a special character $ even in the assignment stage. Contrast this to the shell where this sign is used only during access to the variable. Another point that is worth noticing is that the number π is not immediately available as it is in Octave. Instead, we need to load a module, namely Math::Trig in order to access it.

Python

Python is a language that has the flexibility of dealing with mathematical as well as file and text based operations. Let is do a similar variable manipulation operation in python. Python, much like Octave can be called from the terminal. If you go to your terminal and simply type python, you should see a new prompt like this: >. You can then start typing python commands. Let us see a session:

user@machine:~$ python
>>> a=3
>>> b=pi
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'pi' is not defined
>>> import math
>>> pi=math.pi
>>> b=pi
>>> c=a*b
>>> print 'The result is',c
The result is 9.42477796077
>>> print 'The result is',c,'.'
The result is 9.42477796077 .
>>> exit()
user@machine:~$

In this session, we first assign the value 3 to a variable a. Then, just like we did in Octave, we tried to use the special number π but was hit by the computer with an error. This is because, unlike Octave, Python does not come with special characters like π and e or special functions like sin and arctan. Instead, in a similar fashion to perl, in order to get python to conduct a particular task, we need to import or bring in the module that is related with that task. For operations related to mathematics, the module in question is math and once it is loaded through the command import, the related special functions and constants are accesible.

3.2 Scripts

Although it is entirely possible to always work on the command line, i.e. the terminal, it becomes a harder platform to work as the size of the code grows. A much more flexible method of writing code is via scripts, which are pieces of text written in an editor.

A script is a series of commands written in a file instead of the terminal. In Figure 1, three examples of scripts are displayed, which perform the same action (multiply π by three and display the result). Observe the differences. Each file that carries a script written in a particular computer language must carry a prefix that reflects that language. In the case of the Octave, Perl and Python, these prefixes are .m, .pl and .py. Other than the extension, the Octave script is nothing more than just the string of commandline instructions collected in the form of a file. In the Python and Perl scripts on the other hand, there is one more difference, namely the declaration of the location of the interpreter at the top of the file. Without this declaration, the computer does not know where to find the software to process your script. In the case of Octave, it is more common to run the script directly in the Octave interface and therefore this declaration is not necessary.

Figure 1a. A simple script in Octave.

Figure 1b. A simple script in Perl.

Figure 1c. A simple script in Python.

3.3 Control Structures

The way a computer interprets the instructions given by an algorithm is linear, in the sense that each instruction is executed after the previous one. However, sometimes, it is necessary to break the linear flow of the program in one of the following way:

  • Repeat a particular instruction or set of instructions until a given condition is satisfied.
  • Repeat a particular instruction or set of instructions for a predetermined number of times before moving onto the next instruction.
  • Execute a given instruction only if a given condition is satisfied.
  • Skip a number of instructions and go to a different location (or to the end) of the program.

The collection of such computer language parts are called control structures since they can be used to control and change the flow of the program. These structures exist in all languages and obey similar syntax rules. We will illustrate them using Python.

"while" control structure

First, let us see a repeated, a so called while loop. We have discussed loops before while we were discussing algorithms and computer logic. Now, we will see a complete example. The particular task we want to execute is as follows: we start with a number n = 1. We increment n by the number one. We stop adding until n reaches the number ten. The algorithm flowchart is displayed in Fig. 2.

Figure 2. Flowchart of a simple loop

The steps of the pseudoalgorithm can instead be summarized as:

1. Initialize n to 1.
2. While n is smaller than 10
       Add 1 to n.
   Endwhile
3. Exit.

The instructions that are given to python can be typed directly into the terminal. However, since this kind of an action will be prone to errors, it is a better idea to type them into a file, called a script using an editor (you are free to choose whichever editor you like) and then call the script directly from the command line. Now, please open an editor with a file called while_example.py and type the following in there:

n = 1
while n < 10:
    print "n = %d" % (n)
    n = n + 1
print "The while loop has ended. n is now: %d" % (n)

Now when you come to your terminal and type python while_example.py, you should get the following output:

n = 1
n = 2
n = 3
n = 4
n = 5
n = 6
n = 7
n = 8
n = 9
The while loop has ended. n is now: 10

In this example, n is a variable and is initially set to 1. In each step of the while loop, the current value of n is checked and if it is smaller than 10, the number 1 is added to it. The current value is also displayed on the screen. Once the value is 10. The loop ends.

"for" control structure

Next, let us see an example for a for loop. This is the kind of control structure which allows the repetition of a given action a predetermined number of times. For this, we will go back to the factorial example we had studied in the previous lecture. Let us remember the pseudocode:

Initialize N to the number whose factorial we are trying to find.
Initialize f to 1. (This variable will contain the factorial).
for n=1 to N
      f = f × n
endfor
Display f.
Exit.

In order to cast this pseudocode into real code, let us once again open up a file called for_example.py, and type the following

N = 5
f = 1
for n in range (2,N+1):
    f = f * n
print "%d! = %d" % (N,f)

When you run the script using Python, you should see the correct result on your screen. Note here that we have had to use a range that goes from 1 to N + 1 since the upper limit is not inclusive. If you only go from 1 to N, you will find that you get the wrong result.

"if" control stucture

Finally, we will loop at the so-called if statements, which allow for the execution of comments based on certain conditions being satisfied. For example, let us say a student has basketball practice on mondays, study group on tuesdays and tutors on wednesdays. All other days of the week are free for him to do whatever he likes. Let us write a code that reminds him of the day’s activity.

1. Enter day.
2. If day=Monday print "Basketball practice."
   else if day=Tuesday print "Study group."
   else if day=Wednesday print "Tutoring."
   else print "Free day."
3. Exit.

The python code, named if_example.py looks like:

day=input('Enter day:')
if day == "Monday":
    print('Basketball practice.')
elif day == "Tuesday":
    print('Study group.')
elif day == "Wednesday":
    print('Tutoring.')
else:
    print('Free day.')

In this example, in contrast to the ones that came before, the user is prompted for an input, a day. The function used for this is input. Once the user enters the name of the day, it is assigned to the variable day. Then day is tested for whether it is Monday. If it is then the program prints Basketball practice and exits. If it is not, it goes on to the next bit where among the remaining choices (i.e. Tuesday, Wednesday, Thursday, Friday, Saturday and Sunday), it tests for Tuesday. In the next instruction, there are a limited number of choices to test for (i.e. Wednesday, Thursday, Friday, Saturday and Sunday). Finally, if day is none of these three days, the program prints Free day. Note that this program has a shortcoming. It will still work and give a wrong answer even if the user input is not a valid day. How can one fix this?

3.4 ASCII Data vs Binary Data

In most programs that we write, we often find ourselves manipulating some amount of data. We therefore here mention briefly the two most common ways data is stored in a computer, namely in ASCII and binary formats.

The abbreviation, ASCII stands for American Standard Code for International Interchange. When you open your editor and type in numbers and characters, they are not stored as the text characters that you see on the screen, but each letter is saved as its ASCII code. For example, the letter "a" corresponds to the code 97 while the character ' (apostrophe) has the ASCII code 39. There are 128 ASCII codes corresponds to letters, numbers and other characters that you may or may not see on your keyboard (including Turkish characters, the symbol Å, the space characters and so on).

Each number between 0 and 127 is encoded in the binary system and the binary equivalent of the ASCII code for each character is stored in a file. For example, the ASCII code for "A" is 65. (Note that it is different from that of the lowercase a.) The number 65 corresponds to the binary code 1000001 and this 7-digit number is what gets stored in the memory. The maximum number needed however is 127 and only seven binary digits are sufficient to store this number. A byte, as you will remember from the first lecture notes is unit of data that is equal to 8 bits or binary digits. So if ASCII code is used to store letters and characters, the lower 7 digits for a bit is used for each character and the character with the largest value is wasted. For example, the letter "A" is stored as 01000001.

However, if we were to use binary coding, we could accommodate all numbers between (and including) 0 to 255, i.e. 256 numbers in total. This is why to store more data in a compact way, many file formats such as executables and images are stored in the binary format. The interpretation of the binary format is not as straightforward as the ASCII format. The binary format can be interpreted differently by different applications. It is therefore important to know how exactly it has been formatted during its creation. We will get back to data types and data structures later in the course.