1.1 Computers in Everday Life & in Science

In today’s world, from the moment we wake up until we go to bed, we find ourselves in constant interaction with interfaces. The touch screen of your cell phone or tablet, your computer monitor, your printer, your refrigator and many other pieces of electronics that surround you are examples of such interfaces that interact with you (the user), accept your instructions (check messages on cell phone, write a message on WhatsApp, create a word document, make an automated phone call to a service center) and execute them. Behind all of these easy-to-use interfaces a whole ensemble of electronic device parts, machine codes, wires and sometimes electromagnetic signals work together to convert your instructions into actions.

In general terms, you may think of the instruction that you give the interface as a command. As an analogy for this working principle, we may think of the major (a commander in the army) of a company (80-250 soldiers) that needs to have 1000 holes to be dug in a large field in a regular lattice pattern. As it would be unreasonable for him to dig all the holes himself, he calls his secondin-command and describes to him what he wants. At this moment, he is giving a command (dig holes) by means of an interface (his second-in-command). The second-in-command then goes off and divides the company into groups of ten, tells them where each group is positioned and shows them the desired locations of the holes. The groups then start working on the task and by the end of the day all holes are dug. In this example, the company and its smaller parts (the groups of ten) can be thought of as the large machinery behind the simple interface (the secondin-command, who is just the single person). As the major does not ever come in direct contact with the soldiers digging the holes in the field, a computer user is also usually unaware of what is happening beyond the visible interface. Although this kind of knowledge is usually not necessary for everyday tasks, a good understanding of how computers work will help you in your career as a scientist in today’s scientific world.

A computer can, in general, be defined as any device that takes commands from a user and converts it into a specific action, independent of the interface. In a scientific context, we often think of computers in terms of the traditional ones that we use in the lab. This course deals with applications of computers in scientific problems, or in other words scientific computing. Scientific computing allows you to perform tasks that you would not be able to perform by other means. Using a computer in a scientific context, you may tackle problems that are

  • too complex, e.g. intractable integrals;
  • too expensive, e.g. complex laboratory experiments, astrophysical systems;
  • too time-consuming, e.g. analysis of large data.

Recently there have been many complex and large-scale problems whose solutions have been made possible thanks to computers. Some spectacular examples are mapping the human genome, data analysis at CERN, and climate simulations.

In addition to solving problems, a good knowledge of computer software (and hardware in many occasions) allows you to organize and present the results of your work efficiently. This includes data plotting, writing reports and preparing presentations. You may have already been introduced to some of these tasks while some others, you will learn during the course of this class.

1.2 Information Systems

Information systems are composed of the following components:

a) Hardware

Physical parts of the computer, such as the circuit board, silicon chips, input devices (keyboard or mouse), output devices (screen or printer). The internal parts of the computer covered by the plastic case can be summarized as follows:

  • The motherboard is the circuit board inside the computer that holds all the functional units.
  • The processor (central processing unit — CPU) is the component that does the actual computations.
  • The memory (random access memory — RAM) is a short-term storage unit that can be accessed very rapidly.
  • The buses are electrical wiring that move data from one location to another.
  • The chipset is the component that directs this traffic.

In addition to these basic components, several peripherals such as USB keys, mouses and keyboards can be attached to the computer by means of hardware ports. Hardware ports are the connections on the sides or at the back of your laptops or on various locations on your PC cases. When the connection is made between the peripheral and the computer, signal transduction is achieved. According to the amount of electronic signal carried at a time, hardware ports are categorized into two. Serial ports send and receive one bit of data at a time while parallel ports send multiple bits using a set of wires.

b) Software

Collection of programs that tell the computer what to do. There are several layers of software:

  • The operating system (OS) schedules and executes all of the computer’s tasks. It provides the user interface that users interact with, runs applications, manages the file storage system and communicates with the hardware. Examples to operating systems are Windows, iOS, Android and Ubuntu. On the computers that we have in this lab, we have both Windows and Linux (Ubuntu in particular) operating systems simultaneously. During lectures, we will use Linux quite a lot but you are free to do your assignments using any operating system you like. Advanced OSs like Linux run on a kernel, which is a core set of programs that manage all the commands, processes, applications and data that makes up the OS.
  • Applications are the programs that users most often utilize to perform their everyday tasks or leisure activities. For instance, if you are preparing a report, you may start the application Microsoft Office, type, save and print your work. Later, you may schedule a meeting with your friends through the social media applications and after that you may play a game. All three pieces of programs used in this example are applications.

c) Data

Most application programs operate on the data that they receive. For example, if you are creating an Excel sheet of your lab data, the numbers you enter is data that the application Excel works on to produce plots and tables.

A computer system (computer + software + user) in most scientific and technical applications, works in a cycle : The user inputs data, an application processes data and obtains results. The results are then stored and either discarded or reused as input. Let us look at an example to illustrate some of the ideas above. This example deals with two applications (three including the terminal that we use to type our comments on) that are parts of the Ubuntu/Linux OS. Taking a series of discrete x values between -4π and 4π, the first application, octave creates an array that is composed of the values of the function f(x) = sin(x)/x at those discrete values of x. It then saves the data in the memory of the computer, which then becomes input to another application, gnuplot, which uses this data to plot the function f(x).

user@machine:~$ octave -q
octave:1> x=-4*pi:0.01:4*pi;
octave:2> f=sin(x)./x;
octave:3> data=[x' f'];
octave:4> save -text data.txt data
octave:5> exit
user@machine:~$ gnuplot
    G N U P L O T
    Version 4.4 patchlevel 3
    last modified March 2011
    System: Linux 3.2.0-85-generic
    Copyright (C) 1986-1993, 1998, 2004, 2007-2010
    Thomas Williams, Colin Kelley and many others
    gnuplot home: 
    faq, bugs, etc: 
    immediate help: 
    plot window: 
    http://www.gnuplot.info
    type "help seeking-assistance"
    type "help"
    hit 'h'
Terminal type set to 'wxt'
gnuplot> plot "data.txt" u 1:2 w lines lw 3 lc rgb "red" t "Sinc function" 

At the end of this procedure, you should see a plot on your screen that looks like Fig. 1.

Fig. 1. The sinc function drawn with gnuplot

The smallest unit of data that can be stored on or processed by a computer is a bit (binary digit). A bit is a single binary digit that can only take two values: 0 and 1. Eight bits combine to make a byte, which is an 8-digit binary number such like 00110101. Each byte can be used in many ways by the computers depending on the intent. For example, the byte (01100011) may be used to represent the number 99, a location in memory or the 99th dot on a digital picture. Since byte is a very small unit of memory, the memory on modern computers is designed to handle large multiples of the byte. A kilobyte is simply 210 = 1024 bytes, a megatybe (MB) is 220 or approximately 1 million bytes. A gigabyte (GB) is about one billion, a terabyte (TB) one trillion and a petabyte (PB) is one quadrillion.

1.3 Computer Architecture in a Nutshell

The term computer architecture refers to the particular method with which the operation of the computer is organized. The CPU is composed of a control unit, an algorithmic logic unit and registers. The control unit fetches the data delivered by the buses and places it in the data registers, which are designed to only hold data. At the same time, it also takes the instructions (operations to be performed on the data) and places them in the instruction registers. After the logic unit performs the instruction, the results are placed in the result register and sent to main memory. As the buses are typically faster than the CPU’s, the CPU’s are often idle during this cycle. In order to prevent CPU downtime, CPU’s have really fast memories located on or near them, which are called cache. Among many other factors, the speed of the CPU depends on the amount of data it can accept simultaneously as input. This is refered to as the word size. Until recently, the standard word size was 32 bits, while most modern computers are now built on a 64-bit architecture.

The files created by the user and the operating system are stored in the memory of the computer. The memory is subdivided into dynamic and static. The data stored in the dynamic memory is lost when the power is cut off while static storage is permanent. A small portion of the dynamic memory is assigned as RAM (random access memory), which temporarily stores open applications and files being currently modified until the applications are closed or the files are committed to permanent memory. Each piece of data stored in RAM has an address that is randomly assigned according to what is available. In addition to these main parts, computers contain drivers for peripherals and network components.

1.4 Linux, General Usage Tips, File Structures and Editors

First let us begin by learning what Linux is:

  • Linux is just another operating system much like Windows.
  • Many versions may be obtained free of charge and programs are developed by volunteers.
  • It is a very stable operating system and is therefore prefered in large computer systems and for long applications.
  • Originally written as proprietary software under the name Unix, then rewritten by Linus Torwald and offered free of charge.
  • There are several distributions and which one you choose depend on what you expect from an operating system.

In order to interact with any operating system, you need a human interface. What is typically meant by user-computer interaction are a set of commands given by the user to the computer. In Windows, most interaction is through icons so that each time you click on one, you are really giving a command to the computer and telling it to execute an action. In Linux this is also done through a terminal. On an Ubuntu system, if you press the keys Cntr-Alt-t, you will see a small, rectangular window appear on your screen. This is where you type your commands and launch your applications. If you are not comfortable with this yet, you can still do a lot through icon-clicking.

Next, let us look at how Linux operates. Although the look and usage is slightly different from Windows, the idea of storing files and folders is similar.

a) The Filesystem

A filesystem, as the name implies, is a scheme for organizing the various files that are needed for the operation of your system. You can think of the filesystem as a bookcase where you keep your books, papers, files, CDs, DVDs and magazines. If you are an organized person, you allocate a space in your bookcase for your books and put your books in that space, another separate space for your folders, yet another for your CDs. Going a level deeper, you can also arrange your folders according to subject, alphabetically or using another suitable classification scheme.

Similarly, the disk space on your computer is divided into directories, which may branch into several subdirectories. The smallest indivisible storage unit is a file, which is a concept you should already be familiar with from Windows. A file may contain an image format, plain text or computer code. In a Linux system, the topmost directory is usually simply called the root directory and is represented by the symbol /. This directory is divided up into subdirectories each containing crucial content for the operation of the system. Most users find it easy to visualize the filesystem as an upside-down tree where the root is the origin of the filesystem. Directories in root and their subdirectories keep branching out until you hit the real files that the user creates and/or uses, which can be thought of as the "leaves" of the tree as illustrated in Fig. 2. You can view the content of the / directory by typing

user@machine:~$ ls /
bin    etc    media  proc  srv       tmp  windows
boot   home   mnt    root  subdomain usr
dev    lib    opt    sbin  sys       var

Fig. 2. Schematic view of the filesystem.

The most common of these subdirectories are

  • /bin : contains utilities that are essential to system operation. Hence, the shells, file manipulation commands such as cp, ls and chmod.
  • /etc : dedicated to system configuration. Contains configuration files for the system daemons, startup scripts, system parameters, and more.
  • /lib : Core applications and utilities installed with Unix require the libraries in /lib to run.
  • /usr : End-user applications, such as editors, games, and interfaces to system features are here, as is the library of man pages, and more. Chances are that if the file is useful, but not mandatory for system operation, it’ll be found in /usr.
  • /var : Repository for files that typically grow in size over time. Mailboxes, log files, printer queues and databases can be found in /var. It’s commonplace for Web sites to be kept in /var as well, since a Web site tends to amass data preternaturally over time.
  • /home : This is the most important. Contains all of the users’ home directories. A home directory is where a regular user keeps his/her non-application files and modifies them. In most Linux systems, it is the default directory that you find yourself in when you log in to the system.

b) Linux Commands

As mentioned above, in contrast to the Windows operating system, the fastest way to utilize the functionalities of the Linux operating system is through commands that you write on a terminal. The near infinite number of Linux commands that we use not only enable us to perform many different tasks but also provide flexibility within each command through options.

A Linux command has the following general structure :

<command> [OPTIONS] [ARGUMENTS]

where

  • <command> is a special instruction that causes the system to perform an action (such as ls, mkdir, grep)
  • [OPTIONS] are special expressions for the particular command which increases its capabilities. Options are usually supplied by means of a single dash and a letter, e.g. -l, -k, -a. You might also encounter options given as two dashes and a full expression, e.g. --escape, --ignore-backups (for the command ls).
  • [ARGUMENTS] refer to entities which we would like the command to act upon. These can be files, directories, sentences, words or even numbers.

Because it is impossible to know off the top of your head all the options to all the commands, a very handy tool has been designed by Linux developers, which is called the man pages. man stands for manual and if you would like to list all the options for a particular command, you invoke the related man pages by typing

man <command>

The man command, being a command itself, also has options and a particularly useful one is -k, which lists the available commands for a keyword supplied as an argument. So suppose you would like to know which commands are available for manipulating, displaying and converting pdf files. You can just type

user@machine:~$ man -k pdf
pdfeinitex (1)     - PDF output from TeX
texi2dvi4a2ps (1)  - Compile Texinfo and LaTeX files to DVI or PDF
.....

and it will list all the available commands. You can then man the particular command of interest and learn more about it. I find man -k to be very useful because you find yourself learning other commands while you are looking for a particular one, much like looking up a word in a dictionary and learning other words in the process.

c) The shell

Every time you type a command, it instructs the system to perform an action, such as invoke an editor or list the files in a directory. The software that communicates your command to the kernel is called the shell. The name "shell" was chosen in order to emphasize the fact that the user is normally not meant to interact directly with the kernel and the shell is like a protective, external cover that wraps around the kernel. The commands that the user issues are conveyed directly to the shell, which then communicates them to the underlying kernel.

From the user’s perspective, the shell can actually be considered a computer language that has commands and syntax. Shell commands are a very versatile set, which means that they allow you to do almost anything you can think of but it might take a while to find out how to conduct a given task through arguments and options. This can only be improved through lots of practice. Now, let us see a few of the most commonly used commands.

1. Creating and accessing directories

One of the most commonly used features of the shell is its file manipulation functions. Linux has a very rich collection of commands that allow you to create, move, delete, search patterns in and process files. We will now demonstrate some of these commands through several examples.

  • mkdir, cp, mv, cd, ls : In order to form a filesystem tree like the one discussed above, you need to be able to create folders and files. Let us practice this. On your terminal, type the following:
user@machine:~$ mkdir my-dir 
user@machine:~$ cd my-dir 
user@machine:~/my-dir$ ls 
user@machine:~/my-dir$ touch my-file.txt 
user@machine:~/my-dir$ ls 
my-file.txt 
user@machine:~/my-dir$ ls -lasrth 
total  24K 
 20K  drwxr-xr-x    288 user   user   20K  Oct   9  13:12  .. 
   0  -rw-rw-r--      1 user   user      0 Oct   9  13:12  my-file.txt 
4.0K  drwxrwxr-x      2 user   user  4.0K  Oct   9  13:12  . 
user@machine:~/my-dir$ cd 
user@machine:~$ mkdir my-dir2 
user@machine:~$ cd my-dir2/ 
user@machine:~/my-dir2$ ls 
user@machine:~/my-dir2$ mv ../my-dir/my-file.txt      . 
user@machine:~/my-dir2$ ls
my-file.txt 
user@machine:~/my-dir2$ cd ../my-dir 
user@machine:~/my-dir$ ls 
user@machine:~/my-dir2$ cp my-file.txt my-file2.txt 
user@machine:~/my-dir2$ ls 
my-file2.txt   my-file.txt

Let us go through each command line and see what each command means :

  • mkdir (make directory): Creates a directory with the same name as the argument. In order to create a subdirectory of a given directory, you have to be in the parent. If you want to create many subdirectories at the same time, you can use the option -p. See the man pages for more information.
  • cd (change directory): Go into the directory with the same name as the argument. There are many variations :

cd : Go into the home directory of the user.

cd .. : Go into the parent directory directly above in the tree.

cd - : Go into the previously used directory.

cd / : Go into the root directory.

  • ls: (list): List all the files and subdirectories in the current directory. If used with certain option such as -l, -a, or -t, it displays more information about the files such as permissions or time of last modification. Note that you can use many options altogether in a single option heap.
  • mv (move): Move a file from one location to another. This erases the original.
  • cp (copy): Make an identical copy of a file with a different name and/or at a different location.

2. File Manipulation

  • cat, wc, grep : First let us create a file using the command cat
user@machine:~$ cat > file1.txt
I have one dog.
You have one dog.
I have two cats.
You have no cats.
I have two sisters.
Ali has three sisters.
I have one brother.
Ayse has no brothers.
Ctrl-d

Next let us find all the lines where there is an occurence of the word dog.

user@machine:~$ grep dog file1.txt
I have one dog.
You have one dog.

Now let us count the number of lines of the file file1.txt using the command wc, which stands for word count.

user@machine:~$ wc file1.txt
8 32 154 file1.txt

In the output of this command the first number is the number of lines, the second is the number of words and the third is the number of bytes.

  • head, tail: Suppose you are given the following text file, which you might imagine could be the output file of a scientific calculations. (Notice the alternative use of cat for displaying the contents of a file.
user@machine:~$ cat file2.txt
The energy at step 1 is 3.4 eV.
The energy at step 2 is 2.1 eV.
The energy at step 3 is 1.2 eV.
The energy at step 4 is 1.1 eV.
The energy at step 5 is 1.1 eV.
End of calculation.

Let us say that you would like to display the first three lines or the last two lines of this file. The commands head and tail take the number of lines you would like together with the name of the file and display the lines from the beginning or end respectively.

user@machine:~$ head -3 file2.txt
The energy at step 1 is 3.4 eV.
The energy at step 2 is 2.1 eV.
The energy at step 3 is 1.2 eV.
user@machine:~$ tail -2 file2.txt
The energy at step 5 is 1.1 eV.
End of calculation.
  • diff : If you have two files that are very similar except for minor differences and you would like to compare them, you can use the command diff.
ser@machine:~$ cat file3.txt
a
b
c
This file is the first file.
1
2
3
user@machine:~$ cat file4.txt
a
b
c
This file is the second file.
1
2
3
user@machine:~$ diff file3.txt file4.txt
4c4
< This file is the first file.
---
> This file is the second file.
user@machine:~$ diff -u file3.txt file4.txt
--- file3.txt        2009-10-01 01:00:47.000000000 +0300
+++ file4.txt        2009-10-01 01:00:57.000000000 +0300
@@ -1,7 +1,7 @@
a
b
c
-This file is the first file.
+This file is the second file.
1
2
3

The command option -u used with diff allows the user to display the differences between the files in some context. The final command paste simply displays the two files side-byside.

  • file : file is a comment that determines the type of a given file. For example, let us see the information about the file that we have just created, file1.txt, and then let us apply file to a png image file and a pdf document respectively:
user@machine:~$ file file1.txt
file1.txt: ASCII text
user@machine:~$ file sinc.png
sinc.png: PNG image data, 642 x 468, 8-bit/color RGB, non-interlaced
user@machine:~$ file lecture-01.pdf
lecture-01.pdf: PDF document, version 1.4

For the png and pdf files, in addition to the type, you get more information regarding resolution, version etc as well.

Although it is possible to create small and simple files from the command line as described above, when it comes to producing large files or editing an already existing file, you need a more sophisticated application, namely an editor. An editor is a user interface that allows you to create, edit and save files. Note that an editor is not the equivalent of Microsoft Office, because it creates simple text files rather than .doc or .xls. A valid analogy in Windows would be NotePad. There is a large number of editors that are available for Linux. In this lecture, we will concentrate on emacs but feel free to explore other possibilities and find the one that is right for you. Some choices are vim (or vi), nano, pico and gedit. Most students prefer gedit since it is relatively simpler.