Chapter 5

File Operations (Input/Output)

5.1 File descriptors, Input and Output

The input that we supply to our computer programs have so far been a single number or a string. However, it is very often the case that a program is intended to process large amounts of data. An example would be to go through the list of students in a very large class and take the average of the grades. In this case, it is easier to save the grades in a file and have the computer read the relevant data in this file. Alternatively, you may need a computer to write data in a file for further processes. In both cases, you need to be able to read data from or write and append (append means add to the end) data into a file. In this class, we wil learn how to do this.

When you instruct the computer to create a new file or access the contents of an already existing file, it is said that you are opening the file. To each open file, the system assigns a file descriptor, which is a nonnegative integer. The computer then refers to this file by means of this reference until you close it. Closing a file does not mean that you are destroying the file you have created or read; it simply means that you no longer need to access to that file at the moment. Let us see a few examples of just opening and closing files.

The next bit of code opens and closes files with different modes.

user@machine:~$ ls -larth
total 24K
drwxr-xr-x 296 user user  20K Nov 22 19:55 ..
drwxrwxr-x   2 user user 4.0K Nov 22 19:55 .
user@machine:~$ octave -q
octave:1> fid1=fopen("myfile.txt","w")
fid1 = 3
octave:2> fid2=fopen("myfile.txt","w")
fid2 = 4
octave:3> fid3=fopen("myfile.txt","w")
fid3 = 5
octave:4> fclose(fid1)
ans = 0
octave:5> fclose(fid2)
ans = 0
octave:6> fclose(fid3)
ans = 0
octave:7> fopen("myfile1.txt","r")
ans = -1
octave:8> fopen("myfile.txt","r")
ans = 3
octave:9> fclose(3)
ans = 0
octave:10>
[3]+ Stopped               octave -q
user@machine:~$ ls -larth
total 24K
drwxr-xr-x 296 user user  20K Nov 22 19:55 ..
drwxrwxr-x   2 user user 4.0K Nov 22 19:56 .
-rw-rw-r--   1 user user    0 Nov 22 19:56 myfile.txt
user@machine:~$ cat myfile.txt

In the above snippet, we first check the contents of the current directory using the command ls and we see that the directory is empty. We then switch to octave and open a file by means of the function fopen. fopen, in this case, takes in two arguments. The first one is the actual name of the file (i.e. what you will see when you type ls on your screen) while the second is the mode with which you want to open it. Three modes are available: read, write and append. As you see in the above example, file descriptors and the file names that they refer to are not one-to-one. This means that there may be several descriptors pointing to the same file. Afterwards, we can close each of the files using the command fclose and recycle their descriptors. If you try opening a nonexistent file for reading, fopen will return a value of -1, which means there is an error. You can, on the other hand, open a file without any problems if it already exists. Notice, in the above, that after closing myfile.txt once, if you reopen it, the assigned descriptors may be the same of the freed ones. More concretely, if the descriptor was 3 the first time around, it can still be 3 if you close and reopen the file. But as long as the file is open, 3 is reserved for it and will not be assigned to any other files. Once you create the file and get exit octave, you can see it in your directory with ls.

Anytime you are using Linux (be it through the terminal or through any other application) there are three files open by default. These are standard in, standard out and standard error. Standard in accepts input from the keyboard, standard out contains data to be printed on the screen and standard error outputs the errors encountered during the executions of the programs on the screen. In fact, we can see it directly using fopen in a different way:

octave:1> fid=fopen("myfile.txt","w")
fid = 3
octave:2> fopen(3)
ans = myfile.txt
octave:3> [filename,mode]=fopen(3)
filename = myfile.txt
mode = wb
octave:4> [filename,mode]=fopen(0)
filename = stdin
mode = r
octave:5> [filename,mode]=fopen(1)
filename = stdout
mode = w
octave:6> [filename,mode]=fopen(2)
filename = stderr
mode = w

As you see once you open a file, you can obtain information about it using the very same function fopen. You can call fopen with more than one argument. In the above snippet, we first create a file and call fopen using its descriptor. The function returns us information including the name of the file and its mode. In this case the mode is write and b stands for binary. Although you do not necessarily have to write binary data into files, Octave opens them in binary mode by default. You may, by now, have noticed that the file descriptors start from 3. The reason is 0, 1 and 2 are reserved for standard in, standard out and standard error.

Let us now write some data in the file that we open.

user@machine:~$ octave -q
octave:1> fid=fopen("myfile.txt","w")
fid = 3
octave:2> printf("This is a test line.\n");
This is a test line.
octave:3> for n=1:5
> fprintf(fid,"This is line number %d.\n",n)
> endfor
octave:4> fclose(fid)
octave:5> exit
user@machine:~$ cat myfile.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 5.

In this example, after we open a file, we can use fprintf called with the file descriptor fid to print in a file. To illustrate what this does, we first try out a modified verions, printf to simply write a sentence on the screen. So this sentence is being directed to stdout, or descriptor 1. The backslash-n is the newline character. In order to write into a file, we instead call fprintf with the file descriptor as the first argument. The %d identifier is used to define the format and means that this character is to be replaced by an integer. Note that this also works:

octave:1> fprintf(1,"Test.\n")
Test.

Let us now see similar operations in Python:

>>> fid=open("myfile.txt","r")
>>> print "Name of the file: ", fid.name
Name of the file: myfile.txt
>>> print "Name of the file: ", fid.closed
Name of the file: False
>>> print "Opening mode : ", fid.mode
Opening mode : r
>>> fid.close()

Similarly to Octave, once you open the file, you can access its name, whether it is closed or open and the mode with which it was opened. Finally, you can close it. Notice that the structure that we use in order to access information is fid.XXX. The reason for this is the fact Python is an object-oriented computer language, which is a concept that is somewhat advanced and will not be covered at the present time. Next, let us conduct some read/write operations.

>>> fid=open("myfile.txt","r")
>>> a=fid.read(100)
>>> print "Read string is", a
Read string is This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is
>>> fid.tell()
100L
>>> fid.close()
>>> fid=open("myfile.txt","a")
>>> fid.write("This is line number 6.\n")
>>> print fid.read()
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
IOError: File not open for reading
>>> fid.close()
>>> fid=open("myfile.txt","r")
>>> print fid.read()
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 5.
This is line number 6.
>>> fid.close()
>>> fid=open("myfile.txt","w")
>>> fid.write("This is line number 6.\n")
>>> print fid.read()
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
IOError: File not open for reading
>>> fid.close()
>>> fid=open("myfile.txt","r")
>>> print fid.read()
This is line number 6.
>>> fid.close()