File Processing

One upon a time, most computer science departments had a “file processing” course. Starting in the Fall 1994 semester, the department replaced CSCI 330 Introduction to File Processing with this course (which was previously numbered CSCI 443 because databases were consider too complicated for juniors).

CSCI 330 had been added in 1987 with the following description: “Data file processing, External sorting, Merging, Direct access methods, Hashed and Indexed files. Introduction to file and data base management systems.” This week we are going to look at a few of those topics. They are still relevant. In fact, Oracle is creating new Java classes aimed at improving file processing.

The mainframe era

This is the sort of thing I did as an undergraduate student. There are still mainframes out there running VSAM today.

FORTRAN

FORTRAN was one of the first higher-level (non-assembly) langauge to provided serious support for I/O Because physicists wanted fast access to their kilobytes of data, FORTRAN even provided for binary I/O and random access within open files.

By the way, the FORTRAN compiler on the department’s workstation is gfortran, a port of gcc to FORTRAN 95.

Streams

The Unix operating system abandoned record-oriented file systems with character-oriented streams. Most modern operating systems support a streams-based interface similar to that of Unix.

Streams in Unix

A set of four system calls provided the main interface for I/O operations. In these routines files are identifiers by file descriptors, which are small integers like FORTRAN’s logical unit numbers.

Streams in C

The C programming language provided a C-API with routines similar to those of the I/O system calls. These used special C structures, the FILE * to identify open files.

Additional functions are often used for binary I/O.

Direct access in Unix (and C)

Most Unix C programs used the simple operating systems calls for direct access I/O with one additional system call.

There is a related C routine for direct access I/O.

Alex Allain has written a pretty good, and brief, tutorial on C File I/O and Binary file I/O.

File access in C++

The C++ interfaces in std, the C++ Standard Libary, as similar to those of C.

You might check out Alex Allain’ tutorial on C++ File I/O.

Way back in Spring 1999, students in CSCI 343 had taken a few C++ courses, and it was possible to do the first assignment using the C++ fstream classes.

File access in Java

Java has its own stream classes.

It also has classes for formatted I/O.

JDK 7 introduced a new I/O package for Java to significantly improve file system performance. Take a look at the File I/O tutorial for Java for much more information. This new set of packages is a major change.