Using the FILENAME statement to access file-based datatsets

Ian Balanzá-Davis
Ian Balanzá-Davis
Altair Employee
edited September 2022 in Altair RapidMiner

Files that do not use the WPD format are regarded as external files. Files in the WPD format have the extension .wpd. Examples of external files are flat text files and comma-separated variable files.

External files can be specified using operating system pathnames, or using a filename reference. A filename reference is created using the FILENAME statement. You can also use the FILENAME statement to access a number of data sources and destinations that are not files on disk, such as FTP servers and HTTP servers, and to write text into emails, among other things. The full range of devices and media that can be accessed using the FILENAME statement are described in the Altair SLC Reference for Language Elements.

Note: File-based datasets are accessed using library engines, while database tables and Excel workbooks are accessed using data engines. Library engines and data engines are specified using the LIBNAME statement.

FILENAME is a global statement and is used to define an alias (the filename reference) which can be specified on SAS language statements to reference a file in any program in the session. For example, the statement:

FILENAME exref 'c:\temp\exfile.txt';

creates a filename reference exref. This filename reference refers to the file exfile.txt in the folder c:\temp. This filename reference can then be used wherever access to that external file is required by a SAS language statement. For example, in this program:

FILENAME exref 'c:\temp\exfile.txt'; DATA out;   INFILE exref;   INPUT a b;   OUTPUT; RUN;

the FILENAME statement is used to create a filename reference exref. This filename reference is then used with the INFILE statement to specify the file to be used by subsequent INPUT statements. The INPUT statement reads data from each record in the input file into the variables specified using the INPUT statement. The DATA step iterates through each record in the input file until the end of the file is reached. The OUTPUT statement writes the variables in the DATA step to the output dataset out. Note that the OUTPUT statement can, in programs such as this one, be omitted; if it is omitted, output is triggered by the RUN statement.

Remember, however, that external files are not datasets, which are defined using the LIBNAME statement. Some statements require you to specify a dataset (using a library name reference), while others require you to specify a file. For example, the SET statement requires a dataset, while the INFILE statement requires a filename. An input file can be specified to the INFILE statement as a full pathname:

INFILE 'c:\temp\exfile.txt';

or as a filename reference:

INFILE exref;

where exref is the file reference.

A filename reference, however, enables the reference to be used in many steps in the same program, or to be easily changed from session to session by including the required pathname when a program is run.

As noted earlier, the FILENAME statement can enable access to a disk, a web page, an FTP server, and so on. It does this using an access method that you specify which then enables access to each type of medium or device. To create an email, for example, you use the EMAIL access method; to read and write disk files, you use the DISK access method; while to access an FTP server, you use the FTP access method.

An access method is defined by specifying an option to the FILENAME statement. For example, to read a file from a disk you can specify:

FILENAME exfile DISK 'c:\temp\exfile.txt';

The DISK access method is the default, so you can omit it from the statement, as in the previous example. Similarly, to return the data that makes up a web page, you could specify:

FILENAME getpg HTTP 'https://www.mysite.co.uk'; DATA out; INFILE getpg length=ll; INPUT line $varying500. ll; RUN;

This reads the HTML content of the specified web page and writes it to the dataset out.

When accessing files on disk, the FILENAME statement can be used to specify a directory, enabling you to access more than one file in a location. When the filename reference has been specified, you can access a file in the directory using member name syntax on a FILE or INFILE statement. For example, to access the directory c:\temp, you specify:

FILENAME mydir 'c\temp';

To read the file myfile.txt in that directory, you can specify it as a member name of the specified reference:

DATA _NULL_;   INFILE mydir (myfile.txt);   INPUT a b; RUN;

You can specify more than one input or output file in a DATA step. If you do this, you have to be careful with the order of statements in the step. Multiple input and output files are not described in this article.