Reading from and writing to external files in a DATA step

IanBD
IanBD
Altair Employee
edited September 2022 in Altair RapidMiner

The article Introduction to the DATA step: Reading from and writing to databases described how to access databases from the DATA step. This article discusses how you can read data from and write data to other types of files.

To read and write external files, such as text files, you specify the name and location of the file, and how each record in the file is split into fields. The SAS language provides various DATA step statements that you can use to read and write files.

To read an external file, you need to use both of the following statements:

  • The INFILE statement, which specifies the location of the file you want to read.
  • The INPUT statement, which defines how the records in the file are split into fields. The data from these fields is read into variables in the DATA step.

For example:

DATA out;   INFILE 'c:\temp\tfile.txt';   INPUT age name $;   OUTPUT; RUN;

In this DATA step, the file tfile.txt in the folder c:\temp is the input file. The input file has data in the following format:

33 Smith 32 Jones 56 Brown

Each record in the input file contains fields with values for age and surname. The INPUT statement defines the variables into which the values are read. The values are separated with a space, which the INPUT statement recognises as the default field separator.

In the DATA step above, each record in the input file is read, and the values for age and name are read into variables with those names.

The values are written to the output dataset. The variable names in the dataset are those created in the DATA step.

To write data to an external file requires different DATA step statements, FILE and PUT:

  • FILE specifies the file to which data is written.
  • PUT specifies which variables from the DATA step are written to the external file.

For example:

DATA _NULL_;   SET ds_in;   FILE 'c:\temp\tfile.txt';   fn = UPCASE(name);  PUT age fn; RUN;

In this example, the DATA step reads the data in dataset ds_in. This dataset contains variables with the names age and name. The FILE statement specifies the pathname of the output file; the PUT statement defines the fields to write in each record written to the output file. Only the variables in the PUT statement are written: age and fn are written to the output file, name is not. The placement of the PUT statement in the DATA step is important; if the PUT statement in the example had been placed before the FILE statement, it would write the output to the log rather than to the file.

You can write to multiple files, or read from multiple files, by specifying the FILE or INFILE statements for each required input or output file.

There are many options you can specify to the statements FILE, INFILE, PUT and INPUT that affect how data is input and output. You can, for example, specify:

  • That the fields in records are delimited by something other than spaces.
  • What happens if input records contain missing items.
  • The length of a record.
  • Whether the fields in a record are defined by length, position, or delimiters.

These DATA step statements and options are described in the Altair SLC Reference for Language Elements.

Note: You can also use the FILENAME global statement to create an alias for a filename and then use that alias in the INFILE or FILE statement. For example, you could specify:

FILENAME out1 'c:\temp\outtext.txt';

and then use the alias out1 in the FILE statement to specify the output file:

FILE out1;