Reading from and writing to databases in a DATA step

IanBD
IanBD
Altair Employee
edited September 2022 in Altair RapidMiner

The articles Introduction to the DATA step and Introduction to the DATA step statements introduced the DATA step and describe using it with datasets. This article discusses how you can input data from, and output data to, other types of files. The DATA step can be used to read and write data using many types of file, including text files, database tables and Excel files.

Some types of external file, such as database tables, can be treated as datasets. A piece of software, called a data engine, is used to provide a connection from Altair SLC to a specified database server. The tables in that database server can then be accessed as if they are datasets, and the DATA step can read from and write to tables in the same way as it writes to and from datasets.

There are data engines that provide access to many kinds of databases and structured file types. The association between a library and the data engine is specified using the LIBNAME statement; you provide an engine name that identifies the required engine. For example, if you want to read data from a MySQL database, you specify the MYSQL data engine:

LIBNAME Mylib MySQL server=mydb user=username password=userpwd qualifier=myname;

The LIBNAME statement provides many options that define the connection. In the example above, the connection authentication parameters have been specified by options.

You can then read data from a table using the library reference (Mylib, in the example above) and the table name. For example, if the database specified in the example above contains a table mytable, a basic DATA step might be:

DATA out;   SET Mylib.mytable; RUN;

This writes the data in the table mytable to the dataset out in the work library. The names of the columns in the table are used as the variable names in the dataset.

Similarly, you can write to a database table by specifying the library reference and table name to the DATA statement; for example:

DATA Mylib.mytable;   SET ds_in; RUN;

The variable names in the dataset are used as column names in the database table.

The LIBNAME statement for data engines provides options that affect how data is input and output; these options enable you to specify which data is read and written to and from a data source, how you specify authentication details, how file locking is handled, and so on.