Organizing and Importing Data in AI Studio

Joshua_Philip
Joshua_Philip
Altair Employee
edited January 23 in Altair RapidMiner

Hello Altair Community,

I recently received a question from a professor regarding how to organize and import data in AI Studio. I thought this would be valuable to share with the entire community, so here’s the question and my response.

The Question:

How to migrate datasets and processes from the book we are using (files are provided, and they are on my local machine). Do I need to upload the datasets one-by-one?  Is there an easy way to upload the datasets and process files into RapidMiner?

The Solution:

Managing data effectively in AI Studio starts with a well-organized repository. I recommend starting by creating a repository of your choice and organizing it with subfolders for datasets and process files. This approach will help keep things structured and accessible.

Creating a Repository

To create a repository:

  1. Click on the button with three parallel lines (next to the Import Data option).
  2. Select Create Repository from the dropdown menu.
  3. Name your repository and configure its settings as needed.
  4. Once created, you can organize your repository by adding subfolders for datasets, process files, or other categories.
  • Importing Data into the Repository:
    • Click the Import Data button.
    • Format your data according to your needs using the guided configuration wizard.
    • Choose the repository and subfolder where you want to store your data files.
    • This keeps your data files structured and easily accessible for your workflows.
  • Importing Processes into the Repository:
    • Go to the File menu and select Import Process.
    • Navigate to the Repository Pane on the left side, right-click on the Process subfolder you’ve created, and select Store Process Here.
    • This ensures your processes are properly saved and organized for future use.

Alternative Approach for Loading Datasets:

If you prefer not to add files to the repository, you can load datasets directly from your local machine. Here's how:

  1. Use operators such as Read CSVRead Excel, or other similar operators in your process.
  2. Point the file path directly to the folder where your datasets are stored.
  3. Use the guided import configuration wizard in the parameters panel to ensure the setup is accurate and efficient.

Another Alternative (Using Loop Files):

You can also use the Loop Files operator to process multiple files from a directory. Here's how:

  1. Use Loop Files to iterate through the directory where your datasets are stored.
  2. Set the filter type and provide an appropriate glob or regex filter (e.g(for glob filter)., *.xlsx for Excel files, *.csv for CSV files).
  3. Inside the loop, use the corresponding operators (Read ExcelRead CSV, etc.) to load the data.
  4. Pass the ${file_path} macro (generated by Loop Files) to the File parameter of the corresponding operator (e.g., Read Excel or Read CSV).

This configuration ensures that only the files matching your filter (like .xlsx or .csv) are processed during the loop.

After Loading Data using Loop Files:

Once you have the example sets in the IOObjectCollection in the results, you can easily store them in your repository. Simply right-click the Example Set of your choice and select the option Store Data in Repository. You can then choose the repository where you want to store the data and give it a file name of your choice.

Some additional info that could be helpful:

  • To simplify data retrieval for students, you can add the data to a project in AI Hub.
  • Zip the files and locate the repository on your system (right-click on the repo in Studio). Navigate to the repo on your drive and unzip the files directly into it. After that, refresh the repo folder in AI Studio, and all the files will appear there.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.