⚠️Please Note

Technical discussions have been migrated to the Siemens Support Center as Knowledge Base (KB) articles; please note that this content is no longer maintained and may be outdated, so for the latest information, log in to the Siemens Support Center, search online, or contact our support team.

Search for Content in Siemens Support Center

Memory usage of Read CSV operator seems excessive

tennenrishin

It would seem like Read CSV uses far more memory than necessary, which drastically reduces the maximum size of files that it can read on a given machine.

This suspicion is motivated by the fact that a CSV file that is too large to read on a given machine with Read CSV, can very easily be read on that same machine by splitting the file longitudinally (with a common ID field in each part), reading and storing each part in the repository (using Read CSV), and finally reading and joining the parts from the repository.

If the Read CSV were to reuse memory internally, then RapidMiner would (for a given machine) be able to import much larger input data sets directly from CSV files without using the above workaround. This seems like an important constraint for an application such as RM.

Regards,
Isak

Find more posts tagged with

AI Studio

Comments

There are no comments yet