Limiting number of rows

kavuch
kavuch New Altair Community Member
edited November 2024 in Community Q&A
I'm working with multiple CSVs, with over 100.000 entries.
RapidMiner uses up to 6GB RAM (of 8GB) and my system becomes very slow.
Is it possible to limit the number of rows to be load? For example I could only load 1.000 rows and play around with them with a fast system. Just like the limit-function in SQL (http://www.w3schools.com/sql/sql_top.asp).
Tagged:

Answers

  • earmijo
    earmijo New Altair Community Member
    YOu can use Filter By Range (1 to n, where n is the desired size). Check the following process:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.5.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.5.002" expanded="true" height="60" name="Retrieve Sonar" width="90" x="45" y="120">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="6.5.002" expanded="true" height="76" name="Filter Example Range" width="90" x="313" y="75">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="10"/>
          </operator>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>