Altair RISE

A program to recognize and reward our most engaged community members
Nominate Yourself Now!

nominal to binominal in large DataSEts

User: "dehghan-v"
New Altair Community Member
Updated by Jocelyn
with hello , i have problem to preparing my dataset.
i work in tehran traffic transaction database .
this data include this attributes:
    iD،HighWayCode,day,AirCondition,TrafficType,Time

this data set is over 530000 records.

i decide to work on association rule mining with this dataset . for example fp-growth
this attributes to work with this alghoritm(ARM) must convert to binominal.
day ,aircondition ,traffictype ,successfully converted to binominal in rapidminer .
but when converting HighWayCode to binominal crashed.

i read data from database -select attribute-nominaltobinomial-write to database

can any one help me to solve this problem????plz

i mentioned that  attribute HighWayCode is 1000 record

Find more posts tagged with

Sort by:
1 - 3 of 31
    Hi,

    I'm afraid more information is needed to provide any help here.
    Please post your process xml (to get that, select the xml tab over your RapidMiner process and copy&paste the contents) and the error message from the log. And - if possible - a sample line of data which leads to the crash would be very useful.

    Regards,
    Marco
    User: "dehghan-v"
    New Altair Community Member
    OP
    hi
    when run this code memory usage go to very high and rapidminer hanged.

    examle row:1،Sunday,12:00-1:00,Fluent,cloudy


    this xml code :


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input>
         <location/>
       </input>
       <output>
         <location/>
       </output>
       <macros/>
     </context>
     <operator activated="true" class="process" expanded="true" name="Process">
       <process expanded="true" height="363" width="827">
         <operator activated="true" class="read_database" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
           <parameter key="connection" value="1"/>
           <parameter key="query" value="SELECT  *&#10;FROM dbo.BOZ&#10;where id&gt;=250000 and id&lt;550000"/>
         </operator>
         <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="182" y="18"/>
         <operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="372" y="22">
           <parameter key="attribute_filter_type" value="single"/>
           <parameter key="attribute" value="CodeBozorgRah"/>
         </operator>
         <operator activated="true" class="write_database" expanded="true" height="60" name="Write Database" width="90" x="645" y="91">
           <parameter key="connection" value="1"/>
           <parameter key="table_name" value="BOZ2"/>
           <parameter key="overwrite_mode" value="overwrite first, append then"/>
         </operator>
         <connect from_op="Read Database" from_port="output" to_op="Select Attributes" to_port="example set input"/>
         <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
         <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Write Database" to_port="input"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
       </process>
     </operator>
    </process>


    CodeBozorgRah=HighWayCode
    User: "Marco_Boeck"
    Altair Employee
    Hi,

    for each different string in a polynominal attribute a new attribute is created if you're converting it to binominal ("attribute = 1", "attribute = 2", etc), for large data sets with thousands of different entries the result will be really large. Therefore you may either increase the memory available for RapidMiner (see this) or use a different learning scheme (see the example processes in the samples repository).

    Regards,
    Marco