A program to recognize and reward our most engaged community members
<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="5.0"> <context> <input> <location/> </input> <output> <location/> <location/> </output> <macros/> </context> <operator activated="true" class="process" expanded="true" name="Process"> <process expanded="true" height="391" width="915"> <operator activated="true" class="generate_massive_data" expanded="true" height="60" name="Generate Massive Data" width="90" x="135" y="90"> <parameter key="sparse_representation" value="false"/> </operator> <connect from_op="Generate Massive Data" from_port="output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process>
Mar 24, 2010 6:16:23 PM INFO: Decoupling process from location //R5 Forum/data. Process is now associated with file //R5 Forum/data.Mar 24, 2010 6:17:15 PM INFO: No filename given for result file, using stdout for logging results!Mar 24, 2010 6:17:15 PM INFO: Loading initial data.Mar 24, 2010 6:17:15 PM INFO: Process startsMar 24, 2010 6:17:21 PM INFO: Saving results.Mar 24, 2010 6:17:21 PM INFO: Process finished successfully after 5 s
haddock wrote:Hi there Dimas,I'm not clear as to what you want to know, but you should understand that RM can handle much larger datasets than you are talking about. For example if I run the following to get a 10k * 10k matrix ... <?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="5.0"> <context> <input> <location/> </input> <output> <location/> <location/> </output> <macros/> </context> <operator activated="true" class="process" expanded="true" name="Process"> <process expanded="true" height="391" width="915"> <operator activated="true" class="generate_massive_data" expanded="true" height="60" name="Generate Massive Data" width="90" x="135" y="90"> <parameter key="sparse_representation" value="false"/> </operator> <connect from_op="Generate Massive Data" from_port="output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process> It doesn't take too long.Just so you can compare I'm on XP64 double quad with 16G, and for windows boxes it is that 64 that matters, as 32 bit boxes can only address 3??G ( you'll have to Google for the right number ).So the bottom line is that the main strategy is to have lots of memory, if I remember correctly..
Sebastian Land wrote:Hi Dimas,of course the amount of memory does make a difference. If the data doesn't fit into the memory, it either fails or you will need it to stream it from a database what might slow down your process a lot.Coming back to the strategy question: RapidMiner offers several methods for selecting attributes. You might either use the Forward Selection or Backward Elimination operator as a simple start. If that does not suit your needs or they take too long, you might take another operator from the package and it's sub packages Data Transformation / Attribute Set Reduction and Transformation / Selection.Greetings, Sebastian