Hello,
I am trying to do collaborative filtering however I am having difficulty reading the data in.
Originally I formatted the data as a map, where a line contained ID, ID, Boolean. This would process in a few seconds.
What I need is a matrix with the two ID fields being coordinates and the Boolean being the entry. I could not figure out how to do this.
I moved on to trying to use readSparse, however it now takes 1 minute to read in the data. This seems odd and probably wont scale.
*I am new to rapidMiner, any suggestions on resources would be great.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="-20" width="-50">
<operator activated="true" class="read_sparse" compatibility="5.2.008" expanded="true" height="60" name="Read Sparse" width="90" x="28" y="230">
<parameter key="format" value="yx"/>
<parameter key="data_file" value="*****************t"/>
<parameter key="dimension" value="216370"/>
<parameter key="datamanagement" value="boolean_sparse_array"/>
<list key="prefix_map"/>
</operator>
<connect from_op="Read Sparse" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Thanks in advance