One Problem

kinkounio · March 2009

I have a file with more data and i compare to file with one data. The result will have one data of first file. The data more proxim to data of second file.

How to ??

IngoRM · March 2009

Hi,

this question has been asked during the last few days a few times. Here are the answers:

You have two options.

1. Load the data sets and merge them. Calculate a similarity measure for the merged data set. Filter out the combinations where your single data is not part of. Sort the rest. Use the one with the highest similariy. All the necessary operators are part of RapidMiner.

2. If the amount of data is rather large, then the calculation of the full similarity matrix is probably not applicable. In that case, you have to iterate over the examples, use only the current example, calculate the similarity with your single example of interest and store it via ProcessLog. Afterwards you can change the process log back to a data set, sort it etc.

Cheers,
Ingo

kinkounio · March 2009

Good moorning .

Where is the similar post?

Thanks.

kinkounio · March 2009

Hi.

I want to compare 2 archives.

historik.txt

1 73 15 16 13 14 15
2 123 25 26 23 24 25
3 173 35 36 33 34 35
4 224 45 46 43 44 46
5 274 55 56 53 54 56

dades.txt

25 26 23 24 25

The correct result would be the second row of the first file . Value: 123

With this code he is not correct. The result with this code is 73. That I have bad?

<operator name="Root" class="Process" expanded="yes">
<parameter key="resultfile" value="/home/rm_workspace/p2/resultat.res"/>
<operator name="InputHistorik" class="ExampleSource">
<parameter key="attributes" value="/home/rm_workspace/p2/historik.aml"/>
</operator>
<operator name="FeatureRangeRemoval" class="FeatureRangeRemoval">
<parameter key="first_attribute" value="1"/>
<parameter key="last_attribute" value="1"/>
</operator>
<operator name="NearestNeighbors" class="NearestNeighbors">
</operator>
<operator name="Diari" class="ExampleSource">
<parameter key="attributes" value="/home/rm_workspace/p2/dades.aml"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
</operator>

Files aml.

dades.aml
<?xml version="1.0" encoding="UTF-8"?>
<attributeset default_source="dades.dat">
<attribute
name = "dades.txt (1)"
sourcecol = "1"
valuetype = "integer"/>

<attribute
name = "dades.txt (2)"
sourcecol = "2"
valuetype = "integer"/>

<attribute
name = "dades.txt (3)"
sourcecol = "3"
valuetype = "integer"/>

<attribute
name = "dades.txt (4)"
sourcecol = "4"
valuetype = "integer"/>

<attribute
name = "dades.txt (5)"
sourcecol = "5"
valuetype = "integer"/>

</attributeset>

historik.aml

<?xml version="1.0" encoding="UTF-8"?>
<attributeset default_source="historik.dat">
<attribute
name = "historik.txt (1)"
sourcecol = "1"
valuetype = "integer"/>

<label
name = "historik.txt (2)"
sourcecol = "2"
valuetype = "integer"/>

<cluster
name = "historik.txt (3)"
sourcecol = "3"
valuetype = "integer"/>

<attribute
name = "historik.txt (4)"
sourcecol = "4"
valuetype = "integer"/>

<attribute
name = "historik.txt (5)"
sourcecol = "5"
valuetype = "integer"/>

<attribute
name = "historik.txt (6)"
sourcecol = "6"
valuetype = "integer"/>

<attribute
name = "historik.txt (7)"
sourcecol = "7"
valuetype = "integer"/>

</attributeset>

How I can do it?

Thanks.

haddock · March 2009

Hi,

The answer to your problem is that for some reason only known to yourself you call column three a cluster!

<cluster
name = "historik.txt (3)"
sourcecol = "3"
valuetype = "integer"/>

I've laid out the data in one file like this...

1 73 15 16 13 14 15
2 123 25 26 23 24 25
3 173 35 36 33 34 35
4 224 45 46 43 44 46
5 274 55 56 53 54 56
6 ? 25 26 23 24 25

and made the necessary code changes to this...

<operator name="Root" class="Process" expanded="yes">
    <parameter key="resultfile"	value="/home/rm_workspace/p2/resultat.res"/>
    <operator name="InputHistorik" class="ExampleSource">
        <parameter key="attributes"	value="C:\Program Files (x86)\Rapid-I\RapidMiner-4.3\historik"/>
    </operator>
    <operator name="NearestNeighbors" class="NearestNeighbors">
    </operator>
    <operator name="InputHistorik (2)" class="ExampleSource">
        <parameter key="attributes"	value="C:\Program Files (x86)\Rapid-I\RapidMiner-4.3\historik"/>
    </operator>
    <operator name="ExampleFilter" class="ExampleFilter">
        <parameter key="condition_class"	value="missing_labels"/>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
        <list key="application_parameters">
        </list>
    </operator>
</operator>

and rather unsurprisingly the correct answer emerges.

So the answer to

How I can do it?

is

With more care!

kinkounio · March 2009

Hi, haddock.

Your code it's not the solution. I woultd compare the atribute 3-7 of file 1 with atribute of file 2 and the result there is atribute 2 of file 1.

The column "cluster" is an error for me.

I would obtain one valor of the second column of file 1. This valor is the valor where the file 1 is the same valor of file 2.

In the example my, on compare 2 files the result it would have to give the second colum of second row of file 1.

Thanks.

haddock · March 2009

The correct result would be the second row of the first file . Value: 123

To make it even easier for you to comprehend I've put the data into CSV form, then we don't need AML files at all. So here is the data...

1, 73, 15, 16, 13, 14,15
2, 123, 25, 26, 23,24, 25
3, 173, 35, 36, 33, 34, 35
4, 224, 45, 46, 43, 44, 46
5, 274, 55, 56,53, 54, 56
6, , 25, 26, 23, 24, 25

For the same reason I've taken out the second data read and replaced it with a datacopy, like this...

<operator name="Root" class="Process" expanded="yes">
    <operator name="CSVExampleSource" class="CSVExampleSource" breakpoints="after">
        <parameter key="filename"	value="C:\Users\CJFP\Documents\rm_workspace\historik.txt"/>
        <parameter key="read_attribute_names"	value="false"/>
        <parameter key="label_column"	value="2"/>
        <parameter key="id_column"	value="1"/>
    </operator>
    <operator name="IOMultiplier" class="IOMultiplier">
        <parameter key="io_object"	value="ExampleSet"/>
    </operator>
    <operator name="ExampleFilter" class="ExampleFilter">
        <parameter key="condition_class"	value="missing_labels"/>
        <parameter key="invert_filter"	value="true"/>
    </operator>
    <operator name="NearestNeighbors" class="NearestNeighbors">
    </operator>
    <operator name="IOSelector" class="IOSelector">
        <parameter key="io_object"	value="ExampleSet"/>
    </operator>
    <operator name="ExampleFilter (2)" class="ExampleFilter">
        <parameter key="condition_class"	value="missing_labels"/>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
        <list key="application_parameters">
        </list>
    </operator>
</operator>

If I run this I get "123" as the answer, just like before, so I'm puzzled as to what you mean by the following

Your code it's not the solution. I woultd compare the atribute 3-7 of file 1 with atribute of file 2 and the result there is atribute 2 of file 1.

Perhaps you could enlighten us?

kinkounio · April 2009

Hi,
haddock thanks.

I will prove it.

One Problem

Answers

Categories