How can RM identify sequences in dataset?
olandesino
New Altair Community Member
HI all,
I've got a problem and it seems that i'm not able to solve by myself .
I tried to use RM to make a sequence analysis since i have a dataset containing many logs of test results.
My csv dataset contains 3 columns (3 attributes) and thousands of rows representing all values.
The problem is that each test case is 50 rows, so how can I tell to RM that
each 50 rows represent an indipendent group? so i can find interesting patterns "inside" each test case?
Note, there are 8800 test case in my data set, so is useless create 8800 files.
I hope is it clear.
Thx in advance.
A.Florio
I've got a problem and it seems that i'm not able to solve by myself .
I tried to use RM to make a sequence analysis since i have a dataset containing many logs of test results.
My csv dataset contains 3 columns (3 attributes) and thousands of rows representing all values.
The problem is that each test case is 50 rows, so how can I tell to RM that
each 50 rows represent an indipendent group? so i can find interesting patterns "inside" each test case?
Note, there are 8800 test case in my data set, so is useless create 8800 files.
I hope is it clear.
Thx in advance.
A.Florio
Tagged:
0
Answers
-
Bonsoir!
I think that the MultivariateSeries2WindowExamples operator may be what you need, here's an example of this bad boy at work on a mock up of your problem, 8800 entries, representing 176 rows of 50 attributes.<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_examples" value="8800"/>
<parameter key="number_of_attributes" value="3"/>
</operator>
<operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
<parameter key="window_size" value="50"/>
<parameter key="step_size" value="50"/>
</operator>
</operator>0 -
Thanks for the advice but it seems that (after the preprocessing) they are still not grouped in "sequences".
Besides, there are some problem too when RM says that an attribut must have the same type of value...and this is not my case :-(
Any other suggestions?
thank you anyway!
A.Florio0 -
What exactly did you mean by "group" ?so how can I tell to RM that
each 50 rows represent an indipendent group?
0 -
group like:
Serie1:
50 elements of attr 1
50 elements of attr 2
50 elements of attr 3.
Serie2:
|
|
SerieN
So that RM can apply its algorithm not on ALL values, but to the single series.
Example: find pattern through Apriori inside each series and after maybe compare them.
I know that is not so easy to understand my problem, but i try to explain it as the best way.
0 -
Hmm, the previous example produces 176 rows which contain the previous 50 values for each of the 3 attributes based on the notion that each 50 row clump is disinct, so just like your series. If you meant that each example is made up of the last 50 values for each attribute then you change the step size to one, like this, where we just look for sequence patterns in att3.
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_examples" value="8800"/>
<parameter key="number_of_attributes" value="3"/>
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="skip_features_with_name" value="att1|att2"/>
</operator>
<operator name="BinDiscretization" class="BinDiscretization">
<parameter key="range_name_type" value="short"/>
</operator>
<operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
<parameter key="window_size" value="50"/>
<parameter key="step_size" value="1"/>
</operator>
<operator name="W-Apriori" class="W-Apriori">
</operator>
</operator>0 -
I know that I'm close (thx to your help) but it is still not sufficient.
Let's put it in a simple way....I've 1 attribute with 150 elements (rows),
and i want to see in result mode on the 'data view' Series1, Series2, Series3
with under them, 50 values of the attributes.
if I do so :<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
then the output will be (in result mode->data view) : 3 example, 50 attributes' (wrong! I've 1 attribute and 3x50 values)
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="~/minim.aml"/>
</operator>
<operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples" breakpoints="after">
<parameter key="horizon" value="1"/>
<parameter key="window_size" value="50"/>
<parameter key="step_size" value="50"/>
<parameter key="add_incomplete_windows" value="true"/>
</operator>
i tried other series preprocessing operation like "index series" or "Single2series" but it still not what i want.
Meanwhile I want to say that I rally appreciate your help.
A.Florio
0 -
Does the following do it?Let's put it in a simple way....I've 1 attribute with 150 elements (rows),
and i want to see in result mode on the 'data view' Series1, Series2, Series3
with under them, 50 values of the attributes.<operator name="Root" class="Process" expanded="yes">
Hope so! Good weekend.
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_examples" value="150"/>
<parameter key="number_of_attributes" value="1"/>
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter" breakpoints="after">
<parameter key="filter_special_features" value="true"/>
<parameter key="skip_features_with_name" value="label"/>
</operator>
<operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
<parameter key="window_size" value="3"/>
<parameter key="step_size" value="3"/>
</operator>
<operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
<parameter key="replace_what" value="att.*-"/>
<parameter key="replace_by" value="Series_"/>
</operator>
</operator>0 -
In this way, i got 3 columns(ok), but the first one doesn't contains
the first 50 values of my dataset. The values are spread like
a matrix index (1st rows, 2nd rows, ...). how can i tell it to take the first 50 values,
put in the 1st column (1st series), second 50 values, put in 2nd column (2nd series) and so on?
Thank you a lot for your help.
A.Florio0 -
OK, now I see what you mean, at least I hope so! What about this?
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_examples" value="150"/>
<parameter key="number_of_attributes" value="1"/>
</operator>
<operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
<parameter key="window_size" value="50"/>
<parameter key="step_size" value="50"/>
</operator>
<operator name="ExampleSetTranspose" class="ExampleSetTranspose">
</operator>
<operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
<parameter key="replace_what" value="att"/>
<parameter key="replace_by" value="Series"/>
<parameter key="apply_on_special" value="false"/>
</operator>
</operator>0 -
I get this error message when i put my simple dataset with just 1 column (only 1 attribute)
AttributeTypeException
Process failed Message: Cannot map index of nominal attribute to nominal value: index 0 is out of bounds!
Even after a few changes in my dataset, i get always the same error, with out telling me where exactly is in the tree.
What it does mean?0 -
Without seeing the data there is not much I can say.0
-
This is just a piece of the 1 attribute of my dataset.
Too make things easier, I ignored (for now) other attributes.
It is a series of operations: numerical and nominal, nothing special.
[attachment deleted by admin]0 -
Hi,
I noticed a blank line at the end of your file, so I took that out and then copied and pasted 6 times to end up with 140 rows. In the following example I'm saying a series is 20 rows, so we should have 7 identical columns as series, and we do ;D<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource" activated="no">
<parameter key="attributes" value="C:\Program Files (x86)\Rapid-I\RapidMiner-4.3\simple2"/>
</operator>
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="C:\Users\CJFP\Documents\rm_workspace\simple-2.txt"/>
<parameter key="read_attribute_names" value="false"/>
</operator>
<operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
<parameter key="window_size" value="20"/>
<parameter key="step_size" value="20"/>
</operator>
<operator name="ExampleSetTranspose" class="ExampleSetTranspose">
</operator>
<operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
<parameter key="replace_what" value="att"/>
<parameter key="replace_by" value="Series"/>
</operator>
</operator>
[attachment deleted by admin]0