Mining Time series Data using Rapid Miner

uday
uday New Altair Community Member
edited November 5 in Community Q&A
Dear All,

This is regarding the Mining of Time series data.

I have a time series data as follows :

Date                Feature            Time
1/1/2013          Add              1:00:00 PM
1/1/2013          Sub              1:01:00 PM
1/1/2013          Add              1:02:00 PM
1/1/2013          Equals          1:03:00 PM
1/1/2013          Add              1:04:00 PM
1/1/2013          Equals          1:05:00 PM
1/1/2013          Add              1:04:00 PM
1/1/2013          Equals          1:05:00 PM
1/1/2013          Add              1:04:00 PM
1/1/2013          Equals          1:05:00 PM
1/1/2013          Add              1:06:00 PM
1/1/2013          Equals          1:07:00 PM
1/1/2013          Add              1:08:00 PM
1/1/2013          Equals          1:09:00 PM
1/1/2013          Add              1:10:00 PM
1/1/2013          Equals          1:11:00 PM
1/1/2013          Add              1:12:00 PM
1/1/2013          Equals          1:13:00 PM
1/1/2013          Add              1:14:00 PM
1/1/2013          Equals          1:15:00 PM
1/1/2013          Add              1:16:00 PM
1/1/2013          Equals          1:17:00 PM
1/1/2013          Add              1:18:00 PM
1/1/2013          Equals          1:19:00 PM
1/1/2013          Add              1:20:00 PM
1/1/2013          Equals          1:21:00 PM
1/1/2013          Add              1:22:00 PM
1/1/2013          Equals          1:23:00 PM

By observing the data is clear that most of the times "Add" is followed by "Equals"

Please help in identifying the appropriate Mining technique to arrive at such kind of result and procedure to do the same.

Thanks in Advance :)

Regards,
Uday.


Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hi Uday,

    you should install the Series extension from the Marketplace and use the Windowing operator to bring the data into the correct format. Define the Feature as label attribute, and remove the Date and Time columns if they are not important for the prediction.
    If you need any help, please let us know.

    Best regards,
    Marius
  • uday
    uday New Altair Community Member
    Dear Marius,

    Could you please help in modeling the same. When i used the model with the other data set it is not giving the accurate results.

    Please help me in this regard.

    Thanks in Advance :)

    Regards,
    Uday.
  • uday
    uday New Altair Community Member
    Dear Marius,

    Please help me in interpreting the following data table :
    Rows                    Items                                                    Size      Freq                Support                                          Score     
    1             Feature-1 = Equals                                     1.0 158.0 0.48466257668711654                           1.0
    2             Feature-0 = Equals                                     1.0 158.0 0.48466257668711654                           1.0
    3             Feature-1 = Addition                                     1.0 108.0 0.3312883435582822                           1.0
    4             Feature-0 = Addition                                     1.0 108.0 0.3312883435582822                           1.0
    5             Feature-1 = Subtraction                             1.0 56.0         0.17177914110429449                           1.0
    6             Feature-0 = Subtraction                             1.0 56.0         0.17177914110429449                           1.0
    7             Feature-1 = Equals, Feature-0 = Addition     2.0 107.0 0.3282208588957055                           2.044186591654946
    8             Feature-1 = Equals, Feature-0 = Subtraction  2.0 50.0          0.15337423312883436                           1.842224231464738
    9             Feature-0 = Equals, Feature-1 = Addition     2.0 102.0 0.3128834355828221 1.9486638537271452
    10             Feature-0 = Equals, Feature-1 = Subtraction  2.0 52.0  0.15950920245398773 1.9159132007233273
  • MariusHelf
    MariusHelf New Altair Community Member
    Please post the process that created this data table. Then it will be way easier for me to interpret the data.

    Best regards,
    Marius
  • uday
    uday New Altair Community Member
    Dear Marius,

    The process is as follows :

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="528" width="1016">
          <operator activated="true" breakpoints="after" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel" width="90" x="83" y="136">
            <parameter key="excel_file" value="C:\Users\IC014052\Documents\PatternDetect.xlsx"/>
            <list key="annotations"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Date.true.date.attribute"/>
              <parameter key="1" value="Feature.true.nominal.attribute"/>
              <parameter key="2" value="Time.true.time.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="246" y="120">
            <parameter key="name" value="Date"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" breakpoints="after" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing" width="90" x="380" y="75">
            <parameter key="horizon" value="1"/>
            <parameter key="window_size" value="2"/>
            <parameter key="create_label" value="true"/>
            <parameter key="label_attribute" value="Feature"/>
          </operator>
          <operator activated="true" breakpoints="after" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="210">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Feature-0|Feature-1||Feature-2"/>
          </operator>
          <operator activated="true" breakpoints="after" class="nominal_to_binominal" compatibility="5.2.008" expanded="true" height="94" name="Nominal to Binominal" width="90" x="581" y="300">
            <parameter key="attributes" value="|Feature-2|Feature-1|Feature-0"/>
            <parameter key="transform_binominal" value="true"/>
          </operator>
          <operator activated="true" breakpoints="after" class="fp_growth" compatibility="5.2.008" expanded="true" height="76" name="FP-Growth" width="90" x="648" y="165">
            <parameter key="min_support" value="0.2"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.2.008" expanded="true" height="94" name="Multiply" width="90" x="715" y="300"/>
          <operator activated="true" breakpoints="after" class="item_sets_to_data" compatibility="5.2.008" expanded="true" height="76" name="Item Sets to Data" width="90" x="849" y="300"/>
          <operator activated="true" breakpoints="after" class="create_association_rules" compatibility="5.2.008" expanded="true" height="76" name="Create Association Rules" width="90" x="782" y="75">
            <parameter key="min_confidence" value="0.4"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Windowing" to_port="example set input"/>
          <connect from_op="Windowing" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
          <connect from_op="FP-Growth" from_port="frequent sets" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Create Association Rules" to_port="item sets"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Item Sets to Data" to_port="frequent item sets"/>
          <connect from_op="Item Sets to Data" from_port="example set" to_port="result 2"/>
          <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

    Thanks & Regards,
    Uday
  • uday
    uday New Altair Community Member
    Dear Marius,

    Please do let know if any further information is required.

    Thanks & Regards,
    Uday.
  • MariusHelf
    MariusHelf New Altair Community Member
    You have searched for frequent item sets, and the data set you have posted are exactly those, i.e. sets of items that appear frequently in the data passed into FP-Growth. The first rows are item sets consisting of only one attribute each, whereas beginning from row 7 you have item sets containing 2 items. The support column describes in how many percent of all data sets that item/combination of items occurs.

    If you want to predict the value of the next value based on the current and/or previous values, frequent item sets and association rules are probably not the best choice. Try a classification algorithm instead.

    Best regards,
    Marius
  • uday
    uday New Altair Community Member
    Dear Marius,

    Thanks for the Reply :), Just need one clarification can we represent the output of the process in a graphical format , like tree view.

    Kindly help me in this regard.

    Thanks & Regards,
    Uday.
  • MariusHelf
    MariusHelf New Altair Community Member
    Uday,

    if you create a tree model, just connect the corresponding model output to the process output and you will get a visualization of the tree.

    Best regards,
    Marius
  • uday
    uday New Altair Community Member
    Dear Marius,

    Sorry for the delay in the response.

    Thanks for the Reply :)

    Just need one more clarification regarding the filtering of input data

    consider for example if the input data is in the following format:

    Date                Feature            Time
    1/1/2013          Add              1:00:00 PM
    1/1/2013          Add              1: 00:01 PM
    1/1/2013          Add              1: 00:02 PM
    1/1/2013          Add              1: 00:03 PM
    1/1/2013          Add              1: 00:04 PM
    1/1/2013          Add              1: 00:05 PM
    1/1/2013          Add              1: 00:06 PM
    1/1/2013          Add              1: 00:07 PM
    1/1/2013          Add              1: 00:08 PM
    1/1/2013          Sub              1:01:00 PM
    1/1/2013          Add              1:02:00 PM
    1/1/2013          Equals          1:03:00 PM
    1/1/2013          Add              1:04:00 PM
    1/1/2013          Equals          1:05:00 PM
    1/1/2013          Add              1:04:00 PM
    1/1/2013          Equals          1:05:00 PM
    1/1/2013          Add              1:04:00 PM
    1/1/2013          Add              1:16:00 PM
    1/1/2013          Equals          1:17:00 PM
    1/1/2013          Add              1:18:00 PM
    1/1/2013          Equals          1:19:00 PM
    1/1/2013          Add              1:20:00 PM
    1/1/2013          Equals          1:21:00 PM
    1/1/2013          Add              1:22:00 PM
    1/1/2013          Equals          1:23:00 PM
    1/1/2013          Sub              1:23:01 PM
    1/1/2013          Sub              1:23:01 PM
    1/1/2013          Sub              1:23:02 PM
    1/1/2013          Sub              1:23:03 PM
    1/1/2013          Sub              1:23:04 PM
    1/1/2013          Sub              1:23:05 PM
    1/1/2013          Sub              1:23:06 PM


    after applying the filtering or transformations on the data , the data should be as follows:
    Date                Feature            Time
    1/1/2013          Add              1: 00:08 PM
    1/1/2013          Sub              1:01:00 PM
    1/1/2013          Add              1:02:00 PM
    1/1/2013          Equals          1:03:00 PM
    1/1/2013          Add              1:04:00 PM
    1/1/2013          Equals          1:05:00 PM
    1/1/2013          Add              1:04:00 PM
    1/1/2013          Equals          1:05:00 PM
    1/1/2013          Add              1:04:00 PM
    1/1/2013          Add              1:16:00 PM
    1/1/2013          Equals          1:17:00 PM
    1/1/2013          Add              1:18:00 PM
    1/1/2013          Equals          1:19:00 PM
    1/1/2013          Add              1:20:00 PM
    1/1/2013          Equals          1:21:00 PM
    1/1/2013          Add              1:22:00 PM
    1/1/2013          Equals          1:23:00 PM
    1/1/2013          Sub              1:23:06 PM

    if on the same date the feature appears with in secs, i need to take the last occurrence of it.

    Please help in this regard.

    Kindly let me know what transformations or filtering is available in Rapid Miner.
  • uday
    uday New Altair Community Member
    Dear Marius,

    Could you please help me in creating the process , which takes the data of the following format and identify the frequent used patterns in the forward way

    This is regarding the Mining of Time series data.

    I have a time series data as follows :

    Date                Feature            Time
    1/1/2013          Add              1:00:00 PM
    1/1/2013          Sub              1:01:00 PM
    1/1/2013          Add              1:02:00 PM
    1/1/2013          Equals          1:03:00 PM
    1/1/2013          Add              1:04:00 PM
    1/1/2013          Equals          1:05:00 PM
    1/1/2013          Add              1:04:00 PM
    1/1/2013          Equals          1:05:00 PM
    1/1/2013          Add              1:04:00 PM
    1/1/2013          Equals          1:05:00 PM
    1/1/2013          Add              1:06:00 PM
    1/1/2013          Equals          1:07:00 PM
    1/1/2013          Add              1:08:00 PM
    1/1/2013          Equals          1:09:00 PM
    1/1/2013          Add              1:10:00 PM
    1/1/2013          Equals          1:11:00 PM
    1/1/2013          Add              1:12:00 PM
    1/1/2013          Equals          1:13:00 PM
    1/1/2013          Add              1:14:00 PM
    1/1/2013          Equals          1:15:00 PM
    1/1/2013          Add              1:16:00 PM
    1/1/2013          Equals          1:17:00 PM
    1/1/2013          Add              1:18:00 PM
    1/1/2013          Equals          1:19:00 PM
    1/1/2013          Add              1:20:00 PM
    1/1/2013          Equals          1:21:00 PM
    1/1/2013          Add              1:22:00 PM
    1/1/2013          Equals          1:23:00 PM

    By observing the data is clear that most of the times "Add" is followed by "Equals"

    To Arrive at this conclusion as you mentioned i have selected date as ID and done with the Windowing and applied nominal to binominal operator and then followed by the FP growth operator to identify the frequent itemsets.

    But i just want the result Add-> Equals  10(count)

    if i set the window size as 2.

    FeatureName-1 = Add -> FeatureName-0 = Equals (13)

    FeatureName-0 = Add -> FeatureName-1 = Equals (10)

    Which one to consider and i just only the forward rules.

    Thanks in Advance :)

    Regards,
    Uday
  • uday
    uday New Altair Community Member
    Dear Marius,

    Please do reply, this is very urgent.

    Sorry if i am commanding.

    Thanks in Advance :)

    Regards,
    Uday
  • MariusHelf
    MariusHelf New Altair Community Member
    Hi Uday,

    I have been on holidays. For very urgent questions we offer commercial support :)

    Anyway, your output is already what you requested, and even a bit more:

    FeatureName-1 = Add -> FeatureName-0 = Equals (13)
    FeatureName-0 = Add -> FeatureName-1 = Equals (10)

    This tells your that if FeatureName-1 (the previous action) is "Add", then FeatureName-0 (the current action) is likely to be "Equals". Of course there is also the other direction, represented by the second rule, i.e. if the current action is "Add" then it is likely that the previous action was Equals.

    Best regards,
    Marius