"PCA for 101010101 series prediction"

wessel
wessel New Altair Community Member
edited November 2024 in Community Q&A
Dear All,

I have a process which predicts the next Boolean value given a Boolean series.

The processes first applies windowing.
Then a sliding window validation is ran.
Inside the sliding window validation PCA is applied.
After this integer {1, 0} values are converted Boolean.
And then a J48 learner is applied.
Before apply model  {1, 0} values are converted yet again converted to Boolean.

This constantly converting between Boolean to integer makes the processes really slow!
Is there a way to overcome this problem?
Can we apply PCA to a Boolean data set?

Best regards,

Wessel



<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
    <parameter key="parallelize_main_process" value="true"/>
    <process expanded="true" height="445" width="435">
      <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="DNA"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing" width="90" x="180" y="30">
        <parameter key="horizon" value="1"/>
        <parameter key="create_label" value="true"/>
        <parameter key="label_attribute" value="x"/>
      </operator>
      <operator activated="true" class="series:sliding_window_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="315" y="30">
        <parameter key="training_window_step_size" value="100"/>
        <parameter key="test_window_width" value="1"/>
        <parameter key="average_performances_only" value="false"/>
        <parameter key="parallelize_training" value="true"/>
        <parameter key="parallelize_testing" value="true"/>
        <process expanded="true" height="445" width="435">
          <operator activated="true" class="principal_component_analysis" compatibility="5.1.006" expanded="true" height="94" name="PCA" width="90" x="45" y="30"/>
          <operator activated="true" class="numerical_to_binominal" compatibility="5.1.006" expanded="true" height="76" name="Numerical to Binominal" width="90" x="180" y="30">
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="weka:W-J48" compatibility="5.1.000" expanded="true" height="76" name="W-J48" width="90" x="315" y="30"/>
          <connect from_port="training" to_op="PCA" to_port="example set input"/>
          <connect from_op="PCA" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
          <connect from_op="Numerical to Binominal" from_port="example set output" to_op="W-J48" to_port="training set"/>
          <connect from_op="W-J48" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="445" width="435">
          <operator activated="true" class="numerical_to_binominal" compatibility="5.1.006" expanded="true" height="76" name="Numerical to Binominal (2)" width="90" x="45" y="75">
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="180" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="5.1.006" expanded="true" height="76" name="Performance (2)" width="90" x="315" y="30">
            <parameter key="accuracy" value="false"/>
            <parameter key="kappa" value="true"/>
            <list key="class_weights"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Numerical to Binominal (2)" to_port="example set input"/>
          <connect from_op="Numerical to Binominal (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Windowing" to_port="example set input"/>
      <connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="36"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • land
    land New Altair Community Member
    Hi,
    actually you simply can leave the integer as integer. The J48 will convert them to bins itself. Or do I overlook something?

    Greetings,
    Sebastian

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.