"Write filters to disk"

chris_ml
chris_ml New Altair Community Member
edited November 2024 in Community Q&A
The operator ModelGrouper is a convenient solution if some preprocessing and predictions models must be
simultaneously written to disk. A data mining process also often contains some filters like the
"FeatureNameFilter" operator which are however not written to disk when the ModelWriter is used.

In the following code, is there a way to also dump the "FeatureNameFilter" into a file such that the complete
process can be later read in and be applied on unseen data?

<?xml version="1.0" encoding="US-ASCII"?>
<process version="4.4">

  <operator name="Root" class="Process" expanded="yes">
      <parameter key="logverbosity"    value="init"/>
      <parameter key="random_seed"      value="2001"/>
      <parameter key="encoding" value="SYSTEM"/>
      <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
          <parameter key="target_function"      value="polynomial classification"/>
          <parameter key="number_examples"      value="100"/>
          <parameter key="number_of_attributes" value="5"/>
          <parameter key="attributes_lower_bound"      value="-10.0"/>
          <parameter key="attributes_upper_bound"      value="10.0"/>
          <parameter key="local_random_seed"    value="-1"/>
          <parameter key="datamanagement"      value="double_array"/>
      </operator>
      <operator name="NoiseGenerator" class="NoiseGenerator">
          <parameter key="random_attributes"    value="3"/>
          <parameter key="label_noise"  value="0.05"/>
          <parameter key="default_attribute_noise"      value="0.0"/>
          <list key="noise">
          </list>
          <parameter key="offset"      value="0.0"/>
          <parameter key="linear_factor"        value="1.0"/>
          <parameter key="local_random_seed"    value="-1"/>
      </operator>
      <operator name="Normalization" class="Normalization">
          <parameter key="return_preprocessing_model"  value="true"/>
          <parameter key="create_view"  value="false"/>
          <parameter key="method"      value="Z-Transformation"/>
          <parameter key="min"  value="0.0"/>
          <parameter key="max"  value="1.0"/>
      </operator>
      <operator name="FeatureNameFilter" class="FeatureNameFilter">
          <parameter key="filter_special_features"      value="false"/>
          <parameter key="skip_features_with_name"      value="result"/>
      </operator>
      <operator name="NearestNeighbors" class="NearestNeighbors">
          <parameter key="keep_example_set"    value="false"/>
          <parameter key="k"    value="3"/>
          <parameter key="weighted_vote"        value="false"/>
          <parameter key="measure_types"        value="MixedMeasures"/>
          <parameter key="mixed_measure"        value="MixedEuclideanDistance"/>
          <parameter key="nominal_measure"      value="NominalDistance"/>
          <parameter key="numerical_measure"    value="EuclideanDistance"/>
          <parameter key="divergence"  value="GeneralizedIDivergence"/>
          <parameter key="kernel_type"  value="radial"/>
          <parameter key="kernel_gamma" value="1.0"/>
          <parameter key="kernel_sigma1"        value="1.0"/>
      <parameter key="kernel_sigma2"        value="0.0"/>
          <parameter key="kernel_sigma3"        value="2.0"/>
          <parameter key="kernel_degree"        value="3.0"/>
          <parameter key="kernel_shift" value="1.0"/>
          <parameter key="kernel_a"    value="1.0"/>
          <parameter key="kernel_b"    value="0.0"/>
      </operator>
      <operator name="ModelGrouper" class="ModelGrouper">
      </operator>
      <operator name="ModelWriter" class="ModelWriter">
          <parameter key="model_file"  value="combined_model_bin.mod"/>
          <parameter key="overwrite_existing_file"      value="true"/>
          <parameter key="output_type"  value="XML"/>
      </operator>
  </operator>

</process>

Answers

  • land
    land New Altair Community Member
    Hi Chris,
    this unfortunately is not possible. You still have to design a process for application. But you could use a trick for simplifying this:
    If you store all the preprocessing stuff in a single process, you might load and apply it in both the training process as well as in the apply process using the process embedder. Then this process behaves like a modell itself.

    Greetings,
      Sebastian

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.