How to combine Logistic regression with SOM as a hybrid model?

Hi,
I need to combine Logistic regression with SOM or DBSCAN as a hybrid model. This will be a hybrid "Classification + Clustering" model in which a classifier can be trained first, and its output is used as the input for the cluster to improve the clustering results.
Thanks,
Find more posts tagged with
Thanks for your response ...
The problem is when I hybridize the algorithms, the performance measures (accuracy, precision, recall) don't change even if I disable the x-validation operator which contains the logistic regression. I don't know why logistic regression cannot affect the overall performance...
Please see the attached file.
Thanks
Hi,
In this example, first I have applied decision tree (DT) on Titanic data. The resulting accuracy is 80.29%.
When the DT is hybridized with Fuzzy C-means (FCM), still the performance accuracy is 80.29%. This means that the system does not take into account the FCM. Is there another way to integrate the Classification and Clustering models? Can you help me on this issue?
DT process:
<?xml version="1.0" encoding="UTF-8"?>
<process version="7.2.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator class="process" name="Process" expanded="true" compatibility="7.2.002" activated="true">
<parameter value="init" key="logverbosity"/>
<parameter value="2001" key="random_seed"/>
<parameter value="never" key="send_mail"/>
<parameter value="" key="notification_email"/>
<parameter value="30" key="process_duration_for_mail"/>
<parameter value="SYSTEM" key="encoding"/>
<process expanded="true">
<operator class="retrieve" name="Retrieve Titanic" expanded="true" compatibility="7.2.002" activated="true" y="34" x="45" width="90" height="68">
<parameter value="//Samples/data/Titanic" key="repository_entry"/>
</operator>
<operator class="replace_missing_values" name="Replace Missing Values" expanded="true" compatibility="7.2.002" activated="true" y="136" x="45" width="90" height="103">
<parameter value="false" key="return_preprocessing_model"/>
<parameter value="false" key="create_view"/>
<parameter value="all" key="attribute_filter_type"/>
<parameter value="" key="attribute"/>
<parameter value="" key="attributes"/>
<parameter value="false" key="use_except_expression"/>
<parameter value="attribute_value" key="value_type"/>
<parameter value="false" key="use_value_type_exception"/>
<parameter value="time" key="except_value_type"/>
<parameter value="attribute_block" key="block_type"/>
<parameter value="false" key="use_block_type_exception"/>
<parameter value="value_matrix_row_start" key="except_block_type"/>
<parameter value="false" key="invert_selection"/>
<parameter value="false" key="include_special_attributes"/>
<parameter value="average" key="default"/>
<list key="columns"/>
</operator>
<operator class="set_role" name="Set Role" expanded="true" compatibility="7.2.002" activated="true" y="289" x="45" width="90" height="82">
<parameter value="Survived" key="attribute_name"/>
<parameter value="label" key="target_role"/>
<list key="set_additional_roles"/>
</operator>
<operator class="x_validation" name="Validation" expanded="true" compatibility="7.2.002" activated="true" y="34" x="246" width="90" height="124">
<parameter value="false" key="create_complete_model"/>
<parameter value="true" key="average_performances_only"/>
<parameter value="false" key="leave_one_out"/>
<parameter value="10" key="number_of_validations"/>
<parameter value="automatic" key="sampling_type"/>
<parameter value="false" key="use_local_random_seed"/>
<parameter value="1992" key="local_random_seed"/>
<process expanded="true">
<operator class="parallel_decision_tree" name="Decision Tree" expanded="true" compatibility="7.2.002" activated="true" y="34" x="162" width="90" height="82">
<parameter value="gain_ratio" key="criterion"/>
<parameter value="20" key="maximal_depth"/>
<parameter value="true" key="apply_pruning"/>
<parameter value="0.25" key="confidence"/>
<parameter value="true" key="apply_prepruning"/>
<parameter value="0.1" key="minimal_gain"/>
<parameter value="2" key="minimal_leaf_size"/>
<parameter value="4" key="minimal_size_for_split"/>
<parameter value="3" key="number_of_prepruning_alternatives"/>
</operator>
<connect to_port="training set" to_op="Decision Tree" from_port="training"/>
<connect to_port="model" from_port="model" from_op="Decision Tree"/>
<connect to_port="through 1" from_port="exampleSet" from_op="Decision Tree"/>
<portSpacing spacing="0" port="source_training"/>
<portSpacing spacing="0" port="sink_model"/>
<portSpacing spacing="0" port="sink_through 1"/>
<portSpacing spacing="0" port="sink_through 2"/>
</process>
<process expanded="true">
<operator class="apply_model" name="Apply Model" expanded="true" compatibility="7.2.002" activated="true" y="34" x="112" width="90" height="82">
<list key="application_parameters"/>
<parameter value="false" key="create_view"/>
</operator>
<operator class="performance" name="Performance" expanded="true" compatibility="7.2.002" activated="true" y="136" x="246" width="90" height="82">
<parameter value="true" key="use_example_weights"/>
</operator>
<connect to_port="model" to_op="Apply Model" from_port="model"/>
<connect to_port="unlabelled data" to_op="Apply Model" from_port="test set"/>
<connect to_port="labelled data" to_op="Performance" from_port="labelled data" from_op="Apply Model"/>
<connect to_port="averagable 1" from_port="performance" from_op="Performance"/>
<portSpacing spacing="0" port="source_model"/>
<portSpacing spacing="0" port="source_test set"/>
<portSpacing spacing="0" port="source_through 1"/>
<portSpacing spacing="0" port="source_through 2"/>
<portSpacing spacing="0" port="sink_averagable 1"/>
<portSpacing spacing="0" port="sink_averagable 2"/>
</process>
</operator>
<operator class="apply_model" name="Apply Model (3)" expanded="true" compatibility="7.2.002" activated="true" y="34" x="447" width="90" height="82">
<list key="application_parameters"/>
<parameter value="false" key="create_view"/>
</operator>
<operator class="performance" name="Performance (2)" expanded="true" compatibility="7.2.002" activated="true" y="85" x="648" width="90" height="82">
<parameter value="true" key="use_example_weights"/>
</operator>
<connect to_port="example set input" to_op="Replace Missing Values" from_port="output" from_op="Retrieve Titanic"/>
<connect to_port="example set input" to_op="Set Role" from_port="example set output" from_op="Replace Missing Values"/>
<connect to_port="training" to_op="Validation" from_port="example set output" from_op="Set Role"/>
<connect to_port="model" to_op="Apply Model (3)" from_port="model" from_op="Validation"/>
<connect to_port="unlabelled data" to_op="Apply Model (3)" from_port="training" from_op="Validation"/>
<connect to_port="labelled data" to_op="Performance (2)" from_port="labelled data" from_op="Apply Model (3)"/>
<connect to_port="result 1" from_port="performance" from_op="Performance (2)"/>
<portSpacing spacing="0" port="source_input 1"/>
<portSpacing spacing="0" port="sink_result 1"/>
<portSpacing spacing="0" port="sink_result 2"/>
</process>
</operator>
</process>
DT-FCM process:
<?xml version="1.0" encoding="UTF-8"?>
<process version="7.2.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator class="process" name="Process" expanded="true" compatibility="7.2.002" activated="true">
<parameter value="init" key="logverbosity"/>
<parameter value="2001" key="random_seed"/>
<parameter value="never" key="send_mail"/>
<parameter value="" key="notification_email"/>
<parameter value="30" key="process_duration_for_mail"/>
<parameter value="SYSTEM" key="encoding"/>
<process expanded="true">
<operator class="retrieve" name="Retrieve Titanic" expanded="true" compatibility="7.2.002" activated="true" y="34" x="45" width="90" height="68">
<parameter value="//Samples/data/Titanic" key="repository_entry"/>
</operator>
<operator class="replace_missing_values" name="Replace Missing Values" expanded="true" compatibility="7.2.002" activated="true" y="136" x="45" width="90" height="103">
<parameter value="false" key="return_preprocessing_model"/>
<parameter value="false" key="create_view"/>
<parameter value="all" key="attribute_filter_type"/>
<parameter value="" key="attribute"/>
<parameter value="" key="attributes"/>
<parameter value="false" key="use_except_expression"/>
<parameter value="attribute_value" key="value_type"/>
<parameter value="false" key="use_value_type_exception"/>
<parameter value="time" key="except_value_type"/>
<parameter value="attribute_block" key="block_type"/>
<parameter value="false" key="use_block_type_exception"/>
<parameter value="value_matrix_row_start" key="except_block_type"/>
<parameter value="false" key="invert_selection"/>
<parameter value="false" key="include_special_attributes"/>
<parameter value="average" key="default"/>
<list key="columns"/>
</operator>
<operator class="set_role" name="Set Role" expanded="true" compatibility="7.2.002" activated="true" y="289" x="45" width="90" height="82">
<parameter value="Survived" key="attribute_name"/>
<parameter value="label" key="target_role"/>
<list key="set_additional_roles"/>
</operator>
<operator class="x_validation" name="Validation" expanded="true" compatibility="7.2.002" activated="true" y="34" x="246" width="90" height="124">
<parameter value="false" key="create_complete_model"/>
<parameter value="true" key="average_performances_only"/>
<parameter value="false" key="leave_one_out"/>
<parameter value="10" key="number_of_validations"/>
<parameter value="automatic" key="sampling_type"/>
<parameter value="false" key="use_local_random_seed"/>
<parameter value="1992" key="local_random_seed"/>
<process expanded="true">
<operator class="parallel_decision_tree" name="Decision Tree" expanded="true" compatibility="7.2.002" activated="true" y="34" x="162" width="90" height="82">
<parameter value="gain_ratio" key="criterion"/>
<parameter value="20" key="maximal_depth"/>
<parameter value="true" key="apply_pruning"/>
<parameter value="0.25" key="confidence"/>
<parameter value="true" key="apply_prepruning"/>
<parameter value="0.1" key="minimal_gain"/>
<parameter value="2" key="minimal_leaf_size"/>
<parameter value="4" key="minimal_size_for_split"/>
<parameter value="3" key="number_of_prepruning_alternatives"/>
</operator>
<connect to_port="training set" to_op="Decision Tree" from_port="training"/>
<connect to_port="model" from_port="model" from_op="Decision Tree"/>
<connect to_port="through 1" from_port="exampleSet" from_op="Decision Tree"/>
<portSpacing spacing="0" port="source_training"/>
<portSpacing spacing="0" port="sink_model"/>
<portSpacing spacing="0" port="sink_through 1"/>
<portSpacing spacing="0" port="sink_through 2"/>
</process>
<process expanded="true">
<operator class="apply_model" name="Apply Model" expanded="true" compatibility="7.2.002" activated="true" y="34" x="112" width="90" height="82">
<list key="application_parameters"/>
<parameter value="false" key="create_view"/>
</operator>
<operator class="performance" name="Performance" expanded="true" compatibility="7.2.002" activated="true" y="136" x="246" width="90" height="82">
<parameter value="true" key="use_example_weights"/>
</operator>
<connect to_port="model" to_op="Apply Model" from_port="model"/>
<connect to_port="unlabelled data" to_op="Apply Model" from_port="test set"/>
<connect to_port="labelled data" to_op="Performance" from_port="labelled data" from_op="Apply Model"/>
<connect to_port="averagable 1" from_port="performance" from_op="Performance"/>
<portSpacing spacing="0" port="source_model"/>
<portSpacing spacing="0" port="source_test set"/>
<portSpacing spacing="0" port="source_through 1"/>
<portSpacing spacing="0" port="source_through 2"/>
<portSpacing spacing="0" port="sink_averagable 1"/>
<portSpacing spacing="0" port="sink_averagable 2"/>
</process>
</operator>
<operator class="apply_model" name="Apply Model (3)" expanded="true" compatibility="7.2.002" activated="true" y="34" x="447" width="90" height="82">
<list key="application_parameters"/>
<parameter value="false" key="create_view"/>
</operator>
<operator class="nominal_to_numerical" name="Nominal to Numerical" expanded="true" compatibility="7.2.002" activated="true" y="187" x="447" width="90" height="103">
<parameter value="false" key="return_preprocessing_model"/>
<parameter value="false" key="create_view"/>
<parameter value="all" key="attribute_filter_type"/>
<parameter value="" key="attribute"/>
<parameter value="" key="attributes"/>
<parameter value="false" key="use_except_expression"/>
<parameter value="nominal" key="value_type"/>
<parameter value="false" key="use_value_type_exception"/>
<parameter value="file_path" key="except_value_type"/>
<parameter value="single_value" key="block_type"/>
<parameter value="false" key="use_block_type_exception"/>
<parameter value="single_value" key="except_block_type"/>
<parameter value="false" key="invert_selection"/>
<parameter value="false" key="include_special_attributes"/>
<parameter value="dummy coding" key="coding_type"/>
<parameter value="false" key="use_comparison_groups"/>
<list key="comparison_groups"/>
<parameter value="all 0 and warning" key="unexpected_value_handling"/>
<parameter value="false" key="use_underscore_in_name"/>
</operator>
<operator class="prules:FCM" name="Fuzzy C-Means" expanded="true" compatibility="7.0.000" activated="true" y="391" x="447" width="90" height="103">
<parameter value="true" key="add_cluster_attribute"/>
<parameter value="false" key="add_as_label"/>
<parameter value="false" key="Add partition matrix"/>
<parameter value="3" key="Clusters"/>
<parameter value="50" key="Iterations"/>
<parameter value="2.0" key="Fuzzynes"/>
<parameter value="1.0E-4" key="MinGain"/>
<parameter value="MixedMeasures" key="measure_types"/>
<parameter value="MixedEuclideanDistance" key="mixed_measure"/>
<parameter value="NominalDistance" key="nominal_measure"/>
<parameter value="EuclideanDistance" key="numerical_measure"/>
<parameter value="GeneralizedIDivergence" key="divergence"/>
<parameter value="radial" key="kernel_type"/>
<parameter value="1.0" key="kernel_gamma"/>
<parameter value="1.0" key="kernel_sigma1"/>
<parameter value="0.0" key="kernel_sigma2"/>
<parameter value="2.0" key="kernel_sigma3"/>
<parameter value="3.0" key="kernel_degree"/>
<parameter value="1.0" key="kernel_shift"/>
<parameter value="1.0" key="kernel_a"/>
<parameter value="0.0" key="kernel_b"/>
<parameter value="false" key="use_local_random_seed"/>
<parameter value="1992" key="local_random_seed"/>
</operator>
<operator class="performance" name="Performance (2)" expanded="true" compatibility="7.2.002" activated="true" y="391" x="648" width="90" height="82">
<parameter value="true" key="use_example_weights"/>
</operator>
<connect to_port="example set input" to_op="Replace Missing Values" from_port="output" from_op="Retrieve Titanic"/>
<connect to_port="example set input" to_op="Set Role" from_port="example set output" from_op="Replace Missing Values"/>
<connect to_port="training" to_op="Validation" from_port="example set output" from_op="Set Role"/>
<connect to_port="model" to_op="Apply Model (3)" from_port="model" from_op="Validation"/>
<connect to_port="unlabelled data" to_op="Apply Model (3)" from_port="training" from_op="Validation"/>
<connect to_port="example set input" to_op="Nominal to Numerical" from_port="labelled data" from_op="Apply Model (3)"/>
<connect to_port="exampleSet" to_op="Fuzzy C-Means" from_port="example set output" from_op="Nominal to Numerical"/>
<connect to_port="labelled data" to_op="Performance (2)" from_port="exampleSet" from_op="Fuzzy C-Means"/>
<connect to_port="result 1" from_port="performance" from_op="Performance (2)"/>
<portSpacing spacing="0" port="source_input 1"/>
<portSpacing spacing="0" port="sink_result 1"/>
<portSpacing spacing="0" port="sink_result 2"/>
</process>
</operator>
</process>
Many thanks,
Komeil
I'm a bit confused as to why you want to first classify the data and then segment it? These are two different methods of learning (Supervised and Unsupervised). In the supervised method you start with knowing the truth, you know who died and didn't die in the Titanic disaster. Normally, in the Unsupervised way, you typically don't have a class label and look for statisical characteristics that 'segment' like groups together. In what you are trying to do here is build a model on the Titatinic data set with a label and then throw out that label and segement out the regular attributes. You will get different performance measures for sure, one for a classification problem and the other for a segementation problem.
If you're looking to combine multiple algorithms, have you tried our stacking (ensembing) operator?
Just take your pre-processed (ETL'd) data, feed it into a X-val with your Logistic Regression, the use an apply model on the outside to to score your training set and put it into the clustering algo. Of course I'm simplifying it, but it should be quite easy to do.
Update: Something like this?