I have a data set and I am trying KNN for that but I am not getting it, can u please help me?

HIMANI
HIMANI New Altair Community Member
edited November 2024 in Community Q&A
Please find the attachment below.

Best Answers

  • varunm1
    varunm1 New Altair Community Member
    edited July 2019 Answer ✓
    Hello @HIMANI

    It is not easy to create a process without any information like what kind of problem you are trying to solve (Classification or Regression)? What is your target label (Output)?  What kind of validation do you need (Split or  Cross Validation)? There are many things to do while developing a model. I created a K-NN model that does classification with an output label "Sex". I see your data has missing values, I used impute missing values operator to replace missing values with data. Finally, I selected a 5 fold cross-validation method to train and test data. Below is the XML code for the process (Click on SHOW). To use this XML code, you need to open a new process and then open XML window (View --> Show Panel --> XML). Copy the code from here and paste in XML window and click on the Green tick mark on XML window you can see the process. To run the process, you just need to import your dataset into rapidminer and then replace the dataset in the process with the imported data. Also, you can change your output attribute in set role operator (Current label = Sex). Sample process image is provided below.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve Nebula" width="90" x="45" y="85">
    <parameter key="repository_entry" value="//Local Repository/data/Nebula"/>
    </operator>
    <operator activated="true" class="impute_missing_values" compatibility="9.3.001" expanded="true" height="68" name="Impute Missing Values" width="90" x="246" y="85">
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="iterate" value="true"/>
    <parameter key="learn_on_complete_cases" value="true"/>
    <parameter key="order" value="chronological"/>
    <parameter key="sort" value="ascending"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    <process expanded="true">
    <operator activated="true" class="k_nn" compatibility="9.3.001" expanded="true" height="82" name="k-NN" width="90" x="179" y="85">
    <parameter key="k" value="5"/>
    <parameter key="weighted_vote" value="true"/>
    <parameter key="measure_types" value="MixedMeasures"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="EuclideanDistance"/>
    <parameter key="divergence" value="GeneralizedIDivergence"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    </operator>
    <connect from_port="example set source" to_op="k-NN" to_port="training set"/>
    <connect from_op="k-NN" from_port="model" to_port="model sink"/>
    <portSpacing port="source_example set source" spacing="0"/>
    <portSpacing port="sink_model sink" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="set_role" compatibility="9.3.001" expanded="true" height="82" name="Set Role" width="90" x="380" y="85">
    <parameter key="attribute_name" value="sex"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="9.3.001" expanded="true" height="145" name="Cross Validation" width="90" x="514" y="85">
    <parameter key="split_on_batch_attribute" value="false"/>
    <parameter key="leave_one_out" value="false"/>
    <parameter key="number_of_folds" value="5"/>
    <parameter key="sampling_type" value="automatic"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    <parameter key="enable_parallel_execution" value="true"/>
    <process expanded="true">
    <operator activated="true" class="k_nn" compatibility="9.3.001" expanded="true" height="82" name="k-NN (2)" width="90" x="112" y="85">
    <parameter key="k" value="5"/>
    <parameter key="weighted_vote" value="true"/>
    <parameter key="measure_types" value="MixedMeasures"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="EuclideanDistance"/>
    <parameter key="divergence" value="GeneralizedIDivergence"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    </operator>
    <connect from_port="training set" to_op="k-NN (2)" to_port="training set"/>
    <connect from_op="k-NN (2)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="9.3.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    <parameter key="create_view" value="false"/>
    </operator>
    <operator activated="true" class="performance_classification" compatibility="9.3.001" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <parameter key="main_criterion" value="first"/>
    <parameter key="accuracy" value="true"/>
    <parameter key="classification_error" value="false"/>
    <parameter key="kappa" value="true"/>
    <parameter key="weighted_mean_recall" value="false"/>
    <parameter key="weighted_mean_precision" value="false"/>
    <parameter key="spearman_rho" value="false"/>
    <parameter key="kendall_tau" value="false"/>
    <parameter key="absolute_error" value="false"/>
    <parameter key="relative_error" value="false"/>
    <parameter key="relative_error_lenient" value="false"/>
    <parameter key="relative_error_strict" value="false"/>
    <parameter key="normalized_absolute_error" value="false"/>
    <parameter key="root_mean_squared_error" value="false"/>
    <parameter key="root_relative_squared_error" value="false"/>
    <parameter key="squared_error" value="false"/>
    <parameter key="correlation" value="false"/>
    <parameter key="squared_correlation" value="false"/>
    <parameter key="cross-entropy" value="false"/>
    <parameter key="margin" value="false"/>
    <parameter key="soft_margin_loss" value="false"/>
    <parameter key="logistic_loss" value="false"/>
    <parameter key="skip_undefined_labels" value="true"/>
    <parameter key="use_example_weights" value="true"/>
    <list key="class_weights"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Nebula" from_port="output" to_op="Impute Missing Values" to_port="example set in"/>
    <connect from_op="Impute Missing Values" from_port="example set out" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>



    Hope this helps.

Answers

  • varunm1
    varunm1 New Altair Community Member
    edited July 2019 Answer ✓
    Hello @HIMANI

    It is not easy to create a process without any information like what kind of problem you are trying to solve (Classification or Regression)? What is your target label (Output)?  What kind of validation do you need (Split or  Cross Validation)? There are many things to do while developing a model. I created a K-NN model that does classification with an output label "Sex". I see your data has missing values, I used impute missing values operator to replace missing values with data. Finally, I selected a 5 fold cross-validation method to train and test data. Below is the XML code for the process (Click on SHOW). To use this XML code, you need to open a new process and then open XML window (View --> Show Panel --> XML). Copy the code from here and paste in XML window and click on the Green tick mark on XML window you can see the process. To run the process, you just need to import your dataset into rapidminer and then replace the dataset in the process with the imported data. Also, you can change your output attribute in set role operator (Current label = Sex). Sample process image is provided below.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve Nebula" width="90" x="45" y="85">
    <parameter key="repository_entry" value="//Local Repository/data/Nebula"/>
    </operator>
    <operator activated="true" class="impute_missing_values" compatibility="9.3.001" expanded="true" height="68" name="Impute Missing Values" width="90" x="246" y="85">
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="iterate" value="true"/>
    <parameter key="learn_on_complete_cases" value="true"/>
    <parameter key="order" value="chronological"/>
    <parameter key="sort" value="ascending"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    <process expanded="true">
    <operator activated="true" class="k_nn" compatibility="9.3.001" expanded="true" height="82" name="k-NN" width="90" x="179" y="85">
    <parameter key="k" value="5"/>
    <parameter key="weighted_vote" value="true"/>
    <parameter key="measure_types" value="MixedMeasures"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="EuclideanDistance"/>
    <parameter key="divergence" value="GeneralizedIDivergence"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    </operator>
    <connect from_port="example set source" to_op="k-NN" to_port="training set"/>
    <connect from_op="k-NN" from_port="model" to_port="model sink"/>
    <portSpacing port="source_example set source" spacing="0"/>
    <portSpacing port="sink_model sink" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="set_role" compatibility="9.3.001" expanded="true" height="82" name="Set Role" width="90" x="380" y="85">
    <parameter key="attribute_name" value="sex"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="9.3.001" expanded="true" height="145" name="Cross Validation" width="90" x="514" y="85">
    <parameter key="split_on_batch_attribute" value="false"/>
    <parameter key="leave_one_out" value="false"/>
    <parameter key="number_of_folds" value="5"/>
    <parameter key="sampling_type" value="automatic"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    <parameter key="enable_parallel_execution" value="true"/>
    <process expanded="true">
    <operator activated="true" class="k_nn" compatibility="9.3.001" expanded="true" height="82" name="k-NN (2)" width="90" x="112" y="85">
    <parameter key="k" value="5"/>
    <parameter key="weighted_vote" value="true"/>
    <parameter key="measure_types" value="MixedMeasures"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="EuclideanDistance"/>
    <parameter key="divergence" value="GeneralizedIDivergence"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    </operator>
    <connect from_port="training set" to_op="k-NN (2)" to_port="training set"/>
    <connect from_op="k-NN (2)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="9.3.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    <parameter key="create_view" value="false"/>
    </operator>
    <operator activated="true" class="performance_classification" compatibility="9.3.001" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <parameter key="main_criterion" value="first"/>
    <parameter key="accuracy" value="true"/>
    <parameter key="classification_error" value="false"/>
    <parameter key="kappa" value="true"/>
    <parameter key="weighted_mean_recall" value="false"/>
    <parameter key="weighted_mean_precision" value="false"/>
    <parameter key="spearman_rho" value="false"/>
    <parameter key="kendall_tau" value="false"/>
    <parameter key="absolute_error" value="false"/>
    <parameter key="relative_error" value="false"/>
    <parameter key="relative_error_lenient" value="false"/>
    <parameter key="relative_error_strict" value="false"/>
    <parameter key="normalized_absolute_error" value="false"/>
    <parameter key="root_mean_squared_error" value="false"/>
    <parameter key="root_relative_squared_error" value="false"/>
    <parameter key="squared_error" value="false"/>
    <parameter key="correlation" value="false"/>
    <parameter key="squared_correlation" value="false"/>
    <parameter key="cross-entropy" value="false"/>
    <parameter key="margin" value="false"/>
    <parameter key="soft_margin_loss" value="false"/>
    <parameter key="logistic_loss" value="false"/>
    <parameter key="skip_undefined_labels" value="true"/>
    <parameter key="use_example_weights" value="true"/>
    <list key="class_weights"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Nebula" from_port="output" to_op="Impute Missing Values" to_port="example set in"/>
    <connect from_op="Impute Missing Values" from_port="example set out" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>



    Hope this helps.
  • HIMANI
    HIMANI New Altair Community Member
    I am new with rapidminer, and I am doing an assignment, in that I have to to analyse the dataset given to us in linear regression, logical regression, and knn but I dont know any of them can u please help me its my first quarter and this is a totally new subject for me and I really need to pass.
  • HIMANI
    HIMANI New Altair Community Member
    I can understand, I am not asking you to do the assignments, but I am really stuck we are doing a team project and my partner has ditched me, and so now I have to do all on my own, and the problem is I have so many things to do by just watching tutorials. The dataset which is given to me has the least possibilities of any result, therefore I am confused. It probably should be classification. Validation should be of yes no type thats all I know
  • HIMANI
    HIMANI New Altair Community Member
    I am trying to do a decision tree can you tell me if it is right or wrong. And I also don't know if this is a right file format to share.
  • HIMANI
    HIMANI New Altair Community Member
    Is that all? Can you please answer me?
  • HIMANI
    HIMANI New Altair Community Member
    At least help me with the decision tree, M just asking if it is right or wrong, I am not asking you to do the assignment. This decision tree is done after spending hours watching tutorials, but i dont get them properly.Please say at least yes or no
  • HIMANI
    HIMANI New Altair Community Member
    hello?