nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

Features effecting Bottom Line (Revenue)

msacs09

Experts,

Can you please help me on how to perform a feature weights/contributing factors that effecting the revenue. We would like understand why are some instances of revenue low and some high, what is the differentiator. Please see the sample data. I wanted to see what features are affecting a revenue percentages.

Can you please help me how to approach this. I do have lot of nominal attributes, should i convert everything to numerical etc., can you point me to a sample process please.

As Always thanks you for your valuable advice and time

Find more posts tagged with

AI Studio

Finance

Feature Selection

Correlation

Accepted answers

All comments

Telcontar120

If you are understanding the univariate relationships between Revenue and other attributes one at a time, you should look at the Weighting operators. Weight by Correlation is good for numerical attributes and Weight by Information Gain or Weight by Chi Square is good for nominal variables.

These will only show you individual relationships. Your question may actually be about what combinations of factors are most associated with Revenue. If that is the case and you are interested in exploring multivariate relationships, then that is basically a supervised machine learning problem. In that case, you probably want to build a simple predictive model to start, using a highly interpretable algorithm. I suggest a simple Decision Tree model so you can get a sense of what combinations of factors are associated with different levels of Revenue.

In both cases, looking at the tutorial processes contained in RapidMiner will be useful for understanding the basic setup and use in RapidMiner.

msacs09

@Telcontar120 Thank you sir. Your understanding is exactly right. I need to "explore multivariate relationships effecting Revenue" . Can I kindly request a sample/similar process that I can infer please??

msacs09

Telcontar120 Thank you sir. Is there a sample process around exploring multivariate relationships please?

Telcontar120

Here's a simple cross-validation with a DT for a numerical dataset. You'll need to substitute your own dataset of course and make sure Revenue is set as the label.

<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000-BETA2">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.1.000-BETA2" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="120"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.1.000-BETA2" expanded="true" height="68" name="Retrieve Polynomial" width="90" x="112" y="85">
        <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Validation" width="90" x="380" y="34">
        <parameter key="split_on_batch_attribute" value="false"/>
        <parameter key="leave_one_out" value="false"/>
        <parameter key="number_of_folds" value="10"/>
        <parameter key="sampling_type" value="shuffled sampling"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <parameter key="enable_parallel_execution" value="true"/>
        <process expanded="true">
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.1.000-BETA2" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
            <parameter key="criterion" value="least_square"/>
            <parameter key="maximal_depth" value="10"/>
            <parameter key="apply_pruning" value="true"/>
            <parameter key="confidence" value="0.1"/>
            <parameter key="apply_prepruning" value="true"/>
            <parameter key="minimal_gain" value="0.01"/>
            <parameter key="minimal_leaf_size" value="2"/>
            <parameter key="minimal_size_for_split" value="4"/>
            <parameter key="number_of_prepruning_alternatives" value="3"/>
          </operator>
          <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <description align="left" color="green" colored="true" height="113" resized="true" width="284" x="33" y="148">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <operator activated="true" class="performance" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
            <parameter key="use_example_weights" value="true"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
          <description align="left" color="blue" colored="true" height="107" resized="true" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
        </process>
        <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
      </operator>
      <connect from_op="Retrieve Polynomial" from_port="output" to_op="Validation" to_port="example set"/>
      <connect from_op="Validation" from_port="model" to_port="result 1"/>
      <connect from_op="Validation" from_port="test result set" to_port="result 2"/>
      <connect from_op="Validation" from_port="performance 1" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

msacs09

@Telcontar120 Thank you very much sir. Can you suggest the best way to represent this via chart. What charts in Rapidminer would help us to interpret the below for the Business folks. Does the below sample says that Med has highest margin, since the count is 10?? Basically i want to extract the decision tree model and present in a meaningful way

RegressionTree

segment = global: 0.018 {count=4}
segment = local
|   Sector = AD: 0.016 {count=3}
|   Sector = ES: 0.011 {count=2}
segment = med: 0.020 {count=10}

msacs09

Telcontar120 Thank you sir. Is there a sample process around exploring multivariate relationships please?

Telcontar120

The sample process I provided earlier in this thread is suitable for exploring and showing multivariate relationships via a decision tree. You could also swap the learner and do something similar with a linear regression or GLM.