GPU slower than CPU

varunm1
varunm1 New Altair Community Member
edited November 5 in Community Q&A
Hi,

I switched Deep learning to use GPU instead of CPU(1 core), but this runs slower. I see that the GPU utilization is very less (2 to 3%) while the process is running. When I use CPU the CPU utilization is 70% approx. I am using a batch size of 32. Is it because of the smaller batch size?

Thanks,
Varun
Tagged:

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi @varunm1,
    on how many examples are you learning? Keep in mind that the cost of getting it on the GPU is fairly high for small data sets. GPUs are useful if your data gets a bit larger.

    BR,
    Martin
  • hughesfleming68
    hughesfleming68 New Altair Community Member
    edited January 2019
    I have seen this as well but it does not seem to be specific to any particular DL software. The last time I tested this with tensorflow, my CPU with 28 threads was 2x faster than the GPU. For my data sets, I have not found the GPU to help much so I guess it really depends on what you are trying to do. I have also noticed the low gpu utilization, I was under the impression at the time that Windows was not reporting those stats very well.
  • varunm1
    varunm1 New Altair Community Member
    Hi @mschmitz @hughesfleming68

    Ya true what you said but the datasets are 400k and 1 million samples with 102 attributes. Thats the reason why I felt something wrong after looking at the utilization rates comparing both cpu and gpu. One interesting observation is that earlier for a similar data set gpu utilization is around 30 to 40 percent.

    One more thing is that the dataset is sparse

    Thanks
    Varun


  • David_A
    David_A New Altair Community Member
    Hi @varunm1,

    could you perhaps share your network setup with us? It would be interesting to see if there is room for improvements?

    Best,
    David
  • varunm1
    varunm1 New Altair Community Member
    Hi @David_A

    Do you mean the xml code of neural network process?

    Regards,
    Varun
  • David_A
    David_A New Altair Community Member
    Yes,

    with that it's easier to compare the CPU vs. GPU performance.
  • varunm1
    varunm1 New Altair Community Member
    edited January 2019
    @David_A

    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve Subject_Assistment_Concentration_Clean_100" width="90" x="45" y="187">
            <parameter key="repository_entry" value="../../data/AIED_2019_100/Subject_Assistment_Concentration_Clean_100"/>
          </operator>
          <operator activated="true" class="concurrency:cross_validation" compatibility="9.1.000" expanded="true" height="166" name="Cross Validation" width="90" x="514" y="493">
            <parameter key="split_on_batch_attribute" value="false"/>
            <parameter key="leave_one_out" value="false"/>
            <parameter key="number_of_folds" value="5"/>
            <parameter key="sampling_type" value="automatic"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="deeplearning:dl4j_sequential_neural_network" compatibility="0.9.000" expanded="true" height="103" name="Deep Learning" width="90" x="179" y="34">
                <parameter key="loss_function" value="Cross Entropy (Binary Classification)"/>
                <parameter key="epochs" value="20"/>
                <parameter key="use_miniBatch" value="true"/>
                <parameter key="batch_size" value="32"/>
                <parameter key="updater" value="Adam"/>
                <parameter key="learning_rate" value="0.01"/>
                <parameter key="momentum" value="0.9"/>
                <parameter key="rho" value="0.95"/>
                <parameter key="epsilon" value="1.0E-6"/>
                <parameter key="beta1" value="0.9"/>
                <parameter key="beta2" value="0.999"/>
                <parameter key="RMSdecay" value="0.95"/>
                <parameter key="weight_initialization" value="ReLU"/>
                <parameter key="bias_initialization" value="0.0"/>
                <parameter key="use_regularization" value="false"/>
                <parameter key="l1_strength" value="0.1"/>
                <parameter key="l2_strength" value="0.1"/>
                <parameter key="optimization_method" value="Stochastic Gradient Descent"/>
                <parameter key="backpropagation" value="Standard"/>
                <parameter key="backpropagation_length" value="50"/>
                <parameter key="infer_input_shape" value="true"/>
                <parameter key="network_type" value="Simple Neural Network"/>
                <parameter key="log_each_epoch" value="true"/>
                <parameter key="epochs_per_log" value="10"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
                <process expanded="true">
                  <operator activated="true" class="deeplearning:dl4j_convolutional_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Convolutional Layer" width="90" x="45" y="340">
                    <parameter key="number_of_activation_maps" value="32"/>
                    <parameter key="kernel_size" value="102.5"/>
                    <parameter key="stride_size" value="1.1"/>
                    <parameter key="activation_function" value="ReLU (Rectified Linear Unit)"/>
                    <parameter key="use_dropout" value="true"/>
                    <parameter key="dropout_rate" value="0.5"/>
                    <parameter key="overwrite_networks_weight_initialization" value="false"/>
                    <parameter key="weight_initialization" value="Normal"/>
                    <parameter key="overwrite_networks_bias_initialization" value="false"/>
                    <parameter key="bias_initialization" value="0.0"/>
                  </operator>
                  <operator activated="true" class="deeplearning:dl4j_pooling_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Pooling Layer" width="90" x="179" y="340">
                    <parameter key="Pooling Method" value="max"/>
                    <parameter key="PNorm Value" value="1.0"/>
                    <parameter key="Kernel Size" value="2.2"/>
                    <parameter key="Stride Size" value="1.1"/>
                  </operator>
                  <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Fully-Connected Layer" width="90" x="112" y="85">
                    <parameter key="number_of_neurons" value="256"/>
                    <parameter key="activation_function" value="ReLU (Rectified Linear Unit)"/>
                    <parameter key="use_dropout" value="true"/>
                    <parameter key="dropout_rate" value="0.5"/>
                    <parameter key="overwrite_networks_weight_initialization" value="false"/>
                    <parameter key="weight_initialization" value="Normal"/>
                    <parameter key="overwrite_networks_bias_initialization" value="false"/>
                    <parameter key="bias_initialization" value="0.0"/>
                    <description align="center" color="transparent" colored="false" width="126">You can choose a number of neurons to decide how many internal attributes are created.</description>
                  </operator>
                  <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Fully-Connected Layer (2)" width="90" x="514" y="85">
                    <parameter key="number_of_neurons" value="2"/>
                    <parameter key="activation_function" value="Softmax"/>
                    <parameter key="use_dropout" value="false"/>
                    <parameter key="dropout_rate" value="0.25"/>
                    <parameter key="overwrite_networks_weight_initialization" value="false"/>
                    <parameter key="weight_initialization" value="Normal"/>
                    <parameter key="overwrite_networks_bias_initialization" value="false"/>
                    <parameter key="bias_initialization" value="0.0"/>
                    <description align="center" color="transparent" colored="false" width="126">The last layer needs to be setup with an activation function, that fits the problem type.</description>
                  </operator>
                  <connect from_port="layerArchitecture" to_op="Add Convolutional Layer" to_port="layerArchitecture"/>
                  <connect from_op="Add Convolutional Layer" from_port="layerArchitecture" to_op="Add Pooling Layer" to_port="layerArchitecture"/>
                  <connect from_op="Add Pooling Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer" to_port="layerArchitecture"/>
                  <connect from_op="Add Fully-Connected Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer (2)" to_port="layerArchitecture"/>
                  <connect from_op="Add Fully-Connected Layer (2)" from_port="layerArchitecture" to_port="layerArchitecture"/>
                  <portSpacing port="source_layerArchitecture" spacing="0"/>
                  <portSpacing port="sink_layerArchitecture" spacing="0"/>
                  <description align="center" color="yellow" colored="true" height="254" resized="false" width="189" x="60" y="45">First Hidden Layer</description>
                  <description align="center" color="yellow" colored="false" height="254" resized="false" width="189" x="470" y="49">Output Layer</description>
                </process>
                <description align="center" color="transparent" colored="true" width="126">Open the Deep Learning operator by double-clicking on it, to discovere the layer setup.</description>
              </operator>
              <connect from_port="training set" to_op="Deep Learning" to_port="training set"/>
              <connect from_op="Deep Learning" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="187">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              </operator>
              <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="112" y="289"/>
              <operator activated="true" class="performance" compatibility="9.1.000" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="340">
                <parameter key="use_example_weights" value="true"/>
              </operator>
              <operator activated="true" class="performance_classification" compatibility="9.1.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
                <parameter key="main_criterion" value="first"/>
                <parameter key="accuracy" value="true"/>
                <parameter key="classification_error" value="false"/>
                <parameter key="kappa" value="true"/>
                <parameter key="weighted_mean_recall" value="false"/>
                <parameter key="weighted_mean_precision" value="false"/>
                <parameter key="spearman_rho" value="false"/>
                <parameter key="kendall_tau" value="false"/>
                <parameter key="absolute_error" value="false"/>
                <parameter key="relative_error" value="false"/>
                <parameter key="relative_error_lenient" value="false"/>
                <parameter key="relative_error_strict" value="false"/>
                <parameter key="normalized_absolute_error" value="false"/>
                <parameter key="root_mean_squared_error" value="true"/>
                <parameter key="root_relative_squared_error" value="false"/>
                <parameter key="squared_error" value="false"/>
                <parameter key="correlation" value="false"/>
                <parameter key="squared_correlation" value="false"/>
                <parameter key="cross-entropy" value="false"/>
                <parameter key="margin" value="false"/>
                <parameter key="soft_margin_loss" value="false"/>
                <parameter key="logistic_loss" value="false"/>
                <parameter key="skip_undefined_labels" value="true"/>
                <parameter key="use_example_weights" value="true"/>
                <list key="class_weights"/>
                <description align="center" color="transparent" colored="false" width="126">Calculate model performance</description>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Multiply" from_port="output 2" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="performance 2"/>
              <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_test set results" spacing="0"/>
              <portSpacing port="sink_performance 1" spacing="0"/>
              <portSpacing port="sink_performance 2" spacing="0"/>
              <portSpacing port="sink_performance 3" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Subject_Assistment_Concentration_Clean_100" from_port="output" to_op="Cross Validation" to_port="example set"/>
          <connect from_op="Cross Validation" from_port="model" to_port="result 3"/>
          <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/>
          <connect from_op="Cross Validation" from_port="performance 2" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="45" y="40">Creating a simple neural network with one hidden layer and an output layer.</description>
          <description align="center" color="green" colored="true" height="331" resized="true" width="275" x="285" y="79">Iris is a multi-class classification problem, therefore the network loss is set to &amp;quot;multiclass cross entropy&amp;quot;.</description>
        </process>
      </operator>
    </process>
    


  • David_A
    David_A New Altair Community Member
    Thanks a lot.

    I'll investigate it, but I can't promise anything on the short term.
    As @hughesfleming68 already mentioned, that's nothing RapidMiner specific and happens at a lot of Deep Learning frameworks.


  • varunm1
    varunm1 New Altair Community Member
    @David_A

    Sure no problem, I just want to bring it to your notice.

    Thanks,
    Varun