GPU slower than CPU
varunm1
New Altair Community Member
Hi,
I switched Deep learning to use GPU instead of CPU(1 core), but this runs slower. I see that the GPU utilization is very less (2 to 3%) while the process is running. When I use CPU the CPU utilization is 70% approx. I am using a batch size of 32. Is it because of the smaller batch size?
Thanks,
Varun
I switched Deep learning to use GPU instead of CPU(1 core), but this runs slower. I see that the GPU utilization is very less (2 to 3%) while the process is running. When I use CPU the CPU utilization is 70% approx. I am using a batch size of 32. Is it because of the smaller batch size?
Thanks,
Varun
0
Answers
-
Hi @varunm1,
on how many examples are you learning? Keep in mind that the cost of getting it on the GPU is fairly high for small data sets. GPUs are useful if your data gets a bit larger.
BR,
Martin2 -
I have seen this as well but it does not seem to be specific to any particular DL software. The last time I tested this with tensorflow, my CPU with 28 threads was 2x faster than the GPU. For my data sets, I have not found the GPU to help much so I guess it really depends on what you are trying to do. I have also noticed the low gpu utilization, I was under the impression at the time that Windows was not reporting those stats very well.
1 -
Hi @mschmitz @hughesfleming68
Ya true what you said but the datasets are 400k and 1 million samples with 102 attributes. Thats the reason why I felt something wrong after looking at the utilization rates comparing both cpu and gpu. One interesting observation is that earlier for a similar data set gpu utilization is around 30 to 40 percent.
One more thing is that the dataset is sparse
Thanks
Varun
0 -
Yes,
with that it's easier to compare the CPU vs. GPU performance.0 -
@David_A
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve Subject_Assistment_Concentration_Clean_100" width="90" x="45" y="187"> <parameter key="repository_entry" value="../../data/AIED_2019_100/Subject_Assistment_Concentration_Clean_100"/> </operator> <operator activated="true" class="concurrency:cross_validation" compatibility="9.1.000" expanded="true" height="166" name="Cross Validation" width="90" x="514" y="493"> <parameter key="split_on_batch_attribute" value="false"/> <parameter key="leave_one_out" value="false"/> <parameter key="number_of_folds" value="5"/> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="deeplearning:dl4j_sequential_neural_network" compatibility="0.9.000" expanded="true" height="103" name="Deep Learning" width="90" x="179" y="34"> <parameter key="loss_function" value="Cross Entropy (Binary Classification)"/> <parameter key="epochs" value="20"/> <parameter key="use_miniBatch" value="true"/> <parameter key="batch_size" value="32"/> <parameter key="updater" value="Adam"/> <parameter key="learning_rate" value="0.01"/> <parameter key="momentum" value="0.9"/> <parameter key="rho" value="0.95"/> <parameter key="epsilon" value="1.0E-6"/> <parameter key="beta1" value="0.9"/> <parameter key="beta2" value="0.999"/> <parameter key="RMSdecay" value="0.95"/> <parameter key="weight_initialization" value="ReLU"/> <parameter key="bias_initialization" value="0.0"/> <parameter key="use_regularization" value="false"/> <parameter key="l1_strength" value="0.1"/> <parameter key="l2_strength" value="0.1"/> <parameter key="optimization_method" value="Stochastic Gradient Descent"/> <parameter key="backpropagation" value="Standard"/> <parameter key="backpropagation_length" value="50"/> <parameter key="infer_input_shape" value="true"/> <parameter key="network_type" value="Simple Neural Network"/> <parameter key="log_each_epoch" value="true"/> <parameter key="epochs_per_log" value="10"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <process expanded="true"> <operator activated="true" class="deeplearning:dl4j_convolutional_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Convolutional Layer" width="90" x="45" y="340"> <parameter key="number_of_activation_maps" value="32"/> <parameter key="kernel_size" value="102.5"/> <parameter key="stride_size" value="1.1"/> <parameter key="activation_function" value="ReLU (Rectified Linear Unit)"/> <parameter key="use_dropout" value="true"/> <parameter key="dropout_rate" value="0.5"/> <parameter key="overwrite_networks_weight_initialization" value="false"/> <parameter key="weight_initialization" value="Normal"/> <parameter key="overwrite_networks_bias_initialization" value="false"/> <parameter key="bias_initialization" value="0.0"/> </operator> <operator activated="true" class="deeplearning:dl4j_pooling_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Pooling Layer" width="90" x="179" y="340"> <parameter key="Pooling Method" value="max"/> <parameter key="PNorm Value" value="1.0"/> <parameter key="Kernel Size" value="2.2"/> <parameter key="Stride Size" value="1.1"/> </operator> <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Fully-Connected Layer" width="90" x="112" y="85"> <parameter key="number_of_neurons" value="256"/> <parameter key="activation_function" value="ReLU (Rectified Linear Unit)"/> <parameter key="use_dropout" value="true"/> <parameter key="dropout_rate" value="0.5"/> <parameter key="overwrite_networks_weight_initialization" value="false"/> <parameter key="weight_initialization" value="Normal"/> <parameter key="overwrite_networks_bias_initialization" value="false"/> <parameter key="bias_initialization" value="0.0"/> <description align="center" color="transparent" colored="false" width="126">You can choose a number of neurons to decide how many internal attributes are created.</description> </operator> <operator activated="true" class="deeplearning:dl4j_dense_layer" compatibility="0.9.000" expanded="true" height="68" name="Add Fully-Connected Layer (2)" width="90" x="514" y="85"> <parameter key="number_of_neurons" value="2"/> <parameter key="activation_function" value="Softmax"/> <parameter key="use_dropout" value="false"/> <parameter key="dropout_rate" value="0.25"/> <parameter key="overwrite_networks_weight_initialization" value="false"/> <parameter key="weight_initialization" value="Normal"/> <parameter key="overwrite_networks_bias_initialization" value="false"/> <parameter key="bias_initialization" value="0.0"/> <description align="center" color="transparent" colored="false" width="126">The last layer needs to be setup with an activation function, that fits the problem type.</description> </operator> <connect from_port="layerArchitecture" to_op="Add Convolutional Layer" to_port="layerArchitecture"/> <connect from_op="Add Convolutional Layer" from_port="layerArchitecture" to_op="Add Pooling Layer" to_port="layerArchitecture"/> <connect from_op="Add Pooling Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer" to_port="layerArchitecture"/> <connect from_op="Add Fully-Connected Layer" from_port="layerArchitecture" to_op="Add Fully-Connected Layer (2)" to_port="layerArchitecture"/> <connect from_op="Add Fully-Connected Layer (2)" from_port="layerArchitecture" to_port="layerArchitecture"/> <portSpacing port="source_layerArchitecture" spacing="0"/> <portSpacing port="sink_layerArchitecture" spacing="0"/> <description align="center" color="yellow" colored="true" height="254" resized="false" width="189" x="60" y="45">First Hidden Layer</description> <description align="center" color="yellow" colored="false" height="254" resized="false" width="189" x="470" y="49">Output Layer</description> </process> <description align="center" color="transparent" colored="true" width="126">Open the Deep Learning operator by double-clicking on it, to discovere the layer setup.</description> </operator> <connect from_port="training set" to_op="Deep Learning" to_port="training set"/> <connect from_op="Deep Learning" from_port="model" to_port="model"/> <portSpacing port="source_training set" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="187"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="112" y="289"/> <operator activated="true" class="performance" compatibility="9.1.000" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="340"> <parameter key="use_example_weights" value="true"/> </operator> <operator activated="true" class="performance_classification" compatibility="9.1.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34"> <parameter key="main_criterion" value="first"/> <parameter key="accuracy" value="true"/> <parameter key="classification_error" value="false"/> <parameter key="kappa" value="true"/> <parameter key="weighted_mean_recall" value="false"/> <parameter key="weighted_mean_precision" value="false"/> <parameter key="spearman_rho" value="false"/> <parameter key="kendall_tau" value="false"/> <parameter key="absolute_error" value="false"/> <parameter key="relative_error" value="false"/> <parameter key="relative_error_lenient" value="false"/> <parameter key="relative_error_strict" value="false"/> <parameter key="normalized_absolute_error" value="false"/> <parameter key="root_mean_squared_error" value="true"/> <parameter key="root_relative_squared_error" value="false"/> <parameter key="squared_error" value="false"/> <parameter key="correlation" value="false"/> <parameter key="squared_correlation" value="false"/> <parameter key="cross-entropy" value="false"/> <parameter key="margin" value="false"/> <parameter key="soft_margin_loss" value="false"/> <parameter key="logistic_loss" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> <list key="class_weights"/> <description align="center" color="transparent" colored="false" width="126">Calculate model performance</description> </operator> <connect from_port="model" to_op="Apply Model" to_port="model"/> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Apply Model" from_port="labelled data" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="Performance" to_port="labelled data"/> <connect from_op="Multiply" from_port="output 2" to_op="Performance (2)" to_port="labelled data"/> <connect from_op="Performance (2)" from_port="performance" to_port="performance 2"/> <connect from_op="Performance" from_port="performance" to_port="performance 1"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_test set results" spacing="0"/> <portSpacing port="sink_performance 1" spacing="0"/> <portSpacing port="sink_performance 2" spacing="0"/> <portSpacing port="sink_performance 3" spacing="0"/> </process> </operator> <connect from_op="Retrieve Subject_Assistment_Concentration_Clean_100" from_port="output" to_op="Cross Validation" to_port="example set"/> <connect from_op="Cross Validation" from_port="model" to_port="result 3"/> <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/> <connect from_op="Cross Validation" from_port="performance 2" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="45" y="40">Creating a simple neural network with one hidden layer and an output layer.</description> <description align="center" color="green" colored="true" height="331" resized="true" width="275" x="285" y="79">Iris is a multi-class classification problem, therefore the network loss is set to &quot;multiclass cross entropy&quot;.</description> </process> </operator> </process>
0 -
Thanks a lot.
I'll investigate it, but I can't promise anything on the short term.
As @hughesfleming68 already mentioned, that's nothing RapidMiner specific and happens at a lot of Deep Learning frameworks.
2