Running Deep Learning extension with CUDA 10.2?

jacobcybulski
jacobcybulski New Altair Community Member
edited November 5 in Community Q&A
I can see in the new version of the Deep Learning extension the requirement for CUDA 10.0. However the new Tensorflow, which I also use on my system, requires CUDA 10.1+ and runs with the newest one too, which is CUDA 10.2. The release notes for the extension suggest to contact RM for assistance. As it is, the preferences for the GPU/CPU switch are complaining about my CUDA. I imagine I may need to set up a multi-CUDA system on my Ubuntu 18.04? Or is there some easy tweak to run the extension with the newer version of CUDA?

Best Answers

  • pschlunder
    pschlunder New Altair Community Member
    edited April 2020 Answer ✓
    find a version build against CUDA 10.2 and cuDNN 7.6 here:

    (link is only valid until May 14th, if you need the extension and the link expired please point it out and we'll update).

    You can place the downloaded jar under your .RapidMiner/extensions folder. Once we'll release 0.9.4 it should be automatically used since it's a newer version.

    Another option would be to also install 10.0 and set the CUDA environment variable to the 10.0 version for the environment you're using RapidMiner in.

    Hope this helps,
    Philipp


  • jacobcybulski
    jacobcybulski New Altair Community Member
    Answer ✓
    @jczogalla I have got a workaround! When you export the settings for LD_LIBRARY_PATH and a PATH to /usr/local/cuda within Rapid-Miner.sh, miraculously it is then possible to switch from CPU to GPU and Deep Learning operators actually execute on a GPU!
    I have tried to set these environment variables in /etc/profile and /etc/environment but it did not matter. Perhaps there is some global setting for JVM?

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    i think the built we have on marketplace requieres a specific cuda version. We may be able to provide a custom build, right @pschlunder ?

    Cheers,
    Martin
  • jacobcybulski
    jacobcybulski New Altair Community Member
    Thanks @jczogalla and @mschmitz , I think I may need to reorganise my libraries to use a multiple CUDA setup :( Jacob
  • pschlunder
    pschlunder New Altair Community Member
    edited April 2020 Answer ✓
    find a version build against CUDA 10.2 and cuDNN 7.6 here:

    (link is only valid until May 14th, if you need the extension and the link expired please point it out and we'll update).

    You can place the downloaded jar under your .RapidMiner/extensions folder. Once we'll release 0.9.4 it should be automatically used since it's a newer version.

    Another option would be to also install 10.0 and set the CUDA environment variable to the 10.0 version for the environment you're using RapidMiner in.

    Hope this helps,
    Philipp


  • jacobcybulski
    jacobcybulski New Altair Community Member
    Hi @pschlunder , this would be fantastic! However, the link to rapidminer-my.sharepoint.com is not public so I cannot download it. If you could change its access to anyone this would be great. Thanks. Jacob
  • pschlunder
    pschlunder New Altair Community Member
    Oh, sorry! Updated.
  • jacobcybulski
    jacobcybulski New Altair Community Member
    edited April 2020
    @pschlunder , thanks a lot - I have downloaded the JAR file and will be playing with it. I've dropped it to .RapidMiner/extensions and it seems to be recognised. However, I am still having issues with the GPU. When I peeked into the .RapidMiner/extensions/workspace/rmx_deeplearning I can see the 9.4 SNAPSHOTS for cpu-backend and the libs, but the GPU-backend is still version 0.9.0 (in .javacpp cache there is only a CPU back end). Perhaps the GPU backend gets compiled only when the GPU option is happily accepted? Or is it only a CPU compiled snapshot? The RM error on switching to GPU backend is still that it is looking for CUDA 10.0.
    Jacob
  • jczogalla
    jczogalla New Altair Community Member
    Hi @jacobcybulski

    I think this might now be a problem with how your path is set up. You are correct to assume that the GPU backend is only extracted/installed when it finds the correct CUDA version. Make sure that your path contains the CUDA 10.2 location and that it is before any other CUDA references in the path. I'm not sure about other environment variables in Linux that Java might pick up about libraries...

    Cheers
    Jan
  • jacobcybulski
    jacobcybulski New Altair Community Member
    Thanks @jczogalla , I'll have to play with this, interestingly nvidia-smi finds it all just perfectly.
  • jczogalla
    jczogalla New Altair Community Member
    Yeah, I guess they have some better heuristics for that :)
  • jacobcybulski
    jacobcybulski New Altair Community Member
    edited April 2020
    Hi @jczogalla , it seems there is nothing I can do to make Deep Learning extension to switch to GPU, in neither of the versions of the extension. I removed all my NVIDIA drivers, CUDA and cuDNN libraries, cleaned the system and installed only CUDA 10.0 with cuDNN 7.4 as required. When switching to GPU I am always told it failed, which brings me to the only conclusion that RM Educational License is considered free for the purpose of running with GPU?
    In case my conclusion is incorrect, I include an observation here. The current CUDA toolkit may provide conflicting information as compared with the NVIDIA driver, which comes with its own CUDA libraries. So NVIDIA driver 415 comes with CUDA 10.0, 418 with CUDA 10.1 and 440 with CUDA 10.2, these versions are reported by nvidia-smi, irrespectively what is the current active version of CUDA installed in /usr/local/cuda and pointed to by $PATH and $LD_LIBRARY_PATH, which is reported with nvcc. So I ensured that all sources of system information on my Ubuntu 18.04 tell me the same story, here it is:
    jacob@goblin-galore:~$ nvidia-smi
    Thu Apr 23 16:12:41 2020      
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 415.27       Driver Version: 415.27       CUDA Version: 10.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 108...  Off  | 00000000:17:00.0 Off |                  N/A |
    |  0%   33C    P8    10W / 280W |      2MiB / 11178MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    |   1  GeForce GTX 108...  Off  | 00000000:65:00.0  On |                  N/A |
    |  0%   61C    P0    66W / 280W |    248MiB / 11175MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                              
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    1      1777      G   /usr/lib/xorg/Xorg                           167MiB |
    |    1      3121      G   /usr/bin/gnome-shell                          79MiB |
    +-----------------------------------------------------------------------------+

    jacob@goblin-galore:~$ nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130

    jacob@goblin-galore:~$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
    #define CUDNN_MAJOR 7
    #define CUDNN_MINOR 4
    #define CUDNN_PATCHLEVEL 2
    --
    #define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

    #include "driver_types.h"

    I am using RM 9.6 with Deep Learning 0.9.3 which gives me the same error as in the new 0.9.4 snapshot:
    Error while switching to GPU backend. Either CUDA 10.0 is not installed or you have a free license. Check the log for more information.
    Any ideas?
    Jacob







  • jczogalla
    jczogalla New Altair Community Member
    Hm. I am not sure about the license. Afaik, an educational license does not count as a free license.

    Regarding the error message, in the 0.9.4 snapshot version, we simply forgot to adjust the message to show 10.2 instead of 10.0...
    Can you provide your Studio log file? You can share it via PM if you like. Not sure how well the logging is, but maybe we can see something there.

    Other than that I am not sure why it would not work, since this version did work for other people before, but that might have been on Windows machines.
  • jacobcybulski
    jacobcybulski New Altair Community Member
    Unfortunately, I cannot test it on my Windows machine with GPUs as it is locked in my office at work, to be opened only after the COVID-19 goes away. I'll dig out the logs though as I am very keen on getting it right!
  • jacobcybulski
    jacobcybulski New Altair Community Member
    Answer ✓
    @jczogalla I have got a workaround! When you export the settings for LD_LIBRARY_PATH and a PATH to /usr/local/cuda within Rapid-Miner.sh, miraculously it is then possible to switch from CPU to GPU and Deep Learning operators actually execute on a GPU!
    I have tried to set these environment variables in /etc/profile and /etc/environment but it did not matter. Perhaps there is some global setting for JVM?
  • jczogalla
    jczogalla New Altair Community Member
    Hi @jacobcybulski
    That's great to hear! I think the problem here is the special handling on Linux systems with the LD library path. There might be global JVM settings, but that might hurt other java programs. And yes, you would have to touch the RapidMiner.sh file because there is no other way to put that in there.
    We'll make a note and think about a possibility to provide the cuda path as a setting, similar to what we do with Python.