PhysicsAI Error

Daehee Park_21836
Daehee Park_21836 Altair Community Member
edited October 15 in Community Q&A

Hi, I'm using Hyperworks 2024.

 

I am getting the error below when trying to train a PhysicsAI Model via GPU. (via CPU, there is no error)

[07:49:51] (INFO): ************************************************************************  [07:49:51] (INFO): **                                                                    **  [07:49:51] (INFO): **                                                                    **  [07:49:51] (INFO): **                     Altair PhysicsAI 2024.0                        **  [07:49:51] (INFO): **                                                                    **  [07:49:51] (INFO): **                 Advanced Machine Learning Software                 **  [07:49:51] (INFO): **                    from Altair Engineering, Inc.                   **  [07:49:51] (INFO): **                                                                    **  [07:49:51] (INFO): ** Build: 27.1ac51437e                                                **  [07:49:51] (INFO): ************************************************************************  [07:49:51] (INFO): **  COPYRIGHT (C) 2023-2023                 Altair Engineering, Inc.  **  [07:49:51] (INFO): ** All Rights Reserved.  Copyright notice does not imply publication. **  [07:49:51] (INFO): **         Contains trade secrets of Altair Engineering, Inc.         **  [07:49:51] (INFO): ** Decompilation or disassembly of this software strictly prohibited. **  [07:49:51] (INFO): ************************************************************************  [07:49:51] (INFO):   [07:49:52] (INFO): Matched subcases: 1  [07:49:52] (INFO):  - subcase 1: Subcase 1 (loadstep1)  [07:50:26] (INFO): ------------------------------------------------------------------------  [07:50:26] (INFO):   1. Building features and labels  [07:50:26] (INFO): ------------------------------------------------------------------------  [08:10:34] (INFO): Node features:  [08:10:34] (INFO):  name: cae.coord  [08:10:34] (INFO):  type: CONTINOUS  [08:10:34] (INFO):  length: 3  [08:10:34] (INFO):   [08:10:34] (INFO):  name: cae.part_label  [08:10:34] (INFO):  type: CATEGORICAL  [08:10:34] (INFO):  length: 1  [08:10:34] (INFO):   [08:10:34] (INFO): Edge features:  [08:10:34] (INFO):  name: cae.direction  [08:10:34] (INFO):  type: CONTINOUS  [08:10:34] (INFO):  length: 4  [08:10:34] (INFO):   [08:10:34] (INFO): Node labels:  [08:10:34] (INFO):  name: cae.results  [08:10:34] (INFO):  subcase: Subcase 1 (loadstep1)  [08:10:34] (INFO):  field: Displacement  [08:10:34] (INFO):  type: CONTINOUS  [08:10:34] (INFO):  length: 3  [08:10:34] (INFO):  Masks:  [08:10:34] (INFO):  - cae.nonshape_node_mask - active  [08:10:34] (INFO):   [08:10:34] (INFO): Vector features:  [08:10:34] (INFO): Vector labels:  [08:11:58] (INFO): ------------------------------------------------------------------------  [08:11:58] (INFO):   2. Training novelty detector  [08:11:58] (INFO): ------------------------------------------------------------------------  [08:11:58] (INFO): ------------------------------------------------------------------------  [08:11:58] (INFO):   3. Training/Validation split  [08:11:58] (INFO): ------------------------------------------------------------------------  [08:11:58] (INFO): Fraction     : 0.85  [08:11:58] (INFO): # training   : 37  [08:11:58] (INFO): # validation : 7  [08:11:58] (INFO):   [08:11:58] (INFO): ------------------------------------------------------------------------  [08:11:58] (INFO):   4. Initializing model  [08:11:58] (INFO): ------------------------------------------------------------------------  [08:12:06] (INFO): Width: 128  [08:12:06] (INFO): Depth: 8  [08:12:06] (INFO): Batch size: 2  [08:12:06] (INFO): Learning rate: 0.001  [08:12:06] (INFO): Early stopping enabled with a patience of: 400.0  [08:12:06] (INFO):   [08:12:06] (INFO): Total params: 787,267  [08:12:06] (INFO): Trainable params: 787,267  [08:12:06] (INFO): Non-trainable params: 0  [08:12:06] (INFO):   [08:12:06] (INFO): ------------------------------------------------------------------------  [08:12:06] (INFO):   5. Training  [08:12:06] (INFO): ------------------------------------------------------------------------  [08:12:29] (ERROR): *** UNEXPECTED ERROR ***  Module: execute  Line:   58  Type:   InternalError

 

CUDA v11.8 cuDNN 8.7 version installed correctly and GPU configuration checked.

nvcc --version  nvcc: NVIDIA (R) Cuda compiler driver  Copyright (c) 2005-2022 NVIDIA Corporation  Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022  Cuda compilation tools, release 11.8, V11.8.89  Build cuda_11.8.r11.8/compiler.31833905_0
Python GPU configuration  from tensorflow.python.client import device_lib  print(device_lib.list_local_devices())  [name: "/device:CPU:0"  device_type: "CPU"  memory_limit: 268435456  locality {  }  incarnation: 7197899504812016186  xla_global_id: -1  , name: "/device:GPU:0"  device_type: "GPU"  memory_limit: 6176714752  locality {    bus_id: 1    links {    }  }  incarnation: 7711125656020260411  physical_device_desc: "device: 0, name: Quadro RTX 4000, pci bus id: 0000:21:00.0, compute capability: 7.5"  xla_global_id: 416903419  ]

Answers

  • PaolaAG
    PaolaAG
    Altair Employee
    edited October 1

    Hello,

    Please can you set an env variable EDS_DEBUG=1 and provide any console output?

    Kind Regards,

    Paola.

  • Daehee Park_21836
    Daehee Park_21836 Altair Community Member
    edited October 15

    Hello,

    Please can you set an env variable EDS_DEBUG=1 and provide any console output?

    Kind Regards,

    Paola.

    I just resolved this problem.

    I added environment variable.

    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\extras\CUPTI\lib64