Data, Data Everywhere, Not a Byte to Use

Fatma Kocer-Poyraz
Fatma Kocer-Poyraz
Altair Employee

-originally published in December, 2018

Using Machine Learning (ML) for Computer-Aided Engineering (CAE) is an entirely different beast than using ML for other industries where we are used to seeing it the most, such as retail, recommendation engines, spam filtering, etc.

Unlike these industries, CAE does not have as much of an issue with confidentiality (knowing design parameter values does not help to reconstruct a design in most cases) and liability (engineers know the application physics and can verify the accuracy of the answer). However, CAE suffers from a big challenge: not enough data or even no data! Unlike other industries where data flows in large volumes every second, in the CAE industry we are trained to survive with very minimal data. In this industry, data is expensive to obtain and in some cases needs to be created from scratch. You may find this hard to believe, but there are also many cases where the data is either not retained or not organized. Before rolling your eyes at such oversights, note that CAE data requires larger storage than most other types of data. Gigabyte sized results files for physics-based simulations is not uncommon. In addition, the disciplines under the CAE umbrella such as FEA, CFD, MBD are all performed by individual experts – in the absence of an organized central data management system, the data ends up residing on many different hard drives making it harder to be utilized in machine learning processes.

Let me give you two examples to explain the issues in a bit more detail:

Two of my colleagues recently did a mesh-quality prediction project using TensorFlow. Just like any other ML project, this requires training and testing data. These datasets should be of similar size and resolution and they need to be parametrized. And of course, there should be enough of them. Who would have this type of data sitting around? Companies have a lot of meshed parts, but not in a shape or form needed to conduct ML. Hence my “Where’s Waldo” analogy: there may be a lot of data, but not the data you need. Let’s assume that somehow they have the data in the format needed. The next question that arises is why would they share it? After all, “data is the new oil” and therefore considered valuable. For this mesh-quality application, my colleagues have created one thousand 100mm by 100mm coupons that are parametrized by density and bias on each side. They meshed them, and one-by-one classified them as acceptable quality or not. Note the amount of domain expertise and the skillset that is required to come up with a practical idea for data set creation and automating the process. Once the data is created, creating a transfer learning model, looking at GRAD-CAM results to see which areas are the most indicative of mesh quality predictions was a piece of cake!

image

Image Credit: Copyright chriscom/Flickr/https://www.flickr.com/photos/chrigu/4279149885/in/photostream/

Another example is one most simulation experts are familiar with. As a CAE analyst, you would like to give the engineer or the designer a reduced order model which they can use for quick trade-off studies without a need for expensive simulations. Here you can run several simulations up front using Design of Experiments (DOE) and create a response surface with the dataset that can then be sent to the designer or engineer. However, not all DOE methods are created equal. Some are suitable for screening and others for space filling, which is what you need to create your dataset to conduct ML. Furthermore, if you have ever done this before, you may know that efficiency (number of simulations required upfront) and accuracy (of the predictive model) conflict with each other and you need a tool that finds the Pareto front in this multi-objective optimization problem. You may also know that this requires few iterations particularly in highly nonlinear cases where you most likely need to keep on adding points until you are satisfied with the accuracy. Where you put these additional points is very critical of course. You cannot waste limited resources, whether that be time or licenses by adding a design point close to a previously simulated one. So, you need to use a sampling DOE method that is extensible onto itself to achieve the highest accuracy with the lowest number of simulations. Another important point is the usability of this reduced order model. Engineers/designers should have easy access to it, such as an embedded worksheet, or they should be able to easily embed it to their systems, such as python dynamically linked libraries (dll). Both challenges are nicely solved in Altair HyperStudy, Altair’s design exploration and optimization product. Among many practical DOE methods, HyperStudy’s default method Modified Extensible Lattice Sequences (MELS) offers an extensible sampling method. HyperStudy can then export the response surface to python or spreadsheets for easy use by the engineer or designer.

image

Altair HyperStudy Trade-off Tab

The recent launch of our product line SmartWorks has been very exciting at Altair. Altair SmartWorks is Altair’s Internet of Things (IoT) and Business Intelligence (BI) platforms that make operational data of engineering applications accessible. With operation data, we can gain insights about the real working conditions of the same applications we helped design and improve on the assumptions we use during the design process, resulting in even more performance gains and overall cost reduction. This merging of simulation and IoT data is what paves the way for Digital Twin – a subject that will be written about in future posts.

In summary, the challenge in leveraging ML for CAE is not the actual modelling part, but getting useful data in the right format. Luckily at Altair we have talented engineers that can write automation scripts and intuitive products that can generate data efficiently for the highest accuracy. Nevertheless, if you see the “data”, let it know that we are desperately looking for more.