An Engineering Data Scientist’s Hierarchy of Needs
Data variety, accurate models, automated machine learning, generalizable predictions and design optimization are some of the Engineering Data Scientist's needs for survival, success and growth.
Maslow created the very popular Hierarchy of Needs in 1940s to explain human needs and growth. You may have seen some version of it in your classes or in some articles. It starts with physiological and safety needs as the most essential ones. These are also known as survival needs meaning if they are not met, a person cannot survive. It then builds on with psychological needs of belongingness and esteem. These potential. They are known as growth needs and includes creativity.
You may wonder what Maslow’s hierarchy of needs has anything to do with data science. In the engineering data science team in Altair, we often find ourselves reflecting on our experiences or having philosophical discussions about data science. It was in one of those discussions that we started stacking up our needs from data science and that is when I realized that an engineering data scientist’s needs is no different than a human needs in the sense that there are basic needs required for survival, build on it are needs for wider success and finally the needs for creativity and innovation. So, let me explain.
As we all are aware, the quality of a predictive machine learning model depends heavily on the data. The more data there is, the more variation it will contain, and the better the model will be. So that is one of the data needs. Next data need relates to the data format. For engineering applications, data usually comes in 3D shapes in CAD or CAE files. Machine learning algorithms cannot ingest this data as is, so they have to be prepped to be useful for model training. There are many ways to prep the data such as voxelization, multi-view images, and point clouds.
Moving on to performance needs, model accuracy is a must, otherwise it is junk in, junk out. But how accurate it needs to be depends on your objectives. If you are at the concept design phases, your predictive model needs to be good enough to give you the right A to B comparison. On the other hand, one thing that we tend to ignore until it becomes an issue is the training time required. If we are using neural net models that require powerful GPU machines to train, but we do not have access to such machines, similar to human beings whose basic needs are not met, the machine learning models will not survive in the production environment.
When it comes to usability needs, we need to look at the training and deployment processes. The fact that engineers are good at math does not mean they want to tweak hyperparameters of neural net models. They just want to make use of these models to make better design decisions. So, for success, we need to consider automated machine learning. We also cannot assume engineers would blindly use these models; they would want to know how the ML model makes decisions. This means that the ML models would have to be explainable.
Automated Machine Learning (AutoML) in Altair KnowledgeStudio
Once these needs are met, we move to the next level for growth; prediction, and prescription needs. If you can use the same process you developed for one application in multiple applications, you are in the growth zone as this will allow you quickly scale the use of machine learning models. One of the most important reasons why we want to be able to predict real-time is that we want to do more design exploration, we want to optimize and through these we want to innovate. And that is the uppermost need of an engineering data scientist!