Starting with Data Science as an Engineer

Joseph Pajot
Joseph Pajot
Altair Employee
edited December 2021 in Other Discussion & Knowledge

Data science concepts can sound intimidating yet engineers are uniquely equipped to quickly pick them up. Many of the tools to get started are already on your computer.

As data science has grown into a hot field it has drawn many people into a new career trajectory and provided a welcome new home for former biologists, marketers, and linguists.  The internet is full of articles that describe how easy is it to get into data science without a technical background.  I’ve read plenty of these articles and can attest that they are fun to read, but they made me think about the people like myself who are coming at data science from a different perspective.  As an engineer, I’d like to believe I am technically competent, but I recall being both perplexed (an intimidated!) the first time I came across terms like neural networks and predictive modeling.   Over time I realized that even though I didn’t know it, I already understood many of the fundamental concepts.   In this article I’ll draw a line from familiar curve fitting techniques to modern machine learning.

In university, we are taught the basics of creating functions from data using techniques like cubic splines or least squares curve fitting.  Generally, this requires solving a system of linear equations.  At some level we probably recall the matrix-vector algebra, and many can probably even recognize the least squares formula for the coefficients of a linear polynomial.

image

In a mathematical matrix based programming language like OML used in Altair Compose, the following code shows a basic implementation of the above formula based on a given data set for a two variable linear function f(x,z) = A + Bx + Cz.

image

The code is broken down into 3 logical blocks.  The first block (lines 3-5) define the input data points.  The second block (lines 8-9) perform the matrix algebra.  The final block (lines 12-13) predicts the output value of our new function at a new input point : f_new = 10.835.

As machine learning has evolved, many open source implementations of algorithms have become available.  Python, specifically, has become a darling language with many excellent free machine learning packages readily available.   The code segment shown below solves the exact same problem as above using the scikit-learn module.

image

This code contains the same three logical blocks.  Although they perform the exact same tasks, the second and third block have a simpler syntax without the explicit linear algebra seen in the OML equivalent.  Despite the syntax changes, the prediction at the new point is the same.  That is because the underlying math is identical!

Taking this example one step further, it is a minor change to switch the predictive model from a linear regression to a multilayer perceptron neural network.  The highlighted line contains the only alteration from the previous example.

image

With the change in the predictive model, the resulting prediction at the new point has also changed to f_new = 10.725.  If you are wondering which model is more accurate, then congratulations, you are now thinking like a data scientist.

This very simple introduction has simplified the overall process for the sake of clarity.  For example, most predictive models have optional parameters to tune their behavior but here the defaults are used.  Choosing methods and “tuning the hyperparameters” is part of the challenge for a data scientist.  Although they may be simple, these examples illustrate how easy it is for engineers to begin working in data science.   Open Compose and try some of the code above for yourself.  It may well be the first step to becoming your company’s chief data scientist!   Share your experience, thoughts, or questions in the comments section below.