Some Thoughts on Identifying CAD Parts with ShapeAI
Teaching an AI to identify CAD parts is possible. Success depends on a clear data strategy.
“You’re gonna need a bigger boat” is a memorable line from the movie Jaws, delivered by police chief Brody upon seeing the eponymous shark for the first time. This scene has come to my mind during several recent projects, but my mind has playfully changed the words to “You’re gonna need a bigger dataset”. These projects all had the same goal: to use Altair’s shapeAI technology to teach a machine learning model to classify geometric shapes.
My desire for bigger datasets arose from two concerns. The first was simply the volume of data. The second was regarding data diversity, and specifically the difficulties associated with imbalanced classification. I’ve found that most people find the challenges associated with lack-of-data to be self-evident, but I’ve also observed that challenges of class imbalance to less intuitive. General numerical techniques to handle imbalance are a topic for another day, but the specific application of geometric part classification allows some mitigation of the problem by addressing the dataset itself.
To explain more fully, imagine a hypothetical task to identify all the bolts within a CAD assembly. To begin your training, all you have are 99 examples of a CAD bolt. If you did try to train with this data, all the model could ever learn is that every geometry is a bolt. Clearly, it is not sufficient to have only examples of the target class. The model must learn to discriminate with examples that are not a bolt. The above example is extreme in order to explain the lesson, but it does beg the question “How much non-bolt data is required?”. Adding one non-bolt datum is unlikely to be enough as the model can predict “bolt” every time and still maintain 99% accuracy on the training dataset. Conversely, adding too many non-bolt examples will turn the bolts themselves into an extreme minority, swinging the imbalance pendulum.
Realistically, the appropriate solution is a combination of numerical methods and problem scope. Cutting edge technologies like shapeAI will need to automatically detect and treat class imbalance, but additionally, datasets should be maintained for specific tasks. For example, altering the scope of the previous task from “finding all the bolts within a CAD assembly” to “finding all the bolts within an automotive CAD assembly” makes the task much easier to solve. In this newly defined scope, the dataset only needs to be augmented with geometric shapes found in cars, which is much more manageable.
Altair continues to develop shapeAI to meet the challenges of tomorrow’s CAE world, saving time and money by improved automation. It’s never too early to align your organizational data to solve your biggest CAE modeling bottlenecks.