Into the weeds: How we represent geometry in machine learning

Eamon Whalen · August 2022

We talk a lot about shape-based machine learning, from automatically identifying parts to predicting their engineering performance to detecting geometric anomalies. What we don't often discuss is how these ML models are able to "see" the geometry in the first place. Today I'd like to dive into the weeds of the most important decision in any shape-based ML project: geometry representation.

Applying machine learning to 3D shapes requires a geometry representation that’s compatible with machine learning methods. Luckily, the various geometry representations and ML methods have grown substantially over the past decade. This article will briefly review the most popular options to date.

Below are examples of some of the most common geometry representations in machine learning, applied to a bracket from the SimJEB dataset. This diagram was inspired by this excellent survey from Ahmed et al.

First off, the various geometry representations available for machine learning can broadly be characterized by their structure:

Euclidian representations have a regular grid-like structure: think cells in a table or pixels in an image.

Non-Euclidian representations lack this regular structure and include representations like point clouds, graphs and meshes. Deep learning algorithms have come a long way in the past five years to support non-Euclidian data.

With that in mind, here's a description of each representation along with their pros and cons:

Shape Descriptors

Shape descriptors are fixed mathematical functions that convert geometry into a fixed-length vector, after which classical machine learning methods can be applied. An advantage of shape descriptors is that they do not require deep learning, but a disadvantage is that they are fixed, meaning feature extraction can’t be tuned to different applications.

Multiview Images

There are a variety of methods for describing geometry as one or more images, including taking images from multiple views, or of multiple cross sections, like a CT scan. An advantage of multiview images is that deep learning methods for images are well understood, but a disadvantage is that the required rasterization leads to jagged edges and partial occlusion of the underlying shape. It can also be tricky to map predictions back onto the original shape.

Voxels

Voxels are a 3D grid of binary values which indicate whether or not the shape occupies that voxel or not. Voxels solve the occlusion problem of multiview images but still suffer from rasterization error. They also have large memory requirements because they encode the space around the shape in addition to the shape itself.

Signed Distance Function

Signed distance functions (SDFs) represent a shape implicitly as a continuous scalar field, encoding the distance to the closest point on a shape. Points outside of the shape have positive values, points inside have negative values, and points with a value of zero lie on the part's surface. Some advantages of SDFs include infinite resolution and the ability to express complex topologies, but a disadvantage is that they can require a large number of sampled points to train.

Point Clouds

Point clouds are collections of x,y,z coordinates sampled on the surface of the shape. Point clouds are memory efficient because they only represent the shape’s boundary. They can also be applied to point clouds produced from lidar scanners. A disadvantage of point clouds is that they don’t contain any information regarding the topology of shape and can therefore struggle to capture small geometric features.

Graphs/meshes

Two of the most common geometry representations in engineering are polygonal meshes and parametric CAD. Both of these can be abstracted as a graph and leverage graph-based machine learning algorithms. Graphs solve the topological ambiguity problem with point clouds since they also encode connectivity, but learning on graphs is a relatively newer technology compared to domains like images and comes with its own set of challenges.

Final thoughts

This is by no means a comprehensive list. It is an exciting time to work with geometric machine learning because new ideas are constantly being proposed. Which representation do you think will win the race? Is there just one that is superior, or do you think it will be application-dependent? I’d like to hear your thoughts.