Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Querying using GDI - Part 2

This document builds on the basics of GDI queries covered in Part 1. It focuses on optimizing query performance, leveraging ontology awareness for smarter filtering, and providing practical examples that demonstrate advanced GDI properties in action.

Leveraging the Ontology for Query Optimization
Further Examples
- Example 1: Using s:partitionBy for Large Datasets
- Example 2: Using s:concurrency for Parallel Data Loads
- Example 3: Locale-Specific Parsing with s:locale
- Example 4: Model and Key for Entity Mapping (s:model, s:key)
- Example 5: Data Normalization with s:normalize
- Example 6: Using s:formats for Data Type Formatting
- Example 7: Reference and Foreign Key Joins
References

_____________________________________________________________________________________

Leveraging the Ontology for Query Optimization

Ontology awareness helps us leverage the GDI’s ability to push down filters to the source.

Here are the steps to explore the Ontology Structure .

From the main menu, navigate to the Model section.
In the Model section, click the filter icon.
In the filter options, select the checkbox to show only system data.
In the filtered list of ontologies, we will see the DataToolkit ontology.
Click on the DataToolkit ontology to inspect the classes, properties, and inferred relationships
Expanding the relevant options on the left, helps us understand the structure and supported parameters in our ontology.

________________________________________________________________________________________

Further Examples

Example 1 . Using `s:partitionBy` for Large Datasets

Suppose you want to extract the public.Movies, but your table is very large and you want to avoid timeouts and memory issues.

PREFIX movies: <http://altair.com/DataSource/your-datasource-id/Top200Movies/public/>

PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#>


INSERT {

  GRAPH ${targetGraph} {

    ?movie a movies:Movie ;

      movies:title ?Title ;

      movies:year ?Year .

  }

}

WHERE {

  SERVICE <http://cambridgesemantics.com/services/DataToolkit> {

  ?generator a s:RdfGenerator, s:OntologyGenerator ;

      s:as (?subject ?predicate ?object) ;

      s:base ${targetGraph} .

      ?movie a s:DbSource ;

      s:url "{{db.movies.url}}" ;

      s:username "{{db.movies.user}}" ;

      s:password "{{db.movies.password}}" ;

      s:database "{{db.movies.database}}" ;

      s:table "public.Top200Movies" ;

      s:model "Top200Movies" ;

      s:partitionBy ("Year")) ; # Partition extraction by Year


      s:query '''

        SELECT MovieID, Title, Year

        FROM public.Movies

        ORDER BY IMDB_Rating DESC


      ''' .

  }

}

Why use these?

s:partitionBy splits the extraction into separate partitions based on the values in the specified column (e.g., Year).

Example 2 . Using `s:concurrency` for Parallel Data Loads

To speed up ingestion by running multiple queries in parallel:

...

    s:concurrency 4 ;  

...

To speed up ingestion from very large tables, we can run multiple queries in parallel. This is especially useful when our database can handle multiple concurrent connections. For this to work effectively, we should always pair s:concurrency with s:partitionBy.

s:concurrency: This property sets the maximum number of parallel cores (also referred to as slices in the documentation) that GDI can use to execute our query.

Example 3 . Locale-Specific Parsing with `s:locale`

If your data contains locale-specific formats (e.g., dates, numbers):

...

    s:locale "en_US" ;  # Use US English locale for parsing

...

This ensures that numbers and dates are interpreted correctly.

Example 4. Model and Key for Entity Mapping

If you want to specify a model and a unique key for entity resolution:

...

    s:model "Movies" ;

    s:key ("MovieID") ;

...

This helps GDI generate consistent URIs for each movie entity.

These two properties are crucial for building a structured knowledge graph from a flat data source. They allow us to define how our relational data maps to a rich, semantic model.

The `s:model` Property: Defining Our Data's Class

The s:model property is used to define the class for the data we're ingesting.

What it does: It maps a database table or a query to a specific RDF class in our ontology. For example, if we're ingesting the Top200Movies table, setting s:model "Movie" tells GDI to create a Movie class and generate triples that declare each movie as an instance of that class.
Why it's important: This provides semantic meaning and structure to our data. Without a model, our data would be just a collection of triples. With a model, we have a formal schema that allows us to reason about and query our data more effectively.

The `s:key` Property: Creating Unique Identities

The s:key property is a universal property that defines a primary key for our source table. It plays a critical role in creating a connected knowledge graph.

What it does: It tells GDI to use the value in the specified column to generate a unique URI for each row of data. This ensures that every row becomes a distinct entity in our graph that can be linked to other resources.
Why it's important: Without an s:key, GDI cannot guarantee unique identities for our data, making it difficult to link resources together and build a coherent graph. We're essentially giving GDI the primary key to create a unique URI for each movie.

For more information please refer: https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/gdi-key.htm?Highlight=%22s%3Akey%22

Example 5. Example: Data Normalization with `s:normalize`

This lets us control how names (labels and URIs) are created for classes and properties when we bring data into Graph Studio from different sources.

This means we can set rules to make sure the names in our data model are consistent, clean, and follow preferred style or standards

...

    s:normalize true ;

...

Or specify rules:

...

    s:normalize [ s:trim true ; s:lowercase true ] ;

...

For more information please refer: https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/gdi-normalize.htm

Example 6. Using `s:formats` for Data Type Formatting

If you want to control how certain datatypes are formatted:

...

    s:formats [

        s:dateFormat "yyyy-MM-dd" ;

        s:numberFormat "#,##0.##" ;

        s:delimiter "," ;

        s:headerRow true

    ] ;

...

s:formats block is especially important when your data uses non-default formats or when you want to avoid parsing errors during integration

For more information please refer: https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/gdi-formats.htm

Example 7. Reference and Foreign Key Joins

Suppose you want to join the Movies table with a related Directors table to enrich each movie with director details:

...
    s:reference [
      s:model "Directors" ;
      s:using ("Director")
    ] ;
...

This tells GDI how to resolve relationships between the Movies table and the Directors table, using the Director field in Movies as the join key.

_____________________________________________________________________________________

Find more posts tagged with

Graph Studio

Auto Tagged

Comments

There are no comments yet