Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Querying Using GDI - Part 1

This discussion introduces the basics of GDI queries - how to connect to a database, define generators, and follow best practices for secure and efficient data onboarding.

Developer Checklist
Understanding the Dataset
GDI Query Structure
Best Practices for Developers
Troubleshooting Tips
References

________________________________________________________________________________________

Developer Checklist

Connect to the database: Define s:DbSource with s:url, s:username, and s:password.
Select data: Use s:table for full ingestion or s:query for targeted extraction.
Generate triples + ontology: Always include both s:RdfGenerator and s:OntologyGenerator.
Scale for performance: Use s:partitionBy (and optionally s:concurrency) for large tables.
Secure credentials: Store sensitive info in Query Contexts, not in queries.

______________________________________________________________________________________

Understanding the Dataset

Suppose our database contains a table called public.Movies with fields:

MovieID , Title ,Year, Genre ,Director and IMDB_Rating

Our goal is to extract this data and transform it into RDF triples for use in the knowledge graph.

______________________________________________________________________________________

GDI Query Structure

1. Define the Data Source and Generator

A GDI query starts with s:DbSource, which specifies the database connection (JDBC URL, user, password, table/query, etc.).

To transform relational data into a knowledge graph, you must include a generator line with both s:RdfGenerator and s:OntologyGenerator:

s:RdfGenerator – Converts each row and column from the source into RDF triples (subject–predicate–object). This is how raw data becomes graph entities.
s:OntologyGenerator – Builds the semantic model for the data by creating classes for tables (e.g., Movie) and properties for fields (e.g., title, year).

Together, these generators automate both the data ingestion and the schema creation, eliminating the need to manually code mappings for every table.

2. Example GDI Transformation Query

Below is a sample GDI transformation query for onboarding data from a public.Movies table.

PREFIX public: <http://altair.com/DataSource/your-datasource-id/Top200Movies/public/>

PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#>


INSERT {

  GRAPH ${targetGraph} {

    ?subject ?predicate ?object .

  }

}

WHERE {

  SERVICE <http://cambridgesemantics.com/services/DataToolkit> {

    ?generator a s:RdfGenerator, s:OntologyGenerator ;

      s:as (?subject ?predicate ?object) ;

      s:base ${targetGraph} .


    public:Movies a s:DbSource ;

      s:url "{{db.your-datasource-id.url}}" ;

      s:username "{{db.your-datasource-id.user}}" ;

      s:password "{{db.your-datasource-id.password}}" ;

      s:database "{{db.your-datasource-id.database}}" ;

      s:table "public.Movies" ;

      s:model "Movies"

     .

  }

}

Key Elements

?generator a s:RdfGenerator, s:OntologyGenerator ; s:as (?subject ?predicate ?object) ; s:base ${targetGraph} .This line is required to invoke the GDI RDF and Ontology Generators, which automatically generate the graph and ontology for your data source.
s:DbSource:
Defines the database connection and extraction details, using context variables for sensitive information.
s:query:
Custom SQL query for data extraction.

_______________________________________________________________________________________

Best Practices for Developers

a. Start Simple, Then Expand

Begin with a basic query to ensure connectivity and correct data extraction. For example, start by selecting just a few columns or limiting the number of rows.

b. Use Query Contexts

Never hardcode credentials. Always use context variables for URLs, usernames, and passwords for security and reusability.

Query contexts are a core feature for managing sensitive information like credentials and connection details. They store these as reusable key-value pairs, which we can reference in our GDI queries to avoid hardcoding secrets. This practice makes our queries more secure, portable, and easier to maintain.

For more information please refer: https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/context.htm?Highlight=query%20context

c. Optimize for Performance

To effectively handle large-scale data ingestion, especially with tables that have millions of rows, we should optimize our queries to leverage GDI's parallel processing capabilities.

Primary Optimization: Use s:partitionBy: s:partitionBy is the most effective method for speeding up ingestion from large tables.
- When we use this property, GDI automatically splits a single large query into multiple smaller queries ("buckets") based on a specified column (like a primary key).
- This prevents the source database from being overwhelmed and allows Graph Studio to process data in parallel, which is much more efficient.
Secondary Tuning (s:limit, s:offset): For development, testing, and pagination, we can still use s:limit to restrict the number of rows.
Thes:offset property is used in combination with s:limit to page through results.

For more information please refer: https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/gdi-properties.htm#DbSource

d. Advanced: Filtering and Aggregation

Suppose you want only movies with an IMDB rating above 8.0:

s:query '''

  SELECT MovieID, Title, Year, Genre, Director, IMDB_Rating

  FROM public.Movies

  WHERE IMDB_Rating > 8.0

'''

Or, to get the highest-rated movie per year:

s:query '''

  SELECT Year, MAX(IMDB_Rating) as MaxRating

  FROM public.Movies

  GROUP BY Year

'''

_____________________________________________________________________________________

Troubleshooting Tips

Connection errors: Check JDBC URL, Query Context variables, and firewall/DNS access.
Query errors: Validate SQL directly in the source database first.
Performance issues: Apply s:partitionBy, s:limit, and s:concurrency.
Mapping issues: Confirm RDF mapping aligns with column names and types.

________________________________________________________________________________________

Next Steps

Please check Part 2 for extended examples, advanced optimization techniques, and guidance on leveraging ontology for query optimization.

Find more posts tagged with

Graph Studio

Auto Tagged

Comments

There are no comments yet