This document builds on the basics of GDI queries covered in Part 1. It focuses on optimizing query performance, leveraging ontology awareness for smarter filtering, and providing practical examples that demonstrate advanced GDI properties in action.
Contents
- Leveraging the Ontology for Query Optimization
- Further Examples
- Example 1: Using
s:partitionBy
for Large Datasets - Example 2: Using
s:concurrency
for Parallel Data Loads - Example 3: Locale-Specific Parsing with
s:locale
- Example 4: Model and Key for Entity Mapping (
s:model
, s:key
) - Example 5: Data Normalization with
s:normalize
- Example 6: Using
s:formats
for Data Type Formatting - Example 7: Reference and Foreign Key Joins
- References
_____________________________________________________________________________________
Leveraging the Ontology for Query Optimization
Ontology awareness helps us leverage the GDI’s ability to push down filters to the source.
Here are the steps to explore the Ontology Structure .
- From the main menu, navigate to the Model section.
- In the Model section, click the filter icon.
- In the filter options, select the checkbox to show only system data.
- In the filtered list of ontologies, we will see the
DataToolkit
ontology.
- Click on the
DataToolkit
ontology to inspect the classes, properties, and inferred relationships
- Expanding the relevant options on the left, helps us understand the structure and supported parameters in our ontology.
________________________________________________________________________________________
Further Examples
Example 1 . Using s:partitionBy
for Large Datasets
Suppose you want to extract the public.Movies
, but your table is very large and you want to avoid timeouts and memory issues.
PREFIX movies: <http://altair.com/DataSource/your-datasource-id/Top200Movies/public/>
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#>
INSERT {
GRAPH ${targetGraph} {
?movie a movies:Movie ;
movies:title ?Title ;
movies:year ?Year .
}
}
WHERE {
SERVICE <http://cambridgesemantics.com/services/DataToolkit> {
?generator a s:RdfGenerator, s:OntologyGenerator ;
s:as (?subject ?predicate ?object) ;
s:base ${targetGraph} .
?movie a s:DbSource ;
s:url "{{db.movies.url}}" ;
s:username "{{db.movies.user}}" ;
s:password "{{db.movies.password}}" ;
s:database "{{db.movies.database}}" ;
s:table "public.Top200Movies" ;
s:model "Top200Movies" ;
s:partitionBy ("Year")) ; # Partition extraction by Year
s:query '''
SELECT MovieID, Title, Year
FROM public.Movies
ORDER BY IMDB_Rating DESC
''' .
}
}
Why use these?
s:partitionBy
splits the extraction into separate partitions based on the values in the specified column (e.g., Year
).
Example 2 . Using s:concurrency
for Parallel Data Loads
To speed up ingestion by running multiple queries in parallel:
...
s:concurrency 4 ;
...
To speed up ingestion from very large tables, we can run multiple queries in parallel. This is especially useful when our database can handle multiple concurrent connections. For this to work effectively, we should always pair s:concurrency
with s:partitionBy
.
s:concurrency
: This property sets the maximum number of parallel cores (also referred to as slices in the documentation) that GDI can use to execute our query.
Example 3 . Locale-Specific Parsing with s:locale
If your data contains locale-specific formats (e.g., dates, numbers):
...
s:locale "en_US" ; # Use US English locale for parsing
...
This ensures that numbers and dates are interpreted correctly.
Example 4. Model and Key for Entity Mapping
If you want to specify a model and a unique key for entity resolution:
...
s:model "Movies" ;
s:key ("MovieID") ;
...
This helps GDI generate consistent URIs for each movie entity.
These two properties are crucial for building a structured knowledge graph from a flat data source. They allow us to define how our relational data maps to a rich, semantic model.
The s:model
Property: Defining Our Data's Class
The s:model
property is used to define the class for the data we're ingesting.
- What it does: It maps a database table or a query to a specific RDF class in our ontology. For example, if we're ingesting the
Top200Movies
table, setting s:model "Movie"
tells GDI to create a Movie
class and generate triples that declare each movie as an instance of that class. - Why it's important: This provides semantic meaning and structure to our data. Without a model, our data would be just a collection of triples. With a model, we have a formal schema that allows us to reason about and query our data more effectively.
The s:key
Property: Creating Unique Identities
The s:key
property is a universal property that defines a primary key for our source table. It plays a critical role in creating a connected knowledge graph.
- What it does: It tells GDI to use the value in the specified column to generate a unique URI for each row of data. This ensures that every row becomes a distinct entity in our graph that can be linked to other resources.
- Why it's important: Without an
s:key
, GDI cannot guarantee unique identities for our data, making it difficult to link resources together and build a coherent graph. We're essentially giving GDI the primary key to create a unique URI for each movie.
Example 5. Example: Data Normalization with s:normalize
This lets us control how names (labels and URIs) are created for classes and properties when we bring data into Graph Studio from different sources.
This means we can set rules to make sure the names in our data model are consistent, clean, and follow preferred style or standards
...
s:normalize true ;
...
Or specify rules:
...
s:normalize [ s:trim true ; s:lowercase true ] ;
...
Example 6. Using s:formats
for Data Type Formatting
If you want to control how certain datatypes are formatted:
...
s:formats [
s:dateFormat "yyyy-MM-dd" ;
s:numberFormat "#,##0.##" ;
s:delimiter "," ;
s:headerRow true
] ;
...
s:formats
block is especially important when your data uses non-default formats or when you want to avoid parsing errors during integration
Example 7. Reference and Foreign Key Joins
Suppose you want to join the Movies table with a related Directors table to enrich each movie with director details:
...
s:reference [
s:model "Directors" ;
s:using ("Director")
] ;
...
This tells GDI how to resolve relationships between the Movies
table and the Directors
table, using the Director field in Movies
as the join key.
_____________________________________________________________________________________
Further Reading