This document walks through the steps to configure a relational data source in Graph Studio using the Graph Data Interface (GDI).
It covers creating the source, setting connection parameters, validating connectivity, defining schemas, and preparing data for ingestion into the knowledge graph.
Contents
- Developer Checklist
- Navigating to Create a New Data Source
- Configuring Essential Connection Parameters
- Testing the Connection
- Defining the Schema
- Next Steps
_______________________________________________________________________________________
Developer Checklist
- Create a New Data Source
- Configuring Essential Connection Parameters
- Test Connection
- Define the Schema
________________________________________________________________________________________
Navigating to Create a New Data Source
The first step is to access the data source management area within Graph Studio. This is the starting point for connecting to any new data source.
- In the Graph Studio application, expand the Onboard menu and click Structured Data. The Data Sources screen appears.
- Click the Add Data Source button.
- Select Database as the data source type, then choose a specific database type (e.g., Databricks, PostgreSQL, Oracle, Snowflake).
The GDI supports connecting to a wide range of databases via JDBC drivers.
________________________________________________________________________________________
Configuring Essential Connection Parameters
This is where we provide the specific details for Graph Studio to establish a connection. The fields presented will be similar across database types, although settings may vary slightly based on the JDBC driver.
Field | Example | Notes / Best Practices |
---|
Title | Postgres_SalesDB
| Display name in Graph Studio. |
Description | Sales database for analytics
| Optional. |
User | analytics_user
| Avoid using admin/root accounts. |
Password | ********
| Store in Query Context where possible. |
Server | salesdb.company.com
| Hostname or IP. |
Database | sales_db
| Optional for some JDBC drivers. |
It is a security best practice to reference connection information such as the URL, username, and password from a Query Context to abstract these sensitive details from the queries themselves.
Cluster Types: These prerequisites and driver steps apply to static Lakehouse clusters.
For Kubernetes-based dynamic deployments, JDBC jars added manually do not persist across pod restarts. In such environments, additional automation or engineering support is required for persistent driver deployment.
________________________________________________________________________________________
Testing the Connection
After configuring the parameters, it is a crucial step to test the connection. This verifies that the credentials and network configuration are correct.
On the data source's Overview tab, you may click the Test Connection button.
The Test Connection button validates only the connection between Graph Studio and the database.It does not validate the Lakehouse (AGS/AnzoGraph engine) connectivity via JDBC.
Recommended Approach: Always perform a manual connectivity test from the Lakehouse (AGL) node using a SQL client such as DBVisualizer
, psql
, sqlcmd
, or isql
.
This ensures the machine where GDI runs can directly reach the upstream database with the provided JDBC URL and credentials.
________________________________________________________________________________________
Defining the Schema
Once the connection is successful, we can define the schema. The schema specifies what source data will be onboarded.
We have a few options for defining a database schema.
- Import Predefined Schema: This is the most common option. We can import tables and/or schemas that already exist in the database.
- Create a Schema from an SQL Query: We can write a custom SQL query to define the data for a new schema. This query can include any functionality the source database supports.
- Best Practice: When writing SQL queries, use single quotes (
'
) around values. Using double quotes ("
) can result in an error.
Each schema we define creates one or more tables for ingestion. We can import or create up to 5 schemas per database data source. To include more, we must create another data source.
______________________________________________________________________________________
Next Steps
After successfully defining our schema, we can proceed with a few next steps to prepare the data for ingestion and analysis:
- Assign Primary and Foreign Keys: We can edit a schema to assign primary and foreign keys if they are not already defined in the source. This is crucial for creating relationships in the knowledge graph.
- Use the Automated Workflow: We can use the automated direct data load workflow to create a Graphmart from our new data source, which automatically generates data layers and steps.
- Validate Connectivity End-to-End: Before building data layers, confirm that the Lakehouse node can run a GDI query successfully against the source. This query-based test ensures full path validation (driver + credentials + network).
________________________________________________________________________________________
Further Reading
https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/gdi-reqs.htm
https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/gdi-intro.htm