Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Connecting Graph Studio to External Databases

This document walks through the steps to configure a relational data source in Graph Studio using the Graph Data Interface (GDI).

It covers creating the source, setting connection parameters, validating connectivity, defining schemas, and preparing data for ingestion into the knowledge graph.

Developer Checklist
Navigating to Create a New Data Source
Configuring Essential Connection Parameters
Testing the Connection
Defining the Schema
Next Steps

_______________________________________________________________________________________

Developer Checklist

Create a New Data Source
Configuring Essential Connection Parameters
Test Connection
Define the Schema

________________________________________________________________________________________

Navigating to Create a New Data Source

The first step is to access the data source management area within Graph Studio. This is the starting point for connecting to any new data source.

In the Graph Studio application, expand the Onboard menu and click Structured Data. The Data Sources screen appears.
Click the Add Data Source button.
Select Database as the data source type, then choose a specific database type (e.g., Databricks, PostgreSQL, Oracle, Snowflake).

The GDI supports connecting to a wide range of databases via JDBC drivers.

For a detailed list of officially supported databases, refer to the

https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/gdi-intro.htm .

________________________________________________________________________________________

Configuring Essential Connection Parameters

This is where we provide the specific details for Graph Studio to establish a connection. The fields presented will be similar across database types, although settings may vary slightly based on the JDBC driver.

Field	Example	Notes / Best Practices
Title	`Postgres_SalesDB`	Display name in Graph Studio.
Description	`Sales database for analytics`	Optional.
User	`analytics_user`	Avoid using admin/root accounts.
Password	`********`	Store in Query Context where possible.
Server	`salesdb.company.com`	Hostname or IP.
Database	`sales_db`	Optional for some JDBC drivers.

It is a security best practice to reference connection information such as the URL, username, and password from a Query Context to abstract these sensitive details from the queries themselves.

Cluster Types: These prerequisites and driver steps apply to static Lakehouse clusters.

For Kubernetes-based dynamic deployments, JDBC jars added manually do not persist across pod restarts. In such environments, additional automation or engineering support is required for persistent driver deployment.

________________________________________________________________________________________

Testing the Connection

After configuring the parameters, it is a crucial step to test the connection. This verifies that the credentials and network configuration are correct.

On the data source's Overview tab, you may click the Test Connection button.

The Test Connection button validates only the connection between Graph Studio and the database.It does not validate the Lakehouse (AGS/AnzoGraph engine) connectivity via JDBC.

Recommended Approach: Always perform a manual connectivity test from the Lakehouse (AGL) node using a SQL client such as DBVisualizer, psql, sqlcmd, or isql.

This ensures the machine where GDI runs can directly reach the upstream database with the provided JDBC URL and credentials.

________________________________________________________________________________________

Defining the Schema

Once the connection is successful, we can define the schema. The schema specifies what source data will be onboarded.

We have a few options for defining a database schema.

Import Predefined Schema: This is the most common option. We can import tables and/or schemas that already exist in the database.
Create a Schema from an SQL Query: We can write a custom SQL query to define the data for a new schema. This query can include any functionality the source database supports.
Best Practice: When writing SQL queries, use single quotes (') around values. Using double quotes (") can result in an error.

Each schema we define creates one or more tables for ingestion. We can import or create up to 5 schemas per database data source. To include more, we must create another data source.

______________________________________________________________________________________

Next Steps

After successfully defining our schema, we can proceed with a few next steps to prepare the data for ingestion and analysis:

Assign Primary and Foreign Keys: We can edit a schema to assign primary and foreign keys if they are not already defined in the source. This is crucial for creating relationships in the knowledge graph.
Use the Automated Workflow: We can use the automated direct data load workflow to create a Graphmart from our new data source, which automatically generates data layers and steps.
Validate Connectivity End-to-End: Before building data layers, confirm that the Lakehouse node can run a GDI query successfully against the source. This query-based test ensures full path validation (driver + credentials + network).

________________________________________________________________________________________

Find more posts tagged with

Graph Studio

Auto Tagged

Comments

There are no comments yet