This document provides a practical checklist and best practices for configuring Graph Data Interface (GDI) connections in Graph Studio. It covers driver setup, network access, credentials, and validation steps to ensure reliable connectivity between Graph Lakehouse and external databases.
Terminology Alignment
Please note that product naming has evolved:
Version / Timeline | Graph Engine (Database Layer) | User Interface (UI Layer) |
---|
Before v2025 | AnzoGraph (MPP graph engine) | Anzo (Graph Studio interface) |
From v2025 onwards | Graph Lakehouse (new name for AnzoGraph) | Graph Studio (continues as UI, replaces “Anzo”) |
_____________________________________________________________________________________
Contents
- Developer Checklist
- Supported JDBC Drivers and Encoding Considerations
- Network Accessibility
- Database Credentials and Permissions:
______________________________________________________________________________________
Developer Checklist
- Install the correct JDBC driver on the Lakehouse node and set permissions.
- Note: Steps apply to static clusters (K8s jars don’t persist).
- Verify firewall/DNS/network access to the database.
- Test connectivity from the Lakehouse node with a SQL client.
____________________________________________________________________________________
Introduction
The Graph Data Interface (GDI) is a tool for bringing data from external sources into our knowledge graph.
It connects to many types of data, including databases, HTTP/REST endpoints, and various file formats like CSV, JSON, XML, and Parquet.
GDI also connects with file systems like Amazon S3 and applications such as Elasticsearch and Kafka.
This article offers best practices and troubleshooting tips to help you successfully connect Graph Studio to your external relational databases.
________________________________________________________________________________________
Supported JDBC Drivers and Encoding Considerations
Before attempting to establish a connection, ensure the following requirements are met. Addressing these proactively can prevent common initial setup errors.
Supported JDBC Drivers:
- Driver Compatibility: Always use a JDBC driver version recommended by your database vendor that is also compatible with your data integration platform. Incompatible drivers are a frequent cause of connection failures.
- Best Practice: Always use the JDBC driver version recommended by your database vendor and ensure it is compatible with your Graph Studio/AnzoGraph version.
- For platforms like Databricks, Snowflake, Postgres, and Oracle, the correct JDBC driver and a valid connection string are essential for GDI to establish a connection.
- While GDI does not use OData connectors, it can be configured to read from API or OData endpoints if they are available as HTTP/s sources. This is done using GDI's federated SPARQL queries.
- It's important to note that using APIs for large-scale data ingestion and transformation may introduce performance challenges and potential connection timeouts.
Altair Graph Studio ships with some popular database drivers. However, it is the customer's responsibility to obtain and install any other necessary JDBC drivers. This is a crucial prerequisite for establishing a successful connection.
Character Encoding
- The GDI is designed to automatically pick up character encoding from metadata. This works as long as the data itself is not malformed. When issues arise, such as "????" appearing in fields, it's a sign that the automatic detection has failed.
- Best Practice: If we encounter encoding issues, we should first check the source data for integrity. We should then ensure that character encoding settings are consistent across our database and the GDI connection string parameters. This can be a critical step when the automatic detection process is unsuccessful.
________________________________________________________________________________________
Network Accessibility
Verify that your Altair Graph Studio server has unrestricted network access (including IP routing, firewall rules, and security group configurations) to the target database server's hostname/IP and port.
Common Symptoms of Network Issues:
- Connection attempts fail with messages like "Connection refused" or "No route to host".
- The data integration environment or its services become "not accessible".
- Production URLs or endpoints are "not working".
- Intermittent disconnections occur between the data integration platform and the database.
Initial Diagnostic Steps (from the data integration server):
- We can use
ping <database_hostname_or_IP>
to check basic network reachability. This is a common and effective initial check.
Proxy Configuration:
- Proxies can be configured at the operating system level to manage all network traffic. This is an administrative task that manages traffic for the entire system.
- For a more specific approach, especially with HTTP sources, we can configure a proxy directly within our GDI queries. The GDI supports an
s:proxy
property to specify proxy information. The value can be a string, such as s:proxy "host_url:port_number"
, or an RDF list.
Best Practices
Collaborate with your network administrator to confirm the necessary ports are open bi-directionally between Graph Studio and your database.
- Firewall Configuration: Collaborate closely with your network administrator to confirm that all necessary ports are open bi-directionally between your data integration platform's server and your database server. This is a very common oversight.
- Security Group Rules: In cloud environments (e.g., AWS, Azure), ensure that security group rules allow traffic on the specific database port.
- Route Verification: Confirm that there is a valid network route from your data integration platform's server to the database host.
________________________________________________________________________________________
Database Credentials and Permissions
- Ensure you have a valid database username and password with sufficient privileges to connect to the database and read data from the required schemas/tables.
- Authentication Methods: For databases that support it, you might use username-only authentication in conjunction with private key files, where the private key is managed securely on the server.
- Ensure the data integration platform correctly handles such authentication configurations, as inconsistent handling of missing passwords can occur.
Best Practices:
- Adhere to the principle of least privilege; provide only the necessary read permissions for the GDI user.
- If your organization has specific database connectivity guidelines , ensure your connection configuration adheres to these. Adhering to such guidelines often resolves issues before needing deeper troubleshooting.
- Database Service Status:
- Confirm that the target database service is running and actively listening on the specified port.
________________________________________________________________________________________
Further Reading
https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/gdi-reqs.htm
https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/gdi-query.htm
https://docs.cambridgesemantics.com/anzo/v5.4/userdoc/gdi-generator.htm