Custom jdbc connection with file uri uses Rapidminer program directory

MaartenK
MaartenK New Altair Community Member
edited November 2024 in Community Q&A
I am trying to make a connection to a DuckDB using the DuckDB jdbc driver. In rapidminer i choose the driver manually (duckdb_jdbc-0.1.7.jar. 
I then configure a manual URI: jdbc:duckdb:/duckdb/test_de.duckdb. This is a file URI.
Rapidminer does not take this as an absolute path but prefixes the RM program directory. When I click "test connection" I get this error:

IO Error: Cannot open file "C:\Program Files\RapidMiner\RapidMiner Studio\/duckdb/test_de.duckdb": Het systeem kan het opgegeven pad niet vinden.

How can I use this driver and pass an absolute path?

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • jwpfau
    jwpfau
    Altair Employee
    edited March 2023 Answer ✓
    The DuckDB JDBC Driver isn't in great shape, here is a hack to at least read some data from a DuckDB that gives you a "GetTypeInfo" error.

    It's not suitable for production, but might help to extract some data once.

    <?xml version="1.0" encoding="UTF-8"?><process version="10.1.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="10.1.002" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="10.1.002" expanded="true" height="68" name="Retrieve DuckDB" width="90" x="45" y="34">
            <parameter key="repository_entry" value="/Connections/DuckDB"/>
          </operator>
          <operator activated="true" class="execute_script" compatibility="10.1.002" expanded="true" height="82" name="Execute Script" width="90" x="246" y="34">
            <parameter key="script" value="import com.rapidminer.connection.ConnectionInformationContainerIOObject;&#10;import com.rapidminer.extension.jdbc.tools.jdbc.StatementCreator;&#10;import com.rapidminer.extension.jdbc.tools.jdbc.PostgreSQLUtils;&#10;&#10;ConnectionInformationContainerIOObject inputData = input[0];&#10;String jdbcURL = inputData.getConnectionInformation().getConfiguration().getValue(&quot;db_config.manual_url&quot;);&#10;&#10;//String jdbcURL = &quot;jdbc:duckdb:C:\\duckdb\\test_de.duckdb&quot;;&#10;// Reuse the jdbc data types of postgres, since they mostly fit&#10;StatementCreator.CACHE.put(jdbcURL, PostgreSQLUtils.POSTGRES_DATA_TYPES);&#10;&#10;&#10;// This line returns the first input as the first output&#10;return inputData;"/>
            <parameter key="standard_imports" value="true"/>
          </operator>
          <operator activated="true" class="jdbc_connectors:read_database" compatibility="10.1.002" expanded="true" height="82" name="Read Database" width="90" x="380" y="34">
            <parameter key="define_connection" value="repository"/>
            <parameter key="connection_entry" value=""/>
            <parameter key="database_system" value="MySQL"/>
            <parameter key="define_query" value="table name"/>
            <parameter key="query" value="SELECT * FROM titanic;"/>
            <parameter key="use_default_schema" value="true"/>
            <parameter key="table_name" value="titanic"/>
            <parameter key="prepare_statement" value="false"/>
            <enumeration key="parameters"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <connect from_op="Retrieve DuckDB" from_port="output" to_op="Execute Script" to_port="input 1"/>
          <connect from_op="Execute Script" from_port="output 1" to_op="Read Database" to_port="connection"/>
          <connect from_op="Read Database" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    

    If you need to write, make sure to set the batch size to 0, since batch is also not implemented in this driver.

    Greetings,
    Jonas

Answers

  • jwpfau
    jwpfau
    Altair Employee
    edited March 2023
    Hi Maarten,

    Since you are using Windows you would have to use a Windows style file path i.e.
    jdbc:duckdb:C:\duckdb\test_de.duckdb

    Edit: I just tried the DuckDB JDBC Driver and it's missing some required functionality:

    https://github.com/duckdb/duckdb/issues/6759

    Greetings,
    Jonas
  • MaartenK
    MaartenK New Altair Community Member
    I tried. It will give another error: GetTypeInfo with an exclamation mark. However, it also gives this error if I point to an non existing file name. But maybe the path issue is solverd and I ran into an other error.

  • jwpfau
    jwpfau
    Altair Employee
    edited March 2023 Answer ✓
    The DuckDB JDBC Driver isn't in great shape, here is a hack to at least read some data from a DuckDB that gives you a "GetTypeInfo" error.

    It's not suitable for production, but might help to extract some data once.

    <?xml version="1.0" encoding="UTF-8"?><process version="10.1.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="10.1.002" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="10.1.002" expanded="true" height="68" name="Retrieve DuckDB" width="90" x="45" y="34">
            <parameter key="repository_entry" value="/Connections/DuckDB"/>
          </operator>
          <operator activated="true" class="execute_script" compatibility="10.1.002" expanded="true" height="82" name="Execute Script" width="90" x="246" y="34">
            <parameter key="script" value="import com.rapidminer.connection.ConnectionInformationContainerIOObject;&#10;import com.rapidminer.extension.jdbc.tools.jdbc.StatementCreator;&#10;import com.rapidminer.extension.jdbc.tools.jdbc.PostgreSQLUtils;&#10;&#10;ConnectionInformationContainerIOObject inputData = input[0];&#10;String jdbcURL = inputData.getConnectionInformation().getConfiguration().getValue(&quot;db_config.manual_url&quot;);&#10;&#10;//String jdbcURL = &quot;jdbc:duckdb:C:\\duckdb\\test_de.duckdb&quot;;&#10;// Reuse the jdbc data types of postgres, since they mostly fit&#10;StatementCreator.CACHE.put(jdbcURL, PostgreSQLUtils.POSTGRES_DATA_TYPES);&#10;&#10;&#10;// This line returns the first input as the first output&#10;return inputData;"/>
            <parameter key="standard_imports" value="true"/>
          </operator>
          <operator activated="true" class="jdbc_connectors:read_database" compatibility="10.1.002" expanded="true" height="82" name="Read Database" width="90" x="380" y="34">
            <parameter key="define_connection" value="repository"/>
            <parameter key="connection_entry" value=""/>
            <parameter key="database_system" value="MySQL"/>
            <parameter key="define_query" value="table name"/>
            <parameter key="query" value="SELECT * FROM titanic;"/>
            <parameter key="use_default_schema" value="true"/>
            <parameter key="table_name" value="titanic"/>
            <parameter key="prepare_statement" value="false"/>
            <enumeration key="parameters"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <connect from_op="Retrieve DuckDB" from_port="output" to_op="Execute Script" to_port="input 1"/>
          <connect from_op="Execute Script" from_port="output 1" to_op="Read Database" to_port="connection"/>
          <connect from_op="Read Database" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    

    If you need to write, make sure to set the batch size to 0, since batch is also not implemented in this driver.

    Greetings,
    Jonas

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.