Google big query can't connect
Bautasten
New Altair Community Member
Hi,
I'm trying out RapidMiner 9.5 and need to connect to google BigQuery but having problems or it's because I don't know how, so I'm using the Google cloud services connection and have the in-database extension installed. I create the connection and test it and everything works fine but I don't see have big query tables, I guess they should be below the connection in the repository or need I do something else with the connection?
any ideas?
I'm trying out RapidMiner 9.5 and need to connect to google BigQuery but having problems or it's because I don't know how, so I'm using the Google cloud services connection and have the in-database extension installed. I create the connection and test it and everything works fine but I don't see have big query tables, I guess they should be below the connection in the repository or need I do something else with the connection?
any ideas?
1
Best Answers
-
Hi @Bautasten,
You will need an In Database Nest meta-operator with that Google Cloud Services connection, and then a Retrieve (In Database) inside it where you should be able to see and select your dataset and the tables in the dropdown parameters.
Let us know if you have any problem.
Peter
P.S: Yes, the connection list cannot display tables like the JDBC database connections on the Repository panel.1 -
@btibert,
the Access Scope(s) is a setting in the connection itself.
Now that you installed the In-Database Processing extension, you can choose “Google BigQuery Select ...” scope as well for this connection. A single connection can serve acces both to Google Storage and to BigQuery.
I hope this helps.
Peter0
Answers
-
Hi,
You need to use the connection, please see the documentation here: https://docs.rapidminer.com/latest/studio/connect/database/jdbc/
Regards,
Marco1 -
I don't find any operator that can use the Google cloud services connection to get the data from bigquery (only data storage), is it possible the use that connection or do you need to use a JDBC-connection, sry for the newbie questions...1
-
Hi,
My apologies, I misread the question. I have pinged someone with experience with those connections.
Regards,
Marco1 -
Hi @Bautasten,
You will need an In Database Nest meta-operator with that Google Cloud Services connection, and then a Retrieve (In Database) inside it where you should be able to see and select your dataset and the tables in the dropdown parameters.
Let us know if you have any problem.
Peter
P.S: Yes, the connection list cannot display tables like the JDBC database connections on the Repository panel.1 -
thx a lot that did the trick!
Just a follow up question, how do I supply my own sql-query to get biqquery-data into rapid miner?1 -
You can use the other In Database operators to filter, aggregate, etc. For custom query, you currently need to create a view in BigQuery that you will see listed in Retrieve after a refresh.
Best,
Peter1 -
OK, thx!1
-
I had this exact question, but I am still not certain that I follow the solution. Would it be possible for someone to show worked example in RM 9.9? Some of my python courses use Google Big Query, and I would love to include similar tasks in a course that I teach with RM.0
-
@btibert,
There is an extension that can be installed from the Marketplace, called In-Database Processing. (Not to be confused with InDatabase Extension, a different product.)
If you install this extension, you should see many new operators after a Studio restart.
The In Database Nest meta-operator allows you to use a Google Cloud Services connection (either connect it to the input port or select it using the parameters). You can put a Retrieve (In Database) operator inside the nest. This operator's schema name and table name parameters should list your datasets and tables, respectively (after the validation thread completes, see the progress bar in the bottom right corner).
Is there a point in this setup that fails in your case? E.g. if your datasets or tables don't show up, does the process (after connecting the outputs) still run if you manually enter valid, existing dataset / table combinations?
Please let me know.
Best,
Peter
1 -
Thanks for this. I am certain my Google Connection works because I am able to list my buckets and get files as expected. I didn't have the extension loaded, so I installed, but I am not able to list my datasets and tables. I attempted this via a direct connection using the Google source, as well as selecting it as a parameter without the connection.
When I click refresh to get the datasets and tables, the lower right basically blips, as if it completed very quickly, but there isn't any info in the drop down options.
The error that I am getting is:
Process failed: Insufficient Access Scope(s). Operator expects 'Google BigQuery Select and Create Tables', but the actual is 'Google Storage Read and Write'.
But I have the operator Retrieve (In Database) within the subprocess of the Nest.
0 -
@btibert,
the Access Scope(s) is a setting in the connection itself.
Now that you installed the In-Database Processing extension, you can choose “Google BigQuery Select ...” scope as well for this connection. A single connection can serve acces both to Google Storage and to BigQuery.
I hope this helps.
Peter0 -
Thanks, I am on a different machine, and it worked as expected. What is interesting is that the existing Google Auth connection on this machine required that I re-auth; that didn't happen on my other Macbook. Not implying anything, but I will report back tomorrow on the differences and/or if it's as simple as creating a new connection. I suspect that is the case given flow and two inputs as you noted above.
Will log a new ticket/case/query if there is something about the other machine. Thanks for the guidance.1 -
Hello Experts,
I am new to Rapidminer and facing similar kind of issue and looking for some help to connect to Bigquery tables.
I created new connection in RM following steps suggested in 'Using Google Cloud Service connector' article , and test connection was also successful, however, I am not able to find either dataset or any tables or views in Rapidminer. Please help with your valuable suggestions.
Update:: This is resolved under different thread. Thank you for your support.0