Creating database connections for RM Studio in AWS Cloud

MichaelWall
MichaelWall New Altair Community Member
edited November 5 in Community Q&A

Hi, we've just started running RM Studio in AWS cloud (EC2 Windows instance) but are having problems creating Orcale databse connections to both our on-premise datawarehouse and datawarehouse running in same virtual private clound (RDS Unix instance). Anybody come across this before? Any ideas on how to get around it are much appreciated? Thanks Mike

Best Answer

  • MichaelWall
    MichaelWall New Altair Community Member
    Answer ✓

    Just to close this off, I finally resolved this with some help from the Rapidminer Tech guys. There were two issues:

     

    1) The default JDBC URL string for Oracle needed some tweaking to add extra // after the @ and also to change the schema separator to /. I now have a custom Oracle database connection type set up.

    2) By default Rapidminer tries to route the connection through a proxy, but all my connections are inside a a corporate network so no proxy required. The various database instances are open to each other, so by being helpful and looking for a proxy Rapidminer was actually stopping the connections from being made.  In preferences the proxy needs to be set to Direct (no proxy).

     

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    This becomes tricky for sure if you're running AWS and want to connect back to a DB on your local network. I know with Radoop you have to create all these SOCKS proxies and tunnels to connect to AWS and I wonder if that's the same way. Maybe @phellinger can shed some light on this. 

  • MichaelWall
    MichaelWall New Altair Community Member

    Thanks, I suspected it might not be straight forward. We're on a bit of a learning curve with both AWS and RM!

  • ZSzatmari
    ZSzatmari New Altair Community Member

    Connecting back to the on-premise Oracle datawarehouse requires network access and access rights for the database user. Probably it is not allowed to connect to the datawarehouse from outside, there can be some firewall or such protection. If the IP and the port is open, then you can check if the database user has privileges to connect from outside.

     

    Connecting to an RDS instance requires proper security group settings on AWS. In the most simple cases you should add the EC2 security group of the RM instance to the RDS security group in order to grant access.

  • MichaelWall
    MichaelWall New Altair Community Member

    Thanks for the replies. I did manage to sort out the security group issues, so now the EC2 intance does have connectivity to both our Oracle database environments. I installed SQL Developer and can run queries against both environments.

     

    Unfortunately Rapidminer Studio still cannot establish oracle database connections. It gets the error 'IO Error: The Network Adapter could not establish the connection'. The issue has to be with Rapdminer Studio, as SQL Developer is able to make JDBC connections from same box using same database credentials? Is there any way of trouble shooting the network adapter on Rapidminer Studio?

     

    Thanks

     

    Mike

     

  • ZSzatmari
    ZSzatmari New Altair Community Member

    First of all I propose to check the JDBC URL diplayed by the RapidMiner Studio's "Manage Database Connection" dialog . There are a lot of fine tune settings for this that change the URL. Please compare the URL with the URL used by the SQL Developer and try to find the differences. This connection is a plain JDBC connection so there couldn't be too many architectural differences.

     

    Zoltán Sz.

  • sgenzer
    sgenzer
    Altair Employee
    Hi...I have created a MySQL database on RDS and connected it to an EC2 RM Server instance on the same VPC like you are doing. It took some troubleshooting with IAM, as well as the database settings. Can you share how you're trying to set it up (black out the credentials of course)? As Zoltan said, it should work... :)

    Scott
  • MichaelWall
    MichaelWall New Altair Community Member

    On the same Windows EC2 instance:

     

    In SQL Developer I have a basic Oracle connection (role is default) : Name: DCA_TKDAP01, Username: XXX, Password: xxx, hostname: nn.nnn.nn.nnn, port: 1521, service name: TKDAP01. The connection details appear as XXX@//nn.nnn.nn.nnn/TKDAP01. This works just fine. The same connection details even work if I switch to using SID: TKDAP01 instead of service name: TKDAP01.

     

    In Rapidminer Studio I have the default Oracle connection set as: Name: DCA_TKDAP01, Database System: Oracle, hostname: nn.nnn.nn.nnn, port: 1521, database scheme: TKDAP01, Username: XXX Password: xxx. The URL details appear as jdbc:oracle:thin:@nn.nnn.nn.nnn:1521:TKDAP01. The credentials I am entering into RM are the same as those that work in SQL Developer, same hostname, same username and password. RM produces an error 'IO Error: The Network Adapter could not establish the connection'.

     

    I've tried creating alternate Oracle database drivers in RM with tweaked URL prefix and or schema separator, but none have worked.

     

    Thanks

     

    Mike

     

  • MichaelWall
    MichaelWall New Altair Community Member
    Answer ✓

    Just to close this off, I finally resolved this with some help from the Rapidminer Tech guys. There were two issues:

     

    1) The default JDBC URL string for Oracle needed some tweaking to add extra // after the @ and also to change the schema separator to /. I now have a custom Oracle database connection type set up.

    2) By default Rapidminer tries to route the connection through a proxy, but all my connections are inside a a corporate network so no proxy required. The various database instances are open to each other, so by being helpful and looking for a proxy Rapidminer was actually stopping the connections from being made.  In preferences the proxy needs to be set to Direct (no proxy).

     

  • DocMusher
    DocMusher New Altair Community Member
    Hi RM friends,
    Using command line terminal, I could get in contact with a database on ec2 aws. 
    Next, 

    ubuntu@ip-XXX-XX-XX-XXX:~$ psql -U ubuntu -d postgres

    psql (10.1)

    Type "help" for help.

    postgres=> help

    You are using psql, the command-line interface to PostgreSQL.

    Type:  \copyright for distribution terms

           \h for help with SQL commands

           \? for help with psql commands

           \g or terminate with semicolon to execute query

           \q to quit

    postgres=> \?

    My question, how can I connect this postgres db to my local RM studio considering the fact that I need to login for AWS with key (pem), followed by ssh -i etc. 

    To make it short, a db on ec2 aws; how can I access the database to be able to use it on my local RM studio?

    Cheers

    Sven

  • sgenzer
    sgenzer
    Altair Employee
    this is a question for @BalazsBarany!
  • BalazsBarany
    BalazsBarany New Altair Community Member
    Hi,

    basically what you need is a permanent or semi-permanent connection between your PC and your AWS machine. You can do it with SSH port forwarding, maybe OpenVPN or by exposing the PostgreSQL service on the public IP of your cloud server (not recommended).

    If you go the SSH route, you need a port forwarding of port 5432 to localhost:5432.
    ssh -L 5432:localhost:5432 yourawsmachine

    Then you can configure your RapidMiner Studio to access the database on localhost and port 5432. The SSH "tunnel" needs to be running when you use Studio.

    If you get errors about the user not being in pg_hba.conf or something similar, then you need to edit /etc/postgresql/10/main/pg_hba.conf (or similar) to allow connections from localhost with the authentication you are using (probably password, so the entry would be md5 or scram-sha256). But the default configuration should let you in.

    Regards,

    Balázs
  • DocMusher
    DocMusher New Altair Community Member
    Thank you Balazs, I give my feedback how I progress.
    cheers
    Sven
  • DocMusher
    DocMusher New Altair Community Member
    edited December 2018
    Dear Balázs,
    I have to admit, it is too complex for me. As scientist, the context is not abnormal and might provide a solution for the future in other domains. I notice a huge explosion of databases integrated in daily clinical practice as eg genomics (http://www.cbioportal.org/). As such my request seems not that extraterrestrial. 

    The setting

    The database (MIMIC-III) is available on AWS.
    My amazon ID has been added to the AWS EC2 AMI.
    The current zone of this AMI is US-WEST-2 (Oregon).
    There is a DB user: mimicuser
    Password: xxxxxxx
    The ubuntu password is yyyyyyy

    The ubuntu user has a password because there are a set of tools that can be used with it. The user by default is “locked”, meaning that the password cannot be sued to authenticate via ssh.
    The available tools that use the ubuntu password are:
    https://<IP>/rstudio/
    https://<IP>/jupyter/
    There is also the Querybuilder tool without any limitations running in this server.
    https://<IP>/

    To access these websites you have to add the port 443 to the network security group rules.

    Is there any manual with the steps needed to arrange the connection in order my local RM studio could be used study the db. 
    Hoping to progress I send you my greetings
    Sven