Rapid Miner _ Many-to-many

Hi everyone,
I am new here and I just started using rapidminer.
I want to use it for my research in chemistry but I have already a few problems how to set up my data right for rapidminer, since my dataset does not really look like the examples I find out there.

To make things easier lets just say I am searching for matching substance pairs:
->I have around 60 substances.
->Theoretically every substance can match every other substance.
->Practically a few are known to match, others are known not to match and some are a " maybe" .
->Every substance has a lot of different properties (> 10), which I know all (for example color, smell, molecular weight,...).

When I now want to create a dataset, I have a " n: m " problem, which would be solved in a classical database with tables where the individual matches are linked by IDs.

Is there a way in rapidminer to link my matches from two identical tables? Or should I think about a way to express the many-to-many relationship in one table? If so, any Ideas how to do that ?

Thanks for your help in advance!
Cheers,
Dennis

Find more posts tagged with

AI Studio

Accepted answers

MartinLiebig

Hi,

i am not sure what you ultimativly want to do?

Don't start with too much of a data base thinking. While in databases you want to go for your star schema, in data mining you need to generate a "one line per recpipe/customer.." representation to start mining.

Best,

Martin

All comments

MartinLiebig

Hi,

it feels like you just want to use a join operator?

BR,
Martin

dennis_enkelman

Hm
Im not sure if I am right, but I understood the join operator as "merging" two tables to one.

What I want to do is to include the information of couples into rapidminer. In the following the software should learn to predict if two different substances fit together or not.

I would understand how to use "join" in a one-to-many relationship, but in my case I can't

Do you still think "join" is the right operator here? I am insecure if I just don't understand it right or if it does not work. Sorry for my inexperience

Attached you find an example dataset of two data tables and one table combining the primary keys of the couples. Lets say there is a missing couple between 4 and 3. If rapid miner would learn the (hypothetical) connections of color and molar weight it could predict the missing couple. (Dont think about the content here, I just try to keep it simple)

Thanks!
Dennis

Präsentation1.jpg

dennis_enkelman

I mean I could join every combination to one single row, but this would create huge amounts of data. with 60 substances I would have 3600 rows. with 100 already 10000. Later it might be necessary to include also groups of three substances that "fit", which would give up to 1000000 rows.
In databases its so easy to link such relationships via a many to many function. I was hoping there is something similar in RM.

Cheers,
Dennis

MartinLiebig

Hi,

i am not sure what you ultimativly want to do?

Best,

Martin

dennis_enkelman

Yeah thats it, I can imagine my problem in the star schema and was hoping to import this in rapid miner as it is. Transferring it into this "one line" representation increases the data amount immensely. But probably you are right, it might be also a good thing to keep it simple by increasing the amount of rows and not thinking in database dimensions.

I will try that! Thanks for your advice

Cheers,
Dennis