"For each XLS row, calculate similarity among the 3 text cells in that row"

dfischer
dfischer New Altair Community Member
edited November 5 in Community Q&A
Hi everyone,

I would appreciate if you could share any thoughts on how could I solve the problem below:

INPUT: Excel with multiple rows and 3 columns (say columns A,B and C). All excel content is text

PROBLEM: For each row, calculate similarity among the 3 text cells in that row. Then save the calculated similarities

Example:

If Sim(x,y) is the text similarity between any cells 'x' and 'y' in the Excel file, an ideal output would be another excel that follows the format below:

Sim(A1,B1) Sim(A1,C1) Sim(B1,C1)
Sim(A2,B2) Sim(A2,C2) Sim(B2,C2)
Sim(A3,B3) Sim(A3,C3) Sim(B3,C3)
Sim(A4,B4) Sim(A4,C4) Sim(B4,C4)
Sim(A5,B5) Sim(A5,C5) Sim(B5,C5)
...
Sim(An,Bn) Sim(An,Cn) Sim(Bn,Cn)

I've see a number of Rapidminer videos to learn this task but haven't succeeded yet.

Any ideas? Since I am still learning the basics, I would appreciate if you could tell what the entire process looks like.

Thank you in advance

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    the operators you might need is Cross distances. This is calculating the similarity - but usually between documents which are given as examples. So you i think you need to use a Loop and a Transpose (or Depivot?) Operator to get a vertical example set for each round.

    If you could post an example set me or another helper might find time to build an example process.

    cheers,
    Martin