Identify similar strings of only one attribute

aandreal
aandreal New Altair Community Member
edited November 5 in Community Q&A
Hello,
I would like to identify a degree of similarity between strings all belonging to a single attribute of type text. The reason is that I have strings that present tests performed in the hospital in the form: exam_a;exam_b;exam_c. I would also like to identify when they occur in different order but always with the same elements: exam_c;exam_b;exam_a.

Please help me.
Thanks

Best Answer

Answers

  • aandreal
    aandreal New Altair Community Member
    Hi @mschmitz,
    thanks for the reply.

    It can help me but not quite what I want to do. I have situations in which I have strings of length 1 but also of length 20 (depending on the number of exams). Besides that, I have situations of missing values. I considered the Jaccard index idea by working on values separated by; but what happens is a word-by-word comparison (taking into account that by splitting the shorter strings are still commensurate with the longer string by adding missing values). I would like to think in terms of sets, then compare the words of one string with the words of a second string. What do you think about it? How could I do it?
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,
    Jaccard-Index or cosine similarity of 1 hot encoded values maybe also viable candidates for a solution, yes.

    Best,
    Martin
  • aandreal
    aandreal New Altair Community Member
    Thanks @mschmitz
    I think I found the solution with Jaccard. However, before applying it, I would like to sort the data. To do this I am transposing and then sorting the columns. I have a problem with the transpose: applying the operator I am shown only the column of type ID and I cannot find all the other necessary columns. Why?
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,
    you can still type in the names, they do exist.

    We cannot know from the information of the header what columns will be created after transposing the table. Thats why we sadly cannot display them.

    Best,
    Martin
  • aandreal
    aandreal New Altair Community Member
    edited November 2020
    OMG, @mschmitz.

    So if I have 1158 attributes, do I have to do 1158 sort? My idea was to use a Loop.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    of course you use loops and macros for it. nobody wants to do things 1158 times.