Identify similar strings of only one attribute
aandreal
New Altair Community Member
Hello,
I would like to identify a degree of similarity between strings all belonging to a single attribute of type text. The reason is that I have strings that present tests performed in the hospital in the form: exam_a;exam_b;exam_c. I would also like to identify when they occur in different order but always with the same elements: exam_c;exam_b;exam_a.
I would like to identify a degree of similarity between strings all belonging to a single attribute of type text. The reason is that I have strings that present tests performed in the hospital in the form: exam_a;exam_b;exam_c. I would also like to identify when they occur in different order but always with the same elements: exam_c;exam_b;exam_a.
Please help me.
Thanks
Tagged:
0
Best Answer
-
Hi,Have a look at the operator fuzzy matching and Generate Levensthein Distance in operator toolbox extension. I think what you want to do is to replace the ; with a space and then do a fuzzy matching using TOKEN_SET_RATIO or so as a measure.Cheers,Martin5
Answers
-
Hi,Have a look at the operator fuzzy matching and Generate Levensthein Distance in operator toolbox extension. I think what you want to do is to replace the ; with a space and then do a fuzzy matching using TOKEN_SET_RATIO or so as a measure.Cheers,Martin5
-
Hi @mschmitz,thanks for the reply.
It can help me but not quite what I want to do. I have situations in which I have strings of length 1 but also of length 20 (depending on the number of exams). Besides that, I have situations of missing values. I considered the Jaccard index idea by working on values separated by; but what happens is a word-by-word comparison (taking into account that by splitting the shorter strings are still commensurate with the longer string by adding missing values). I would like to think in terms of sets, then compare the words of one string with the words of a second string. What do you think about it? How could I do it?0 -
Hi,Jaccard-Index or cosine similarity of 1 hot encoded values maybe also viable candidates for a solution, yes.Best,Martin0
-
Thanks @mschmitz
I think I found the solution with Jaccard. However, before applying it, I would like to sort the data. To do this I am transposing and then sorting the columns. I have a problem with the transpose: applying the operator I am shown only the column of type ID and I cannot find all the other necessary columns. Why?
0 -
Hi,you can still type in the names, they do exist.We cannot know from the information of the header what columns will be created after transposing the table. Thats why we sadly cannot display them.Best,Martin0
-
of course you use loops and macros for it. nobody wants to do things 1158 times.
0