Grouping profiles strings having the same words, but occurring out of order Python

Robertd
Robertd New Altair Community Member
edited November 5 in Community Q&A

I have a data frame containing a column of profile types, which looks like this:

left_side                       right_side                  similarity</code>0         Android Java
1                  Software Development Developer
2                            Full-stack Developer
3                      JavaScript Frontend Design
4                          Android iOS JavaScript
5                             Ruby JavaScript PHP</pre><div><code><p>I've used NLP to fuzzy match similar profiles, which returned the following similarity dataframe:</p><div><pre class="CodeBlock"><code>
7   JavaScript Frontend Design  Design JavaScript Frontend  0.849943
8   JavaScript Frontend Design  Frontend Design JavaScript  0.814599
9   JavaScript Frontend Design  JavaScript Frontend         0.808010
10  JavaScript Frontend Design  Frontend JavaScript Design  0.802881
12  Android iOS JavaScript      Android iOS Java            0.925126
15  Machine Learning Engineer   Machine Learning Developer  0.839165
21  Android Developer Developer Android Developer           0.872646
25  Design Marketing Testing    Design Marketing            0.817195
28  Quality Assurance           Quality Assurance Developer 0.948010

While this has helped, taking me from 478 unique profile to 461, what I'd want to focus on are profiles like this:

Frontend Design JavaScript  Design Frontend JavaScript<br>

The only tool I've seen which looks to address this problem is difflib? My question is, what other techniques would be available so as to go through and standardize these profiles that consist of the same words, but out of order, to one standard string. So desired output would be, taking a string containing "Design", "Frontend" and "JavaScript" and replacing it with "Design Frontend JavaScript".

Right now, I'm merging my original dataframe with the similarity dataframe to replace all occurrences of profile string on the right_side with the left_side, but that means I'm replacing the right_side below ("Java Python Data Science") with the left_side below ("JavaScript Python Data Science").

</code>53  JavaScript Python Data Science  Java Python Data Science</pre><p></p><p>Any help would be greatly appreciated!!!</p></div><div><br></div>
______________________

Answers