"[SOLVED] Two near words"

zahrahnnx
zahrahnnx New Altair Community Member
edited November 5 in Community Q&A
Hi everyone

I have an excel file including 20 rows... Each row is filled by description regarding to business analysis.
The words "problem" & "solving" are among the common words . But in each document they may come in different order. eg "solving the problems" or " problem solving skills" "solving technical problems" etc

I want to put all of these combinations of "problem " & "solving" into one attribute. For example, I'll add an attribute called "problem-solving". If an document includes the words "problem " & "solving" together or with 1~4 words in between, the value of attribute "problem-solving" set to 1. else 0.

I did similar thing for "Database" related words. eg if a document contains sql,or mysql the value of "Database" will be 1. It works. But I don't know how to do it when there is two words.

image

Please let me know if you have any idea. Thanks
Zahrahnnx

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    my first idea would be to do an n-grams and Select Attributes for problem and solving? Maybe use Generate Aggregation after wards.

    Cheers,
    Martin
  • zahrahnnx
    zahrahnnx New Altair Community Member
    Martin Schmitz wrote:

    Hi,

    my first idea would be to do an n-grams and Select Attributes for problem and solving? Maybe use Generate Aggregation after wards.

    Cheers,
    Martin
    Thanks for the response , yes n-gram works :)
    I also came up with another solution. I'll share, maybe someone face with same problem.

    Using "Extract Information" operator inside " Process document from Data". and then use below Regular Expression in "Extract Information"
    (problem\W+(?:\w+\W+){0,5}?solving)|(solving\W+(?:\w+\W+){0,5}?problem)

    It adds new attribute which I called it "Problem_Solving", then in the main process I used "Select Attribute" operator to check "Problem_Solving"

    Both ways works  ;)
    Thanks again
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    i think i like your idea a bit more. Seems to be a bit faster :)

    Thanks for the message!

    Martin