"[SOLVED] Two near words"
zahrahnnx
New Altair Community Member
Hi everyone
I have an excel file including 20 rows... Each row is filled by description regarding to business analysis.
The words "problem" & "solving" are among the common words . But in each document they may come in different order. eg "solving the problems" or " problem solving skills" "solving technical problems" etc
I want to put all of these combinations of "problem " & "solving" into one attribute. For example, I'll add an attribute called "problem-solving". If an document includes the words "problem " & "solving" together or with 1~4 words in between, the value of attribute "problem-solving" set to 1. else 0.
I did similar thing for "Database" related words. eg if a document contains sql,or mysql the value of "Database" will be 1. It works. But I don't know how to do it when there is two words.
Please let me know if you have any idea. Thanks
Zahrahnnx
I have an excel file including 20 rows... Each row is filled by description regarding to business analysis.
The words "problem" & "solving" are among the common words . But in each document they may come in different order. eg "solving the problems" or " problem solving skills" "solving technical problems" etc
I want to put all of these combinations of "problem " & "solving" into one attribute. For example, I'll add an attribute called "problem-solving". If an document includes the words "problem " & "solving" together or with 1~4 words in between, the value of attribute "problem-solving" set to 1. else 0.
I did similar thing for "Database" related words. eg if a document contains sql,or mysql the value of "Database" will be 1. It works. But I don't know how to do it when there is two words.
Please let me know if you have any idea. Thanks
Zahrahnnx
Tagged:
0
Answers
-
Hi,
my first idea would be to do an n-grams and Select Attributes for problem and solving? Maybe use Generate Aggregation after wards.
Cheers,
Martin0 -
Thanks for the response , yes n-gram worksMartin Schmitz wrote:
Hi,
my first idea would be to do an n-grams and Select Attributes for problem and solving? Maybe use Generate Aggregation after wards.
Cheers,
Martin
I also came up with another solution. I'll share, maybe someone face with same problem.
Using "Extract Information" operator inside " Process document from Data". and then use below Regular Expression in "Extract Information"
(problem\W+(?:\w+\W+){0,5}?solving)|(solving\W+(?:\w+\W+){0,5}?problem)
It adds new attribute which I called it "Problem_Solving", then in the main process I used "Select Attribute" operator to check "Problem_Solving"
Both ways works
Thanks again0 -
Hi,
i think i like your idea a bit more. Seems to be a bit faster
Thanks for the message!
Martin0