hi Everyone.
I need some serious help. I have been working on an excel file. Just one column. It contains data coming from pHysicians offices. Its a string of free text that the doctor would write down when examining a patient. This column pertains to the daignosis information. I need to create a model to give this data some structure.
I am specifically trying to filter out all conditions that are related to migraine. the way I am doing it in microsoft excel is that I am using the "if,error,search" functions to sniff out the keywords from the table. I need two kinds of Keywords:
includes: i.e all keywords that can be "Migraines"
excludes: i.e all keywords that if present can never be migraines.
Sometimes I have to combine "includes" and "excludes" to find out the actual migraine.. for example:
Includes = Migraine
Excludes = family History of
in this case I am trying to look for a patient with Migraine, not someone who has a family history of Migraine. So I need to exclude the text "family History of". its like "this string should include this keyword and exclude this keyword"
I think this should be faily simple in rapidminer. It is taking my hours and hours of formulas in excel and driving me crazy since i have about half a million rows to analyse and too many formulas. The objective is to create a model that i can scale up to other diseases as well.
Can anyone help...
I am attaching the excel file with some data as well as some examples of includes and excludes I am using. Created a zip file with the excel file inserted
Thanks
Arsalan (MD)