Text clustering/learning with description text data and pre-defined "buckets"

Question

I am new to Rapid miner and am looking for general / high level advice here! I have some Text description data for which each record i have marked it with a "tag" that defines that description data as one of X number of available buckets. So for example, if the description text is "Site performance is slow" i would "tag" this as "performance". I have a large set of data where i have each description and the "tag" that i manually grouped that into. I would like to do something in RapidMiner where i have it analyze my combinations of description and tag data from the past as a "training" set. Then from there when i get new description records (that do not yet have a "tag" populated) i want to have the statistics tool use the historical data to guess what Tag it would be on. So for example if another description comes in saying "site performance is slow" with similar keywords it would know from the trained data that this typically gets marked as a "performance" tag. I would like to get this setup so that i dont have to poplate the Tag each time and that the statistics software would make a guess at it first. Then from there i can confirm if it is accurate or not and make changes manually, thus improving the "trained" data over time.... any high level suggestions here??

sgenzer · Answer

hi @JoshL welcome and sorry no one has chimed in here. Is this still an issue?

Scott