New to data mining and rapidminer so any help is appreciated. I have a database with a column {Company Name}. I need to get a total Company count but the problem is there are spelling errors and inconsistencies in spelling in this attribute so a simple removal of duplicates doesn't work. I have around 15K results but I'm guessing there are really only about 800 actual companies in my database. Trying to avoid manually removing them in the CSV
Example:
ABC Company
ABC Co.
ABC Company Inc.
ABC Company Inc
ABC Company, Inc
ABC
I'd want the above to be grouped into 1 group since it's all the same company. I've only spent a few hours in Rapidminer but figured I'd ask if this is even possible before spending more time. Can I make a process that is smart enough to automatically aggregate or group attributes so I have an idea of total Companies? Doesn't have to be 100% accurate.