Classify webpages into 4 groups using a set of keywords
Hello,
I have used the following operators so far: Read Excel -> Get Pages -> Data to Documents -> Process Documents -> Select attributes
What I want is to classify around 450 webpages into 4 categories acoording to the words they use.
So, for example, if a website uses a lot the following group of words (not necessarily all of them): "a", "b", "c", "d", "e", etc will be classified as "Category ABC"; if it uses more the words "z", "x", "v", "u", etc will be classifyied as "Category ZXV"... etc I want this to include 4 categories. For each category I have a set of 14 to 16 related words.
Now, I would like to associate each word to a category AND I wanted RM to analyse the words of all documents (in these case, websites) and to define which website belongs to each category based on the occurence and frequency of words they use.
Is this possible to do with RP? And (assuming I did everything correctly in the process above) how can I preceed from here?
Many many thanks for your help.
Best,
Katia