-
Boilerplate text analysis - text mining
Dear community I'm new to text mining with RM and would like to know, if it's even possible to build a process in RM which suits my research question. I would like to create a process which searches for boilerplate language in documents. In detail I'd like to input management reports from different companies (pdf files)…
-
How do I create balanced clusters?
Hi guys, I'm pretty new to the community so sorry if my question will seem quite elementary, but how do I create balanced clusters (k-means) - meaning that each cluster will have the same size of items in it? Or is there a way to force a minimum cluster size to anything else than 1? (What I am trying to do is to create…
-
How to perform Similarity operations on string (multiple words) on multiple Rows of text attribute
Hi guys, I am very new to RM. I have an Excel File with 150 rows. Each row "is represented" by multiple columns. Rn I am interested in one column - the column comprising the text. Using RM, I want to see if any of the texts show similarities to other texts "of other rows". Any advice? A precise description would help a…
-
similarity
How are you .. How can I use data to similarity to calculate the similarity of a document with all the lines in a database and choose the most similarThank you
-
how to compare similairity of one column data from different excel files
I have three excel files. They all have a column named "word" with 300 rows. for example: file A (word column) file B (word column) file C (word column) book pen desk plate book dictionary studio studio book I want to compare similarity of these three columns. Which operator is suitable, data to similarity operator, and…
-
Similarity Calculation
Hello everyone, I need a way to calculate the similarity between a certain phrase and a set of phrases stored in a database and choose the phrase that is most similar and use it in another operationI am using Postgre SQLThank you
-
How to remove duplicate data
Hello community: I have a problem withe duplicate data, this is an example: Col 1 Col 2 39-2021 49-2021 49-2021 39-2021 so I want to remove one of those rows. Help please, you are my only hope.....
-
Identify similar strings of only one attribute
Hello, I would like to identify a degree of similarity between strings all belonging to a single attribute of type text. The reason is that I have strings that present tests performed in the hospital in the form: exam_a;exam_b;exam_c. I would also like to identify when they occur in different order but always with the same…
-
University Project: Compare similarity of two data sets
Hello everyone, for a university project in the 1st semester we want to match data from lecture notes with appropriate Udemy courses.We have already done the crawling of the lecture contents and the Udemy courses. Now the questions would be, which procedure would be the best for us. How can the "best" or most suitable…
-
Find Similarities in documents and group them into clusters
Hi all, i am new to rapid miner and data mining in general. i run the support team in my organisation and we have some much data from previous resolved cases that can be useful to find slimier issues and present the solution to people encountering the same issues. what we have is a free text filed for the engineer to write…
-
Similarity between mutiple tables
Hi, Currently, I am working on a thesis research for my university to solve an entity resolution problem. Today I have tried to integrate two tables with each other through measuring the Similarity between these tables. If the threshold is above 0,9 it is considered as useful and will it be used in the second evaluation.…
-
How to compare which similarity measurement gives better results?
For my text document data sets, i have done 'Data to Similarity' using Cosine, Jaccard, Dice etc similarities. My goal is to determine which similarity measurement gives better results for my input data set. How do i do the comparative check?