The most recent content from our members.
There's a wealth of content waiting for you to explore! Need support or have a question? Sign in or register to get started.
Dear community I'm new to text mining with RM and would like to know, if it's even possible to build a process in RM which suits my research question. I would like to create a process which searches for boilerplate language in documents. In detail I'd like to input management reports from different companies (pdf files)…
Hi guys, I'm pretty new to the community so sorry if my question will seem quite elementary, but how do I create balanced clusters (k-means) - meaning that each cluster will have the same size of items in it? Or is there a way to force a minimum cluster size to anything else than 1? (What I am trying to do is to create…
Hi guys, I am very new to RM. I have an Excel File with 150 rows. Each row "is represented" by multiple columns. Rn I am interested in one column - the column comprising the text. Using RM, I want to see if any of the texts show similarities to other texts "of other rows". Any advice? A precise description would help a…
How are you .. How can I use data to similarity to calculate the similarity of a document with all the lines in a database and choose the most similarThank you
I have three excel files. They all have a column named "word" with 300 rows. for example: file A (word column) file B (word column) file C (word column) book pen desk plate book dictionary studio studio book I want to compare similarity of these three columns. Which operator is suitable, data to similarity operator, and…
Hello everyone, I need a way to calculate the similarity between a certain phrase and a set of phrases stored in a database and choose the phrase that is most similar and use it in another operationI am using Postgre SQLThank you
Hello community: I have a problem withe duplicate data, this is an example: Col 1 Col 2 39-2021 49-2021 49-2021 39-2021 so I want to remove one of those rows. Help please, you are my only hope.....
Hello, I would like to identify a degree of similarity between strings all belonging to a single attribute of type text. The reason is that I have strings that present tests performed in the hospital in the form: exam_a;exam_b;exam_c. I would also like to identify when they occur in different order but always with the same…
Hello everyone, for a university project in the 1st semester we want to match data from lecture notes with appropriate Udemy courses.We have already done the crawling of the lecture contents and the Udemy courses. Now the questions would be, which procedure would be the best for us. How can the "best" or most suitable…
Hi all, i am new to rapid miner and data mining in general. i run the support team in my organisation and we have some much data from previous resolved cases that can be useful to find slimier issues and present the solution to people encountering the same issues. what we have is a free text filed for the engineer to write…
Hi, Currently, I am working on a thesis research for my university to solve an entity resolution problem. Today I have tried to integrate two tables with each other through measuring the Similarity between these tables. If the threshold is above 0,9 it is considered as useful and will it be used in the second evaluation.…
For my text document data sets, i have done 'Data to Similarity' using Cosine, Jaccard, Dice etc similarities. My goal is to determine which similarity measurement gives better results for my input data set. How do i do the comparative check?