The most recent content from our members.
There's a wealth of content waiting for you to explore! Need support or have a question? Sign in or register to get started.
Hello, I would like to ask. How do I change my tokenized words attribute back into my own text attribute in Excel file? I was doing tokenized words for correcting many mistakes in my text by using Stem (Dictionary) and many other operators within Process Documents from Data. The thing is that I can't find any operator that…
Hi, I'm trying to read pdf-files in RapidMiner through the "Read Document" operator and then use the "Replace Token Operator" to delete all line-breaks. I replace "\n" with " ", but when I then copy the text, all line breaks are still in place. Weirdly, when I use the "Create Document" operator and manually copy the text…
I have to process some documents where the double exclamation !! when followed by a word should be an individual token by itself (e.g., sentence!! as a token, not 'sentence' and '!!' separate). Similarly, the smiley character : ) is expected to be a separate token. When I use the non-letters mode in Tokenize, the words get…
Hi there, I'm very new to RapidMiner. I'm reading german pdf-files and tokenizing them, which is working fine... However, the pdf-files contain hyphens that seperate a fair amount of words in to two parts, like the following example: "die Bedeutung der finan- ziellen Interessen der Union" I'm trying to dehyphenate the…
Hi Rapid miner community,I don't find the solution to replace whole words after a "read excel" operator. If I use a "Replace (dictionary)" operator linked with an excel file, words are partially substituted - as they are not tokenized - and sometimes part of the word is substituted and aggregated with the rest of the word.…
hi everybody! I made the data preparation shown in the picutre, but watching the tf-idf's weighting schemas i notice that there are some strange charaters (for example “optionâ€), how can eliminate them? thank u
I have a spreadsheet with a text column and a label column. I would like to represent text values with some token metadata. I'm using "process documents". In "process documents" I'm tokenizingo the text value. I would like to achieve the following: 1. Add an attribute to the exampleset which contains a count of the number…
Hi all, i am new to rapid miner and data mining in general. i run the support team in my organisation and we have some much data from previous resolved cases that can be useful to find slimier issues and present the solution to people encountering the same issues. what we have is a free text filed for the engineer to write…
We have the token service API : $RMServerHost/api/rest/tokenservice As per the documentation "The expirationDate indicates how long the JWT is valid, the default is 5 minutes." Can we increase the expiration date for the generated token?
I have an example set of 451 comments (rows). I want to generate mult-word tokens (n-Grams) from this example set. What is the sequence of operators?