-
How to extract/filter text elements using regex?
Dear community, Currently I've been trying to use regular expressions in my RAPIDMINER model to filter extracts from a text. The text is extracted from excel under one content attribute. Each text is extracted from one of these cells in excel. I would like to extract specific sentences using regular expressions…
-
Add new column if then in existing column
I have a dataset something like this: I want to make a new column if data in column_1 is start with a number and ends with a letter and return 'Complete' or 'Incomplete' in a new column. Expected output: Thank you.
-
Rename by Replacing - REGEX
Hi All - struggling to utilize regex to rename columns after pivoting. Ideally trying to remove the "count(IDT)_" and leave the year on the header. Any help would be appreciated. XML and sample data attached. <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002"> <context> <input/> <output/> <macros/> </context>…
-
\n command doesn't work in Replace Token Operator
Hi, I'm trying to read pdf-files in RapidMiner through the "Read Document" operator and then use the "Replace Token Operator" to delete all line-breaks. I replace "\n" with " ", but when I then copy the text, all line breaks are still in place. Weirdly, when I use the "Create Document" operator and manually copy the text…
-
Tokenize operator issue - help request
I have to process some documents where the double exclamation !! when followed by a word should be an individual token by itself (e.g., sentence!! as a token, not 'sentence' and '!!' separate). Similarly, the smiley character : ) is expected to be a separate token. When I use the non-letters mode in Tokenize, the words get…
-
Rename by Replacing - Unexpected RegEx output
Hi, Maybe I'm doing something wrong but I don't understand why this replacement regex adds "MyString" after $1. Thanks in advance <?xml version="1.0" encoding="UTF-8"?><process version="9.9.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.000"…
-
How to apply regex in nominal values of attributes for processing
Hi, I am new to Rapid Miner, i am trying to apply a regex on an attribute value using Cut stage, but to me looks it is applying regex condition on attribute name, not on its value. Please help
-
how to exclude words from returns of a regex in a replace with dictionary operator
hi everybody, after many hours on internet, I must acknowledge I can't find any solution in order to exclude words from results a REGEX. I use regular expressions in a spreadsheet connecte to a "replace with dictionary" operator. Some REGEX capture too many words. one example: - the…
-
Filter Stopwords with Regular Expression
Hi guys, I'm currently doing a sentiment analysis in Rapidminer with Knn. I want to count the number of words that are left in the document when removing stopwords. Using the "Filter stopwords" operator inside the "process documents from data operator" only works if I tokenize the data and use the "Nominal to Text"…
-
RegEx query returns only one word instead of a complete sentence
Hey I am new to Rapidminer and try to analyze text for my Bachelor thesis.I have already pre-processed (e.g. tokenized etc. ) the Documents and would like to use "extract information" and regular expressions to get all sentences containing the word "Kenntnisse". I have already tested some expressions on regex101.com and…
-
Extract all Uppercase words from document into new attributes
Hello RM forum, I'd like to extract all matches from an example set to a new attribute or even to multiple new attributes. Example text: This is my RegEx which finds all occurences (operator "generate extract": The Problem is, that it will only generate the first match as a new attribute: I want to get a new attribute…
-
Substitute Search result in RegEx with new line doesn't work
Hello support team, I might found a bug in the RegEx implementation of Rapidminer. My goal is to replace a "blank" with a "new line" character. Searching in following Text: Sri Lanka Using the Replace operator: Search for ( ) Replace with: \n leads to: SrinLanka I created a workaround using the "split" operator which you…
-
I want consider columns through regex in generate aggregation operator
Hello, I have total 20 columns in my data ,three column like scenario_1,scenario_2,scenario_3 i want concat these three through generate aggregation operator how can i do this. I write regex like "\w+_ " but not working for me
-
Filter: 1) extract numeric information from text column 2) select attributes subset based on a table
Dear all, i'm kinda new dealing with RapidMiner, and hope some of you in the community is able to help me with my problem. I have already experience with other ETL and data management tools but did not find a way within RapidMiner to tackle it correctly. I have two questions. 1 is more important, 2 is nice2know 1) extract…
-
Replace values by conditional match
Hi all, I searched the community and couldn't find the answer. Maybe it was solved before but I need it urgently. So appreciate if you answer again. I am working on a very messy dataset. As you look above, models L200, L 200, MITSUBISHI L200 etc. (I mean all of them) are the same model of a brand. I would like to replace…
-
replace string with single quote string
I have a column attribute with following values as SENDER BANK RECEIVING BANK SENDER RECEIVER I want a result in single string as follows: 'SENDER BANK','RECEIVING BANK','SENDER','RECEIVER'
-
How to remove characters from end of string using regular expression within Replace operator?
Hi, I'm trying to remove certain characters from the end of string in a text field using Replace operator, for example, Current value: abcd(xxx) Desired value: abcd In the replace operator, I use the following regular expression to locate strings end with (xxx), but what should I put in as Replacement? (.*)(?:abc)(.*)…
-
How to extract a specific part (section) from a large text (txt format)?
Dear RM Friends, I have 500 txt files containing large Reports and I need to extract only one section of these Reports. As the Reports are each slightly different, the only common patern I can recognise is that the section' headline by all start with the same 3 words, but in the end of each something different is written…
-
Using Regex and Macros in Loop Attributes
Hi, I'm creating a reusable template for looping through attributes. So, I specify the name of the attribute in a separate macro. How can I use this macro in the regex field of Loop Attributes if I'm using attribute filter type = regular expression?
-
Using regex for generating generic column names by
Hi everyone, I am trying to make a join on two tables that share not only Key attributes but column names, e.g 1960, 1961..., 2010 (although column values differ in both tables). After doing the join I got a table with duplicate column names, but different data. I want to name one set o these columns as p1960,…
-
Why am I getting "inadmissible input" for regex in "finds" expression?
I have a Generate Attributes operator with an expression that uses the "finds" function. It takes the existing nominal attribute Link_prefix, which contains URL strings, and I want to check for the existence of an IP address in the URL. My expression looks like this: finds(Link_prefix, "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")…
-
"Generate Attributes - function expression OR regex for
Hello together, i have a nominal attribute title which contains a text description and between the text description the year (4 digits). Sometimes there are also some other digits in the text. So i have to search for "4 digits within the text" and generate a new attribute for year. Example: title = "that is the 1st test…
-
"Filter text on regex."
I want to find all text snippets containing 1 or several words via regex. if I write select Filter Examples, and set it to "Expression" and provide it with: finds(Text, "(?i)\blootbox|micro\b") it doesn't work, although it is syntactically correct. If I remove |micro, it only returns all snippts that contain lootbox - why…
-
Save regex capturing group for use later
Hi, I'm reading in multiple sheets from an excel file and the first element contains an identifier for the data rather than an attribute name. I'm using "Rename by Replacing" to change the attribute name to something more appropriate but I'd like to keep the regex capture response for later use when I create an new…
-
"Stem (Dictionary) Indonesia Language with regex"
Hello, I have a problem when trying to use regex for Stem (Dictionary) Indonesia language This is for example indonesian language: saya sangat senang dengan kalian-kalian, tampilannya dan suaranya sangat bagus and I want to make it as below: saya sangat senang dengan kalian, tampil dan suara sangat bagus That is working…