Create Operator or Output with Groovy
darktemptation
New Altair Community Member
My Problem is quite simple:
I want to load a Webpage with Groovy in a Text-Operator (i.e. Document) and then extract certain attributes (e.g. all <li>-Texts).
Now I can fetch the HTML from a page with
Can somebody give me a hint?
I want to load a Webpage with Groovy in a Text-Operator (i.e. Document) and then extract certain attributes (e.g. all <li>-Texts).
Now I can fetch the HTML from a page with
But the Script Operator does not return anything to the output, even when I use the "return" from Groovy.
"http://rapid-i.com".toURL().text
Can somebody give me a hint?
0
Answers
-
Hm ... maybe I didnt grasp the problem, but rapidminer cannot deal with arbitrary groovy types. You have to convert the output into an IO-type rapidminer does understand.
... and "no", I do not know which one and how :-\.0 -
Hi there,
I think you were getting nothing back because you were being re-directed, the following ( with the '/' on the end of the URL ) produces words of infinite beauty, wisdom, etc..etc..<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location>//R5 Forum/groovout</location>
<location/>
<location/>
<location/>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="478" width="915">
<operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="176" y="9">
<parameter key="script" value="operator.getProcess().getLog().log("http://rapid-i.com/".toURL().text)"/>
</operator>
<connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Thanks for the hints so far.
But @haddock:
Even if I try your setting, the log tells me that there is nothing delivered to the Result 1 Port.
And if you like you can also take another page where you aren't redirected.
"http://www.aboutgroovy.com".toURL().text
So the problem is still how to get the fetched result to the the output-port, with a data type that is known by RM.
Maybe then the question must be, how can I create an IOObject with a String/Text Attribute in Groovy (as Steffen suggest)?0 -
Ooops, I thought you weren't getting anything back. If you want to manipulate the contents you can with macros, and so to logs and example, like this...
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
There's some Groovy stuff on the Wiki which may be helpful.
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros>
<macro>
<key>HTML</key>
<value/>
</macro>
</macros>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="258" width="915">
<operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="45" y="30">
<parameter key="script" value=" operator.getProcess().getMacroHandler().addMacro("HTML", "http://rapid-i.com/".toURL().text.substring(0,10)); //def html="http://rapid-i.com/".toURL().text.substring(0,10)"/>
</operator>
<operator activated="true" class="provide_macro_as_log_value" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="230" y="74">
<parameter key="macro_name" value="HTML"/>
</operator>
<operator activated="true" class="log" expanded="true" height="76" name="Log" width="90" x="359" y="73">
<list key="log">
<parameter key="HTML?" value="operator.Provide Macro as Log Value.value.macro_value"/>
</list>
</operator>
<operator activated="true" class="log_to_data" expanded="true" height="94" name="Log to Data" width="90" x="511" y="71"/>
<connect from_op="Execute Script" from_port="output 1" to_op="Provide Macro as Log Value" to_port="through 1"/>
<connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_op="Log to Data" to_port="through 1"/>
<connect from_op="Log to Data" from_port="exampleSet" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Thanks haddock
This goes into the right direction, for what I looked. After a few modifications I found the way how to do it.
So I just used the scripting Operator to define the macro and used the macro as parameter for the "Create Document" Operator. This is a nice solution I think.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros>
<macro>
<key>HTML</key>
<value/>
</macro>
</macros>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="500" width="655">
<operator activated="true" class="execute_script" expanded="true" height="60" name="Execute Script" width="90" x="45" y="30">
<parameter key="script" value=" operator.getProcess().getMacroHandler().addMacro("HTML", "http://rapid-i.com/".toURL().text); //def html="http://rapid-i.com/".toURL().text.substring(0,10)"/>
</operator>
<operator activated="true" class="text:create_document" expanded="true" height="60" name="Create Document" width="90" x="246" y="300">
<parameter key="text" value="%{HTML}"/>
<parameter key="add label" value="true"/>
<parameter key="label_type" value="text"/>
<parameter key="label_value" value="raw html"/>
</operator>
<connect from_op="Create Document" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Now there will be the next step, how to find all the "<li>" and write all in one Attribute.
0 -
Hi,
a detailed description how to use the Groovy Operator and how to return things there is given in the How to Extend RapidMiner tutorial.
Greetings,
Sebastian0 -
Hi Sebastian
After a short search I found the tutorial in the Shop
http://rapid-i.com/component/page,shop.product_details/flypage,flypage.tpl/product_id,52/category_id,5/option,com_virtuemart/Itemid,180/
I guess you mean this.
But when I read there the detailed description there is written:
What book do you guys talk there about? There is only a White Paper to purchase/download.
Together with the white paper you receive two projects for Eclipse. The one is an extension containing all examples covered in the book and the other is a template for building own Extensions with Eclipse.
And may it be possible to get a short preview of the content, at least this White Paper costs €40 (resp. CHF 60.-), then I want to know exactly what is covered in there and maybe see an example.
Best regards
Darktemptation0 -
Hi,
this "book" is the white paper. The whitepaper has 45 pages in DIN A4, so other people would format it differently and call it a book.
Sorry, but what examples do you mean? It covers everything you need to write your own extensions. In fact we are using it internally to teach new colleges...There's everything in I know about extensions and I wrote the published ones...
Greetings,
Sebastian0