Script task issue with RapidMiner Studio 9.2
olli_aro
New Altair Community Member
Hi all,
Has anyone else noticed any issues with using Script Tasks with the latest version of RapidMiner Studio (RapidMiner Studio 9.2.000 (rev:461351, platform: WIN64))?
If I add a script task in my process process, all follow on tasks seem to loose the ExampleSet causing e.g. the available attributes list in "Reorder Attributes" task to be empty. The funny thing is that the follow on tasks still seem to execute ok, if I run the process. For example, if I remove the script task from the process, reorder the attributes with "Reorder Attributes", then put the script task back in and run, the output is reordered as per my configuration.
The above used to work for me with no issue in the previous version of RapidMiner Studio.
Regards,
Olli
Tagged:
0
Best Answer
-
Hi,
Unfortunately, that is to be expected. Execute Script can be used to do anything, you can create and return new data here, you can remove/add attributes to existing data, return something else entirely (a model instead of a data set), it may not even return anything at all, etc.
So to know what really happens in there, we would have to execute the script. And because you are completely free in what to do, it may even fail when not running on the entire data. Right now, we have no chance to do that for meta data, so it has no meta data.
But, there is a solution:- Click on "Process" -> "Synchronize Meta Data with Real Data" in the top menu bar and make sure it's checked
- Right-click the Execute Script operator in your process, and select "Breakpoint After"
- Run the process. It will now pause after the script has run, and the meta data will be created based on the actual data that is now there.
- Select Attributes now has the attributes available. You can select them now.
- Resume the process by clicking the run button again. It will resume where it was paused and you will finish the process with the attributes selected in the 2nd Select Attributes operator.
Regards,
Marco2
Answers
-
I had a similar issue with select attributes yesterday and this does seem to be new to 9.2. Some of my attributes were missing. I had to select the ones I didn't want and then invert. Even though the I could not see the attributes in the operator, the process worked with the new attributes. I will try and reproduce this later today.0
-
Thanks for the post back. Maybe a bug then?
0 -
Hi,
When you edit your process in the UI and in the parameters of an operator you are selecting attributes, that is what we call "metadata". It's a best effort solution to help create a process. When actually running the process on the real data, this metadata is irrelevant and only the actual data is being looked at.
We have changed the way this metadata is generated in Studio 9.2, as previously it could freeze your entire Studio UI if you had an operator with large / slow metadata. This was fixed, but as a result some of the metadata may now take a while to appear, or may outright be missing because we overlooked something. Please let us know these instances and possibly share the process with us so we can have a look!
Regards,
Marco2 -
Hi Marco,Thanks for the message back.The metadata just simply goes missing for any data steps following the script step.It is really easy to demonstrate. Please see the process below. The Select Attributes prior to the script task can see both Town and District, however the second Select Attributes cannot. If I exclude the script task from the process flow everything works as expected.Regards,Olli
<pre class="CodeBlock"><code><?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="SYSTEM"/><br> <process expanded="true"><br> <operator activated="true" class="read_excel" compatibility="9.2.000" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34"><br> <parameter key="excel_file" value="sample data.xlsx"/><br> <parameter key="sheet_selection" value="sheet number"/><br> <parameter key="sheet_number" value="1"/><br> <parameter key="imported_cell_range" value="A1"/><br> <parameter key="encoding" value="SYSTEM"/><br> <parameter key="first_row_as_names" value="true"/><br> <list key="annotations"/><br> <parameter key="date_format" value=""/><br> <parameter key="time_zone" value="SYSTEM"/><br> <parameter key="locale" value="English (United States)"/><br> <parameter key="read_all_values_as_polynominal" value="false"/><br> <list key="data_set_meta_data_information"><br> <parameter key="0" value="Town.true.polynominal.attribute"/><br> <parameter key="1" value="District.true.polynominal.attribute"/><br> </list><br> <parameter key="read_not_matching_values_as_missings" value="false"/><br> <parameter key="datamanagement" value="double_array"/><br> <parameter key="data_management" value="auto"/><br> </operator><br> <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="246" y="85"><br> <parameter key="attribute_filter_type" value="subset"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value="District|Town"/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> </operator><br> <operator activated="true" class="execute_script" compatibility="9.2.000" expanded="true" height="82" name="Execute Script" width="90" x="246" y="238"><br> <parameter key="script" value="/* * You can use both Java and Groovy syntax in this script. * * Note that you have access to the following two predefined variables: * 1) input (an array of all input data) * 2) operator (the operator instance which is running this script) */ // Take first input data and treat it as generic IOObject // Alternatively, you could treat it as an ExampleSet if it is one: // ExampleSet inputData = input[0]; IOObject inputData = input[0]; // You can add any code here // This line returns the first input as the first output return inputData;"/><br> <parameter key="standard_imports" value="true"/><br> </operator><br> <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="136"><br> <parameter key="attribute_filter_type" value="subset"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> </operator><br> <connect from_port="input 1" to_op="Read Excel" to_port="file"/><br> <connect from_op="Read Excel" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/><br> <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Execute Script" to_port="input 1"/><br> <connect from_op="Execute Script" from_port="output 1" to_op="Select Attributes" to_port="example set input"/><br> <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="source_input 2" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process>
0 -
Hi,
Unfortunately, that is to be expected. Execute Script can be used to do anything, you can create and return new data here, you can remove/add attributes to existing data, return something else entirely (a model instead of a data set), it may not even return anything at all, etc.
So to know what really happens in there, we would have to execute the script. And because you are completely free in what to do, it may even fail when not running on the entire data. Right now, we have no chance to do that for meta data, so it has no meta data.
But, there is a solution:- Click on "Process" -> "Synchronize Meta Data with Real Data" in the top menu bar and make sure it's checked
- Right-click the Execute Script operator in your process, and select "Breakpoint After"
- Run the process. It will now pause after the script has run, and the meta data will be created based on the actual data that is now there.
- Select Attributes now has the attributes available. You can select them now.
- Resume the process by clicking the run button again. It will resume where it was paused and you will finish the process with the attributes selected in the 2nd Select Attributes operator.
Regards,
Marco2 -
Hi Marco. This has fixed the issue for me. Thank you so much for your help on this one. Best regards, Olli
0