"Possible Bug: Missing Results"
tanto
New Altair Community Member
I'm a bit new to RapidMiner, so I don't want to file an official bug report until I get some community feedback. Using the Text extension, I've been using the Process Data from Files operator with success. However, when I combine it with the Similarity from Data operator, the results perspective stops working. The log still reports that everything went fine, but nothing new appears.
This issue continues even after I remove the similarity operator. The only way to restore normal functioning is to close RapidMiner and delete the perspective XML files.
Am I doing something wrong, or is this a bug?
This issue continues even after I remove the similarity operator. The only way to restore normal functioning is to close RapidMiner and delete the perspective XML files.
Am I doing something wrong, or is this a bug?
Tagged:
0
Answers
-
Hi,
can you provide a process (and if it depends on the data, that as well) so we can reproduce it?
General rule of thumb is if you need to delete some file afterwards to get everything working again there is something which is not working as intended
Regards,
Marco0 -
Hi,
just jumping in: another thing which came to my mind was a closed result history:
http://rapid-i.com/rapidforum/index.php/topic,3598.msg13402.html
Maybe it's simply this...
Cheers,
Ingo0 -
Here's the process that's giving me trouble.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
It appears to be the design perspective XML file that needs deleted. Here it is before (working):
<process version="5.1.017">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
<process expanded="true" height="206" width="279">
<operator activated="true" class="text:process_document_from_file" compatibility="5.1.004" expanded="true" height="76" name="Process Documents from Files" width="90" x="112" y="75">
<list key="text_directories">
<parameter key="bills" value="D:\Bills"/>
</list>
<parameter key="file_pattern" value="112~h1*"/>
<parameter key="add_meta_information" value="false"/>
<parameter key="keep_text" value="true"/>
<parameter key="prune_method" value="absolute"/>
<parameter key="prune_below_absolute" value="2"/>
<parameter key="prune_above_absolute" value="9999"/>
<process expanded="true" height="596" width="970">
<operator activated="true" class="text:tokenize" compatibility="5.1.004" expanded="true" height="60" name="Tokenize" width="90" x="112" y="120"/>
<operator activated="true" class="text:transform_cases" compatibility="5.1.004" expanded="true" height="60" name="Transform Cases" width="90" x="541" y="75"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="36"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="data_to_similarity" compatibility="5.1.017" expanded="true" height="76" name="Data to Similarity" width="90" x="263" y="212">
<parameter key="measure_types" value="NumericalMeasures"/>
<parameter key="numerical_measure" value="DiceSimilarity"/>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Data to Similarity" to_port="example set"/>
<connect from_op="Data to Similarity" from_port="similarity" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process><?xml version="1.0"?>
Here it is after (not working):
<VLDocking version="2.1">
<DockingDesktop name="default">
<DockingPanel>
<Split orientation="1" location="0.7996044825313118">
<Split orientation="1" location="0.24979321753515302">
<Split orientation="0" location="0.19928400954653938">
<Dockable>
<Key dockName="overview"/>
</Dockable>
<TabbedDockable>
<Dockable>
<Key dockName="new_operator"/>
</Dockable>
<Dockable>
<Key dockName="repository_browser"/>
</Dockable>
</TabbedDockable>
</Split>
<Split orientation="0" location="0.7995226730310262">
<TabbedDockable>
<Dockable>
<Key dockName="process_panel"/>
</Dockable>
<Dockable>
<Key dockName="xml_editor"/>
</Dockable>
</TabbedDockable>
<TabbedDockable>
<Dockable>
<Key dockName="error_table"/>
</Dockable>
<Dockable>
<Key dockName="log_viewer"/>
</Dockable>
</TabbedDockable>
</Split>
</Split>
<Split orientation="0" location="0.6599045346062052">
<Dockable>
<Key dockName="property_editor"/>
</Dockable>
<TabbedDockable>
<Dockable>
<Key dockName="operator_help"/>
</Dockable>
<Dockable>
<Key dockName="comment_editor"/>
</Dockable>
</TabbedDockable>
</Split>
</Split>
</DockingPanel>
<TabGroups>
<TabGroup>
<Dockable>
<Key dockName="new_operator"/>
</Dockable>
<Dockable>
<Key dockName="repository_browser"/>
</Dockable>
<Dockable>
<Key dockName="repository_browser"/>
</Dockable>
</TabGroup>
<TabGroup>
<Dockable>
<Key dockName="operator_help"/>
</Dockable>
<Dockable>
<Key dockName="comment_editor"/>
</Dockable>
<Dockable>
<Key dockName="comment_editor"/>
</Dockable>
</TabGroup>
<TabGroup>
<Dockable>
<Key dockName="error_table"/>
</Dockable>
<Dockable>
<Key dockName="log_viewer"/>
</Dockable>
<Dockable>
<Key dockName="log_viewer"/>
</Dockable>
</TabGroup>
<TabGroup>
<Dockable>
<Key dockName="process_panel"/>
</Dockable>
<Dockable>
<Key dockName="xml_editor"/>
</Dockable>
<Dockable>
<Key dockName="xml_editor"/>
</Dockable>
</TabGroup>
</TabGroups>
</DockingDesktop>
</VLDocking><?xml version="1.0"?>
<VLDocking version="2.1">
<DockingDesktop name="default">
<DockingPanel>
<Split orientation="1" location="0.7996044825313118">
<Split orientation="1" location="0.24979321753515302">
<Split orientation="0" location="0.19928400954653938">
<Dockable>
<Key dockName="overview"/>
</Dockable>
<TabbedDockable>
<Dockable>
<Key dockName="new_operator"/>
</Dockable>
<Dockable>
<Key dockName="repository_browser"/>
</Dockable>
</TabbedDockable>
</Split>
<Split orientation="0" location="0.7995226730310262">
<TabbedDockable>
<Dockable>
<Key dockName="process_panel"/>
</Dockable>
<Dockable>
<Key dockName="xml_editor"/>
</Dockable>
</TabbedDockable>
<TabbedDockable>
<Dockable>
<Key dockName="error_table"/>
</Dockable>
<Dockable>
<Key dockName="log_viewer"/>
</Dockable>
</TabbedDockable>
</Split>
</Split>
<Split orientation="0" location="0.6599045346062052">
<Dockable>
<Key dockName="property_editor"/>
</Dockable>
<TabbedDockable>
<Dockable>
<Key dockName="operator_help"/>
</Dockable>
<Dockable>
<Key dockName="comment_editor"/>
</Dockable>
</TabbedDockable>
</Split>
</Split>
</DockingPanel>
<TabGroups>
<TabGroup>
<Dockable>
<Key dockName="new_operator"/>
</Dockable>
<Dockable>
<Key dockName="repository_browser"/>
</Dockable>
<Dockable>
<Key dockName="repository_browser"/>
</Dockable>
<Dockable>
<Key dockName="repository_browser"/>
</Dockable>
</TabGroup>
<TabGroup>
<Dockable>
<Key dockName="operator_help"/>
</Dockable>
<Dockable>
<Key dockName="comment_editor"/>
</Dockable>
<Dockable>
<Key dockName="comment_editor"/>
</Dockable>
<Dockable>
<Key dockName="comment_editor"/>
</Dockable>
</TabGroup>
<TabGroup>
<Dockable>
<Key dockName="error_table"/>
</Dockable>
<Dockable>
<Key dockName="log_viewer"/>
</Dockable>
<Dockable>
<Key dockName="log_viewer"/>
</Dockable>
<Dockable>
<Key dockName="log_viewer"/>
</Dockable>
</TabGroup>
<TabGroup>
<Dockable>
<Key dockName="process_panel"/>
</Dockable>
<Dockable>
<Key dockName="xml_editor"/>
</Dockable>
<Dockable>
<Key dockName="xml_editor"/>
</Dockable>
<Dockable>
<Key dockName="xml_editor"/>
</Dockable>
</TabGroup>
</TabGroups>
</DockingDesktop>
</VLDocking>0 -
Here's a link to a tarball of the input text that I've been using for testing.
http://www.mediafire.com/?6t86rwieaw5b12d
Also, this problem was replicable on another computer (Amazon EC2 instance).0 -
Hi tanto,
Thanks for the process and data, this really helps to find the problem.
I can - at least partially - replicate your problem. But the reason is not a broken result display due to the similarity operator but simply a too long runtime for creating the display for the similarity. After about 40 minutes on my computer, the tab for the similarity object has finally been created and it took another 50 minutes until the message "Please standby while the display is created..." vanished and the results finally have been there.
You can easily try this yourself:- Use your process and text data but change the parameter "prune_below_absolute" to 200 and "prune_above_absolute" to 250: it will take about 10 seconds until the tab is created and another 10 seconds until the display creation has finished. The number of created terms is about 100.
- Now change the parameter "prune_above_absolute" to 500: it will now take about 25 seconds until the tab is created and another 40 seconds until the display creation has finished. The number of created terms with these pruning settings is about 250.
- You can repeat this by slightly increasing the setting - check the number of created terms and the increase in time. With your pruning settings, you ended up with more than 13000 terms which cause the long display creation times mentioned above...
Interesting observation: the number of examples (about 1000) was a smaller problem than the number of attributes. I did actually not have expected this since the number of attributes should contribute only linearly to the necessary runtime for most of the similarity / distance measures. I will think about that and discuss this with the others.
So this is indeed not really a bug but maybe a chance for an performance improvement for the creation of the similarity viewer (if you like you can still file a report in our bugtracker at http://bugs.rapid-i.com as a feature request and add a link to this conversation here). For now, you have several options like using a stronger pruning / filtering / stemming and other approaches which help to reduce the number of features. If you do not want to look at the similarities themself but simply use them for the rest of the process, I would recommend to filter down the number of attributes during process design like in the small test above and remove the filter afterwards after the full process has been designed.
Cheers,
Ingo0 -
Thank you very much! That's solved a lot of my confusion and headaches.
On a related note, is there a maximum limit to the size of an ExampleSet? Using a larger data input via the Similarity Data operator, I'm getting negative 637040551 examples in the result set.0