X-Distance of two identical vectors, and distance is not 0

tiramisusann
tiramisusann New Altair Community Member
edited November 5 in Altair RapidMiner
Hi,

I'm using two identical texts (which are processed the same way) to calculate the distance between them. The vectors are supposed to be absolutely identical, but the X-distance-operator (Numerical measures --> cosine similarity) does not calculate the distance of 0, but of 0,045.

Why? Do you have an idea?

All the best!
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Repository_BI." width="90" x="45" y="30">
       <parameter key="repository_entry" value="//Test/Repository_BI."/>
     </operator>
     <operator activated="true" class="replace_missing_values" compatibility="5.3.015" expanded="true" height="94" name="Replace Missing Values" width="90" x="315" y="30">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="TAKE_TEXT"/>
       <parameter key="default" value="value"/>
       <list key="columns"/>
       <parameter key="replenishment_value" value="No"/>
     </operator>
     <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples (2)" width="90" x="450" y="30">
       <parameter key="condition_class" value="attribute_value_filter"/>
       <parameter key="parameter_string" value="TAKE_TEXT=No"/>
       <parameter key="invert_filter" value="true"/>
     </operator>
     <operator activated="true" class="sample" compatibility="5.3.015" expanded="true" height="76" name="Sample" width="90" x="585" y="30">
       <parameter key="balance_data" value="true"/>
       <list key="sample_size_per_class">
         <parameter key="UP" value="400"/>
         <parameter key="DOWN" value="400"/>
       </list>
       <list key="sample_ratio_per_class"/>
       <list key="sample_probability_per_class"/>
     </operator>
     <operator activated="true" class="select_attributes" compatibility="5.3.015" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="120">
       <parameter key="attribute_filter_type" value="subset"/>
       <parameter key="attributes" value="|TAKE_TEXT|HEADLINE"/>
     </operator>
     <operator activated="true" class="text:create_document" compatibility="5.3.002" expanded="true" height="60" name="Create Document" width="90" x="45" y="300">
       <parameter key="text" value="The recently launched corporate foundation of VWR International, LLC, celebrates&#10;its most charitable quarter yet&#10;PHILADELPHIA--(Business Wire)--&#10;VWR International, LLC announced today that its recently launched corporate&#10;foundation, the VWR Foundation, has granted over $60,000 since late December&#10;2009, making it the most charitable quarter yet for the one-year-old Foundation.&#10;The Foundation, which seeks to support research, science education, health and&#10;well-being initiatives across the globe, fulfilled grants to a diverse group of&#10;organizations from The Scripps Research Institute to Doctors Without Borders. &#10;&#10;&quot;Corporate responsibility has been a longstanding element that differentiates&#10;good organizations from great organizations; for VWR, this inspired our&#10;associates to establish the VWR Foundation,&quot; stated John M. Ballbach, Chairman,&#10;President and CEO of VWR and President of the VWR Foundation. &quot;Because enhancing&#10;the environments in which we work and live remains the Foundation`s paramount&#10;objective, our associates designed the Foundation to support areas of research,&#10;science education, health and well-being. These priorities are consistent with&#10;the synergies generated as a distributor of scientific supplies.&quot; &#10;&#10;Holding true to the Foundation`s mission to support innovative research&#10;initiatives, a contribution was made to The Scripps Research Institute, a&#10;research organization that is internationally recognized for its basic research&#10;in immunology, molecular and cellular biology, chemistry, neurosciences,&#10;autoimmune diseases, cardiovascular diseases, virology and synthetic vaccine&#10;development. &#10;&#10;Two grants were awarded this quarter in the area of Science Education. The first&#10;was awarded to Schmahl Science Workshop, an organization that networks with&#10;teachers and scientists throughout the country to provide hands-on science&#10;activities for kids in a free-form environment. The second grant was awarded to&#10;the Science Museum of Minnesota, a large regional science museum located in&#10;downtown St. Paul that provides science education to an audience of more than&#10;one million students and science enthusiasts per year. &#10;&#10;The VWR Foundation also made health and well-being a primary focus of its&#10;giving. In the wake of the devastating earthquake in Port-au-Prince earlier this&#10;year, the Foundation made a contribution to Doctors Without Borders to support&#10;the volunteer doctors and nurses providing urgent medical care to Haitian&#10;victims. In addition, the Foundation donated to Professionals Analyzing Pap&#10;Smears, Inc., a healthcare team composed of volunteer physicians, nurse&#10;practitioners, nurses and cyto-technologists that establish cervical cancer&#10;screening clinics in developing countries. &#10;&#10;Most notably, VWR International, LLC and the VWR Foundation joined together to&#10;host a silent auction at the company`s North American Sales Meeting earlier this&#10;year. All proceeds from this event were donated to the Center for Cancer and&#10;Blood Disorders at the Children`s Medical Center Dallas.&#10;&#10;About VWR Foundation&#10;&#10;The VWR Foundation was started by five associates of VWR International, LLC who&#10;wanted to make a difference in the areas in which they worked and lived. The&#10;Foundation was officially established in January 2009 and focuses on research,&#10;health and well-being and science education. For more information about the VWR&#10;Foundation, visit www.VWRfoundation.org. &#10;&#10;About VWR International, LLC&#10;&#10;VWR International, LLC, headquartered in West Chester, Pennsylvania, is a global&#10;laboratory supply and distribution company with worldwide sales in excess of&#10;$3.5 billion in 2009. VWR enables the advancement of the world`s most critical&#10;research through the distribution of a highly diversified product line to most&#10;of the world`s top pharmaceutical and biotech companies, as well as industrial,&#10;educational, and governmental organizations. With 150 years of industry&#10;experience, VWR offers a well-established distribution network that reaches&#10;thousands of specialized labs and facilities spanning the globe. VWR has over&#10;6,500 associates around the world working to streamline the way researchers&#10;across North America, Europe, and Asia stock and maintain their labs. In&#10;addition, VWR further supports its customers by providing onsite services,&#10;storeroom management, product procurement, supply chain systems integration, and&#10;technical services. &#10;&#10;For more information on VWR International, phone 1-800-932-5000, visit&#10;www.vwr.com, or write, VWR International, LLC, 1310 Goshen Parkway, P.O. Box&#10;2656, West Chester, PA 19380-0906. &#10;&#10;VWR and design are registered trademarks of VWR International, LLC. &#10; &#10;VWR International, LLC&#10;Valerie Collado, 610-429-2796&#10;valerie_collado@vwr.com&#10;or&#10;Brownstein Group&#10;Laura Van De Pette, 267-238-4118&#10;lvandepette@brownsteingroup.com&#10;&#10;&#10;&#10;Copyright Business Wire 2010 &#10; &#10;"/>
     </operator>
     <operator activated="true" class="select_attributes" compatibility="5.3.015" expanded="true" height="76" name="Select Attributes (2)" width="90" x="246" y="120">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="TREND"/>
       <parameter key="attributes" value="|TAKE_TEXT|HEADLINE"/>
       <parameter key="invert_selection" value="true"/>
       <parameter key="include_special_attributes" value="true"/>
     </operator>
     <operator activated="true" class="text:process_document_from_data" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Data" width="90" x="380" y="120">
       <parameter key="keep_text" value="true"/>
       <parameter key="prune_method" value="absolute"/>
       <parameter key="prune_below_absolute" value="3"/>
       <parameter key="prune_above_absolute" value="9999"/>
       <parameter key="select_attributes_and_weights" value="true"/>
       <list key="specify_weights">
         <parameter key="TAKE_TEXT" value="1.0"/>
         <parameter key="HEADLINE" value="1.0"/>
       </list>
       <parameter key="parallelize_vector_creation" value="true"/>
       <process expanded="true">
         <operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
         <operator activated="true" class="text:transform_cases" compatibility="5.3.002" expanded="true" height="60" name="Transform Cases" width="90" x="180" y="30"/>
         <operator activated="true" class="text:filter_stopwords_english" compatibility="5.3.002" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="315" y="30"/>
         <operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (Porter)" width="90" x="450" y="30"/>
         <operator activated="true" class="text:generate_n_grams_terms" compatibility="5.3.002" expanded="true" height="60" name="Generate n-Grams (Terms)" width="90" x="606" y="30">
           <parameter key="max_length" value="3"/>
         </operator>
         <connect from_port="document" to_op="Tokenize" to_port="document"/>
         <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
         <connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
         <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
         <connect from_op="Stem (Porter)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
         <connect from_op="Generate n-Grams (Terms)" from_port="document" to_port="document 1"/>
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="0"/>
         <portSpacing port="sink_document 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="k_means" compatibility="5.3.015" expanded="true" height="76" name="Clustering (2)" width="90" x="514" y="120">
       <parameter key="k" value="60"/>
       <parameter key="measure_types" value="NumericalMeasures"/>
       <parameter key="max_optimization_steps" value="200"/>
     </operator>
     <operator activated="true" class="text:process_documents" compatibility="5.3.002" expanded="true" height="94" name="Process Documents" width="90" x="246" y="255">
       <parameter key="keep_text" value="true"/>
       <process expanded="true">
         <operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (2)" width="90" x="313" y="210"/>
         <operator activated="true" class="text:transform_cases" compatibility="5.3.002" expanded="true" height="60" name="Transform Cases (2)" width="90" x="314" y="120"/>
         <operator activated="true" class="text:filter_stopwords_english" compatibility="5.3.002" expanded="true" height="60" name="Filter Stopwords (2)" width="90" x="315" y="30"/>
         <operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (2)" width="90" x="450" y="30"/>
         <operator activated="true" class="text:generate_n_grams_terms" compatibility="5.3.002" expanded="true" height="60" name="Generate n-Grams (2)" width="90" x="585" y="30">
           <parameter key="max_length" value="3"/>
         </operator>
         <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
         <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
         <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
         <connect from_op="Filter Stopwords (2)" from_port="document" to_op="Stem (2)" to_port="document"/>
         <connect from_op="Stem (2)" from_port="document" to_op="Generate n-Grams (2)" to_port="document"/>
         <connect from_op="Generate n-Grams (2)" from_port="document" to_port="document 1"/>
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="0"/>
         <portSpacing port="sink_document 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="multiply" compatibility="5.3.015" expanded="true" height="94" name="Multiply" width="90" x="447" y="255"/>
     <operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="648" y="165">
       <list key="application_parameters"/>
     </operator>
     <operator activated="true" class="join" compatibility="5.3.015" expanded="true" height="76" name="Join" width="90" x="179" y="435">
       <parameter key="use_id_attribute_as_key" value="false"/>
       <list key="key_attributes">
         <parameter key="cluster" value="cluster"/>
       </list>
     </operator>
     <operator activated="true" class="select_attributes" compatibility="5.3.015" expanded="true" height="76" name="Select Attributes (3)" width="90" x="313" y="435">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="id"/>
       <parameter key="invert_selection" value="true"/>
       <parameter key="include_special_attributes" value="true"/>
     </operator>
     <operator activated="true" class="select_attributes" compatibility="5.3.015" expanded="true" height="76" name="Select Attributes (4)" width="90" x="447" y="435">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="cluster"/>
       <parameter key="invert_selection" value="true"/>
       <parameter key="include_special_attributes" value="true"/>
     </operator>
     <operator activated="true" class="cross_distances" compatibility="5.3.015" expanded="true" height="94" name="Cross Distances" width="90" x="581" y="435">
       <parameter key="measure_types" value="NumericalMeasures"/>
       <parameter key="nominal_measure" value="DiceSimilarity"/>
       <parameter key="numerical_measure" value="CosineSimilarity"/>
       <parameter key="only_top_k" value="true"/>
       <parameter key="k" value="3"/>
     </operator>
     <connect from_op="Retrieve Repository_BI." from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
     <connect from_op="Replace Missing Values" from_port="example set output" to_op="Filter Examples (2)" to_port="example set input"/>
     <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Sample" to_port="example set input"/>
     <connect from_op="Sample" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
     <connect from_op="Select Attributes" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
     <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
     <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
     <connect from_op="Process Documents from Data" from_port="example set" to_op="Clustering (2)" to_port="example set"/>
     <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents" to_port="word list"/>
     <connect from_op="Clustering (2)" from_port="cluster model" to_op="Apply Model" to_port="model"/>
     <connect from_op="Clustering (2)" from_port="clustered set" to_op="Join" to_port="left"/>
     <connect from_op="Process Documents" from_port="example set" to_op="Multiply" to_port="input"/>
     <connect from_op="Multiply" from_port="output 1" to_op="Cross Distances" to_port="request set"/>
     <connect from_op="Multiply" from_port="output 2" to_op="Apply Model" to_port="unlabelled data"/>
     <connect from_op="Apply Model" from_port="labelled data" to_op="Join" to_port="right"/>
     <connect from_op="Join" from_port="join" to_op="Select Attributes (3)" to_port="example set input"/>
     <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Select Attributes (4)" to_port="example set input"/>
     <connect from_op="Select Attributes (4)" from_port="example set output" to_op="Cross Distances" to_port="reference set"/>
     <connect from_op="Cross Distances" from_port="result set" to_port="result 1"/>
     <connect from_op="Cross Distances" from_port="request set" to_port="result 2"/>
     <connect from_op="Cross Distances" from_port="reference set" to_port="result 3"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
     <portSpacing port="sink_result 4" spacing="0"/>
   </process>
 </operator>
</process>
Tagged: