🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

AMOUNT OF EXAMPLES DOES NOT CORRELATES WITH INPUT DATA LOADED FROM PDFs

User: "antonio_heredia"
New Altair Community Member
Updated by Jocelyn
on="1.0" encoding="UTF-8"?><process version="8.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
<parameter key="logverbosity" value="all"/>
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Files" width="90" x="45" y="136">
<list key="text_directories">
<parameter key="Forging vs AM" value="C:\Users\xwb15193\Desktop\L.R AM vs F\ScienceDirect\ScienceDirect_articles_04Jul2018_11-57-34.507"/>
</list>
<parameter key="add_meta_information" value="false"/>
<parameter key="keep_text" value="true"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34">
<parameter key="mode" value="linguistic tokens"/>
</operator>
<operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="246" y="34"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
<description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="77" y="85">Type your comment</description>
<description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="99" y="169">Type your comment</description>
</process>
</operator>
<operator activated="false" class="filter_examples" compatibility="8.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="246" y="289">
<list key="filters_list">
<parameter key="filters_entry_key" value="label.contains.and"/>
</list>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

I tried to tokenize pdf articles, resulting in only 21 examples. Why does it happen? It should outcome many more. To do so, I used: "Process data from files" and inside I included "Tokenize" and "filter stopwords", Which again works but not throughout all the documents. What should I do to fix it?

 

Cheers,

 

Antonio

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "lionelderkrikor"
    New Altair Community Member

    Hi @antonio_heredia,

     

    Do you have a lot of files ?

    Can your share these files in order we can reproduce what you observe ?

     

    Regards,

     

    Lionel

     

    NB : The first line of your XML process is broken, however I was able to repair it.