Encoding problem
This is a follow up from this conversation addressing a slightly different problem which had become quite messy.
There were a few problems pinpointed but the one that is causing the most trouble is the encoding problem. So here it is:
I loop through a number of text files (a sample is attached below) and convert them to a dataset which looks fine. However due to the encoding problem when I write and read a CSV file everything gets messed up. I have done some trial and error but cannot find a fix to the problem and this is as upstream as I can possibly go. Here is the XML:
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="concurrency:loop_files" compatibility="7.5.003" expanded="true" height="82" name="Loop Files" width="90" x="112" y="187">
<parameter key="directory" value="C:\Users\alan.jeffares\Desktop\data2"/>
<process expanded="true">
<operator activated="true" class="text:read_document" compatibility="7.5.000" expanded="true" height="68" name="Read Document" width="90" x="313" y="34">
<parameter key="encoding" value="UTF-8"/>
</operator>
<connect from_port="file object" to_op="Read Document" to_port="file"/>
<connect from_op="Read Document" from_port="output" to_port="output 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:documents_to_data" compatibility="7.5.000" expanded="true" height="82" name="Documents to Data" width="90" x="246" y="187">
<parameter key="text_attribute" value="textspecial"/>
</operator>
<operator activated="true" class="write_csv" compatibility="7.5.003" expanded="true" height="82" name="Write CSV" width="90" x="447" y="187">
<parameter key="csv_file" value="C:\Users\alan.jeffares\Documents\test3.csv"/>
</operator>
<operator activated="true" class="read_csv" compatibility="7.5.003" expanded="true" height="68" name="Read CSV" width="90" x="581" y="187">
<parameter key="csv_file" value="C:\Users\alan.jeffares\Documents\test3.csv"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<connect from_op="Loop Files" from_port="output 1" to_op="Documents to Data" to_port="documents 1"/>
<connect from_op="Documents to Data" from_port="example set" to_op="Write CSV" to_port="input"/>
<connect from_op="Read CSV" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>