I'm reading a series of CSV data files, comma-separated and using quotes. Within a data line, if quotes are used in a field, it indicates that using double-quotes. i.e., It essentially uses quotes as the escape character for quotes. An example line could be:
"News Alert","Mon, 13 May 2019 08:29:58","""NEWS OFFICE"" <newsoffice@spamdude.com>"
which it SHOULD interpret as 3 fields as follows:
(1) News Alert (2) Mon, 13 May 2019 08:29:58 (3) "NEWS OFFICE" <newsoffice@spamdude.com>
I'm using the Read CSV operator, with "use quotes" checked and using quotes as both the quotes character and escape character. The result is that it not only doesn't read the line correctly, it completely skips reading any line that has the double-quotes in it. My operator XML is as follows:
<operator activated="true" class="read_csv" compatibility="9.0.003" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
<parameter key="column_separators" value=","/>
<parameter key="escape_character" value="""/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="false"/>
</operator>
Is there a way to do this so it reads and interprets my example line properly, or do I have to preprocess all my data files with a Python script or something similar to replace the double-quotes with some other escape character (like the default backslash), before ingesting to RapidMiner? Thanks for the help!