Problem with Get Pages- Operator in combination with read csv (duplicate attribute)

informatist
informatist New Altair Community Member
edited November 2024 in Community Q&A

I just started to use rapidminer and I have a problem with the operator "get pages". When I start my process, the tool says "Process failed. Duplicate attribute name: URL". I'm starting with a csv-file which has names in the first column. in the second column, which is called "URL" and classified as "file path" attribute in the first operator (read csv), there are links, which i want to open with the operator "get pages". in the "get pages" operator, I selected URL as link attribute. I hope you can help me... the whole error message is "Exception: java.lang.IllegalArgumentException



Message: Duplicate attribute name: URL Stack trace: com.rapidminer.example.SimpleAttributes.register(SimpleAttributes.java:124) com.rapidminer.example.SimpleAttributes.add(SimpleAttributes.java:203) com.rapidminer.example.AbstractAttributes.addRegular(AbstractAttributes.java:94) com.rapidminer.operator.web.features.construction.RetrievePagesOperator.doWork(RetrievePagesOperator.java:124) com.rapidminer.operator.Operator.execute(Operator.java:1002) com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:76) com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:811) com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:806) java.security.AccessController.doPrivileged(Native Method) com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:806) com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:392) com.rapidminer.operator.Operator.execute(Operator.java:1002) com.rapidminer.Process.run(Process.java:1195) com.rapidminer.Process.run(Process.java:1091) com.rapidminer.Process.run(Process.java:1044) com.rapidminer.Process.run(Process.java:1039) com.rapidminer.Process.run(Process.java:1029) com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)

 

My XML-Process:

<?xml version="1.0" encoding="UTF-8"?><process version="7.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.2.003" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
<parameter key="csv_file" value="/Users/test/agrar Kopie.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="UTF-8"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Link.true.polynominal.attribute"/>
<parameter key="1" value="URL.true.file_path.base_value"/>
</list>
<parameter key="datamanagement" value="float_array"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="7.2.003" expanded="true" height="82" name="Filter Example Range" width="90" x="246" y="34">
<parameter key="first_example" value="-2"/>
<parameter key="last_example" value="-1"/>
</operator>
<operator activated="true" class="web:retrieve_webpages" compatibility="7.2.001" expanded="true" height="68" name="Get Pages" width="90" x="514" y="34">
<parameter key="link_attribute" value="URL"/>
<parameter key="random_user_agent" value="true"/>
<parameter key="accept_cookies" value="all"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_op="Get Pages" to_port="Example Set"/>
<connect from_op="Get Pages" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

 

I hope someone can help me with this problem.

Thank you in advance!

Tagged:

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member

    What happens if you change the name of the second column in the raw csv to something else, like "link"?  Does the error still occur?