basic xpath problem

Kintaro
New Altair Community Member
Hello,
I'm trying to extract data with xpath from an html page.
I have:
Create Document => Extract Information
Create Document:
query type: xpath
attribute type: nominal
xpath queries: //title
namespace: n/a
ignore CDATA: true
assume html: true
Result:
attribute name: ?
What am I doing wrong? >:(
I'm trying to extract data with xpath from an html page.
I have:
Create Document => Extract Information
Create Document:
Extract Information configurated with:
<html>
<head>
<title>TITLE</title>
</head>
<body>BODY</body>
</html>
query type: xpath
attribute type: nominal
xpath queries: //title
namespace: n/a
ignore CDATA: true
assume html: true
Result:
attribute name: ?
What am I doing wrong? >:(
Tagged:
0
Answers
-
I'm asking this because if I try the same thing in a online path test it work without any problem... so I don't know why Rapidminer isn't.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.3.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="6.1.000" expanded="true" height="60" name="Create Document" width="90" x="112" y="255">
<parameter key="text" value="<html> <head> <title>TITLE</title> </head> <body>BODY</body> </html>"/>
</operator>
<operator activated="true" class="text:extract_information" compatibility="6.1.000" expanded="true" height="60" name="Extract Information" width="90" x="313" y="30">
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries">
<parameter key="nome" value="<title>.</title>"/>
</list>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="nome" value="//title"/>
</list>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Extract Information" to_port="document"/>
<connect from_op="Extract Information" from_port="document" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Solved
I can't use path like this, I have to use for example:
//h:title/text()
text() to extract only the text from the title tag
and I have to use h: because is html, right?0