[SOLVED] Help with xml, xpath, namespaces.

New Altair Community Member

May 6, 2012

Updated Nov 5, 2024 by Jocelyn

Below is sample XML from GoogleCSE API:

<?xml version="1.0" encoding="UTF-8"?>
<feed gd:kind="customsearch#search" xmlns="http://www.w3.org/2005/Atom" xmlns:cse="http://schemas.google.com/cseapi/2010" xmlns:gd="http://schemas.google.com/g/2005" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<title>Google Custom Search - Albertus Magnus College. library Albertus Magnus College Library intitle:newsletter albertus.edu</title>
<id>tag:www.googleapis.com,2010-09-29:/customsearch/v1?q= Albertus Magnus College. library Albertus Magnus College Library intitle:newsletter albertus.edu&cx=008033228147187897025:-ua_scxr1uc&num=7&start=1&safe=off</id>
<author>
<name>Library Website Search Engine - Google Custom Search</name>
</author>
<updated>1970-01-16T11:10:30.455Z</updated>
<opensearch:Url type="application/atom+xml" template="https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={cse:safe?}&cx={cse:cx?}&cref={cse:cref?}&sort={cse:sort?}&filter={cse:filter?}&gl={cse:gl?}&cr={cse:cr?}}&googlehost={cse:googleHost?}&c2coff={?cse:disableCnTwTranslation}&hq={cse:hq?}&hl={cse:hl?}&siteSearch={cse:siteSearch?}&siteSearchFilter={cse:siteSearchFilter?}&exactTerms={cse:exactTerms?}&excludeTerms={cse:excludeTerms?}&linkSite={cse:linkSite?}&orTerms={cse:orTerms?}&relatedSite={cse:relatedSite?}&dateRestrict={cse:dateRestrict?}&lowRange={cse:lowRange?}&highRange={cse:highRange?}&searchType={cse:searchType?}&fileType={cse:fileType?}&rights={cse:rights?}&imgsz={cse:imgsz?}&imgtype={cse:imgtype?}&imgc={cse:imgc?}&imgcolor={cse:imgcolor?}&alt=atom"/>
<opensearch:Query role="request" title="Google Custom Search - Albertus Magnus College. library Albertus Magnus College Library intitle:newsletter albertus.edu" totalResults="7" searchTerms=" Albertus Magnus College. library Albertus Magnus College Library intitle:newsletter albertus.edu" count="7" startIndex="1" inputEncoding="utf8" outputEncoding="utf8" cse:safe="off" cse:cx="008033228147187897025:-ua_scxr1uc"/>
<opensearch:totalResults>7</opensearch:totalResults>
<opensearch:startIndex>1</opensearch:startIndex>
<cse:context title="Library Website Search Engine"/>
<cse:searchInformation>
<cse:searchTime>0.073074</cse:searchTime>
<cse:formattedSearchTime>0.07</cse:formattedSearchTime>
<cse:totalResults>7</cse:totalResults>
<cse:formattedTotalResults>7</cse:formattedTotalResults>
</cse:searchInformation>
<cse:spelling>
<cse:correctedQuery type="html"/>
</cse:spelling>
<entry gd:kind="customsearch#result">
<id>http://www.albertus.edu/policy-reports/advancement-publications/documents/albertus-archive-october-2011-special-edition.pdf</id>
<updated>1970-01-16T11:10:30.455Z</updated>
<title type="html">Special Edition Athletics @lbertus Newsletter</title>
<link href="http://www.albertus.edu/policy-reports/advancement-publications/documents/albertus-archive-october-2011-special-edition.pdf" title="www.albertus.edu"/>
<summary type="html">This weekend marks a busy and historic time on campus for the Albertus. Magnus College Athletics Department as both the men&#39;s and women&#39;s soccer ...</summary>
<cse:cacheId>AJGUZgC9CVMJ</cse:cacheId>
<cse:mime>application/pdf</cse:mime>
<cse:fileFormat>PDF/Adobe Acrobat</cse:fileFormat>
<cse:formattedUrl type="html">www.albertus.edu/.../albertus-archive-october-2011-special-edition.pdf</cse:formattedUrl>
<cse:PageMap>
<cse:DataObject type="metatags">
<cse:Attribute name="creationdate" value="D:20111118135759-05'00'"/>
<cse:Attribute name="producer" value="Acrobat Web Capture 8.0"/>
<cse:Attribute name="moddate" value="D:20111118140743-05'00'"/>
<cse:Attribute name="title" value="Special Edition Athletics @lbertus Newsletter"/>
</cse:DataObject>
</cse:PageMap>
</entry>
...

</feed>

I'm using Generate Extract operator. I've specified the namespaces as:
<list key="namespaces">
<parameter key="x" value="http://www.kbcafe.com/rss/atom.xsd.xml"/>
<parameter key="xmlns:cse" value="http://schemas.google.com/cseapi/2010"/>
<parameter key="xmlns:gd" value="http://schemas.google.com/g/2005"/>
<parameter key="xmlns:opensearch" value="http://a9.com/-/spec/opensearch/1.1/"/>
<parameter key="xx" value="xml"/>
</list>

I've tried to extract xpath such as
//x:feed
//feed
and more specific - can't seem to match anyhting in ths feed. I'm sure the problem is in my namespaces, but I don't know where to go to find the answer.

The targets I want to extract are
//x:feed/x:entry/x:title
and //x:feed/x:entry/x:link/@href.

Any help would be appreciated.

Find more posts tagged with

AI Studio

🎉Community Raffle - Win $25

[SOLVED] Help with xml, xpath, namespaces.

Find more posts tagged with

Quick Links