Website-Content into one cell

ds139
ds139 New Altair Community Member
edited November 2024 in Community Q&A

Hello everyone,

I want to use textmining methods on the lyrics of a website.

What I have now is:

                                                                               

 Artist  Song  Lyrics
 The Killers   Mr. Brightside   http://lyrics.html 

 

What I do want is:                                                                                 

 Artist  Song  Lyrics
 The Killers   Mr. Brightside    Coming out of my cage and I'm doing just fine... 

 

You know what I mean?  The Lyrics are written within a <p></p> and I want the whole string into one single cell - 

I do know, that I need "Retrieve", "Get Pages" and "Process Documents to Data" (inside: "Extract Content", and the I don't know any further,...)

 

Which Operator manages it, that the content within the <p> is put into one cell

I hope someone can help me, because I need the Lyrics for further processings

Thank you

 

Tagged:

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member

    I think you want "Cut Document" rather than (or in addition to) "Extract Content" in this case.  After you have retrieved the pages using "Get Pages" and then created your text documents using "Data to Documents" you can use Cut Document and then specify the region of the html that you want to extract using either Xpath (if the lyrics are in a named element) or some kind of regex query.