🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

read data from html tables on web pages

User: "Flixport"
New Altair Community Member
Updated by Jocelyn
Hey all,

has the operator HTML Reader been deleted from the new version or why can I not find it? 
Would be nice if someone answers me, thanks.

Find more posts tagged with

Sort by:
1 - 6 of 61
    User: "varunm1"
    New Altair Community Member
    Accepted Answer
    Hello @Flixport

    Read HTML table is in the "Web Table Extraction" extension. You need to download the extension from rapidminer market place. 

    Thanks
    User: "Flixport"
    New Altair Community Member
    OP
    Hello @varunm1

    As I understand, the Web Table Extraction extracts data from an HTML table. But The data we are interested in is often not tabulated. Is there a solution for this?

    thanks

    User: "varunm1"
    New Altair Community Member
    Hi @Flixport

    Not sure about this. @Telcontar120 or @mschmitz can suggest on this

    Thanks
    User: "Telcontar120"
    New Altair Community Member
    There are definitely ways to get data from web pages into RapidMiner but it is not necessarily simple or straightforward depending on the page structure (that's why there's a whole expert training class just on web mining!).  It's also complicated by the fact that some of the web mining operators have not been updated in some time and so there are some "quirks" you need to be aware of.  But if you are interested in this topic you should download the free web mining extension from the marketplace and take a look at the Get Page operator to start.  This will allow you to pull in any html page and then you can try to extract the information you need with some of the other text mining operators (from the underlying html).
    User: "sgenzer"
    Altair Employee
    yes so just to be clear there are actually two extensions we're talking about here: the Web Mining extension and the Web Table Extraction extension.

    The Web Mining extension is a rather dated one and the advice from @Telcontar120 should help you there.

    The Web Table Extraction extension was developed out of RapidMiner Research in Dortmund; my colleague @ey wrote the extension and an accompanying Knowledge Base article about a year ago that may help.

    Scott
    User: "Flixport"
    New Altair Community Member
    OP
    Updated by Flixport
    Hey all,

    thank for the answers. I think you can also as a solution to convert the HTML document into an XML document or is that not possible?

    thanks