read data from html tables on web pages
Flixport
New Altair Community Member
Best Answer
Answers
-
0
-
There are definitely ways to get data from web pages into RapidMiner but it is not necessarily simple or straightforward depending on the page structure (that's why there's a whole expert training class just on web mining!). It's also complicated by the fact that some of the web mining operators have not been updated in some time and so there are some "quirks" you need to be aware of. But if you are interested in this topic you should download the free web mining extension from the marketplace and take a look at the Get Page operator to start. This will allow you to pull in any html page and then you can try to extract the information you need with some of the other text mining operators (from the underlying html).1
-
yes so just to be clear there are actually two extensions we're talking about here: the Web Mining extension and the Web Table Extraction extension.
The Web Mining extension is a rather dated one and the advice from @Telcontar120 should help you there.
The Web Table Extraction extension was developed out of RapidMiner Research in Dortmund; my colleague @ey wrote the extension and an accompanying Knowledge Base article about a year ago that may help.
Scott1 -
Hey all,thank for the answers. I think you can also as a solution to convert the HTML document into an XML document or is that not possible?thanks
0