Text Mining - Industry 4

Question

Hey,

I want to extract all the texts from this page: http://www.plattform-i40.de/I40/Navigation/Karte/SiteGlobals/Forms/Formulare/EN/map-use-cases-formular.html and create a table with different factors extracted from these texts, each line is a case, each column is a data extracted from the text. I think i'll use 6 column: Value Creation, Product Examples, Region....

Then I want to link those datas to know which one fits most for an external given case. For instance: Given Case X fits at 80% with company of line 35, 60% with company of line 118, etc...

Do you know how I can do all of that?

It's for my Master Thesis.

Thanks a lot,

Charles

charlesmrt · Answer

Thanks,

I found an other way to do it, by downloading html page on my computer thanks to "Download them all", then I used a text processing and Extract Information with Regular Expression. I obtained a Table in which I got all the informations.

But i still have a question, in Regular expression, i can extract only one expression per column of my table, the query expression is unique, but sometimes i got many solutions for one attribute name. How can I do to have multiple solutions in one column, I used "|" but it makes a disjonction of element not an accumulation.

Thanks,

Charles

Capture3.JPG

sgenzer · Answer

oh that seems very complicated. I would use RegEx. Scott

charlesmrt · Answer

Hey,

Thanks for answering, in the file attached, you can see the HTML, I just want to extract "software solution", I tried to use "//*[contains(.,'Product example')]/../span[last()]" or "//*[contains(.,'Product example')]/../span[1]" but it doesn't work.. How could I do?

The link: http://www.plattform-i40.de/I40/Redaktion/EN/Use-Cases/150-smart-engineering-and-production-4-0-en/article-smart-engineering-and-production-4-0-en.html

Thanks,

Charles

Path.JPG