How Powerful is RapidMiner
tomdom
New Altair Community Member
I have a very important question.
I am working on a project and deal with relationship extraction.. I am looking for a product which supports it and someone said that I might consider RapidMiner.
So I played around with it but I need a special form of relationship extraction.
For example having this information in such a form:
Graduated PhD students
As advisor
Dr. Dustin Lange, Hasso Plattner Institut, University of Potsdam 2013
Dr. Mohammed AbuJarour, Hasso Plattner Institut, University of Potsdam 2011
Now: SAP Innovation Center Potsdam
Dr. Armin Roth, Humboldt Universität zu Berlin 2010
Now: IBM Böblingen
Dr. Falk Brauer, Hasso Plattner Institut, University of Potsdam 2010
Now: SAP Asia
Dr. Jens Bleiholder, Hasso Plattner Institut, University of Potsdam 2010
Now: OPITZ Consulting, Berlin
Prof. Dr. Melanie Herschel (formerly Weis), Humboldt Universität zu Berlin 2007
Now: Asssistant Professor at Université Paris Sud
Dr. Alexander Bilke, Technische Universität Berlin, 2006
Now: msg Systems AG
As reviewer
Daniel Warneke, Technische Universität Berlin, 2011
Thomas Kabisch, Humboldt-Universität zu Berlin, 2011
Katja Hose, Technische Universität Ilmenau, 2009
Paolo Carreira, Universidade de Lisboa, 2008
Andreas Thor, Universität Leipzig, 2008
Chokri ben Necib, Humboldt-Universität zu Berlin, 2007
Cinzia Cappiello, Politecnico di Milano, 2005
the tool must extract me that Dr. Dustin Lange ...til Dr Bilke are PHD students. Can RapidMiner find that relationship or when we look at the html Code of that Website RapidMiner should analyze the HTML Tags and extract me the Information I need.
<div id="c3041" class="csc-default">
<!-- Header: [begin] -->
<div class="csc-header csc-header-n8"><h1>Graduated PhD students</h1></div>
<!-- Header: [end] -->
<!-- [begin] -->
<ul><li><b>As advisor</b> <ul> <li><i>Dr. Dustin Lange</i>, Hasso Plattner Institut, University of Potsdam 2013</li><li><i>Dr. Mohammed AbuJarour</i>, Hasso Plattner Institut, University of Potsdam 2011<br>Now: SAP Innovation Center Potsdam</li> <li><i>Dr. Armin Roth</i>, Humboldt Universität zu Berlin 2010<br>Now: IBM Böblingen</li> <li><i>Dr. Falk Brauer</i>, Hasso Plattner Institut, University of Potsdam 2010<br>Now: SAP Asia</li> <li><i>Dr. Jens Bleiholder</i>, Hasso Plattner Institut, University of Potsdam 2010<br>Now: OPITZ Consulting, Berlin</li> <li><i>Prof. Dr. Melanie Herschel</i> (formerly Weis), Humboldt Universität zu Berlin 2007<br>Now: Asssistant Professor at Université Paris Sud</li> <li><i>Dr. Alexander Bilke</i>, Technische Universität Berlin, 2006<br>Now: msg Systems AG</li> </ul></li><li><b>As reviewer</b><ul> <li>Daniel Warneke, Technische Universität Berlin, 2011</li> <li>Thomas Kabisch, Humboldt-Universität zu Berlin, 2011</li> <li>Katja Hose, Technische Universität Ilmenau, 2009</li> <li>Paolo Carreira, Universidade de Lisboa, 2008</li> <li>Andreas Thor, Universität Leipzig, 2008</li> <li>Chokri ben Necib, Humboldt-Universität zu Berlin, 2007</li> <li>Cinzia Cappiello, Politecnico di Milano, 2005</li></ul></li><li>For a list of graduated Master students, see <a href="naumann/teaching/master_theses/completed_theses.html" title="Öffnet internen Link im aktuellen Fenster" class="internal-link">here</a>.<ul> </ul></li></ul>
<!-- [end] -->
</div>
Is this possible?
Best Thomas
I am working on a project and deal with relationship extraction.. I am looking for a product which supports it and someone said that I might consider RapidMiner.
So I played around with it but I need a special form of relationship extraction.
For example having this information in such a form:
Graduated PhD students
As advisor
Dr. Dustin Lange, Hasso Plattner Institut, University of Potsdam 2013
Dr. Mohammed AbuJarour, Hasso Plattner Institut, University of Potsdam 2011
Now: SAP Innovation Center Potsdam
Dr. Armin Roth, Humboldt Universität zu Berlin 2010
Now: IBM Böblingen
Dr. Falk Brauer, Hasso Plattner Institut, University of Potsdam 2010
Now: SAP Asia
Dr. Jens Bleiholder, Hasso Plattner Institut, University of Potsdam 2010
Now: OPITZ Consulting, Berlin
Prof. Dr. Melanie Herschel (formerly Weis), Humboldt Universität zu Berlin 2007
Now: Asssistant Professor at Université Paris Sud
Dr. Alexander Bilke, Technische Universität Berlin, 2006
Now: msg Systems AG
As reviewer
Daniel Warneke, Technische Universität Berlin, 2011
Thomas Kabisch, Humboldt-Universität zu Berlin, 2011
Katja Hose, Technische Universität Ilmenau, 2009
Paolo Carreira, Universidade de Lisboa, 2008
Andreas Thor, Universität Leipzig, 2008
Chokri ben Necib, Humboldt-Universität zu Berlin, 2007
Cinzia Cappiello, Politecnico di Milano, 2005
the tool must extract me that Dr. Dustin Lange ...til Dr Bilke are PHD students. Can RapidMiner find that relationship or when we look at the html Code of that Website RapidMiner should analyze the HTML Tags and extract me the Information I need.
<div id="c3041" class="csc-default">
<!-- Header: [begin] -->
<div class="csc-header csc-header-n8"><h1>Graduated PhD students</h1></div>
<!-- Header: [end] -->
<!-- [begin] -->
<ul><li><b>As advisor</b> <ul> <li><i>Dr. Dustin Lange</i>, Hasso Plattner Institut, University of Potsdam 2013</li><li><i>Dr. Mohammed AbuJarour</i>, Hasso Plattner Institut, University of Potsdam 2011<br>Now: SAP Innovation Center Potsdam</li> <li><i>Dr. Armin Roth</i>, Humboldt Universität zu Berlin 2010<br>Now: IBM Böblingen</li> <li><i>Dr. Falk Brauer</i>, Hasso Plattner Institut, University of Potsdam 2010<br>Now: SAP Asia</li> <li><i>Dr. Jens Bleiholder</i>, Hasso Plattner Institut, University of Potsdam 2010<br>Now: OPITZ Consulting, Berlin</li> <li><i>Prof. Dr. Melanie Herschel</i> (formerly Weis), Humboldt Universität zu Berlin 2007<br>Now: Asssistant Professor at Université Paris Sud</li> <li><i>Dr. Alexander Bilke</i>, Technische Universität Berlin, 2006<br>Now: msg Systems AG</li> </ul></li><li><b>As reviewer</b><ul> <li>Daniel Warneke, Technische Universität Berlin, 2011</li> <li>Thomas Kabisch, Humboldt-Universität zu Berlin, 2011</li> <li>Katja Hose, Technische Universität Ilmenau, 2009</li> <li>Paolo Carreira, Universidade de Lisboa, 2008</li> <li>Andreas Thor, Universität Leipzig, 2008</li> <li>Chokri ben Necib, Humboldt-Universität zu Berlin, 2007</li> <li>Cinzia Cappiello, Politecnico di Milano, 2005</li></ul></li><li>For a list of graduated Master students, see <a href="naumann/teaching/master_theses/completed_theses.html" title="Öffnet internen Link im aktuellen Fenster" class="internal-link">here</a>.<ul> </ul></li></ul>
<!-- [end] -->
</div>
Is this possible?
Best Thomas
Tagged:
0
Answers
-
Hey Thomas,
Yes this is possible, but you will need extensions for this. Via the marketplace you can install the web and text mining extension which will help you to crawl/fetch websites and process these text documents. This includes extraction of information, but you need to know regular expression and/or XPath to achieve this.
I would suggest that you install the extension take a look on the operators, watch a few videos on youtube and or site and come back to the forum if you have specific questions.
Best
Marcin0