🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How to convert image data to structured data

User: "alsaqer002"
New Altair Community Member
Updated by Jocelyn
Hello all,

I am working on a project on image and text mining and I want to know how to convert the image data to structured data.
I already download the image process extension and I found some useful information in this website


The power of machine learning for image mining and analytics

http://www.simafore.com/blog/the-power-of-machine-learning-for-image-mining-and-analytics?success=true


and


New case study: Image mining and unstructured data science

http://www.simafore.com/blog/new-case-study-image-mining-and-unstructured-data-science?success=true


Please can anyone help me to figure out what is inside the loop file operator. I need to know how did they convert the image data to structured data. I have spent more than 4  months working on my final project but I couldn't finish it because I'm stuck on that point.

Thanks for any help,

Find more posts tagged with

Sort by:
1 - 4 of 41
    User: "sgenzer"
    Altair Employee
    hi...I have not used that image processing extension in a while and I don't think it's compatible with RM 7+ (it no longer appears in the marketplace).  However I would strongly recommend trying IBM Watson Bluemix APIs from within RapidMiner using the "Enrich Data by Webservice" operator to do your GET/POST requests.  There is a "Visual Recognition" API in Watson that is probably very good.  I will warn you that the Watson documentation, however, is not!  I have "Tone Analyzer" and "Language Translation" working in my RM and it is really quite amazing.

    Good luck.

    Scott
    User: "JEdward"
    New Altair Community Member
    @sgenzer it is compatible with RM 7. 
    If you want to play with it you can get it here: http://www.burgsys.com/

    I'll try out the Watson API, for much of my work sending data to cloud services isn't something that can be done, but perhaps it solves the original poster's problem.
    User: "alsaqer002"
    New Altair Community Member
    OP

    Thank you  @sgenzer and @JEdward

    Yes, Image Processing extension is no longer appears in the marketplace, but it is compatible with RM 7.

    Thanks a lot @sgenzer for your suggestions. I am interesting to try them, but I think they work very well with data at web, while I need to work with images from my computer.

    Thanks @JEdward for this useful website.
    Actually, I found the B-Designer extension, which includes all features that I need. But I couldn't get it until they send it to me. So, I contacted them and I am still waiting for their response.

    I am still looking for how can I do OCR on images to get the text.

    Thank you again,
    User: "sgenzer"
    Altair Employee
    I have not had much need for OCR but again I would suggest using the RapidMiner "Enrich Data by Webservice" operator (under the Web Mining extension) to call an external API.  There are very good sources out there - a quick search found that Google has a free OCR API: https://cloud.google.com/vision/

    Here is an example of a Enrich Data by Webservice operator that connects with the Google Maps API.  I have deleted my API key which you would need to replace with your own to see this working.  But you should get the idea.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.1.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.0.000" expanded="true" height="68" name="Google Maps Distance Lookup" width="90" x="313" y="34">
            <parameter key="query_type" value="XPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries"/>
            <list key="regular_region_queries"/>
            <list key="xpath_queries">
              <parameter key="Distance" value="//distance/text/text()"/>
            </list>
            <list key="namespaces"/>
            <parameter key="assume_html" value="false"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries"/>
            <parameter key="service_method" value="fgfgfgf"/>
            <parameter key="body" value="text=&lt;%title%&gt;"/>
            <parameter key="url" value=";"/>
            <parameter key="delay" value="150"/>
            <list key="request_properties">
              <parameter key="key" value="mykey"/>
            </list>
          </operator>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    Scott