How to loop through pictures for text recognition

tngo
tngo New Altair Community Member
edited November 5 in Community Q&A
Hi everyone,

I am new to Rapidminer and I would appreciate if any help you can provide. I have a database with a field of URLs. All the URLs are pictures. I need to find a process that without clicking manually on URLs, I still can extract text from the URL images for every row in my dataset. My dataset has hundreds of thousands of rows. 

Answers

  • kayman
    kayman New Altair Community Member
    As rapidminer has no out of the box 'img to text' operators you will need to use the python extension here.

    One possible workflow would be to use RM to loop all of your db records -> webmining extension to download the image and store it locally -> python using for instance opencv to read the image -> pytesseract to do the OCR to get the text -> return text to Rapidminer and continue with next image.


  • rdesai
    rdesai New Altair Community Member
    In deep learning extension with our new functionality, you can easily do by using "extract text from image" as this operator uses  the Tesseract OCR library. In case you have multiple image then you can loop over images by adding another operator referred as "Read Image Meta-Data" inside the process. 
  • tngo
    tngo New Altair Community Member
    @kayman
    Hi Kayman, thank you for your help! Can you be more specific about how to download the images? I used the operator Get pages and I don't see any options to download the images from URLs
  • tngo
    tngo New Altair Community Member
    @rdesai, Thank you so much! I tried your process and it worked. However, I either need to be able to automatically download all images from the URLs in the database to my own folder, or I need an alternative way to run this without needing to download images to a folder. Do you have any thoughts? 
  • kayman
    kayman New Altair Community Member
    You could use the [open file] operator, which allows you to select a file based on a url. if you combine this with the [write file] operator you can save it on your disk. You will probably need to do some tweaking with macros to define filename and folder but in essence this should work fine.
  • kayman
    kayman New Altair Community Member
    @rdesai, oh wow, didn't know that one yet