How to loop through pictures for text recognition
Hi everyone,
I am new to Rapidminer and I would appreciate if any help you can provide. I have a database with a field of URLs. All the URLs are pictures. I need to find a process that without clicking manually on URLs, I still can extract text from the URL images for every row in my dataset. My dataset has hundreds of thousands of rows.
I am new to Rapidminer and I would appreciate if any help you can provide. I have a database with a field of URLs. All the URLs are pictures. I need to find a process that without clicking manually on URLs, I still can extract text from the URL images for every row in my dataset. My dataset has hundreds of thousands of rows.
Find more posts tagged with
Sort by:
1 - 6 of
61
In deep learning extension with our new functionality, you can easily do by using "extract text from image" as this operator uses the Tesseract OCR library. In case you have multiple image then you can loop over images by adding another operator referred as "Read Image Meta-Data" inside the process. 

@kayman
Hi Kayman, thank you for your help! Can you be more specific about how to download the images? I used the operator Get pages and I don't see any options to download the images from URLs
Hi Kayman, thank you for your help! Can you be more specific about how to download the images? I used the operator Get pages and I don't see any options to download the images from URLs
@rdesai, Thank you so much! I tried your process and it worked. However, I either need to be able to automatically download all images from the URLs in the database to my own folder, or I need an alternative way to run this without needing to download images to a folder. Do you have any thoughts?
@rdesai, oh wow, didn't know that one yet
One possible workflow would be to use RM to loop all of your db records -> webmining extension to download the image and store it locally -> python using for instance opencv to read the image -> pytesseract to do the OCR to get the text -> return text to Rapidminer and continue with next image.