Question to get the date out of a document

BadBoy20
New Altair Community Member
So I have pdf files and each of these pdf files (articles) have a date at the top of the page. not at the very top. but around there. The date format is like 19 April 2012. I want to get the first date that shows up and set it as an attribute called "Mydate", is that even possible in rapidminer and how would I go about doing that? thank you.
Tagged:
0
Answers
-
Hi,
you probably need to use Read Document, Process Documents and Keep Document Part and a clever regex. It is hard to say which w/o the document itself.
Cheers,
Martin0 -
This sounds pretty similar to this post from a few days back. Could you rework the process in that?
rapid-i.com/rapidforum/index.php/topic,8874.msg29914.html0