"Extract Information(Dates) from XML - transform that into a date_time attribut"
andk
New Altair Community Member
Hi Guys,
I have another problem with my textmining project. I have a couple of 30 000 xml documents on my harddisk where I want to read out certain information. I use the documents from file operator to load the respective files. Then I have a multiply operator to have two streams:
1) to execute the usual text operations: filter, stopwords, tokenize, etc.
2) I use the extract information operator where I use a xpath adress to read out the segment of the documents where the date and time is written
Finally I combine the two streams with the combine documents operator. Actually the results yield pretty much that what I want. The extracted information (the dates) are now in the metadata of the results or I can see them with out problem in the resulting table. Nevertheless I am not satisfied with the format of the time_stamp column which was previously extracted. Therefore I would like to use the "Nominal to Date" operator to format the extracted information as a date_time meta information. Unfortunately the time_stamp attribute doesn't appear in the attribute list of the "Nominal to Date" operator. What is going wrong here?
I also tried to play around with the documents to data conversion ops and consequentially with the set role op within the process documents operator after the extract information op. but this doesn't do the job.
I would really appreciate if someone could give me a hint. Attributing time stamps to my findings is crucial for my project.
Best regards,
André
I have another problem with my textmining project. I have a couple of 30 000 xml documents on my harddisk where I want to read out certain information. I use the documents from file operator to load the respective files. Then I have a multiply operator to have two streams:
1) to execute the usual text operations: filter, stopwords, tokenize, etc.
2) I use the extract information operator where I use a xpath adress to read out the segment of the documents where the date and time is written
Finally I combine the two streams with the combine documents operator. Actually the results yield pretty much that what I want. The extracted information (the dates) are now in the metadata of the results or I can see them with out problem in the resulting table. Nevertheless I am not satisfied with the format of the time_stamp column which was previously extracted. Therefore I would like to use the "Nominal to Date" operator to format the extracted information as a date_time meta information. Unfortunately the time_stamp attribute doesn't appear in the attribute list of the "Nominal to Date" operator. What is going wrong here?
I also tried to play around with the documents to data conversion ops and consequentially with the set role op within the process documents operator after the extract information op. but this doesn't do the job.
I would really appreciate if someone could give me a hint. Attributing time stamps to my findings is crucial for my project.
Best regards,
André
0
Answers
-
-solved-
Ok, finally I got it going. Actually there was not a real problem I was just misleaded first by the fact that the new extracted attribute didn't appear in the attribute list in the in the settings of the extract information operator, second i therefore always received an error message. but actually when you type in your self selected attribute name and give the right parsing pattern (yyyy.MM.dd etc.) it works. the warning message can just be ignored. maybe you guys could delete this warning message or point out in the description that it works by setting your own attribute name.
cheers and bon week-end,
andré0