Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
How to read SGML (e.g., Reuters21578) by TextInput?
gfyang
Hi,
I'd like to test a text classification algorithm on Reuters21578. However, I find that the TextInput in RM only allows directories, which could not directly deal with the format of SGML in Reuters21578.
Of course, I could write a new program to parse it by myself. But, is there any easier way by RM?
Thank you.
Sincerely yours,
gfyang
Find more posts tagged with
AI Studio
Developer
Accepted answers
All comments
land
Hi,
there are two possibilities if you just want to extract the text from the data: You could just discard any tags, so that the pure text remains, or you could try to build an XPath querry, extracting what you need. The second solution will work with XML, but I don't know if your document contains any non XML elements.
Greetings,
Sebastian
gfyang
Hi,
Thank you for the help.
Sincerely yours,
gfyang
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups