nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

⚠️Please Note

Technical discussions have been migrated to the Siemens Support Center as Knowledge Base (KB) articles; please note that this content is no longer maintained and may be outdated, so for the latest information, log in to the Siemens Support Center, search online, or contact our support team.

Search for Content in Siemens Support Center

"Text Classification using Text Plugin - StringTextInput"

pser

This post refers to http://rapid-i.com/rapidforum/index.php/topic,368.0.html. It adresses the problems I experienced related to the StringTextInput operator.

First I'm loading the texts and their labels from a MySQL database using DatabaseExampleSource. I'd like to save the obtained example set before continuing.

Problem 1: The examples have a string attribute (the text to classify) which usually contains newlines. When writing this to disk using ExampleSetWriter an example is split up to several lines. So ExampleSource doesn't work (it expects one example per line). What can I do?

After loading the data from the database I use StringTextInput.

Problem 2: StringTextInput throws a warning. For every example in the example set it prints out the content of string attribute (e.g. the text to classify) followed by "not found. Assuming the text is directly encoded as document source..." I think this means the string attribute is interpreted as a filename before it is used directly. Since I got spammed with warnings I had to suppress output of warnings completely. Is there a better solution?

Question 3: What does the parameter "prune above" of the operator StringTextInput do when I enter a percentage value? I didn't understand the explanation in the operator description.

Next I need to create a wordlist. Since the database contains a lot of articles I do not want to load them all at once into memory.

Problem 4: How can I modify StringTextInput so that I can load a wordlist and update it with new words? I tried to find the part in the sourcecode and noticed that wordlist creation is handled by WVTool Java library (not Text Plugin). But the Text Plugin seems to use a newer version of WVTool (given as .jar) than I can get via Sourceforge. Where can I get the sourcecode of the newest version of WVTool?

Find more posts tagged with

AI Studio

Text Mining + NLP

Comments

There are no comments yet