A program to recognize and reward our most engaged community members
HelloI paste the submitted process into the application.But the process document operator is red. And disabled.Can someone send me the rmp file from this process?Thank you If anyone helps. I need very muchThankful
If you are serious about POS and tagging I would recommend using the python NLTK package. It is much more robust than the build in POS options, and a whole lot faster also (developers, take this as a hint ;-))
Attached example is not exactly what you need, but there are plenty of examples to find on the internet on how to work with NLTK.
The sample is something I use myself a lot to seperate nouns from verbs, or look for combined strings (noun or verb phrases for instance) and it's pretty modular.
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="generate_data_user_specification" compatibility="8.2.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="85"> <list key="attribute_values"> <parameter key="content" value=""I love this product, the price was really cheap for these types of headphones, and they don't hurt my ears too much after listening to music for hours on end! I ordered with Amazon prime, and it came the next day, I was very pleased.""/> </list> <list key="set_additional_roles"/> <description align="center" color="transparent" colored="false" width="126">Simple string</description> </operator> <operator activated="true" class="nominal_to_text" compatibility="8.2.000" expanded="true" height="82" name="Nominal to Text" width="90" x="179" y="85"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="content"/> <description align="center" color="transparent" colored="false" width="126">ensure the string is text before we start conversion</description> </operator> <operator activated="true" class="subprocess" compatibility="8.2.000" expanded="true" height="82" name="chuncker (2)" width="90" x="313" y="85"> <process expanded="true"> <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="get POS phrases" width="90" x="112" y="34"> <parameter key="script" value="import nltk, re from nltk.tokenize import sent_tokenize, word_tokenize, regexp_tokenize, wordpunct_tokenize from nltk.chunk import * from nltk.chunk.util import * from nltk.chunk.regexp import * from nltk import untag from nltk.stem import PorterStemmer, WordNetLemmatizer from nltk.stem.lancaster import LancasterStemmer from nltk.stem.snowball import SnowballStemmer """ The GetPOS class contains any type of POS combination you might be intrested in, and allows for relatively easy addition of different types based on the Part Of Speech attributes. Note that below examples are for demo purposes only, and may need to be modified to get better results, defined by the given datasets. """ class GetPOS: def __init__(self,txt): self.txt = txt def get_noun_phrases(self): self.chunk_rule = ChunkRule("<JJ.*><NN.*>+|<JJ.*>*<NN.*><CC>*<NN.*>+|<CD><NN.*>", "Simple noun phrase") self.tags = chunckMe(self.txt,[self.chunk_rule]) return ', '.join(set(self.tags)) def get_adverb_phrases(self): self.chunk_rule = ChunkRule("<JJ.*><CC><JJ.*>|<JJ.*><TO>*<VB.*><TO>*<NN.*>+", "adjective phrase") self.tags = chunckMe(self.txt,[self.chunk_rule]) return ', '.join(set(self.tags)) def get_adverbs_adjectives(self): self.chunk_rule = ChunkRule("<RB.*><JJ.*>|<VB.*>+<RB.*>", "Adverb - Adjectives") self.tags = chunckMe(self.txt,[self.chunk_rule]) return ', '.join(set(self.tags)) def get_verbs_adjectives(self): self.chunk_rule = ChunkRule("<VB.*>(<JJ.*>|<NN.*>)+", "verbs - Adjectives") self.tags = chunckMe(self.txt,[self.chunk_rule]) return ', '.join(set(self.tags)) def get_nouns(self): self.chunk_rule = ChunkRule("(<WRB><.*>+)?<NN.*>+", "Nouns") self.tags = chunckMe(self.txt,[self.chunk_rule]) return ', '.join(set(self.tags)) def get_verbs(self): self.chunk_rule = ChunkRule("<VB.*>+", "Verbs") self.tags = chunckMe(self.txt,[self.chunk_rule]) return ', '.join(set(self.tags)) def get_verbs_lemma(self): stopwords=(['be', 'do', 'have', 'am']) lm=nltk.WordNetLemmatizer() self.chunk_rule = ChunkRule("<VB.*>+", "Verbs") self.tags = chunckMe(self.txt,[self.chunk_rule]) return ', '.join([word for word in nltk.word_tokenize(' '.join(set(lm.lemmatize(w, 'v') for w in self.tags))) if word.lower() not in stopwords]) def return_tags(self): self.tags = chunckMe(self.txt,[self.chunk_rule]) return ', '.join(set(self.tags)) """ chunk_me will chunk the provided string, and return only the tokens (words) that apply to the given rule """ def chunckMe(txt,rule): np=[] chunk_parser = RegexpChunkParser(rule, chunk_label='LBL') sentences= sent_tokenize(txt) for sent in sentences: d_words=nltk.word_tokenize(sent) d_tagged=nltk.pos_tag(d_words) chunked_text = chunk_parser.parse(d_tagged) tree = chunked_text for subtree in tree.subtrees(): if subtree.label() == 'LBL': np.append(" ".join(untag(subtree)).lower()) return np; """ the rm_main def is the base as used by rapidminer. the dataframe (called data by default but can be changed to whatever) will be defined by the incoming port, the output will be what is returned to the process. It is perfectly possible to run a python module without retrieving or returning anything, in that case leave the attributes blank. In this example we use some lambda functions to call whatever type of POS we want to add to the dataframe / recordset. So we will have our original dataframe, and add n new series to return for further use within the workflow. """ def rm_main(data): body = data['content'] data['noun_phrases'] = body.apply(lambda x: GetPOS(x).get_noun_phrases()) data['adverb_phrases'] = body.apply(lambda x: GetPOS(x).get_adverb_phrases()) data['nouns'] = body.apply(lambda x: GetPOS(x).get_nouns()) data['verbs_lemma'] = body.apply(lambda x: GetPOS(x).get_verbs_lemma()) return data"/> <description align="center" color="transparent" colored="false" width="126">Apply python (NLTK) to get POS tags and some other magic</description> </operator> <connect from_port="in 1" to_op="get POS phrases" to_port="input 1"/> <connect from_op="get POS phrases" from_port="output 1" to_port="out 1"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="source_in 2" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> <description align="center" color="yellow" colored="false" height="50" resized="false" width="570" x="87" y="243">https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html</description> </process> <description align="center" color="transparent" colored="false" width="126">use python to set some POS logic for key phrases</description> </operator> <connect from_op="Generate Data by User Specification" from_port="output" to_op="Nominal to Text" to_port="example set input"/> <connect from_op="Nominal to Text" from_port="example set output" to_op="chuncker (2)" to_port="in 1"/> <connect from_op="chuncker (2)" from_port="out 1" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process>
Hello. thank you very muchI run your code. The output is this wayNow, if I want to separate and display the attributes and constraints, how should I write it? I did not run anyway .. !!Is it possible to say this too?AndI want to emulate with the extraction and selection of pos tags and sentiment analysis by wordnet,Are I able to connect to the wordnet operator after extraction, nouns and verbs and adverbs , adjectives?Is it possible in Python coding?Sorry i am a beginner.Thank youhave a nice day