Extract Word

fantoon
fantoon New Altair Community Member
edited November 5 in Community Q&A

Hi All, 

I am currently working on a project to analyze Federal Open Market Committee minutes for the past 20 years in order to determine how the stock market will act or react to the FOMC decision to either increase or decrease interest rate.  

 

I have converted all Fed Mins to Text documents as well as my preprocessing included the following operators Transform cases, tokenize, Filter Stop words,  stem porter, filter tokens by length, and generate N-Grams). 

 

 

My struggle to come up with a process to extract only “Interest Rate” phrase from each meeting minutes as well as  “interest rate % percentage” ( Numeric)

 

For example, Meeting December 2017:

Interest Rate

1%

 

 I have attached a sample of the Fed meeting mintues. 

I really appreciate it your help in advance. Thanks!

 

Answers

  • JEdward
    JEdward New Altair Community Member

    I didn't see any attachment for the example so I used your table.  Possibly the Extract Information operator is the one you want? 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="179" y="136">
    <parameter key="text" value="Meeting December 2017:&#10;&#10;Interest Rate&#10;&#9;&#10;&#10;1%"/>
    </operator>
    <operator activated="true" class="text:extract_information" compatibility="7.5.000" expanded="true" height="68" name="Extract Information" width="90" x="313" y="136">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <parameter key="attribute_type" value="Numerical"/>
    <list key="regular_expression_queries">
    <parameter key="Interest Rate" value="Interest Rate[\s\r\n]+([0-9]+)%"/>
    </list>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    <description align="center" color="transparent" colored="false" width="126">Bring out the interest rate value using RegEx.</description>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="7.5.000" expanded="true" height="82" name="Documents to Data" width="90" x="514" y="136">
    <parameter key="text_attribute" value="text"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Extract Information" to_port="document"/>
    <connect from_op="Extract Information" from_port="document" to_op="Documents to Data" to_port="documents 1"/>
    <connect from_op="Documents to Data" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • fantoon
    fantoon New Altair Community Member

     

    I have attached Feds meeting minutes for your view.  I would like to extract “federal funds

     

    Rate” in column and numerical value “1/4 to 1/2 percent” in column. 

     

    Can you please send a screenshot of the process to extract one word and numerical value?

     

    Thank you for your help