🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Letter count in sequence

BDkUser: "BDk"
New Altair Community Member
Updated by Jocelyn
Hi, I'm quite new with the software. I would like to count the number of letter in a random sentence (e.g.:GGGAATCGTCA), e.g. how many 'A' occurred in it and put it into a new column. Is there some operator that could be used for it?  Thank you in advance!

Find more posts tagged with

Sort by:
1 - 4 of 41
    Hi @BDk

    You can use a Process Documents and split the tokens and specify count occurrences 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.10.011">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.10.011" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="9.4.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
            <parameter key="text" value="GGGAATCGTCA"/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
          </operator>
          <operator activated="true" class="text:process_documents" compatibility="9.4.000" expanded="true" height="103" name="Process Documents" width="90" x="246" y="34">
            <parameter key="create_word_vector" value="true"/>
            <parameter key="vector_creation" value="Term Occurrences"/>
            <parameter key="add_meta_information" value="true"/>
            <parameter key="keep_text" value="false"/>
            <parameter key="prune_method" value="none"/>
            <parameter key="prune_below_percent" value="3.0"/>
            <parameter key="prune_above_percent" value="30.0"/>
            <parameter key="prune_below_rank" value="0.05"/>
            <parameter key="prune_above_rank" value="0.95"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="9.4.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34">
                <parameter key="mode" value="regular expression"/>
                <parameter key="characters" value=".:"/>
                <parameter key="expression" value="|"/>
                <parameter key="language" value="English"/>
                <parameter key="max_token_length" value="3"/>
              </operator>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
          <connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    BDkUser: "BDk"
    New Altair Community Member
    OP
    Do I need some extension for this 'process documents' operator? I've an education version of the software and I could not find this operator.
    BDkUser: "BDk"
    New Altair Community Member
    OP
    OK, found the extension, sorry. It works for 1 row fine, thanks Marco. Could it be multiplied?
    I've a table that has 1000+ rows and all contains a letter sequence like the one that posted above. I would like to count the letters in each one by one. But with the posted solution it only works for 1 row or if I enter all the 1000+ via 'create document' it only counts the letters together in all rows...
    HI @BDk

    You'll need to use a Process Documents or Process Documents from Data or Files it depends on how your data was originally collected.