Letter count in sequence

BDk
BDk New Altair Community Member
edited November 2024 in Community Q&A
Hi, I'm quite new with the software. I would like to count the number of letter in a random sentence (e.g.:GGGAATCGTCA), e.g. how many 'A' occurred in it and put it into a new column. Is there some operator that could be used for it?  Thank you in advance!
Tagged:

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • Marco_Barradas
    Marco_Barradas
    Altair Employee
    Hi @BDk

    You can use a Process Documents and split the tokens and specify count occurrences 

    Spoiler
    <?xml version="1.0" encoding="UTF-8"?><process version="9.10.011">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.10.011" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="9.4.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
            <parameter key="text" value="GGGAATCGTCA"/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
          </operator>
          <operator activated="true" class="text:process_documents" compatibility="9.4.000" expanded="true" height="103" name="Process Documents" width="90" x="246" y="34">
            <parameter key="create_word_vector" value="true"/>
            <parameter key="vector_creation" value="Term Occurrences"/>
            <parameter key="add_meta_information" value="true"/>
            <parameter key="keep_text" value="false"/>
            <parameter key="prune_method" value="none"/>
            <parameter key="prune_below_percent" value="3.0"/>
            <parameter key="prune_above_percent" value="30.0"/>
            <parameter key="prune_below_rank" value="0.05"/>
            <parameter key="prune_above_rank" value="0.95"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="9.4.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34">
                <parameter key="mode" value="regular expression"/>
                <parameter key="characters" value=".:"/>
                <parameter key="expression" value="|"/>
                <parameter key="language" value="English"/>
                <parameter key="max_token_length" value="3"/>
              </operator>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
          <connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


  • BDk
    BDk New Altair Community Member
    Do I need some extension for this 'process documents' operator? I've an education version of the software and I could not find this operator.
  • BDk
    BDk New Altair Community Member
    OK, found the extension, sorry. It works for 1 row fine, thanks Marco. Could it be multiplied?
    I've a table that has 1000+ rows and all contains a letter sequence like the one that posted above. I would like to count the letters in each one by one. But with the posted solution it only works for 1 row or if I enter all the 1000+ via 'create document' it only counts the letters together in all rows...
  • Marco_Barradas
    Marco_Barradas
    Altair Employee
    HI @BDk

    You'll need to use a Process Documents or Process Documents from Data or Files it depends on how your data was originally collected.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.