Splitting attributes & Getting specific records

pjm
pjm New Altair Community Member
edited November 2024 in Community Q&A

Hi

Trying to break an attribute down into 2 pieces: main and remainder

e.g. Fruit | 2KG | £2.00

so want to break off fruit from the rest and have the remainder in another attribute.

 

Also working on a 50k dataset and want to get 1k specific id numbers i have in mind

 

thanks for help.  1st time user

Tagged:

Best Answers

  • sgenzer
    sgenzer
    Altair Employee
    Answer ✓
    Another approach is to use Generate Attributes where you take the prefix up to the first space:

    att2 prefix(att1,index(att1," "))

    If you want the remainder in another attribute:

    att3 suffix(att1,length(att1)-length(att2))

    Sometimes you can be off by one character so just add/subtract 1 as needed. I use this more than Split as it gives me a lot more customization.

    Scott
  • sgenzer
    sgenzer
    Altair Employee
    Answer ✓
    Oh that's much easier. Just use "Filter Examples", select "single" and your ID attribute, select the "include special attributes" checkbox, and under custom filter just make two entries: ID > 94000 and another that is ID < 149000. Make sure the "and" button at the bottom is selected.

    You can also use the Filter Example Range operator which is slightly easier but will only filter by example number which may or may not be the same as your IDs.

    Scott

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee

    Hi pjm,

     

    welcome to the community!

     

    The split operator should do the job. I have attached a demo processes for it.

     

    ~Martin

  • pjm
    pjm New Altair Community Member
    thx its looking for something in xpath variables for read xml
    . So what im looking do do is have something like: Adam| Benji | Colin
    . Then set Adam as the main after the split and the other 2 in a seperate variable sub or something. Tried for the split operator: .*| but it results in: vara: A, varb: d, verc: a, vard: m
  • sgenzer
    sgenzer
    Altair Employee
    Answer ✓
    Another approach is to use Generate Attributes where you take the prefix up to the first space:

    att2 prefix(att1,index(att1," "))

    If you want the remainder in another attribute:

    att3 suffix(att1,length(att1)-length(att2))

    Sometimes you can be off by one character so just add/subtract 1 as needed. I use this more than Split as it gives me a lot more customization.

    Scott
  • sgenzer
    sgenzer
    Altair Employee
    Not sure what you mean by the ID numbers - are the 1k IDs randomized among the 50k examples? I use the Generate ID operator sometimes but not sure this is what you're looking for.

    Scott
  • pjm
    pjm New Altair Community Member

    thx for help on generate attributes think this could help me a lot for that problem

    with the ids the 1000 are from 94,000 to just over 149,000

    but none of the other ids fall in that range

    so im looking for a subset of the csv file that only takes records in that range

    thx

  • sgenzer
    sgenzer
    Altair Employee
    Answer ✓
    Oh that's much easier. Just use "Filter Examples", select "single" and your ID attribute, select the "include special attributes" checkbox, and under custom filter just make two entries: ID > 94000 and another that is ID < 149000. Make sure the "and" button at the bottom is selected.

    You can also use the Filter Example Range operator which is slightly easier but will only filter by example number which may or may not be the same as your IDs.

    Scott
  • jason_xie
    jason_xie New Altair Community Member

    Scott, 

     

    Your answer was really helpful. But what would you do if you want to split by 3rd Space?

     

    For example I have a column that has content like Nov 14 2016 12:50 AM, I want to split the date and time into 2 columns. 

     

    Thanks!

  • sgenzer
    sgenzer
    Altair Employee

    Hi @jason_xie - for that I would use a nice RegEx in the Split operator:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.000-BETA">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.000-BETA" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="8.0.000-BETA" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="313" y="85">
    <list key="attribute_values">
    <parameter key="text" value="&quot;Nov 14 2016 12:50 AM&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="split" compatibility="8.0.000-BETA" expanded="true" height="82" name="Split" width="90" x="581" y="85">
    <parameter key="split_pattern" value="(?&lt;=20[0-9][0-9])\s"/>
    </operator>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Split" to_port="example set input"/>
    <connect from_op="Split" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

  • jason_xie
    jason_xie New Altair Community Member

    Thanks! I ended up adding values to the index() output in the prefix() expression to adjust the space cutoffs.