New guy - learning how to clean up and parse a CSV file
WesCo2019
New Altair Community Member
I am new to data analysis and preparing a data set. I have a CSV file with the time stamp in the following format: "2018/05/14:12:00:00 PM."
What is a recommended way to "parse" the time stamp into Year, date, and time components to make it easier to sort and filter?
Thanks!
What is a recommended way to "parse" the time stamp into Year, date, and time components to make it easier to sort and filter?
Thanks!
Tagged:
0
Best Answer
-
hi @WesCo2019 - you're going to want to first put it into a date-time data type:
As for parsing, I would use "Generate Attributes" to create separate attributes for each piece you want:<?xml version="1.0" encoding="UTF-8"?><process version="9.0.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.0.003" expanded="true" height="68" name="Retrieve Lake Huron" width="90" x="45" y="34"> <parameter key="repository_entry" value="//Samples/Time Series/data sets/Lake Huron"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.0.003" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34"> <list key="function_descriptions"> <parameter key="Year" value="date_get(Date,DATE_UNIT_YEAR)"/> <parameter key="Month" value="date_get(Date,DATE_UNIT_MONTH)"/> <parameter key="Week" value="date_get(Date,DATE_UNIT_WEEK)"/> <parameter key="Day" value="date_get(Date,DATE_UNIT_DAY)"/> <parameter key="Hour" value="date_get(Date,DATE_UNIT_HOUR)"/> </list> </operator> <connect from_op="Retrieve Lake Huron" from_port="output" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Hope that helps.
Scott
5
Answers
-
hi @WesCo2019 - you're going to want to first put it into a date-time data type:
As for parsing, I would use "Generate Attributes" to create separate attributes for each piece you want:<?xml version="1.0" encoding="UTF-8"?><process version="9.0.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.0.003" expanded="true" height="68" name="Retrieve Lake Huron" width="90" x="45" y="34"> <parameter key="repository_entry" value="//Samples/Time Series/data sets/Lake Huron"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.0.003" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34"> <list key="function_descriptions"> <parameter key="Year" value="date_get(Date,DATE_UNIT_YEAR)"/> <parameter key="Month" value="date_get(Date,DATE_UNIT_MONTH)"/> <parameter key="Week" value="date_get(Date,DATE_UNIT_WEEK)"/> <parameter key="Day" value="date_get(Date,DATE_UNIT_DAY)"/> <parameter key="Hour" value="date_get(Date,DATE_UNIT_HOUR)"/> </list> </operator> <connect from_op="Retrieve Lake Huron" from_port="output" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Hope that helps.
Scott
5 -
You can also define the date-time format in the csv import wizard and save one step. Which I see Scott has already pointed out. I didn't see the first image.
0 -
Or you can use "Date to Numerical" and then specify the date component you want to extract.0
-
Thanks to all for the helpful insght1