Inserting a unique header row into an already parsed data in DPS to be exported for SQL as CSV
Hello,
I would like to know the best approach as to how I may insert a header and trailer into parsed data to be imported into SQL.
I need to add the following items in the header (to be exported as a pipe-delimited in CSV file):
1) the captured file name of the input file (metadata information?)
2) the total number of rows of parsed data
3) the total amount in $ from a summed column
I currently have 12 columns of parsed data, so I understand that the header will not be aligned with the data columns.
The trailer should just contain an 'End of File' row for SQL to recognize that no further rows of data is in the pipe-delimited CSV file.
I checked out the following previous discussion thread for a similar procedure, but had no headway into how to make it work properly, unless it can truly be accomplished through Monarch Classic application using the Summary tab. Not sure how....
https://community.datawatch.com/thread/3695
Thank you for your feedback.
Henry
Answers
-
Hi Henry,
Let's consider the basics of this requirement to start with.
It sounds like you have a SQL load system that seeks interpret the first line of an input file as process information and subsequent lines as data to be loaded until it hits an EOF marker.
Is that about right?
I can't see how (or why), in that scenario, there would be a need or requirement to map the fields to columns but I would guess that the first line will need some identification about what it is so that the process can interpret it correctly.
In the older thread you referenced the concept of generating 3 outputs and concatenating them was addressed with a view to full automation of the process.
If we simplify things a little in order to do a proof of concept, what we have is a regular table data export to which is appended a one line summary output (File name, rows, amount), presumably for the process to check that everything is OK, and a footer (EOF) to prevent the process hanging in limbo at the end of file.
If running this "manually" on would run the process having defined a model with a "summary" export to provide the File name, row count and amount(s) and a table export for the core data. Then add an EOF marker as the last line.
So you could export the summary, export the table and then concatenate them in windows at the DOS level and add an EOF. Save the result and test it for your SQL load.
Using Monarch Classic you can specify that your output file (for the data table) needs an EOF and therefore add that automatically.
If you use a csv file export in all cases it is basically a text file and so adds rows of text. In theory it would not matter to the export process whether all of the rows presented the same format. So having run the model and with the summary and the table calculated, running the summary export to create a file and adding the table export to append to it (complete with EOF marker set) could be enough. (Subject to what your SQL ingest program is looking for.)
DPS would seem to require a slightly different approach. It is not something I have experimented with and I don't have time to do so today. But you could try it to see how it works out.
DPS has an Append feature for tables that should allow you to combine tables but it requires the table structures to have commonality.
If you can make the structures common it may be possible to use this as a "concatenation" process. However there may be some assumed internal processing (a sort by the first field or something for example) that might mean attempts to do what you need to do would fail without some more specific work on the process.
I guess if DPS is the preferred tool for the core data analysis you require for the table but Classic has the export options you need to simplify that part of the process, you could do what you need to do in DPS and then use the "Open in Classic" mode (or one of the similar options for moving the process on to Classic) and then have the Summary functionality and automatic EOF on CSV file export available on your processed data.
One further question for now.
You mention the header needs a value for "total number of rows for parsed data". Is that actually the number of rows in the exported data table (after filtering for example) or the number of records in the original source data file (pre-filtering) or (very unlikely) the total number of lines in the report (or similar).
I assume the former and for the values to be used as part of the ingestion process validation ... but assumption is often a very dangerous workmate.
Grant
0 -
Altair Forum User said:
Hi Henry,
Let's consider the basics of this requirement to start with.
It sounds like you have a SQL load system that seeks interpret the first line of an input file as process information and subsequent lines as data to be loaded until it hits an EOF marker.
Is that about right?
I can't see how (or why), in that scenario, there would be a need or requirement to map the fields to columns but I would guess that the first line will need some identification about what it is so that the process can interpret it correctly.
In the older thread you referenced the concept of generating 3 outputs and concatenating them was addressed with a view to full automation of the process.
If we simplify things a little in order to do a proof of concept, what we have is a regular table data export to which is appended a one line summary output (File name, rows, amount), presumably for the process to check that everything is OK, and a footer (EOF) to prevent the process hanging in limbo at the end of file.
If running this "manually" on would run the process having defined a model with a "summary" export to provide the File name, row count and amount(s) and a table export for the core data. Then add an EOF marker as the last line.
So you could export the summary, export the table and then concatenate them in windows at the DOS level and add an EOF. Save the result and test it for your SQL load.
Using Monarch Classic you can specify that your output file (for the data table) needs an EOF and therefore add that automatically.
If you use a csv file export in all cases it is basically a text file and so adds rows of text. In theory it would not matter to the export process whether all of the rows presented the same format. So having run the model and with the summary and the table calculated, running the summary export to create a file and adding the table export to append to it (complete with EOF marker set) could be enough. (Subject to what your SQL ingest program is looking for.)
DPS would seem to require a slightly different approach. It is not something I have experimented with and I don't have time to do so today. But you could try it to see how it works out.
DPS has an Append feature for tables that should allow you to combine tables but it requires the table structures to have commonality.
If you can make the structures common it may be possible to use this as a "concatenation" process. However there may be some assumed internal processing (a sort by the first field or something for example) that might mean attempts to do what you need to do would fail without some more specific work on the process.
I guess if DPS is the preferred tool for the core data analysis you require for the table but Classic has the export options you need to simplify that part of the process, you could do what you need to do in DPS and then use the "Open in Classic" mode (or one of the similar options for moving the process on to Classic) and then have the Summary functionality and automatic EOF on CSV file export available on your processed data.
One further question for now.
You mention the header needs a value for "total number of rows for parsed data". Is that actually the number of rows in the exported data table (after filtering for example) or the number of records in the original source data file (pre-filtering) or (very unlikely) the total number of lines in the report (or similar).
I assume the former and for the values to be used as part of the ingestion process validation ... but assumption is often a very dangerous workmate.
Grant
Thank you, Grant, for the thorough explanation.
That is exactly what I am after. I have gone as far as inserting the header information saved as a new model under the Classic application with the exception of the input file name.
Unfortunately, under the Summary tab, I have no way, that I know of, to insert the file name as a value in summary, as Summary, by definition, is a measure of something by using some type of aggregate function (Max, Count, Min, Avg, Sum, etc.). I have attached my results below. I will have to try the trailer procedure that you explained in a short while, as I did not get that far.
As you mentioned, I do need to concatenate the data table and the header/trailer table in DPS, perhaps, as I will need to automate both sections (data and header/footer) in Monarch Automator and output results as one pipe-delimited CSV file, to then be worked on further on SQL.
If you would elaborate on inserting the input file name on the header, that would be awesome! The only way I was able to add the file name is to create a new column using the FILE() function in the Classic application (or using Metadata Information in DPS), which inserts the file name in each data row. Not useful for me in that sense.
Henry
0 -
Altair Forum User said:
Thank you, Grant, for the thorough explanation.
That is exactly what I am after. I have gone as far as inserting the header information saved as a new model under the Classic application with the exception of the input file name.
Unfortunately, under the Summary tab, I have no way, that I know of, to insert the file name as a value in summary, as Summary, by definition, is a measure of something by using some type of aggregate function (Max, Count, Min, Avg, Sum, etc.). I have attached my results below. I will have to try the trailer procedure that you explained in a short while, as I did not get that far.
As you mentioned, I do need to concatenate the data table and the header/trailer table in DPS, perhaps, as I will need to automate both sections (data and header/footer) in Monarch Automator and output results as one pipe-delimited CSV file, to then be worked on further on SQL.
If you would elaborate on inserting the input file name on the header, that would be awesome! The only way I was able to add the file name is to create a new column using the FILE() function in the Classic application (or using Metadata Information in DPS), which inserts the file name in each data row. Not useful for me in that sense.
Henry
Henry,
I'm busy for the rest of the day but will respond in more detail over the weekend.
Try adding File() field from the table as a Key field in the summary and see where that gets you as a starting point.
It's a while since I have made use of that pat of Monarch in anger so I'm working from memory recommendations that may not be entirely error free!
Grant
0 -
Altair Forum User said:
Henry,
I'm busy for the rest of the day but will respond in more detail over the weekend.
Try adding File() field from the table as a Key field in the summary and see where that gets you as a starting point.
It's a while since I have made use of that pat of Monarch in anger so I'm working from memory recommendations that may not be entirely error free!
Grant
Hi Henry,
I should have said "try to respond" over the weekend.
Things came along to eat up the time.
I'll see if I can come up with something tomorrow.
Grant
0 -
Altair Forum User said:
Hi Henry,
I should have said "try to respond" over the weekend.
Things came along to eat up the time.
I'll see if I can come up with something tomorrow.
Grant
Hi Henry,
I decided to have a quick look straight away.
If you Add File() in Classic (Or the equivalent Meta Data field in DPS) you should be able to use that as part of a formula to get just the name (not the entire path string) that you need. (It may also be available in the report header and there may be some situations where multiple "reports" are delivered in a single "file" so there are some interesting considerations for that requirement.)
In Classic you can then create a Summary with the report name as the Key and Measures for the Sum of the field(s) you wish to sum and the Count of the number of records used for the summary (presumably number of values used for the SUM makes sense here.)
That should work IF what you need is based on the extracted and processed records - i.e. Count in number of table records.
If you actually need the number of total records in the report before, say, filtering then we need something different.
If you need the total number of lines in the report (or something like that rather than the number of records extracted) we need something different again. That seems unlikely.
Using DPS as far as I can tell you can do something similar but not in one step.
Get the Report name as you suggested via the Meta Data.
Then use a Transform to "Group By" .
Firstly group by the File Name field and then in Column use your value column and "Sum" as the operation.
Duplicate the table and repeat the Group activity but this time use File name as the Group field and the value column (or file name column if you like) and make the operation "Count".
Then Join those two results to get the File name, total and record count.
Hopefully that is so far so good. We have the values you need, the next thing is to create the input for your SQL.
Not knowing what you SQL ingestion can make use of and on the understanding that you have Automator to make things happen I'll just make some generic suggestions for now.
Firstly I would have thought that Automator and the SQL could be paired in some way to take the header info (exported form either Classic or DPS with a suitable "I'm a header record" identifier that the SQL ingestion could recognise and then either process that followed by the detail lines with the SQL assessing whether it saw the correct number of lines and the correct total and returning an OK or Error to Automator.
OR one could use Automator and perhaps Windows to concatenate the Summary line and the main export into a single file to push to the SQL ingestion.
OR one could export the summary followed by the details to the same export file using Classic and "Append if file exists". Then pass that to the SQL via Automator. (Via a Project in that case of course.)
Or, using DPS, something quite similar would be to select the "Summary" Join then the Detail table and use the "Append" feature in "Prep Data" to create a single table (with or without matching names and with a potentially interesting options to include the source table name in a column (some smart naming might work nicely there). That might give you the single file option with potential to put in Header and Body identifiers for the SQL if that is what it needs.
In general my feeling is that if this entire analysis can be run "hands off, lights out" that on balance I think it would likely be logical to run it in Classic/Automator.
On the other hand if there is some user interaction required before the data is safe to export then the DPS approach is pretty good having only a couple of small extra steps and should allow users to pick up much of the heavy lifting as well as providing some fast and powerful discovery facilities.
And you can still feed Automator with the results.
Does this give you a way forward or are there still further complexities to resolve? It would not surprise me if there are!
Grant
0 -
Altair Forum User said:
Hi Henry,
I should have said "try to respond" over the weekend.
Things came along to eat up the time.
I'll see if I can come up with something tomorrow.
Grant
Absolutely, Grant.
Thank you,
Henry
0 -
Altair Forum User said:
Hi Henry,
I decided to have a quick look straight away.
If you Add File() in Classic (Or the equivalent Meta Data field in DPS) you should be able to use that as part of a formula to get just the name (not the entire path string) that you need. (It may also be available in the report header and there may be some situations where multiple "reports" are delivered in a single "file" so there are some interesting considerations for that requirement.)
In Classic you can then create a Summary with the report name as the Key and Measures for the Sum of the field(s) you wish to sum and the Count of the number of records used for the summary (presumably number of values used for the SUM makes sense here.)
That should work IF what you need is based on the extracted and processed records - i.e. Count in number of table records.
If you actually need the number of total records in the report before, say, filtering then we need something different.
If you need the total number of lines in the report (or something like that rather than the number of records extracted) we need something different again. That seems unlikely.
Using DPS as far as I can tell you can do something similar but not in one step.
Get the Report name as you suggested via the Meta Data.
Then use a Transform to "Group By" .
Firstly group by the File Name field and then in Column use your value column and "Sum" as the operation.
Duplicate the table and repeat the Group activity but this time use File name as the Group field and the value column (or file name column if you like) and make the operation "Count".
Then Join those two results to get the File name, total and record count.
Hopefully that is so far so good. We have the values you need, the next thing is to create the input for your SQL.
Not knowing what you SQL ingestion can make use of and on the understanding that you have Automator to make things happen I'll just make some generic suggestions for now.
Firstly I would have thought that Automator and the SQL could be paired in some way to take the header info (exported form either Classic or DPS with a suitable "I'm a header record" identifier that the SQL ingestion could recognise and then either process that followed by the detail lines with the SQL assessing whether it saw the correct number of lines and the correct total and returning an OK or Error to Automator.
OR one could use Automator and perhaps Windows to concatenate the Summary line and the main export into a single file to push to the SQL ingestion.
OR one could export the summary followed by the details to the same export file using Classic and "Append if file exists". Then pass that to the SQL via Automator. (Via a Project in that case of course.)
Or, using DPS, something quite similar would be to select the "Summary" Join then the Detail table and use the "Append" feature in "Prep Data" to create a single table (with or without matching names and with a potentially interesting options to include the source table name in a column (some smart naming might work nicely there). That might give you the single file option with potential to put in Header and Body identifiers for the SQL if that is what it needs.
In general my feeling is that if this entire analysis can be run "hands off, lights out" that on balance I think it would likely be logical to run it in Classic/Automator.
On the other hand if there is some user interaction required before the data is safe to export then the DPS approach is pretty good having only a couple of small extra steps and should allow users to pick up much of the heavy lifting as well as providing some fast and powerful discovery facilities.
And you can still feed Automator with the results.
Does this give you a way forward or are there still further complexities to resolve? It would not surprise me if there are!
Grant
Thanks, Grant.
Since I already have the counts, all I needed was to display the file name in the header. I accomplished that by creating a new column in Table View (Classic) containing the file name (using File() function), then went to Summary View and entered the newly-created column into the Key data value. I came up with the following screenshot, but will have to filter out the first row, somehow, then removing the column with the file name prior to saving the model.
I will also have to integrate the new header with the parsed data created in DPS. As you mentioned in your instructions, there will be several hops for me to take in the application before having the final output as I need it to be.
0 -
Altair Forum User said:
Thanks, Grant.
Since I already have the counts, all I needed was to display the file name in the header. I accomplished that by creating a new column in Table View (Classic) containing the file name (using File() function), then went to Summary View and entered the newly-created column into the Key data value. I came up with the following screenshot, but will have to filter out the first row, somehow, then removing the column with the file name prior to saving the model.
I will also have to integrate the new header with the parsed data created in DPS. As you mentioned in your instructions, there will be several hops for me to take in the application before having the final output as I need it to be.
Henry,
If you are going to use the Classic summary route why not also use Classic to generate the main table and then just export from both to the same file.
Remember the Export will only export the columns that are visible at the point of exporting. If the name is the issue but a field can be managed, then there are ways of changing the Key name and changing the value in the field whilst retaining the SUM.
I don't see your screenshot but I would think there may be a possibility to have the SQL ingestion routine ignore specific line types and lines with (or perhaps without) certain characteristics.
It might take a little bit of working out how to get the best for your needs but once done the concept should be re-usable if required.
HTH.
Grant
0 -
Altair Forum User said:
Henry,
If you are going to use the Classic summary route why not also use Classic to generate the main table and then just export from both to the same file.
Remember the Export will only export the columns that are visible at the point of exporting. If the name is the issue but a field can be managed, then there are ways of changing the Key name and changing the value in the field whilst retaining the SUM.
I don't see your screenshot but I would think there may be a possibility to have the SQL ingestion routine ignore specific line types and lines with (or perhaps without) certain characteristics.
It might take a little bit of working out how to get the best for your needs but once done the concept should be re-usable if required.
HTH.
Grant
Okay, I see, Grant.
It seems that the screenshot did not go through the first time. Here it is again.
Anyways, I agree with your point about exporting the output. Here is what I am thinking:
1) process the data in DPS.
2) Save the model in DPS. This model will be used in Monarch Automater.
3) Save the parsed data output in DPS and open from Classic to run the header/footer model on the parsed data (will this work?).
4) Save the header/footer model in Classic.
4) Save the entire output (parsed data plus header/footer) in Classic.
5) Configure Monarch Automater (using the graphic workflow) to process the input data and add the header/footer automatically without human intervention.
Will this process work or am I oversimplifying it?
A few things to consider:
1) I will have to capture the original input file name early on in the process, otherwise, when I export (save) the parsed data and rename the file, I will no longer have the original file name. This is simple to do, just create a new column in DPS with the captured file name (metadata columns section).
2) I will have to incorporate the header on top of the parsed data, so this is something I have not tried yet. I only created a model to count the data rows and dollar amounts.
Thank you,
Henry
0 -
Altair Forum User said:
Okay, I see, Grant.
It seems that the screenshot did not go through the first time. Here it is again.
Anyways, I agree with your point about exporting the output. Here is what I am thinking:
1) process the data in DPS.
2) Save the model in DPS. This model will be used in Monarch Automater.
3) Save the parsed data output in DPS and open from Classic to run the header/footer model on the parsed data (will this work?).
4) Save the header/footer model in Classic.
4) Save the entire output (parsed data plus header/footer) in Classic.
5) Configure Monarch Automater (using the graphic workflow) to process the input data and add the header/footer automatically without human intervention.
Will this process work or am I oversimplifying it?
A few things to consider:
1) I will have to capture the original input file name early on in the process, otherwise, when I export (save) the parsed data and rename the file, I will no longer have the original file name. This is simple to do, just create a new column in DPS with the captured file name (metadata columns section).
2) I will have to incorporate the header on top of the parsed data, so this is something I have not tried yet. I only created a model to count the data rows and dollar amounts.
Thank you,
Henry
Hi Henry,
I'm going to start out by admitting that I am an "old school" Monarch user and so the Classic functionality is where my familiarity is greatest. Putting DPS derived output directly through to Automator is not something I have had an opportunity to work with yet.
Is there anything you are doing in DPS with the main body of the report and that produces your 12 columns which you think might not be possible or would seem to be very difficult in Classic?
If not my temptation would be, given the output is going into SQL via Automator, to look to Classic for the whole process. The Classic Project connect report and model and hold the Exports and so becomes the basis for the Automator process.
It sounds like this is intended to be a lights-out procedure once it is working - no tinkering in DPS in the early stages of each run. Is that correct?
Does the report file start out with 12 columns or are these columns the result of some work in DPS. Perhaps using the Pivot feature?
I suspect your step 3, which you have queried above, might be "interesting" but will depend on capturing all the data early as you have identified. Metadata in DPS or the FILE() function in Classic (presumably stripping out the file name from the path using a calculated field) will both do that job quite simply from the original file.
Do you have a sample file for development purposes that you can share? De-personalised data of course.
It's usually much quicker to give an opinion based on a few minutes working with something close to a real example than to try to dig into details via forums or emails.
Communication by model is a lot more direct to answering the questions!
Grant
0 -
Altair Forum User said:
Hi Henry,
I'm going to start out by admitting that I am an "old school" Monarch user and so the Classic functionality is where my familiarity is greatest. Putting DPS derived output directly through to Automator is not something I have had an opportunity to work with yet.
Is there anything you are doing in DPS with the main body of the report and that produces your 12 columns which you think might not be possible or would seem to be very difficult in Classic?
If not my temptation would be, given the output is going into SQL via Automator, to look to Classic for the whole process. The Classic Project connect report and model and hold the Exports and so becomes the basis for the Automator process.
It sounds like this is intended to be a lights-out procedure once it is working - no tinkering in DPS in the early stages of each run. Is that correct?
Does the report file start out with 12 columns or are these columns the result of some work in DPS. Perhaps using the Pivot feature?
I suspect your step 3, which you have queried above, might be "interesting" but will depend on capturing all the data early as you have identified. Metadata in DPS or the FILE() function in Classic (presumably stripping out the file name from the path using a calculated field) will both do that job quite simply from the original file.
Do you have a sample file for development purposes that you can share? De-personalised data of course.
It's usually much quicker to give an opinion based on a few minutes working with something close to a real example than to try to dig into details via forums or emails.
Communication by model is a lot more direct to answering the questions!
Grant
I see, Grant.
Yes to one of your questions, once the models are completed and tested fully in either DPS or Classic, then those models (workspaces) will be incorporated into the Automator for complete automation (I prefer to work fully with DPS if I could, only because I got accustomed to it already and like it a lot). Going forward, the source file layout will not change, only the values. It is tough to add custom rows with the limited functionality available in either application (DPS or Classic).
I have attached an Excel file to show you what would be the output file for export to SQL. Please let me know if the attachment is not visible. I have highlighted the header and footer information along with the column header that should not be in the export file (I only include it for your reference). Actually the export file will contain 22 columns of parsed out data. The original source file only contained 12 columns but I had to split things down based on the business requirements.
Thanks so much.
Henry
0