Best practices around managing the use of KS

Michael_20250 · December 2019

Greetings!

Apologies if I just haven't looked around enough, but even if there are materials posted here I'd love to hear how it's working for people in practice. Let's put aside the data science questions, tweaking this config here or how to analyze this AOC over here. From a functional standpoint using KS involves the following:

Working with other KS users in your org (assuming you're not the only one). Is it considered impolite to go through someone else's project without asking? Is there an expectation that documentation on the analytical effort will exist somewhere so that it can be audited/validated?
KS generating its custom datatype files for the projects/trees/data etc.
If using some sort of Query to bring in data, maintaining a repository of queries somewhere (admittedly outside of KS, but I'd argue the organization and maintenance of these queries is critical to your success in KS.
Assuming you come up with an action item as a result of your analysis in KS (regardless of what method you use), where/how do you record this change for change log purposes (which may be in a separate place from where the change is implemented?
When implementing a change how long do you maintain the existing data files? When applying the theory of data retention policies there should be a point where this data gets discarded. If you discard the main data do you keep a summary of the dataset, screenshot of the tree (if using decision trees), ethat sort of thing?

I'm sure I'm leaving out various considerations. But I'm trying to understand how people are meshing the immense power KS offers its users with the operational needs of keeping track of data and changes. Any thoughts from anyone would be appreciated.

@datawatch team: On a similar note I'd pose the following question to you: while you provide what is arguably a best in class tool that empowers users of various skill levels, how does the tool itself aid users in accomplishing the above points? While some may seem like extra frills/bells/whistles, at least some of the points I raise need to be handled somewhere. If you don't believe your software is the place that's fine, but I'd encourage you to try to imagine an answer to the question "well where is this managed?" If you can't come up with an answer - it's quite possible you've found a new need of your customers.

------------------------------
Michael Graff
Senior Manager Fraud Strategy & Analytics
Staples, Inc.
Framingham MA
(267) 240-6402
------------------------------

Mahshid · December 2019

Hi Michael,

Some good points there. This might be a better conversation to have with your account manager so that they can introduce you to the Product Team, but I'll do my best to respond without promising too much!

1. Most clients that I have worked with have some form of audit/validation prior to promotion to production - assuming of course they are using the tool for production model creation rather than ad hoc analysis (which a lot do). All of these have a need to document the steps, decisions, and results. There is currently the capability to output the reports from many of the nodes in PDF or html format, there is also a 'Self Document' node, which can take screenshots of each page from a workflow. We are also currently working with one of our clients to explore the capability for customized repeatable reporting (e.g. templated reports).

2. With regards to Data Retention, currently each node is a stored copy of the data (generally compressed). It is possible to retain the first imported file and delete the other KDD files then use the 'run-to-here' option, but not something I would recommend. Saving a version of the source data in a data repository outside of the tool would probably be the most common approach - with well documented steps of the current workflow and model. In the Knowledge Studio for Apache Spark version, because the datasets are held in memory, only the metadata is retained. You can also take snapshots of models to save different versions on them in either version.

3. The Product Team is currently working on a new version of the product designed with capabilities for collaboration similar to what is currently available in the Knowledge Hub tool. This will include things such as permission settings, audit, additional points for notes and documentation. The capability for model management/monitoring is also something which is being explored.

If you would like more detail on any of this please feel free to reach out to either me or your rep and we can set up some time with the Product Management team.

------------------------------
Alex Gobolos
Sales Engineer
Datawatch Corporation
Toronto ON
------------------------------
-------------------------------------------
Original Message:
Sent: 12-09-2019 10:06 AM
From: Michael Graff
Subject: Best practices around managing the use of KS

Greetings!

Apologies if I just haven't looked around enough, but even if there are materials posted here I'd love to hear how it's working for people in practice. Let's put aside the data science questions, tweaking this config here or how to analyze this AOC over here. From a functional standpoint using KS involves the following:

Working with other KS users in your org (assuming you're not the only one). Is it considered impolite to go through someone else's project without asking? Is there an expectation that documentation on the analytical effort will exist somewhere so that it can be audited/validated?
KS generating its custom datatype files for the projects/trees/data etc.
If using some sort of Query to bring in data, maintaining a repository of queries somewhere (admittedly outside of KS, but I'd argue the organization and maintenance of these queries is critical to your success in KS.
Assuming you come up with an action item as a result of your analysis in KS (regardless of what method you use), where/how do you record this change for change log purposes (which may be in a separate place from where the change is implemented?
When implementing a change how long do you maintain the existing data files? When applying the theory of data retention policies there should be a point where this data gets discarded. If you discard the main data do you keep a summary of the dataset, screenshot of the tree (if using decision trees), ethat sort of thing?

I'm sure I'm leaving out various considerations. But I'm trying to understand how people are meshing the immense power KS offers its users with the operational needs of keeping track of data and changes. Any thoughts from anyone would be appreciated.

@datawatch team: On a similar note I'd pose the following question to you: while you provide what is arguably a best in class tool that empowers users of various skill levels, how does the tool itself aid users in accomplishing the above points? While some may seem like extra frills/bells/whistles, at least some of the points I raise need to be handled somewhere. If you don't believe your software is the place that's fine, but I'd encourage you to try to imagine an answer to the question "well where is this managed?" If you can't come up with an answer - it's quite possible you've found a new need of your customers.

------------------------------
Michael Graff
Senior Manager Fraud Strategy & Analytics
Staples, Inc.
Framingham MA
(267) 240-6402
------------------------------
"

Michael_20250 · December 2019

Mahshid said:
Hi Michael,

Some good points there. This might be a better conversation to have with your account manager so that they can introduce you to the Product Team, but I'll do my best to respond without promising too much!

1. Most clients that I have worked with have some form of audit/validation prior to promotion to production - assuming of course they are using the tool for production model creation rather than ad hoc analysis (which a lot do). All of these have a need to document the steps, decisions, and results. There is currently the capability to output the reports from many of the nodes in PDF or html format, there is also a 'Self Document' node, which can take screenshots of each page from a workflow. We are also currently working with one of our clients to explore the capability for customized repeatable reporting (e.g. templated reports).

2. With regards to Data Retention, currently each node is a stored copy of the data (generally compressed). It is possible to retain the first imported file and delete the other KDD files then use the 'run-to-here' option, but not something I would recommend. Saving a version of the source data in a data repository outside of the tool would probably be the most common approach - with well documented steps of the current workflow and model. In the Knowledge Studio for Apache Spark version, because the datasets are held in memory, only the metadata is retained. You can also take snapshots of models to save different versions on them in either version.

3. The Product Team is currently working on a new version of the product designed with capabilities for collaboration similar to what is currently available in the Knowledge Hub tool. This will include things such as permission settings, audit, additional points for notes and documentation. The capability for model management/monitoring is also something which is being explored.

If you would like more detail on any of this please feel free to reach out to either me or your rep and we can set up some time with the Product Management team.

------------------------------
Alex Gobolos
Sales Engineer
Datawatch Corporation
Toronto ON
------------------------------
-------------------------------------------
Original Message:
Sent: 12-09-2019 10:06 AM
From: Michael Graff
Subject: Best practices around managing the use of KS

Greetings!

Apologies if I just haven't looked around enough, but even if there are materials posted here I'd love to hear how it's working for people in practice. Let's put aside the data science questions, tweaking this config here or how to analyze this AOC over here. From a functional standpoint using KS involves the following:

Working with other KS users in your org (assuming you're not the only one). Is it considered impolite to go through someone else's project without asking? Is there an expectation that documentation on the analytical effort will exist somewhere so that it can be audited/validated?
KS generating its custom datatype files for the projects/trees/data etc.
If using some sort of Query to bring in data, maintaining a repository of queries somewhere (admittedly outside of KS, but I'd argue the organization and maintenance of these queries is critical to your success in KS.
Assuming you come up with an action item as a result of your analysis in KS (regardless of what method you use), where/how do you record this change for change log purposes (which may be in a separate place from where the change is implemented?
When implementing a change how long do you maintain the existing data files? When applying the theory of data retention policies there should be a point where this data gets discarded. If you discard the main data do you keep a summary of the dataset, screenshot of the tree (if using decision trees), ethat sort of thing?
I'm sure I'm leaving out various considerations. But I'm trying to understand how people are meshing the immense power KS offers its users with the operational needs of keeping track of data and changes. Any thoughts from anyone would be appreciated.

@datawatch team: On a similar note I'd pose the following question to you: while you provide what is arguably a best in class tool that empowers users of various skill levels, how does the tool itself aid users in accomplishing the above points? While some may seem like extra frills/bells/whistles, at least some of the points I raise need to be handled somewhere. If you don't believe your software is the place that's fine, but I'd encourage you to try to imagine an answer to the question "well where is this managed?" If you can't come up with an answer - it's quite possible you've found a new need of your customers.

------------------------------
Michael Graff
Senior Manager Fraud Strategy & Analytics
Staples, Inc.
Framingham MA
(267) 240-6402
------------------------------
"

Thanks Alex. It's good to know that the product as it stands has some capabilities in this area. I do want to clarify though that these details, documenting best practices, maintaining metadata, handling data retention at scale in terms of the KS files, these are not "true" expectations of the product right now. That said, they are realities users have to implement to some degree (if they want to have any sense of organization with how the tool interacts with other systems, teams and processes). I would not mind talking to the product team at all on these topics if there's interest, but I'm really interested in hearing from the rest of the user community to see how they approach these issues. Maybe people use SharePoint, Shared Drives, or have some vendor that helps them manage their business processes and where data lives/ends up, that's what I'm hoping someone is willing to share (and the interplay between those things and how KS is used).

------------------------------
Michael Graff
Senior Manager Fraud Strategy & Analytics
Staples, Inc.
Framingham MA
(267) 240-6402
------------------------------
-------------------------------------------
Original Message:
Sent: 12-10-2019 12:09 PM
From: Alex Gobolos
Subject: Best practices around managing the use of KS

Hi Michael,

Some good points there. This might be a better conversation to have with your account manager so that they can introduce you to the Product Team, but I'll do my best to respond without promising too much!

1. Most clients that I have worked with have some form of audit/validation prior to promotion to production - assuming of course they are using the tool for production model creation rather than ad hoc analysis (which a lot do). All of these have a need to document the steps, decisions, and results. There is currently the capability to output the reports from many of the nodes in PDF or html format, there is also a 'Self Document' node, which can take screenshots of each page from a workflow. We are also currently working with one of our clients to explore the capability for customized repeatable reporting (e.g. templated reports).

2. With regards to Data Retention, currently each node is a stored copy of the data (generally compressed). It is possible to retain the first imported file and delete the other KDD files then use the 'run-to-here' option, but not something I would recommend. Saving a version of the source data in a data repository outside of the tool would probably be the most common approach - with well documented steps of the current workflow and model. In the Knowledge Studio for Apache Spark version, because the datasets are held in memory, only the metadata is retained. You can also take snapshots of models to save different versions on them in either version.

3. The Product Team is currently working on a new version of the product designed with capabilities for collaboration similar to what is currently available in the Knowledge Hub tool. This will include things such as permission settings, audit, additional points for notes and documentation. The capability for model management/monitoring is also something which is being explored.

If you would like more detail on any of this please feel free to reach out to either me or your rep and we can set up some time with the Product Management team.

------------------------------
Alex Gobolos
Sales Engineer
Datawatch Corporation
Toronto ON
------------------------------

Original Message:
Sent: 12-09-2019 10:06 AM
From: Michael Graff
Subject: Best practices around managing the use of KS

Greetings!

Apologies if I just haven't looked around enough, but even if there are materials posted here I'd love to hear how it's working for people in practice. Let's put aside the data science questions, tweaking this config here or how to analyze this AOC over here. From a functional standpoint using KS involves the following:

Working with other KS users in your org (assuming you're not the only one). Is it considered impolite to go through someone else's project without asking? Is there an expectation that documentation on the analytical effort will exist somewhere so that it can be audited/validated?
KS generating its custom datatype files for the projects/trees/data etc.
If using some sort of Query to bring in data, maintaining a repository of queries somewhere (admittedly outside of KS, but I'd argue the organization and maintenance of these queries is critical to your success in KS.
Assuming you come up with an action item as a result of your analysis in KS (regardless of what method you use), where/how do you record this change for change log purposes (which may be in a separate place from where the change is implemented?
When implementing a change how long do you maintain the existing data files? When applying the theory of data retention policies there should be a point where this data gets discarded. If you discard the main data do you keep a summary of the dataset, screenshot of the tree (if using decision trees), ethat sort of thing?

I'm sure I'm leaving out various considerations. But I'm trying to understand how people are meshing the immense power KS offers its users with the operational needs of keeping track of data and changes. Any thoughts from anyone would be appreciated.

@datawatch team: On a similar note I'd pose the following question to you: while you provide what is arguably a best in class tool that empowers users of various skill levels, how does the tool itself aid users in accomplishing the above points? While some may seem like extra frills/bells/whistles, at least some of the points I raise need to be handled somewhere. If you don't believe your software is the place that's fine, but I'd encourage you to try to imagine an answer to the question "well where is this managed?" If you can't come up with an answer - it's quite possible you've found a new need of your customers.

------------------------------
Michael Graff
Senior Manager Fraud Strategy & Analytics
Staples, Inc.
Framingham MA
(267) 240-6402
------------------------------
"

Best practices around managing the use of KS

Answers

Categories