How to use Barriers in FlowTracer


Instead of this flow with 2 jobs and 'A' output of the first, input to the second:

T vw something

O A

 

T vw JobA

I A

 

By including a barrier define:

T vw something

O -md5 A

 

T vw JobA

I A

 

Initially both jobs 'something' and JobA will run. 'something' creates output 'A' and 'JobA' consumes it.

When retracing the first version 'something', it will invalidate 'JobA' right away. However, with the barrier (-md5on the output) 'JobA' is only invalidated after 'something' completed and when the md5sum of file 'A' has changed.

Implementing smart barriers can prevent large parts of a flow from getting invalidated when data has changed, but has not changed in a meaningful way