First, read the introduction to pipelines article, which explains the basics of pipelines, origins, and walks you through the steps of how to create a new pipeline.
This article focuses on the specifics of writing a flow pipeline as opposed to a normal pipeline.
A flow pipeline orchestrates a set of RunPipeline and CallWebService commands, running either in parallel or sequence.
A flow can create many batches but appears as a single batch on the load page and in the batch history. The flow has a single status that reflects the statuses of all child batches (“Child” batches refers to the sub-batches created by the “parent” flow batch).
Opening a flow batch from the load page displays a preview page which lists the batches in the flow. A data flow itself is considered a batch, has a batch ID, and is shown with the “flow” icon. This flow has 4 batches:
As of version 4.29, the xMart deployment feature uses a flow pipeline. A complex deployment now appears as a single item in the batch history, rather than lots of batches mixed in with other non-deployment batches.
Flow Pipeline Xml
Here is a simple flow pipeline which runs 2 pipelines and then calls a web service. The 3 commands are launched in parallel.
<XmartPipeline>
<Flow Mode="PARALLEL">
<RunPipeline OriginCode="ABC" />
<RunPipeline OriginCode="DEF" />
<CallWebService Url="http://webservice/get" />
</Flow>
</XmartPipeline>
The main section is the Flow section. You can have multiple Flow sections.
Inside the Flow section you can put either RunPipeline or CallWebService commands. These commands will run either in sequence or in parallel depending upon how you set the Mode property of Flow.
Ctrl-click the Flow command to display the UI for the Flow command.
Parallel mode is the default and will be processed more quickly (much more if there are many commands) than Sequential mode. You don’t have to worry about overloading the system if Parallel is chosen, even if there are a large number of commands, because the system will regulate the number of commands that happen at once.
Sequential mode is useful when subsequent commands depend upon completion of previous commands. For example, if a second RunPipeline command should materialize a custom view only after the first pipeline has completed.
Properties of all Flow commands
Some properties are common to both RunPipeline and CallWebService commands:
ExecuteOnBatchStatuses
Relevant to sequential flows and PostRun sections of normal pipelines. In a flow pipeline in parallel mode, ExecuteOnBatchStatuses is irrelevant.
In order for this command to run, the previous command’s completion status must match one of these values: SUCCESS (default), CANCELED, SYSTEM_ERROR, REJECTED, INVALID, TIMEOUT_CANCELED. For CallWebService commands, only SUCCESS, SYSTEM_ERROR and TIMEOUT_CANCELED apply.
In a PostRun of a normal pipeline, if the mode of the PostRun is parallel, each command of the PostRun depends upon the status of the main batch. If the mode is sequential, the first command of the PostRun depends upon the status of the main batch but subsequenct commands depend upon the previous command in the PostRun.
RunEvenIfNoChanges
Relevant to sequential flows and PostRun sections of normal pipelines. In a flow pipeline in parallel mode, RunEvenIfNoChanges is irrelevant.
By default, for sequential flows and PostRun sections in normal pipelines, command processing stops when a pipeline produces no data changes. Set RunEvenIfNoChanges to true to run the command even if the previous command (or the main batch containing a PostRun) resulted in no data changes.
Origins
Comma-separated list of origin codes for which this command should run (default is all origins). Putting a “!” character in front of an origin code excludes that origin from the command.
RunPipeline command
The RunPipeline command launches a pipeline. These are the properties specific to the RunPipeline command:
OriginCode
Required: Code of the pipeline origin to run (see what is an origin?).
BatchComment Optional: Comment to be added to the batch when the pipeline is triggered. Pipeline variables like ${BATCH_ID} can be used to identify the parent batch. Default: “PostRun Batch ID: ${BATCH_ID}”
ResultStatus Read-only. This is populated by the system after the flow step has run and indicates the result of the triggered batch or webservice call. It is available in the generated pipeline xml attached to the batch on the load page.
BatchIDs Read-only. Populated by the system, list of generated batch IDs started by the command. Only batches that contain other batches would have this value populated.
CallWebService command
The CallWebService command calls a remote web service API, which can be useful to trigger synchronisation right after data updates. The pipeline script guide provides a detailed description of CallWebService.
The simplest CallWebService command just requires a URL. This example web service is publicly available and supports the GET http protocol:
<CallWebService Url="http://webservice/get" />
Most web services require authentication based on special header values. This example defines an http header variable named “Authorization-Token” and passes the value stored in the SECURE_TOKEN mart variable (what is a mart variable?)
<CallWebService Url="https://secure-webservice/get">
<Headers>
<add Name="Authorization-Token" Value="${mart.SECURE_TOKEN}" />
</Headers>
</CallWebService>
Some web services require http POST rather than GET, so a <Body>
element needs to be used. In addition, the content type of the body often needs to be defined:
<CallWebService Url="https://webservice/post">
<Headers>
<add Name="Content-Type" Value="application/json" />
</Headers>
<Body Type="raw">
{ "payload": "here" }
</Body>
</CallWebService>
File uploads
If any pipeline inside a flow requires a file upload, when a data flow is launched the user will be prompted to upload a file. If multiple pipelines inside the flow require a file upload, the user will only be prompted once and the pipelines will share that file.
Preview and commit
If a flow is launched in Preview mode, you must manually commit each batch in the flow. If the flow or a section of the flow runs in sequence rather than parallel, the first batch of the sequence must be committed before the second batch of the sequence begins (likewise for subsequent batches).
Running a flow in Commit mode means no interaction is required. The system will commit all batches of the flow so you can leave your desk.
Debugging
Debugging a flow is not supported. Only single normal batches can be debugged.
Passing variables
Flow pipelines can have input variables like normal pipelines, including system variables, pipeline variables and mart variables.
A pipeline variable may be passed from a flow pipeline to a child pipeline. By default, the value of the variable as defined in the flow is passed to the child pipeline, or, the value of the variable can be re-defined before passing.
In the following example, the user will be prompted to provide a value for the TableName variable.
When the RunPipeline for origin CREATE_LONG_HEAP is run, the TableName variable will be passed and the value of it will be what the user entered. Alternatively, a new value for TableName could be defined in the <Add>
element.
When the RunPipeline for origin CREATE_WIDE_HEAP is run, 2 new variables are defined and passed, SourceTableName and TargetTableName. These 2 variables are based on the value of TableName entered by the user.
<XmartPipeline>
<Context>
<Inputs>
<Add Key="TableName" Type="text" Placeholder="Table Name" />
</Inputs>
</Context>
<Flow Mode="SEQUENTIAL">
<RunPipeline OriginCode="CREATE_LONG_HEAP" >
<Values>
<Add Key="TableName" />
</Values>
</RunPipeline>
<RunPipeline OriginCode="CREATE_WIDE_HEAP" >
<Values>
<Add Key="SourceTableName" Value="${TableName}_HEAP" />
<Add Key="TargetTableName" Value="${TableName}_WIDE" />
</Values>
</RunPipeline>
</Flow>
</XmartPipeline>
Examples
A nested flow. All batches will be created as soon as possible and be committed in random order.
<XmartPipeline>
<Flow Mode="PARALLEL">
<RunPipeline OriginCode="LOAD_FLEET_MASTER_FILE_STORE" />
<RunPipeline OriginCode="LOAD_DAILY_USAGE_BY_QUARTERS_STORE" />
<Flow Mode="PARALLEL" >
<RunPipeline OriginCode="LOAD_SNOWFLAKE_FLEET_MAINTENANCE_RAW" />
<RunPipeline OriginCode="LOAD_SNOWFLAKE_FLEET_DAILY_USAGE_RAW" />
</Flow>
</Flow>
</XmartPipeline>
Run a set of normal pipelines in parallel and when all are finished, call a web service if they all succeeded.
<XmartPipeline>
<Flow Mode="SEQUENCE">
<Flow Mode="PARALLEL">
<RunPipeline OriginCode="ABC" />
<RunPipeline OriginCode="DEF" />
<RunPipeline OriginCode="GHI" />
<RunPipeline OriginCode="JKL" />
</Flow>
<CallWebService Url="http://webservice/get" ExecuteOnBatchStatuses="SUCCESS" />
</Flow>
</XmartPipeline>