One very useful Control Flow feature in SSIS 2008 is the ‘Conditional Split’. Sometimes new SSIS users can have difficulties in finding out detail about the expressions used in the Conditional Split. The purpose of this blog is to provide some advice about the Conditional Split: its use and how it works, when to use it, and some examples of the expression language. For people who are new (and not-so-new!) to SSIS it can sometimes be difficult to decide which component to use, and it isn’t always immediately clear why the Conditional Split component is useful.
How does the Conditional Split transform work? Briefly, as a Data Flow component, the Conditional Split is a decision maker. The component tests each row, and then decides what its destination should be. In other words, the ‘Conditional Split’ tests each individual inbound row of data against a series of different conditions, which have been defined by the package creator. Then, the Conditional Split component funnels each row of the data up into different outputs, dependent on the result of the test against each condition.
In order to understand its purpose a bit better, here are some examples where it might be useful:
- Directing ‘bad’ rows towards a separate place for further examination. A recent example is where timesheet data has been imported from one source to a different destination. The source system permitted timesheets to be submitted without customer name, project name, or submission date; in contrast, the target system regards these timesheets as incorrect. In order to ensure that the import from source to target went as smoothly as possible, the records were filtered for NULLS in the client, project or submission date fields and directed to a separate location for review. I could then take these output files, and ask the customer for default values for these fields, for inserting at a separate time.
- Differentiating records between ‘new’ and ‘update’ records. ‘New’ records can be inserted into the appropriate place; ‘update’ records can have fields updated in the existing data, rather than inserted as new information. To divide data rows into ‘new’ rows and ‘update’ rows for differentiating between ‘new’ information and information that might need to be updated. For example, if you have a ‘new’ customer, then you may want to treat that differently from situations where an existing customer simply needs an updated profile. This is called ‘Slowly Changing Dimensions’ (SCD).
When should the Conditional Split component be used, instead of the Slowly Changing Dimension component? This is an important point, and it depends on the quality of the data. Note: the SCD component does not handle NULLs. It is possible to use the SSIS 2008 SCD to handle data with NULLs in it, so, in this case, the Conditional Split component should be used.
Prior to creating the package, it is possible to test the data for NULLs by using the Data Profiling Feature in SSIS, and it is recommended here that this investigation is done prior to writing the package. If the data does contain NULLS, then it is recommended to use the Conditional Split component. Choosing the correct and most efficient component for a particular purpose is half the battle; in the words of Mary Poppins, ‘well begun is half done’!
In order to set up the conditions, it is necessary to use the SSIS Expressions language, which applies to the various transformations within SSIS. In the Conditional Split transformation, it is only possible to use expressions that produce a TRUE or FALSE result. So, for example, the following expressions would be legal:
Implicitly, this means that the Conditional Split transform can have more than one output destination. Due to this, it is important to consider what the ‘default’ situation will be, and where the ‘default’ rows should be directed. Here is an example of the Conditional Split transformation, using the ISNULL expression as an example:-
To conclude, here are some valuable tips in using the Conditional Split transformation:
If the Conditional Split component isn’t working properly, then it is possible to ‘daisy chain’ them in order to clarify the output. The ‘daisy chain’ method can help since it is possible to use Data Viewers in order to actually see what data is going through each flow. This mechanism can illuminate the decision behind the decisions made by the Conditional Split.
If the expressions do not result in TRUE or FALSE outcomes, then the Conditional Split component will not work properly. Examples of expressions in this category include increment statements, e.g. @Counter = 1, and other valid expressions which do not return TRUE or FALSE, such as GETDATE().