I put this together for an SSIS performance presentation I did a while ago, and thought it warranted its own post.
In addition to looking for appropriate Design Patterns, I have a checklist of things to look for when trying to optimize an SSIS Data Flow. This list is by no means extensive, but it contains some of the more common causes of performance issues in the packages I’ve worked with.
- Can you remove the OLE DB Command transform?
OLE DB command operates on a row by row basis. Stage the data and perform the operation in a batch mode using the Execute SQL Task instead.
- Can you replace Upsert logic with a MERGE statement?
Learn MERGE, and love it.
- Do you need those Sort transforms in there?
Sometimes you do. Most of the time you don’t. If you do need them (i.e. you are loading from a flat file, and need to do a merge), considering maxing out the DefaultBufferMaxRows setting so you can sort more data in memory (sort will keep at most 5 active buffers in memory before it starts swapping to disk).
- Lookup Transforms
- Can the lookup be avoided?
- Are you using the right cache mode?
- Should you use a Cache Connection Manager?
Lots of related lookup posts can be found here.
Have fun optimizing!
I have a Turbo Dataflow solution for loading a large .csv file (>= 10 GB).
FlatFile Source -> Derived Column Transform -> Script Component Transform -> multiple OLE DB Destinations (will provide detail in another post)