The CDC Control Task is used for a number of different operations. This post lists each of those operations, and briefly describes when to use them.
Mark initial load start
This operation is used when executing an initial load from an active database without a snapshot. It is invoked at the beginning of an initial-load package to record the current LSN in the source database before the initial-load package starts reading the source tables. A walkthrough of how this process works can be found in my CDC in SSIS for SQL Server 2012 post.
Mark initial load end
This operation is used when executing an initial load from an active database without a snapshot. It is invoked at the end of an initial-load package to record the current LSN in the source database after the initial-load package finished reading the source tables. This LSN is determined by recording the current time when this operation occurred and then querying the cdc.lsn_time_mapping table in the CDC database looking for a change that occurred after that time
Mark CDC start
This operation is used when then the initial load is made from a snapshot database database or from a quiescence database. It is invoked at any point within the initial load package. The operation accepts a parameter that can be a snapshot LSN, a name of a snapshot database (from which the snapshot LSN will be derived automatically) or it can be left empty, in which case the current database LSN is used as the start LSN for the change processing package. This operation is used as an alternative to the Mark Initial Load Start/End operations.
Get processing range
This operation is used in a change processing package before invoking the data flow that uses the CDC Source data flow. It establishes a range of LSNs that the CDC Source data flow reads when invoked. The range is stored in an SSIS package variable (StateVariable property) that is used by the CDC Source during data flow processing.
Mark processed range
This operation is used in a change processing package at the end of a CDC run (after the CDC data flow is completed successfully) to record the last LSN that was fully processed in the CDC run. The next time Get processing range is used, this position determines the start of the next processing range.
Reset CDC state
This operation is used to reset the persistent CDC state associated with the current CDC context. After this operation is run, the current maximum LSN from the LSN-timestamp sys.fn_cdc_get_max_lsn table becomes the start of the range for the next processing range. An example of when this operation is used is when you want to process only the newly created change records and ignore all old change records.
Notes
- The following operations open a connection to the source system
- Mark initial load start
- Mark initial load end
- Mark CDC start
Hi Matt, thanks for explaining this, your blog is becoming BOL for CDC in SSIS.
Question:
Do we receive errors when no inserts, updates or deletes occur against the source database between Mark Initial load start and Mark initial Load end? Or doesn't it hurt to call these two operations on a near-read-only database. I'm asking this because I got weird index-out-of-bound errors while playing with CDC in SSIS, and this might explain the errors, though I should rerun the test to be sure.
I don't think so – I believe the operations are just setting LSNs, so I'm not sure why you'd end up with index-out-of-bound errors. Can you post the exact error message(s) you're seeing?
Hi ! In first, thanks for your blog and explications.
I tested SSIS 2012 and I have a question about CDC:
Is it necessary to make 2 packages ? One with Mark initial load start and Mark initial load end in which i don't use CDC Source, to load/initialize CDC_state and the second package, it's with Get processing range and Mark proccessed range which i use CDC Source and CDC Splitter.