Data Explorer is a great new add-on for Excel. It provides a simple interface over a power data transformation engine (powered by M), and it has become my tool of choice anytime I need to get data into Excel. In this post I show how you can use Data Explorer to do some initial data shaping, and then automate the data loading process through SSIS.
In a previous post, I showed how I used Data Explorer to grab the list of upcoming SQL Saturday events. When the Data Explorer query is executed, the data is downloaded, transformed, and stored in the Excel sheet.
We can read this data in SSIS using a Data Flow Task with an Excel Source component.
After we add the Excel Source, double click to bring up the component UI. We can create a new connection manager, and point it at the saved Excel file (.xlsx).
Note that even if you use Office 2010 or 2013 to create the file, you should select the “Microsoft Excel 2007” version (which was when the .xlsx format was introduced, replacing the .xls binary format).
On the Excel Source, we can now select “Table or view” as the data access mode. The drop down list will contain all of the sheets within our workbook. We can either select the name of the Sheet (in this case, Data), or the data query (Data$ExternalData_1). Since the result of the query is the only thing on this sheet, it will come out to the same thing in this instance.
On the Mapping tab, we can see the four columns from the workbook have been read.
We can click OK to save the settings, and then we can setup the rest of the data flow. Running the package should show us that all the rows have been read.
(Yes, that is a custom made Trash destination. I am pretty fancy.)
As we see, we can grab the data from the workbook using SSIS. The problem is that this data is a static copy – reading the data doesn’t automatically re-run the Data Explorer query that we used in the workbook. Thankfully, we can workaround that using a Script Task – see my previous post about dynamically refreshing a workbook. Go ahead and ready that post real quick – I’ll wait.
…
The refresh code in my previous post updates the excel file in place. If we watch the Task Manager, we can see the EXCEL.EXE process runs in the background.
Important – if you run the package with the refresh code added and receive a COMException – {“Class not registered”} error, you’re probably missing the Data Explorer add-in. Data Explorer uses a custom OLEDB provider internally, which requires you to install the add-in on the machine that is running your SSIS package.
If we open the file after the package runs, we should see the updated values from the refreshed query.
Closing Thoughts
Anyone familiar with SSIS will see some overlap between the transformations you can do in Data Explorer, and what you can do in the SSIS Data Flow. At this point Data Explorer is tool for getting data into Excel, and not what most people would consider a full fledged ETL platform. However, underneath the fancy, easy to use interface, is a powerful transformation engine (driven by the M language), and I can see a lot of potential applications for it. While the approach I describe above does get the products working together, I can imagine that closer integration between the products would be coming at some point in the future – but that’s pure speculation on my part.
What do you think? How would you like to see the products integrated?
Great post. Your earlier post got me thinking about using that approach refreshing Data explorer sheets.
That's actually how I ran into it. When I started putting together this blog post, I figured I needed some way to do the automated refresh. I decided to split it up into separate posts since it's useful on its own.
Great post. I can see great potential in data explorer it's a powerful addition to the toolset. Maybe a data explorer source adapter will appear in a future release of ssis.
I've been playing with Data Explorer a lot lately and I am very excited by what the "M" language could possibly bring to SSIS. There are some transformations that are made so easy with M that are still a bit of a pain in SSIS. I would very much like to see a convergence of SSIS and some of the Data Explorer features
For me the main advantage is Data Explorer can easily address some sources, such as AD, Facebook, Hive, … which are hard to get in SSIS.