On some of the files, the header row started on row 2, 3, 4, or 5, and on some others, the first two or three columns were entirely blank. What a pain, right?
One saving advantage that I identified in this proprietary set of excel files: each of the datasets begins with a column headed "Identifier", so that is the cell that I want to end up in cell [A1]. That was definitely key to making this work and easily identifying when my header row began.
I automated cleaning up these spreadsheets with the following PowerShell script
The two while loops are where the customization was for my particular set of problematic Excel spreadsheets, but the general structure of the code is where you can change it up for yourself.Param([string]$wbname) Try { $err1 = 0; #testing only #$wbname = "E:\foo.xls" $xl = New-Object -comobject Excel.Application $xl.DisplayAlerts = $False $wb = $xl.Workbooks.open($wbname) $sheet = $wb.Sheets.Item(1) #delete columns while( ($sheet.Cells.Item(1,1).Text -eq "") -and ($sheet.Cells.Item(2,1).Text -eq "") -and ($sheet.Cells.Item(3,1).Text -eq "")){[void]$sheet.Cells.Item(1,1).EntireColumn.Delete()} #delete rows while( ($sheet.Cells.Item(1,1).Value() -NotLike "Identifier") -and ($sheet.Cells.Item(1,2).Value() -NotLike "Identifier")){[void]$sheet.Cells.Item(1,1).EntireRow.Delete()} #cleanup $wb.Close($true) $xl.Quit() [System.Runtime.Interopservices.Marshal]::ReleaseComObject($xl) Remove-Variable xl } Catch { $err1 = 1; } Finally { if($err1 -eq 1) { #Write-Output "99"; #Write-Output $_.Exception.Message; $stream = [System.IO.StreamWriter] "e:\error.txt" $stream.WriteLine($Error[0].InvocationInfo.PositionMessage) $stream.WriteLine($wbname) $stream.close() [Environment]::Exit("99"); } Else { #Write-Output "0"; [Environment]::Exit("0"); } }
The last four lines are cleanup - while the last might not be very important, the second-to-last line is pretty important to make sure the stick COM object goes away. Explained much better here: http://technet.microsoft.com/en-us/library/ff730962.aspx
In order to put this in sequence in my SSIS package, I used the Execute Process Task. Screenshots below:
Unfortunately the client server I was working for here only had PowerShell 1.0 installed, but from what I understand, this should apply to PowerShell 2.0 and 3.0. I'm open to anyone who has any insight there.
(click to enlarge the below images)
The argument passed to the PowerShell file includes the full path to the Excel file. (I removed the paths from this actual client implementation.)
I'm open to feedback on this, wonder if anyone else has encountered a similar issue. Obviously, the best solution would have been to modify the process that generates these maddeningly inconsistent excel files.
No comments:
Post a Comment