Anyone who has ever had to import from Excel files programatically has probably experienced some headaches with Excel thinking it knows how to determine data types better than you do. This problem is especially prevelent when it comes to mixed types. Excel will ALWAYS prefer numeric types over any other data type regardless of what you try to do about it.
Here are a couple of examples of where this problem can be seen and what Excel does in these cases.
- You have a column in Excel that contains numeric and alphanumeric values, but the majority of the first 8 rows (by default) are numeric.
- Excel determines the column is numeric. It imports the numeric values, but imports nulls for non-numeric values.
- Microsoft talks about this in KB194124.
- You have a column with an equal number of numeric and alphanumeric values in it.
- Excel always favors number data types, so it chooses numeric and non-numeric values will import as nulls.
- You have numeric values, but they’re large numbers.
- Excel determines the column is numeric, but decides that the values are too long, so they must be displaid in scientific format (exponential numbers).
The problem is that Microsoft Excel tries to determine what data types are in each column by analyzing the data instead of using the data type you specify from the Number Category drop down menu. Once it determines this data type, anything that falls outside of it is imported as a NULL value. This whole issue actually comes from the Indexed Sequential Access Method (ISAM) driver that Excel uses to import and export data.
The ISAM driver analyzes the first 8 rows (by default) of each column being imported. It them determines, based on some of the rules I showed above, what data type the data is. The number of rows it analyzes is based on the value in this registry key.
By modifying this value to a higher number, you have a better chance of encompasing more non-numeric values and thus better hopes of a non-numeric data type being chosen.
Use IMEX Switch
In your normal connection information used to connect with the Excel file, there are some values that denote what you’re connecting to and various options. This would look something like this.
Set Db = OpenDatabase(“C:\File.xls”, False, True, “Excel 8.0; HDR=NO;”)
Adding the IMEX=1 option to these tells the ISAM driver to ignore data type and simply import all values as strings (text). You would do modify the previous connection information to look like this.
Set Db = OpenDatabase(“C:\File.xls”, False, True, “Excel 8.0; HDR=NO; IMEX=1;”)
However, there is a down side. As noted in MS KB Article 194124, numeric data types are ALL POWERFUL, and will completely ignore this option. Sorry about that. I just have to ask myself though, what were they thinking by creating an option that would ignore data types and import all values as text and then say that there are data types that will ignore this attempt to ignore data types? Anyway, it is what it is.
Using Tick Marks
So far this is the only method I have found to truely make Excel ignore the numeric values inside of cells and simply import what is there. It takes some coding after import, but I would much prefer that over importing half my data as nulls.
To use this method, you simply take the column which contains mixed data types (i.e. numeric & alphanumeric) and pre-pend a tick mark (‘) to the beginning of each value. This can be done in bulk by creating another column to the right of the one you want to insert the tick marks for, then paste the following formula.
=”‘” & CELLWITHVALUETOPREPENDTO
Then paste the contents of the cell you just put the formula into down to the end of the column. After that copy that entire column and paste values back to the original column. Then delete the column you just used as a concatenator.
After all of this, what it comes down to is that neither I nor anyone I’ve run across online has figured out a way around this incredibly painful problem. There are countless posts online about it and no solutions. So, the best you can hope for is to use one of these work arounds to limit the problems and code out the rest.
I have the users pre-pend a tick mark to all mixed data type columns, then upload the Excel file and then code out the tick marks, convert data types, etc. inside of the SQL stored procedure that imports this information into the database.