I often find myself copying and pasting text from various sources into Excel in order to use its incredible text manipulation abilities to create hundreds of lines of code in mere seconds. However, the text I copy generally has a lot of unneeded information.
A great example of this is copying and pasting a list of countries and their ISO-3166-1 country codes from Wikipedia. The table contains images for the sort buttons, images for the country flags, links to the individual pages for each of the countries listed as well as links to each country code page. This is a lot of excess information that I would really like to get rid of so I can have a clean list of data.
This is what we want to accomplish with this script.
- Remove the images for the sort buttons.
- Remove the images for the country flags.
- Remove the links for each country name.
- Remove the links for each country code.
- Clear all formatting.
To prepare for this, I am going to simply copy the entire table listed in the Current Codes section of the ISO-3166-1 country codes page listed above into Excel. This creates a spreadsheet with country name, alpha-2, alpha-3, numeric and ISO-3166-2 codes. However, as you can tell, there are images, links and different formats all over the place.
You will need to enter the script in the Visual Basic Editor that is built into Excel. You can access this by pressing ALT+F11. This will open a window that at least some have never seen. Don’t worry, this is actually very simple. The tree view on the left shows the entire project (“VBAProject Book1”), then a folder called “Microsoft Excel Objects” that will list the objects (things) that are in this project. Each worksheet in this workbook are listed under that and as well as an object called ThisWorkbook, which represents the entire workbooks.
What we want to do is make this available to any page in the workbook, so we want to create a new kind of object called a Module. To do this, right-click on any item in the tree view and select Insert and then Module. This will create a new module called Module1 and open the code window for you. For what we want to do, just click in the white page on the right (under where it says “(General)” and “(Declarations)” and ignore everything else in this window.
Copy and paste the following code into the code window you have open for Module1.
' Delete all pictures.
' Delete all hyperlinks.
' Clear all cell's formats.
' Loop through each text cell and trim its contents.
For Each Cell In ActiveSheet.UsedRange
If IsNumeric(Cell.Value) = False And IsDate(Cell.Value) = False Then
Cell.Value = Trim(Cell.Value)
I have commented on each task that takes place in the code above, but basically the code removes all images, removes all hyperlinks, then selects all cells and clears their formats and then loops through each cell in the worksheet and trims the spaces from the front and end of the cell. Of course, if you don’t want a particular task to take place, you can comment it out with an apostrophe character (‘) at the beginning of the line.
To run this new script, switch back to the Excel window and select the worksheet that you want to clean up. Then press ALT+F8 to bring up the macro window. Select the new CleanContents macro from the list and click the Run button. Depending on the amount of data you have and the power of your system, the script should be done in a few seconds or so. Bingo, clean data!
You will want to keep this script handy to paste in Excel files when you need to get them cleaned up without having to hand select and remove each images, hyperlink, etc. and clear all the contents. I’ve done it before and it takes forever! This script takes only a few seconds to paste into the Excel file and run. I hope this saves you as much time as it has saved me!
If you enjoy this, please take the time to comment on this post and let me know how it has helped you!