Much to my surprise, Mac’s are horribly deficient at editing a CSV file that has UTF-8 characters in it.
Someone sent me a CSV file, and I needed to make some edits to one of the columns that included adding a bunch of characters with diacritics (ä or ú). I assumed this would be easy, since back in the Windows world I had edited lots of CSV files using Windows Excel.
I naively assumed I would be able to do the same on Microsoft Excel for Mac. As it turns out, you can’t even get off the ground this way. At least in version 2011, Microsoft Excel for Mac cannot open a CSV file with UTF-8 characters. The file will open, but any non-ascii characters will have been totally butchered. To make matters worse, if you put UTF-8 characters into an excel file and try to save it as CSV, it will generate garbage. It just won’t write the proper UTF-8 encoding into the file.
I was initially convinced that there had to be an option somewhere, but there isn’t. UTF-8 in a CSV file is just not supported in Excel on a Mac.
I found some sources that indicated that Numbers for Mac would solve this, so I spent the $20 to download it from the Mac App Store. Well, I found good news and bad news. The good news is that Numbers handles the UTF-8 encoding just fine. The bad news is that their implementation of reading CSV files in Numbers is buggy.
CSV files use commas to separate the columns. This works great until you have a cell that needs to have a comma inside the cell value. In order to prevent the program processing the file from interpreting the comma as a delimiter, you wrap the value in double quotes (“) so that the comma is treated as part of the value. Okay, so what do you do then if you need to have a double-quote in the text? You place two double-quotes (“”) one after the other, and this will be treated as a single double-quote in the text.
Unfortunately, Numbers seems to have some bugs in this when you mix the two together. You take a very simple CSV file with two rows and two columns, like this one:
value1,”I say that “”numbers””, however, sees many cells”
And Numbers will interpret it as:
Not very helpful. The escaped double-quotes are interfering with the processing, causing the internal commas to be mis-interpreted as delimiters.
I started looking around for other editors. I briefly looked at a CSV editor called XTabulator, but found that its editing powers were really quite limited. Most critically, it had no ability to do a find and replace, and it could not copy or paste columns from other programs like Excel.
Eventually, I took a look at LibreOffice. The spreadsheet tool can correctly read files in UTF-8, and it has all of the search and replace functions I need. It appears to be modeled on the Windows version of Microsoft Excel.
I did run into a small hitch when trying to save the file, however. It handled the UTF-8 encoding without a problem at all, but for some reason it saved it in tab-delimited format instead of CSV. However, when I checked the “Edit Filter Settings” button at the bottom of the “Save As…” dialog, it gave me options for the encoding, field delimiter, and text delimiter:
I left the options on their default settings, and that did the trick.
Who knew editing a UTF-8 CSV file on a Mac would be such a pain? Thanks, LibreOffice!