Solved: Editing a CSV with UTF-8 encoding on a Mac

Much to my surprise, Mac’s are horribly deficient at editing a CSV file that has UTF-8 characters in it.

Someone sent me a CSV file, and I needed to make some edits to one of the columns that included adding a bunch of characters with diacritics (ä or ú). I assumed this would be easy, since back in the Windows world I had edited lots of CSV files using Windows Excel.

I naively assumed I would be able to do the same on Microsoft Excel for Mac.  As it turns out, you can’t even get off the ground this way.  At least in version 2011, Microsoft Excel for Mac cannot open a CSV file with UTF-8 characters. The file will open, but any non-ascii characters will have been totally butchered.  To make matters worse, if you put UTF-8 characters into an excel file and try to save it as CSV, it will generate garbage.  It just won’t write the proper UTF-8 encoding into the file.

I was initially convinced that there had to be an option somewhere, but there isn’t. UTF-8 in a CSV file is just not supported in Excel on a Mac.

I found some sources that indicated that Numbers for Mac would solve this, so I spent the $20 to download it from the Mac App Store. Well, I found good news and bad news. The good news is that Numbers handles the UTF-8 encoding just fine. The bad news is that their implementation of reading CSV files in Numbers is buggy.

CSV files use commas to separate the columns. This works great until you have a cell that needs to have a comma inside the cell value. In order to prevent the program processing the file from interpreting the comma as a delimiter, you wrap the value in double quotes (“) so that the comma is treated as part of the value.  Okay, so what do you do then if you need to have a double-quote in the text? You place two double-quotes (“”) one after the other, and this will be treated as a single double-quote in the text.

Unfortunately, Numbers seems to have some bugs in this when you mix the two together.  You take a very simple CSV file with two rows and two columns, like this one:

Column1,Column2
value1,”I say that “”numbers””, however, sees many cells”

And Numbers will interpret it as:

The CSV processor on Numbers is very buggy

The CSV processor on Numbers is very buggy

Not very helpful. The escaped double-quotes are interfering with the processing, causing the internal commas to be mis-interpreted as delimiters.

I started looking around for other editors.  I briefly looked at a CSV editor called XTabulator, but found that its editing powers were really quite limited. Most critically, it had no ability to do a find and replace, and it could not copy or paste columns from other programs like Excel.

Eventually, I took a look at LibreOffice.  The spreadsheet tool can correctly read files in UTF-8, and it has all of the search and replace functions I need.  It appears to be modeled on the Windows version of Microsoft Excel.

I did run into a small hitch when trying to save the file, however. It handled the UTF-8 encoding without a problem at all, but for some reason it saved it in tab-delimited format instead of CSV. However, when I checked the “Edit Filter Settings” button at the bottom of the “Save As…” dialog, it gave me options for the encoding, field delimiter, and text delimiter:

Using the "Edit filter settings" option brings up an extra dialog to specify the proper CSV formatting in LibreOffice

Using the “Edit filter settings” option brings up an extra dialog to specify the proper CSV formatting in LibreOffice

I left the options on their default settings, and that did the trick.

Who knew editing a UTF-8 CSV file on a Mac would be such a pain? Thanks, LibreOffice!

This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

20 Responses to Solved: Editing a CSV with UTF-8 encoding on a Mac

  1. Hans says:

    Thanks! Very useful

  2. Alessandro says:

    Hi Jerem, I use a combination of Libre(/open)Office and the megapowerful OpenRefine: you can use only the second one but you have to learn a bit of the GREL language (Google Refine Epression Language – This was a project supported by Google). Once you understand it you will not go back anymore!
    As everything it has a Con: it’s a platform created for elaborating huge quantities of data, but you will not have the “eye-support”, cause he can preview at least 50 row a time.

  3. rupertdodge says:

    I have a similar situation, found this and it worked a charm
    iconv -f UTF-8 -t WINDOWS-1252 /file/path/filename.csv > /file/path/newfilename.csv

  4. Hakan says:

    I had a similar problem and solved my problem using Sublime Text. I opened the file and saved it with “Save with encoding / UTF-8″…and it all worked fine for me…
    I hope it helps others with similar issue

  5. NeoTeo says:

    All you have to do is load the file up from the Terminal in vim (standard on all OS X), set the mode to utf8 like so:

    :set fileencoding=utf8

    and then write out the file, like so:

    :w myfilename

    • All that does is allow manual edits to the file – lots of text editors can do that. What I needed here was the ability to really manipulate the data as a spreadsheet and save changes (e.g. insert or remove columns, sort data, replace values within a column etc). A text editor cannot do these things to a CSV file.

  6. poonsak says:

    thank you sir, it work properly for my users 🙂

  7. cynthia says:

    i had a simillar problem and take me a few days to search information about csv and i found this.
    this information really help me a lot.

    thanks

  8. Gazzer says:

    The issue isn’t really the Mac. BBEdit, SublimeText, TextWrangler has no problem with these files. Basically, the issue is Microsoft not getting its act together with Excel.

  9. Vilgaile says:

    This is great – thank you very much! I was struggling with this for quite some time for Czech Republic special characters, and using the LibreOffice solved it!

  10. Yord Lertsak says:

    Hi Jeremy . What do you mean by “I left the options, and that did the trick” ?

  11. Markus says:

    Running into similar issues with an xls file encoded in UTF-8.
    Even Libre office is giving me the cold shoulder there.

  12. Just used LibreOffice and worked perfectly! Thanks!

  13. Bobbie Chan says:

    jrothmanshore , you have no idea how many people’s live you have saved here. I am grateful that I found you!!! Two thumbs up!

  14. Pavla says:

    THANK YOU!!! Have been trying to solve this for ages, this finally helped!

  15. Craig says:

    In Mac Excel 2011, did you try using the Text Import Wizard available from the File => Import menu? Choose CSV file as the file type, choose your CSV file in the file browser, and on Step 1 of the Text Import Wizard which follows, about midway down the dialog, it says “File origin”. Choose “Unicode 7.0 UTF-8” in the dropdown and continue with the wizard.

  16. Oleg says:

    Thank you ! This is very helpful indeed.

  17. Ricky says:

    Google Sheets also seems to be able to cope fine with UTF-8, without even having to specify the encoding.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s