Answered: What is this Word non-printable character that looks like a tilde (~)?

We came across a strange bug today when exporting some data from a database into Excel. When we viewed the text through our web interface, it showed as the equivalent of “The quick brown fox jumped over the lazy dog.”  However, when we exported the data to excel, it showed up as “The quick brown foxÂjumped over the lazy dog.”

What was that funny character?  It wasn’t visible in the text field in our web interface.

We traced the source of the data back to that most dreaded text formatting tool – Microsoft Word.  Word always causes me headaches with its constant need to fix everything, turning my normal quotes (“) into curly quotes (“”) or hypens (-) into m-dashes (—), so I immediately suspected it was the source of the problem.

I turned on “Show all printing characters”, and sure enough, there was a strange mark at that spot:

Show all nonprinting characters shows spaces, tabs, carriage returns, etc.

Show all nonprinting characters shows spaces, tabs, carriage returns, etc.

But what was it?  I am used to the small dot to indicate a space, but this looked more like a tilde character:

A space is normally represented as just a dot, but this is something different

A space is normally represented as just a dot, but this is something different

I had never seen it before.  Without the non-printing characters on the screen, it looked just like a space.  But clearly Word understood it to be something different.

I checked a variety of lists of non-printable characters Word uses, but it never seemed to appear. I then googled the “” and came across a Stack Overflow discussion amongst some engineers about what it was.

It turns out that is a non-breaking space.  Web developers are very familiar with it as the “ ” string, which is used to force the web browser to render a space. In HTML, all spaces are considering non-essential, so if you have five in a row, the browser will just render a single space.  You use a non-breaking space when you want several spaces to really be rendered.

Microsoft Word supports them, but for their original purpose – non-breaking spaces. The idea of a non-breaking space is that it separates two words but does not allow the text layout engine to choose it as a place for a line break.  For example, if you wanted to make sure that “New York” was treated as a single phrase and did not wrap text in the middle of it (with one line ending on “New” and the next one starting on “York”), you could put a non-breaking space in there to stop it.

It seems that somehow a non-breaking space was introduced into our document.  It doesn’t appear to have been Word making changes automatically like it does with quotation marks , since it is only at this particular place.  How did they get in there?

Then, it dawned on me.  Perhaps the problem was before Word.  Where had I seen non-breaking spaces before? They are very common in HTML. As an experiment, I created a simple web page that had a single line of text with an “ ” at one spot instead of a space.  I then opened it in a web browser, copied the text, and pasted it into Word.  I then turned on “Show all nonprinting characters”, and there was the squiggle.  The non-breaking space had been copied into Word.

So, this bad character had been passed through generations of systems from an HTML page to Word to our web application to our database to the generated report to excel. It was like a genetic defect, passed down hidden through generations until it finally manifested.

So, the moral of the story: be very careful when copying from HTML pages and pasting them into Word.

This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink.

7 Responses to Answered: What is this Word non-printable character that looks like a tilde (~)?

  1. Robert Rothman says:

    Liked the Genetics analogy!!!

    8080B

    On Sep 4, 2013, at 4:29 PM, Jeremy Rothman-Shore Blog wrote:

    >

  2. Amy Brown says:

    This is such great timing! I’m working on a Markdown text file where for some reason the person who created the file put two spaces after each period, the second one (pointlessly) being a non-breaking space. I couldn’t figure out how to get my text editor (vi) to tell me what kind of character it was. (I knew it wasn’t a second space because searching for two spaces didn’t find anything.) Finally I copied and pasted into Word, which showed the same character you found. A quick Google on “space tilde word” brought me here. I love the internet!

    Now to find the guy who put “space non-breaking-space” after each period and confiscate his keyboard. “No more typing for you!”

  3. blindy says:

    Huge help. Ripped large table of data from webpage and pasted into MS Excel. There was an &nbsp after each number, and Excel would only understand them as text. Used Replace, entered the character for non-breaking space U+00A0, replaced with ‘nothing’ and wiped out >30000 &nbsp’s, now Excel understands them as numbers.

    Thank you!

  4. KC says:

    Just ran into this problem today…..Unbelievably, in another example of Microsoft’s sometimes incredible stupidity: The CLEAN text function in Excel will remove non-printable characters….EXCEPT for the non-breaking space / &nbsp . Microsoft’s and others “tutorials” for this problem suggest similar ways of combining of the SUBSTITUTE, CHAR and TRIM functions to accomplish this.

    Don’t even try – it’s ludicrously cumbersome. As blindy helpfully noted, a find/replace is so much easier. NOTE: If you forget the U+00A0 code, you can Select and Copy one of the &nbsp’s, then Paste it into the find area. It will be as invisible as the space you type in to replace it, but it works beautifully.

    However, the problem shouldn’t be happening in the first place, and wouldn’t if Excel’s import was intelligent enough to include some actually USEFUL options, such as being able to substitute standard for non-breaking or non-printing characters during import…or if the damned CLEAN function did what it SHOULD do, and removed ALL non-printing characters. After all, it’s a not a page layout program….why the hell does it allow that crap in a spreadsheet in the first place?

  5. KC says:

    PS, I forgot to say THANK YOU!!! I had no idea what the “tildedot” was until I read this. It’s what enabled me so quickly to find out WHY nothing was working right. (I.e., even though Excel’s SUBSTITUTE function recognizes the entire Unicode character set, Microsoft is incapable of updating TRIM or CLEAN to do the same, or for that matter, to recognize anything other than the first 32 characters of the 7-bit ASCII set they were “designed” to use…literally, last century.)

    Sorry… all that sounds a bit as if I’m thanking you for helping to raise my blood pressure and lengthen my ever-growing list of, “Why I Hate Microsoft.”

    The thank you is because your post stopped me from flinging my laptop across the room out of sheer frustration. For that, and for the fact I will now finish my project and actually get paid, I am very, very grateful 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s