Unicode blues in C++ and similar languages (after copying from web pages, Skype chat, etc.)—”error: stray \342″

The short version: Compiler errors of the type “error: stray \342” are not mysterious at all. They can easily be analysed directly, without any guesswork required whatsoever. Triplets or doublets of errors, starting with 342 (octal) or 302 (also octal), are converted to hexadecimal, searched for in Unicode code point tables, and a regular expression is developed for searching for (and replacing) them directly in any modern text editor, incl. for invisible ones, like ZERO WIDTH SPACE. Thus retyping code is not necessary nor is trying to guess by visual inspection (some are also not possible to visually distinguish or are literally invisible). It also scales: No matter how large the file is, the culprits are easily found en masse (the compiler also provides the information, but it may be overwhelming and it is much less direct as one has to interpret the doublets or triplets manually).

Also, in this blog post, as a shortcut, the most common ones encountered in the wild have been mapped, so it is not necessary to analyse the error numbers and/or use hex dumps (a straightforward, but tedious process). They can be detected by using the following regular expression in any modern text editor or IDE:

\x{00A0}|\x{00A6}|\x{00AB}|\x{00AE}|\x{00BB}|\x{00CD}|\x{00E4}|\x{037E}|\x{2003}|\x{2009}|\x{200B}|\x{200C}|\x{2013}|\x{2014}|\x{2018}|\x{2019}|\x{201C}|\x{201D}|\x{2028}|\x{2029}|\x{202A}|\x{202B}|\x{202C}|\x{2060}|\x{21B5}|\x{2011}|\x{2212}|\x{2217}|\x{2260}|\x{FEFF}|\x{FF1A}|\x{FFFC}|\x{FFFD}

That is for NO-BREAK SPACE, BROKEN BAR, LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, REGISTERED SIGN, RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, LATIN CAPITAL LETTER I WITH ACUTE, LATIN SMALL LETTER A WITH DIAERESIS, GREEK QUESTION MARK, EM SPACE, THIN SPACE, ZERO WIDTH SPACE, ZERO WIDTH NON-JOINER, EN DASH, EM DASH, LEFT SINGLE QUOTATION MARK, RIGHT SINGLE QUOTATION MARK, LEFT DOUBLE QUOTATION MARK, RIGHT DOUBLE QUOTATION MARK, LINE SEPARATOR, PARAGRAPH SEPARATOR, LEFT-TO-RIGHT EMBEDDING, RIGHT-TO-LEFT EMBEDDING, POP DIRECTIONAL FORMATTING, WORD JOINER, DOWNWARDS ARROW WITH CORNER LEFTWARDS, MINUS SIGN, ASTERISK OPERATOR, NOT EQUAL TO, ZERO WIDTH NO-BREAK SPACE, FULLWIDTH COLON, OBJECT REPLACEMENT CHARACTER, REPLACEMENT CHARACTER, respectively (32).

Note: The regular expression notation is different in Visual Studio Code (and probably others):

\u00A0|\u00A6|\u00AB|\u00AE|\u00BB|\u00CD|\u00E4|\u037E|\u2003|\u2009|\u200B|\u200C|\u2013|\u2014|\u2018|\u2019|\u201C|\u201D|\u2028|\u2029|\u202A|\u202B|\u202C|\u2060|\u21B5|\u2011|\u2212|\u2217|\u2260|\uFEFF|\uFF1A|\uFFFC|\uFFFD

This also works when the stray error numbers are not available or incomplete (though the full source should be available).

Introduction

Compilation of innocently looking C++ (or C) source code may result in errors like:

someFile.c:42: error: stray ‘\302’ in program
someFile.c:42: error: stray ‘\244’ in program

Analysis

Hex dump

Here, Linux’ command-line tool ‘hexdump‘ is used, but any hex dump tool will do.

“80” (option -n) is the number of characters to dump.

0x60, decimal 96 (option -s) is the offset into the file (for example, if the offset is past the end of the file, the output will be empty…). Set it to 0x0 for the beginning of the file.

clear ; hexdump -s 0x60 -n 80 -e '"%08.8_ax  " 8/1 "%02X " "  " 8/1 "%02X " "  |"'    -e '16/1 "%_p""|\n"'  '/home/mortensen/temp2/2023-04-18/Strange.txt'

Positively identifying the strange characters in a text editor or IDE

This can be done by using regular expressions in any modern text editor or IDE (but not, for example, in the Arduino IDE).

This is particularly important for longer documents and source code.

Note: in Visual Studio Code (and probably others) the notation is different: \u00A0 (instead of \x{00A0})

Combined regular expression

\x{00A0}|\x{00A6}|\x{00AB}|\x{00AE}|\x{00BB}|\x{00E4}|\x{2003}|\x{2009}|\x{200B}|\x{200C}|\x{2013}|\x{2014}|\x{2018}|\x{2019}|\x{201C}|\x{201D}|\x{2028}|\x{2029}|\x{202A}|\x{202B}|\x{202C}|\x{2060}|\x{21B5}|\x{2011}|\x{2212}|\x{2217}|\x{2260}|\x{FEFF}|\x{FF1A}|\x{FFFC}|\x{FFFD}

NO-BREAK SPACE

\x{00A0}

Table of common Unicode characters causing this problem (actually encountered in the wild)

This table can used to quickly identify the offending Unicode from the “error: Stray” compiler errors. The third number in a triplet is the most specific. For example, “230” for U+2018 (LEFT SINGLE QUOTATION MARK).

Note that to search for hexadecimal UTF-8 sequences, each number should be preceded by “0x” to directly search in the content of the table. Example: A hexadecimal dump may have “E2 80 9C”. Use “0xE2 0x80 0x9C” to search in the table (with a single space separating the numbers).

Oct  Dec  Hex   Start of    Start of sequence,                   Comment fragment
                seq, type   Unicode code point
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
240  160  0xA0  CE/CP-1250  U+00A0  NO-BREAK SPACE

302  194  0xC2  UTF-8       U+00A0  NO-BREAK SPACE               194 160 (decimal) → 0xC2 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A0 ([NO-BREAK SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=156&number=128)).
240  160  0xA0                                                   302 240 (octal)   → 0xC2 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A0 ([NO-BREAK SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=156&number=128)).

                                               Alternative web site:
                                                                 194 160 (decimal) → 0xC2 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A0 ([NO-BREAK SPACE](https://www.charset.org/utf-8)).
                                                                 302 240 (octal)   → 0xC2 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A0 ([NO-BREAK SPACE](https://www.charset.org/utf-8)).

302  194  0xC2  UTF-8       U+00A6  BROKEN BAR                   194 166 (decimal) → 0xC2 0xA6 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A6 ([BROKEN BAR](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
246  166  0xA6                                                   302 246 (octal)   → 0xC2 0xA6 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A6 ([BROKEN BAR](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).

302  194  0xC2  UTF-8       U+00AB  LEFT-POINTING DOUBLE ANGLE QUOTATION MARK  194 171 (decimal) → 0xC2 0xAB (hexadecimal) → UTF-8 sequence for Unicode code point U+00AB ([LEFT-POINTING DOUBLE ANGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
253  171  0xAB                                                                 302 246 (octal)   → 0xC2 0xAB (hexadecimal) → UTF-8 sequence for Unicode code point U+00AB ([LEFT-POINTING DOUBLE ANGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).

302  194  0xC2  UTF-8       U+00BB  RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK  194 187 (decimal) → 0xC2 0xBB (hexadecimal) → UTF-8 sequence for Unicode code point U+00BB ([RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
273  187  0xBB                                                                  302 273 (octal)   → 0xC2 0xBB (hexadecimal) → UTF-8 sequence for Unicode code point U+00BB ([RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).

302  194  0xC2  UTF-8       U+00AE  REGISTERED SIGN                            194 174 (decimal) → 0xC2 0xAE (hexadecimal) → UTF-8 sequence for Unicode code point U+00AE ([REGISTERED SIGN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
256  174  0xAE                                                                 302 256 (octal)   → 0xC2 0xAE (hexadecimal) → UTF-8 sequence for Unicode code point U+00AE ([REGISTERED SIGN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).

========================================================================================================

Start of \342 \200 series:

342  226  0xE2  UTF-8       U+2003  EM SPACE                     226 128 131 (decimal) → 0xE2 0x80 0x83 (hexadecimal) → UTF-8 sequence for Unicode code point U+2003 ([EM SPACE](https://www.charset.org/utf-8/9)).
200  128  0x80                                                   342 200 203 (octal)   → 0xE2 0x80 0x83 (hexadecimal) → UTF-8 sequence for Unicode code point U+2003 ([EM SPACE](https://www.charset.org/utf-8/9)).
203  131  0x83

342  226  0xE2  UTF-8       U+2009  THIN SPACE                   226 128 137 (decimal) → 0xE2 0x80 0x89 (hexadecimal) → UTF-8 sequence for Unicode code point U+2009 ([THIN SPACE](https://www.charset.org/utf-8/9)).
200  128  0x80                                                   342 200 211 (octal)   → 0xE2 0x80 0x89 (hexadecimal) → UTF-8 sequence for Unicode code point U+2009 ([THIN SPACE](https://www.charset.org/utf-8/9)).
211  137  0x89

342  226  0xE2  UTF-8       U+200B  ZERO WIDTH SPACE             226 128 139 (decimal) → 0xE2 0x80 0x8B (hexadecimal) → UTF-8 sequence for Unicode code point U+200B ([ZERO WIDTH SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200  128  0x80                                                   342 200 213 (octal)   → 0xE2 0x80 0x8B (hexadecimal) → UTF-8 sequence for Unicode code point U+200B ([ZERO WIDTH SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
213  139  0x8B
                                               Alternative web site:
                                                                 226 128 139 (decimal) → 0xE2 0x80 0x8B (hexadecimal) → UTF-8 sequence for Unicode code point U+200B ([ZERO WIDTH SPACE](https://www.charset.org/utf-8/9)).
                                                                 342 200 213 (octal)   → 0xE2 0x80 0x8B (hexadecimal) → UTF-8 sequence for Unicode code point U+200B ([ZERO WIDTH SPACE](https://www.charset.org/utf-8/9)).

342  226  0xE2  UTF-8       U+200C  ZERO WIDTH NON-JOINER        226 128 140 (decimal) → 0xE2 0x80 0x8C (hexadecimal) → UTF-8 sequence for Unicode code point U+200C ([ZERO WIDTH NON-JOINER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
200  128  0x80                                                   342 200 214 (octal)   → 0xE2 0x80 0x8C (hexadecimal) → UTF-8 sequence for Unicode code point U+200C ([ZERO WIDTH NON-JOINER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
214  140  0x8C

226  150  0x96  CE/CP-1250  U+2013  EN DASH

342  226  0xE2  UTF-8       U+2013  EN DASH                      226 128 147 (decimal) → 0xE2 0x80 0x93 (hexadecimal) → UTF-8 sequence for Unicode code point U+2013 ([EN DASH](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200  128  0x80                                                   342 200 223 (octal)   → 0xE2 0x80 0x93 (hexadecimal) → UTF-8 sequence for Unicode code point U+2013 ([EN DASH](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
223  147  0x93

  Note: 223 can also be the stand-alone CE/CP-1250, corresponding to U+201C (LEFT DOUBLE QUOTATION MARK).

342  226  0xE2  UTF-8       U+2014  EM DASH                      226 128 148 (decimal) → 0xE2 0x80 0x94 (hexadecimal) → UTF-8 sequence for Unicode code point U+2014 ([EM DASH](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200  128  0x80                                                   342 200 224 (octal)   → 0xE2 0x80 0x94 (hexadecimal) → UTF-8 sequence for Unicode code point U+2014 ([EM DASH](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
224  148  0x94

  Note: 224 can also be the stand-alone CE/CP-1250, corresponding to U+201D (RIGHT DOUBLE QUOTATION MARK).

221  145  0x91  CE/CP-1250  U+2018  LEFT SINGLE QUOTATION MARK   [CE/CP-1250 0x91](https://en.wikipedia.org/wiki/Windows-1250#Character_set) (145 (decimal), 221 (octal)), corresponding to Unicode code point U+2018 ([LEFT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8212&number=128)).

342  226  0xE2  UTF-8       U+2018  LEFT SINGLE QUOTATION MARK   226 128 152 (decimal) → 0xE2 0x80 0x98 (hexadecimal) → UTF-8 sequence for Unicode code point U+2018 ([LEFT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200  128  0x80                                                   342 200 230 (octal)   → 0xE2 0x80 0x98 (hexadecimal) → UTF-8 sequence for Unicode code point U+2018 ([LEFT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
230  152  0x98

222  146  0x92  CE/CP-1250  U+2019  RIGHT SINGLE QUOTATION MARK  [CE/CP-1250 0x91](https://en.wikipedia.org/wiki/Windows-1250#Character_set) (145 (decimal), 221 (octal)), corresponding to Unicode code point U+2019 ([RIGHT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8212&number=128)).

342  226  0xE2  UTF-8       U+2019  RIGHT SINGLE QUOTATION MARK  226 128 153 (decimal) → 0xE2 0x80 0x98 (hexadecimal) → UTF-8 sequence for Unicode code point U+2019 ([RIGHT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8212&number=128)).
200  128  0x80                                                   342 200 231 (octal)   → 0xE2 0x80 0x98 (hexadecimal) → UTF-8 sequence for Unicode code point U+2019 ([RIGHT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8212&number=128)).
231  153  0x99

223  147  0x93  CE/CP-1250  U+201C  LEFT DOUBLE QUOTATION MARK

342  226  0xE2  UTF-8       U+201C  LEFT DOUBLE QUOTATION MARK   226 128 156 (decimal) → 0xE2 0x80 0x9C (hexadecimal) → UTF-8 sequence for Unicode code point U+201C ([LEFT DOUBLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200  128  0x80                                                   342 200 234 (octal)   → 0xE2 0x80 0x9C (hexadecimal) → UTF-8 sequence for Unicode code point U+201C ([LEFT DOUBLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
234  156  0x9C
                                               Alternative web site:
                                                                 342 200 234 (octal)   → 0xE2 0x80 0x9C (hexadecimal) → UTF-8 sequence for Unicode code point U+201C ([LEFT DOUBLE QUOTATION MARK](https://www.charset.org/utf-8/9)).

224  148  0x94  CE/CP-1250  U+201D  RIGHT DOUBLE QUOTATION MARK

342  226  0xE2  UTF-8       U+201D  RIGHT DOUBLE QUOTATION MARK  226 128 157 (decimal) → 0xE2 0x80 0x9D (hexadecimal) → UTF-8 sequence for Unicode code point U+201D ([RIGHT DOUBLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200  128  0x80                                                   342 200 235 (octal)   → 0xE2 0x80 0x9D (hexadecimal) → UTF-8 sequence for Unicode code point U+201D ([RIGHT DOUBLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
235  157  0x9D
                                               Alternative web site:
                                                                 226 128 157 (decimal) → 0xE2 0x80 0x9D (hexadecimal) → UTF-8 sequence for Unicode code point U+201D ([RIGHT DOUBLE QUOTATION MARK](https://www.charset.org/utf-8/9)).
                                                                 342 200 235 (octal)   → 0xE2 0x80 0x9D (hexadecimal) → UTF-8 sequence for Unicode code point U+201D ([RIGHT DOUBLE QUOTATION MARK](https://www.charset.org/utf-8/9)).

342  226  0xE2  UTF-8       U+2028  LINE SEPARATOR               226 128 168 (decimal) → 0xE2 0x80 0xA8 (hexadecimal) → UTF-8 sequence for Unicode code point U+2028 ([LINE SEPARATOR](https://www.charset.org/utf-8/9)).
200  128  0x80                                                   342 200 250 (octal)   → 0xE2 0x80 0xA8 (hexadecimal) → UTF-8 sequence for Unicode code point U+2028 ([LINE SEPARATOR](https://www.charset.org/utf-8/9)).
250  168  0xA8

342  226  0xE2  UTF-8       U+2029  PARAGRAPH SEPARATOR          226 128 169 (decimal) → 0xE2 0x80 0xA9 (hexadecimal) → UTF-8 sequence for Unicode code point U+2029 ([PARAGRAPH SEPARATOR](https://www.charset.org/utf-8/9)).
200  128  0x80                                                   342 200 251 (octal)   → 0xE2 0x80 0xA9 (hexadecimal) → UTF-8 sequence for Unicode code point U+2029 ([PARAGRAPH SEPARATOR](https://www.charset.org/utf-8/9)).
251  169  0xA9

342  226  0xE2  UTF-8       U+202A  LEFT-TO-RIGHT EMBEDDING      226 128 170 (decimal) → 0xE2 0x80 0xAA (hexadecimal) → UTF-8 sequence for Unicode code point U+202A ([LEFT-TO-RIGHT EMBEDDING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
200  128  0x80                                                   342 200 252 (octal)   → 0xE2 0x80 0xAA (hexadecimal) → UTF-8 sequence for Unicode code point U+202A ([LEFT-TO-RIGHT EMBEDDING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
252  170  0xAA

342  226  0xE2  UTF-8       U+202B  RIGHT-TO-LEFT EMBEDDING      226 128 171 (decimal) → 0xE2 0x80 0xAB (hexadecimal) → UTF-8 sequence for Unicode code point U+202B ([RIGHT-TO-LEFT EMBEDDING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
200  128  0x80                                                   342 200 253 (octal)   → 0xE2 0x80 0xAB (hexadecimal) → UTF-8 sequence for Unicode code point U+202B ([RIGHT-TO-LEFT EMBEDDING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
253  171  0xAB

342  226  0xE2  UTF-8       U+202C  POP DIRECTIONAL FORMATTING   226 128 172 (decimal) → 0xE2 0x80 0xAC (hexadecimal) → UTF-8 sequence for Unicode code point U+202C ([POP DIRECTIONAL FORMATTING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
200  128  0x80                                                   342 200 254 (octal)   → 0xE2 0x80 0xAC (hexadecimal) → UTF-8 sequence for Unicode code point U+202C ([POP DIRECTIONAL FORMATTING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
254  172  0xAC

========================================================================================================

Start of \342 \20x series:

342  226  0xE2  UTF-8       U+2060  WORD JOINER                  226 129 160 (decimal) → 0xE2 0x81 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+2060 ([WORD JOINER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8064)).
201  129  0x81                                                   342 201 240 (octal)   → 0xE2 0x81 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+2060 ([WORD JOINER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8064)).
240  160  0xA0

342  226  0xE2  UTF-8       U+21B5  DOWNWARDS ARROW WITH CORNER LEFTWARDS  226 134 181 (decimal) → 0xE2 0x86 0xB5 (hexadecimal) → UTF-8 sequence for Unicode code point U+21B5 ([DOWNWARDS ARROW WITH CORNER LEFTWARDS](https://www.charset.org/utf-8/9)).
206  134  0x86                                                             342 206 265 (octal)   → 0xE2 0x86 0xB5 (hexadecimal) → UTF-8 sequence for Unicode code point U+21B5 ([DOWNWARDS ARROW WITH CORNER LEFTWARDS](https://www.charset.org/utf-8/9)).
265  181  0xB5

342  226  0xE2  UTF-8       U+2011  NON-BREAKING HYPHEN          226 128 145 (decimal) → 0xE2 0x80 0x91 (hexadecimal) → UTF-8 sequence for Unicode code point U+2011 ([NON-BREAKING HYPHEN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8205&number=128)).
200  128  0x80                                                   342 200 221 (octal)   → 0xE2 0x80 0x91 (hexadecimal) → UTF-8 sequence for Unicode code point U+2011 ([NON-BREAKING HYPHEN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8205&number=128)).
221  145  0x91

342  226  0xE2  UTF-8       U+2212  MINUS SIGN                   226 136 146 (decimal) → 0xE2 0x88 0x92 (hexadecimal) → UTF-8 sequence for Unicode code point U+2212 ([MINUS SIGN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
210  136  0x88                                                   342 210 222 (octal)   → 0xE2 0x88 0x92 (hexadecimal) → UTF-8 sequence for Unicode code point U+2212 ([MINUS SIGN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
222  146  0x92

  Note: 222 can also be the stand-alone CE/CP-1250, corresponding to U+2019 (RIGHT SINGLE QUOTATION MARK).

                                               Alternative web site:
                                                                 226 136 146 (decimal) → 0xE2 0x88 0x92 (hexadecimal) → UTF-8 sequence for Unicode code point U+2212 ([MINUS SIGN](https://www.charset.org/utf-8/9)).
                                                                 342 210 222 (octal)   → 0xE2 0x88 0x92 (hexadecimal) → UTF-8 sequence for Unicode code point U+2212 ([MINUS SIGN](https://www.charset.org/utf-8/9)).

342  226  0xE2  UTF-8       U+2217  ASTERISK OPERATOR            226 136 151 (decimal) → 0xE2 0x88 0x97 (hexadecimal) → UTF-8 sequence for Unicode code point U+2217 ([ASTERISK OPERATOR](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
210  136  0x88                                                   342 210 227 (octal)   → 0xE2 0x88 0x97 (hexadecimal) → UTF-8 sequence for Unicode code point U+2217 ([ASTERISK OPERATOR](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
227  151  0x97

342  226  0xE2  UTF-8       U+2260  NOT EQUAL TO                 226 137 160 (decimal) → 0xE2 0x89 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+2260 ([NOT EQUAL TO](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
211  137  0x89                                                   342 211 240 (octal)   → 0xE2 0x89 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+2260 ([NOT EQUAL TO](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
240  160  0xA0

========================================================================================================

357  239  0xEF  UTF-8       U+FEFF  ZERO WIDTH NO-BREAK SPACE    239 187 191 (decimal) → 0xEF 0xBB 0xBF (hexadecimal) → UTF-8 sequence for Unicode code point U+FEFF ([ZERO WIDTH NO-BREAK SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65272&number=128)).
273  187  0xBB                                                   357 273 277 (octal)   → 0xEF 0xBB 0xBF (hexadecimal) → UTF-8 sequence for Unicode code point U+FEFF ([ZERO WIDTH NO-BREAK SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65272&number=128)).
277  191  0xBF
                                               Alternative web site:
                                                                 239 187 191 (decimal) → 0xEF 0xBB 0xBF (hexadecimal) → UTF-8 sequence for Unicode code point U+FEFF ([ZERO WIDTH NO-BREAK SPACE](https://www.charset.org/utf-8/66)).

344  228  0xE4  CE/CP-1250  U+00E4  LATIN SMALL LETTER A WITH DIAERESIS     https://www.utf8-chartable.de/unicode-utf8-table.pl?start=224

303  195  0xC3  UTF-8       U+00E4  LATIN SMALL LETTER A WITH DIAERESIS     195 164 (decimal) → 0xC3 0xA4 (hexadecimal) → UTF-8 sequence for Unicode code point U+00E4 ([LATIN SMALL LETTER A WITH DIAERESIS](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=224)).
244  164  0xA4                                                              303 244 (octal)   → 0xC3 0xA4 (hexadecimal) → UTF-8 sequence for Unicode code point U+00E4 ([LATIN SMALL LETTER A WITH DIAERESIS](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=224)).

357  239  0xEF  UTF-8       U+FF1A  FULLWIDTH COLON              239 188 154 (decimal) → 0xEF 0xBC 0x9A (hexadecimal) → UTF-8 sequence for Unicode code point U+FF1A ([FULLWIDTH COLON](https://www.charset.org/utf-8/66)).
274  188  0xBC                                                   357 274 232 (octal)   → 0xEF 0xBC 0x9A (hexadecimal) → UTF-8 sequence for Unicode code point U+FF1A ([FULLWIDTH COLON](https://www.charset.org/utf-8/66)).
232  154  0x9A

357  239  0xEF  UTF-8       U+FFFC  OBJECT REPLACEMENT CHARACTER  239 191 188 (decimal) → 0xEF 0xBF 0xBC (hexadecimal) → UTF-8 sequence for Unicode code point U+FFFC ([OBJECT REPLACEMENT CHARACTER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65526)).
277  191  0xBF                                                    357 277 274 (octal)   → 0xEF 0xBF 0xBC (hexadecimal) → UTF-8 sequence for Unicode code point U+FFFC ([OBJECT REPLACEMENT CHARACTER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65526)).
274  188  0xBC

357  239  0xEF  UTF-8       U+FFFD  REPLACEMENT CHARACTER         239 191 189 (decimal) → 0xEF 0xBF 0xBD (hexadecimal) → UTF-8 sequence for Unicode code point U+FFFD ([REPLACEMENT CHARACTER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65526)).
277  191  0xBF                                                    357 277 275 (octal)   → 0xEF 0xBF 0xBD (hexadecimal) → UTF-8 sequence for Unicode code point U+FFFD ([REPLACEMENT CHARACTER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65526)).
275  189  0xBD

Signatures

The UTF-8 sequences often start with:

0xC2 (octal 302). Corresponding part of an error message: “error: stray ‘\302’ in program”. Stack Overflow search. This covers the canonical question (mentioned above).

0xE2 (octal 342). Corresponding part of an error message: “error: stray ‘\342’ in program”. Stack Overflow search.

A less specific search (for “error stray in program”).

Other languages than C or C++

PowerShell

Real-world example: copying code through Skype chat may introduce U+00A0 (NO-BREAK SPACE).

This will result in a confusing error message. Something like:

“Â : The term ‘Â’ is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At C:\UserData\PowerShell\BuildScripts\TempTest.ps1:14”

Non UTF-8 sequences

CE/CP-1250

From the table, these are the character byte values in octal (and can thus be used as signatures in “stray” compiler error output), with their corresponding Unicode characters:

221 (LEFT SINGLE QUOTATION MARK), 222 (RIGHT SINGLE QUOTATION MARK), 223 (LEFT DOUBLE QUOTATION MARK), 224 (RIGHT DOUBLE QUOTATION MARK), 226 (EN DASH), 240 (NO-BREAK SPACE), and 344 (LATIN SMALL LETTER A WITH DIAERESIS)

Binary

Common: \177. It is not known if this is specific to a certain type of system or environment, e.g., binary files on Linux (ELF).

Indeed, the very first byte of an ELF file is 0x7F (octal 177, decimal 127), followed by the three (in ASCII) characters E, F, and F.

Error messages

They vary depending on the programming language.

C/C++

“someFile.c:42: error: stray ‘\302’ in program”

Note that the number is usually in octal, but they have also been observed in decimal (this may vary depending on the compiler or its configuration).

PowerShell

“Â : The term ‘Â’ is not recognized as the name of a cmdlet, function, script file, or operable program.”

Automation

In order to demystify it (and save time), these checks can added to build scripts and IDEs/projects. Then an understandable and much better error message can be issued.

For example, it might be included in Edit Overflow’s build script.

Stack Overflow

Canonical question

For the origin of the offending characters by copying code from web pages, PDF documents, and chat (e.g. Skype Chat or Facebook Messenger), etc.:

Compilation error: stray ‘\302’ in program, etc.

Canned comments:

Related: *[Compilation error: stray ‘\302’ in program, etc.](https://stackoverflow.com/questions/19198332/)*
This is a ***very*** common error when copying code from web pages, [PDF](https://en.wikipedia.org/wiki/Portable_Document_Format) documents, through chat (e.g. [Skype Chat](https://en.wikipedia.org/wiki/Features_of_Skype#Skype_chat) or [Facebook Messenger](https://en.wikipedia.org/wiki/Facebook_Messenger)), etc. The canonical question is *[Compilation error: stray ‘\302’ in program, etc.](https://stackoverflow.com/questions/19198332)*.
The most common ones can ***positively*** (guesswork isn't required) ***be searched*** for (and replaced) using the regular expression \x{00A0}|\x{00A6}|\x{00AB}|\x{00AE}|\x{00BB}|\x{00E4}|\x{2003}|\x{2009}|\x{200B}|\x{200C}|\x{2013}|\x{2014}|\x{2018}|\x{2019}|\x{201C}|\x{201D}|\x{2028}|\x{2029}|\x{202A}|\x{202B}|\x{202C}|\x{2060}|\x{21B5}|\x{2011}|\x{2212}|\x{2217}|\x{2260}|\x{FEFF}|\x{FF1A}|\x{FFFC}|\x{FFFD} (NO-BREAK SPACE, BROKEN BAR, LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, REGISTERED SIGN, RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, LATIN SMALL LETTER A WITH DIAERESIS, EM SPACE, THIN SPACE, ZERO WIDTH SPACE, ZERO WIDTH NON-JOINER, EN DASH, EM DASH, LEFT SINGLE QUOTATION MARK, RIGHT SINGLE QUOTATION MARK, LEFT DOUBLE QUOTATION MARK, RIGHT DOUBLE QUOTATION MARK, LINE SEPARATOR, PARAGRAPH SEPARATOR, LEFT-TO-RIGHT EMBEDDING, RIGHT-TO-LEFT EMBEDDING, POP DIRECTIONAL FORMATTING, WORD JOINER, DOWNWARDS ARROW WITH CORNER LEFTWARDS, MINUS SIGN, ASTERISK OPERATOR, NOT EQUAL TO, ZERO WIDTH NO-BREAK SPACE, FULLWIDTH COLON, OBJECT REPLACEMENT CHARACTER, REPLACEMENT CHARACTER, etc.).

Rendered:

  • Related: Compilation error: stray ‘\302’ in program, etc.
  • This is a very common error when copying code from web pages, PDF documents, through chat (e.g. Skype Chat or Facebook Messenger), etc. The canonical question is Compilation error: stray ‘\302’ in program, etc..
  • The most common ones can positively (guesswork isn’t required) be searched for (and replaced) using the regular expression \x{00A0}|\x{00A6}|\x{00AB}|\x{00AE}|\x{00BB}|\x{00E4}|\x{2003}|\x{2009}|\x{200B}|\x{200C}|\x{2013}|\x{2014}|\x{2018}|\x{2019}|\x{201C}|\x{201D}|\x{2028}|\x{2029}|\x{202A}|\x{202B}|\x{202C}|\x{2060}|\x{21B5}|\x{2011}|\x{2212}|\x{2217}|\x{2260}|\x{FEFF}|\x{FF1A}|\x{FFFC}|\x{FFFD} (NO-BREAK SPACE, BROKEN BAR, LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, REGISTERED SIGN, RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, LATIN SMALL LETTER A WITH DIAERESIS, EM SPACE, THIN SPACE, ZERO WIDTH SPACE, ZERO WIDTH NON-JOINER, EN DASH, EM DASH, LEFT SINGLE QUOTATION MARK, RIGHT SINGLE QUOTATION MARK, LEFT DOUBLE QUOTATION MARK, RIGHT DOUBLE QUOTATION MARK, LINE SEPARATOR, PARAGRAPH SEPARATOR, LEFT-TO-RIGHT EMBEDDING, RIGHT-TO-LEFT EMBEDDING, POP DIRECTIONAL FORMATTING, WORD JOINER, DOWNWARDS ARROW WITH CORNER LEFTWARDS, MINUS SIGN, ASTERISK OPERATOR, NOT EQUAL TO, ZERO WIDTH NO-BREAK SPACE, FULLWIDTH COLON, OBJECT REPLACEMENT CHARACTER, REPLACEMENT CHARACTER, etc.).

Detection of new Stack Overflow questions

A relative efficient method is on-site search with “error stray in program”:

https://stackoverflow.com/search?tab=newest&q=error%20stray%20in%20program&searchOn=3

Web site blues

On 2023-04-25, https://www.utf8-chartable.de/ timed out. But it came back.

Resources

Unicode lookup sites

https://www.utf8-chartable.de/. Example: https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128

https://codepoints.net/. Example: https://codepoints.net/U+3F38

https://www.charset.org/. Example: https://www.charset.org/utf-8/66

https://www.fileformat.info/. Example: https://www.fileformat.info/info/unicode/char/2217/index.htm. U+FEFF: https://www.fileformat.info/info/charset/UTF-16/list.htm?start=44205. U+FEFF may result in the signature doublet “stray \377 … stray \376”, possible only if UTF-16 is used (FF for BOM, UTF-16LE (little-endian), octal 377, decimal 255, hexadecimal 0xFF). For UTF-8, the signature is octal 357 (followed by 273 and 277), hexadecimal 0xEF 0xBB 0xBF, decimal 239 187 191.

Leave a Reply

Your email address will not be published. Required fields are marked *

*