Unicode blues in C++ and similar languages (after copying from web pages, Skype chat, etc.)—”error: stray \342″
The short version: Compiler errors of the type “error: stray \342” are not mysterious at all. They can easily be analysed directly, without any guesswork required whatsoever. Triplets or doublets of errors, starting with 342 (octal) or 302 (also octal), are converted to hexadecimal, searched for in Unicode code point tables, and a regular expression is developed for searching for (and replacing) them directly in any modern text editor, incl. for invisible ones, like ZERO WIDTH SPACE. Thus retyping code is not necessary nor is trying to guess by visual inspection (some are also not possible to visually distinguish or are literally invisible). It also scales: No matter how large the file is, the culprits are easily found en masse (the compiler also provides the information, but it may be overwhelming and it is much less direct as one has to interpret the doublets or triplets manually).
Also, in this blog post, as a shortcut, the most common ones encountered in the wild have been mapped, so it is not necessary to analyse the error numbers and/or use hex dumps (a straightforward, but tedious process). They can be detected by using the following regular expression in any modern text editor or IDE:
\x{00A0}|\x{00A6}|\x{00AB}|\x{00AE}|\x{00BB}|\x{00CD}|\x{00E4}|\x{037E}|\x{2003}|\x{2009}|\x{200B}|\x{200C}|\x{2013}|\x{2014}|\x{2018}|\x{2019}|\x{201C}|\x{201D}|\x{2028}|\x{2029}|\x{202A}|\x{202B}|\x{202C}|\x{2060}|\x{21B5}|\x{2011}|\x{2212}|\x{2217}|\x{2260}|\x{FEFF}|\x{FF1A}|\x{FFFC}|\x{FFFD}
That is for NO-BREAK SPACE, BROKEN BAR, LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, REGISTERED SIGN, RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, LATIN CAPITAL LETTER I WITH ACUTE, LATIN SMALL LETTER A WITH DIAERESIS, GREEK QUESTION MARK, EM SPACE, THIN SPACE, ZERO WIDTH SPACE, ZERO WIDTH NON-JOINER, EN DASH, EM DASH, LEFT SINGLE QUOTATION MARK, RIGHT SINGLE QUOTATION MARK, LEFT DOUBLE QUOTATION MARK, RIGHT DOUBLE QUOTATION MARK, LINE SEPARATOR, PARAGRAPH SEPARATOR, LEFT-TO-RIGHT EMBEDDING, RIGHT-TO-LEFT EMBEDDING, POP DIRECTIONAL FORMATTING, WORD JOINER, DOWNWARDS ARROW WITH CORNER LEFTWARDS, MINUS SIGN, ASTERISK OPERATOR, NOT EQUAL TO, ZERO WIDTH NO-BREAK SPACE, FULLWIDTH COLON, OBJECT REPLACEMENT CHARACTER, REPLACEMENT CHARACTER, respectively (32).
Note: The regular expression notation is different in Visual Studio Code (and probably others):
\u00A0|\u00A6|\u00AB|\u00AE|\u00BB|\u00CD|\u00E4|\u037E|\u2003|\u2009|\u200B|\u200C|\u2013|\u2014|\u2018|\u2019|\u201C|\u201D|\u2028|\u2029|\u202A|\u202B|\u202C|\u2060|\u21B5|\u2011|\u2212|\u2217|\u2260|\uFEFF|\uFF1A|\uFFFC|\uFFFD
This also works when the stray error numbers are not available or incomplete (though the full source should be available).
Introduction
Compilation of innocently looking C++ (or C) source code may result in errors like:
someFile.c:42: error: stray ‘\302’ in program someFile.c:42: error: stray ‘\244’ in program
Analysis
Hex dump
Here, Linux’ command-line tool ‘hexdump‘ is used, but any hex dump tool will do.
“80” (option -n) is the number of characters to dump.
0x60, decimal 96 (option -s) is the offset into the file (for example, if the offset is past the end of the file, the output will be empty…). Set it to 0x0 for the beginning of the file.
clear ; hexdump -s 0x60 -n 80 -e '"%08.8_ax " 8/1 "%02X " " " 8/1 "%02X " " |"' -e '16/1 "%_p""|\n"' '/home/mortensen/temp2/2023-04-18/Strange.txt'
Positively identifying the strange characters in a text editor or IDE
This can be done by using regular expressions in any modern text editor or IDE (but not, for example, in the Arduino IDE).
This is particularly important for longer documents and source code.
Note: in Visual Studio Code (and probably others) the notation is different: \u00A0 (instead of \x{00A0})
Combined regular expression
\x{00A0}|\x{00A6}|\x{00AB}|\x{00AE}|\x{00BB}|\x{00E4}|\x{2003}|\x{2009}|\x{200B}|\x{200C}|\x{2013}|\x{2014}|\x{2018}|\x{2019}|\x{201C}|\x{201D}|\x{2028}|\x{2029}|\x{202A}|\x{202B}|\x{202C}|\x{2060}|\x{21B5}|\x{2011}|\x{2212}|\x{2217}|\x{2260}|\x{FEFF}|\x{FF1A}|\x{FFFC}|\x{FFFD}
NO-BREAK SPACE
\x{00A0}
Table of common Unicode characters causing this problem (actually encountered in the wild)
This table can used to quickly identify the offending Unicode from the “error: Stray” compiler errors. The third number in a triplet is the most specific. For example, “230” for U+2018 (LEFT SINGLE QUOTATION MARK).
Note that to search for hexadecimal UTF-8 sequences, each number should be preceded by “0x” to directly search in the content of the table. Example: A hexadecimal dump may have “E2 80 9C”. Use “0xE2 0x80 0x9C” to search in the table (with a single space separating the numbers).
Oct Dec Hex Start of Start of sequence, Comment fragment
seq, type Unicode code point
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
240 160 0xA0 CE/CP-1250 U+00A0 NO-BREAK SPACE
302 194 0xC2 UTF-8 U+00A0 NO-BREAK SPACE 194 160 (decimal) → 0xC2 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A0 ([NO-BREAK SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=156&number=128)).
240 160 0xA0 302 240 (octal) → 0xC2 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A0 ([NO-BREAK SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=156&number=128)).
Alternative web site:
194 160 (decimal) → 0xC2 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A0 ([NO-BREAK SPACE](https://www.charset.org/utf-8)).
302 240 (octal) → 0xC2 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A0 ([NO-BREAK SPACE](https://www.charset.org/utf-8)).
302 194 0xC2 UTF-8 U+00A6 BROKEN BAR 194 166 (decimal) → 0xC2 0xA6 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A6 ([BROKEN BAR](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
246 166 0xA6 302 246 (octal) → 0xC2 0xA6 (hexadecimal) → UTF-8 sequence for Unicode code point U+00A6 ([BROKEN BAR](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
302 194 0xC2 UTF-8 U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 194 171 (decimal) → 0xC2 0xAB (hexadecimal) → UTF-8 sequence for Unicode code point U+00AB ([LEFT-POINTING DOUBLE ANGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
253 171 0xAB 302 246 (octal) → 0xC2 0xAB (hexadecimal) → UTF-8 sequence for Unicode code point U+00AB ([LEFT-POINTING DOUBLE ANGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
302 194 0xC2 UTF-8 U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 194 187 (decimal) → 0xC2 0xBB (hexadecimal) → UTF-8 sequence for Unicode code point U+00BB ([RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
273 187 0xBB 302 273 (octal) → 0xC2 0xBB (hexadecimal) → UTF-8 sequence for Unicode code point U+00BB ([RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
302 194 0xC2 UTF-8 U+00AE REGISTERED SIGN 194 174 (decimal) → 0xC2 0xAE (hexadecimal) → UTF-8 sequence for Unicode code point U+00AE ([REGISTERED SIGN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
256 174 0xAE 302 256 (octal) → 0xC2 0xAE (hexadecimal) → UTF-8 sequence for Unicode code point U+00AE ([REGISTERED SIGN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=128)).
========================================================================================================
Start of \342 \200 series:
342 226 0xE2 UTF-8 U+2003 EM SPACE 226 128 131 (decimal) → 0xE2 0x80 0x83 (hexadecimal) → UTF-8 sequence for Unicode code point U+2003 ([EM SPACE](https://www.charset.org/utf-8/9)).
200 128 0x80 342 200 203 (octal) → 0xE2 0x80 0x83 (hexadecimal) → UTF-8 sequence for Unicode code point U+2003 ([EM SPACE](https://www.charset.org/utf-8/9)).
203 131 0x83
342 226 0xE2 UTF-8 U+2009 THIN SPACE 226 128 137 (decimal) → 0xE2 0x80 0x89 (hexadecimal) → UTF-8 sequence for Unicode code point U+2009 ([THIN SPACE](https://www.charset.org/utf-8/9)).
200 128 0x80 342 200 211 (octal) → 0xE2 0x80 0x89 (hexadecimal) → UTF-8 sequence for Unicode code point U+2009 ([THIN SPACE](https://www.charset.org/utf-8/9)).
211 137 0x89
342 226 0xE2 UTF-8 U+200B ZERO WIDTH SPACE 226 128 139 (decimal) → 0xE2 0x80 0x8B (hexadecimal) → UTF-8 sequence for Unicode code point U+200B ([ZERO WIDTH SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200 128 0x80 342 200 213 (octal) → 0xE2 0x80 0x8B (hexadecimal) → UTF-8 sequence for Unicode code point U+200B ([ZERO WIDTH SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
213 139 0x8B
Alternative web site:
226 128 139 (decimal) → 0xE2 0x80 0x8B (hexadecimal) → UTF-8 sequence for Unicode code point U+200B ([ZERO WIDTH SPACE](https://www.charset.org/utf-8/9)).
342 200 213 (octal) → 0xE2 0x80 0x8B (hexadecimal) → UTF-8 sequence for Unicode code point U+200B ([ZERO WIDTH SPACE](https://www.charset.org/utf-8/9)).
342 226 0xE2 UTF-8 U+200C ZERO WIDTH NON-JOINER 226 128 140 (decimal) → 0xE2 0x80 0x8C (hexadecimal) → UTF-8 sequence for Unicode code point U+200C ([ZERO WIDTH NON-JOINER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
200 128 0x80 342 200 214 (octal) → 0xE2 0x80 0x8C (hexadecimal) → UTF-8 sequence for Unicode code point U+200C ([ZERO WIDTH NON-JOINER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
214 140 0x8C
226 150 0x96 CE/CP-1250 U+2013 EN DASH
342 226 0xE2 UTF-8 U+2013 EN DASH 226 128 147 (decimal) → 0xE2 0x80 0x93 (hexadecimal) → UTF-8 sequence for Unicode code point U+2013 ([EN DASH](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200 128 0x80 342 200 223 (octal) → 0xE2 0x80 0x93 (hexadecimal) → UTF-8 sequence for Unicode code point U+2013 ([EN DASH](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
223 147 0x93
Note: 223 can also be the stand-alone CE/CP-1250, corresponding to U+201C (LEFT DOUBLE QUOTATION MARK).
342 226 0xE2 UTF-8 U+2014 EM DASH 226 128 148 (decimal) → 0xE2 0x80 0x94 (hexadecimal) → UTF-8 sequence for Unicode code point U+2014 ([EM DASH](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200 128 0x80 342 200 224 (octal) → 0xE2 0x80 0x94 (hexadecimal) → UTF-8 sequence for Unicode code point U+2014 ([EM DASH](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
224 148 0x94
Note: 224 can also be the stand-alone CE/CP-1250, corresponding to U+201D (RIGHT DOUBLE QUOTATION MARK).
221 145 0x91 CE/CP-1250 U+2018 LEFT SINGLE QUOTATION MARK [CE/CP-1250 0x91](https://en.wikipedia.org/wiki/Windows-1250#Character_set) (145 (decimal), 221 (octal)), corresponding to Unicode code point U+2018 ([LEFT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8212&number=128)).
342 226 0xE2 UTF-8 U+2018 LEFT SINGLE QUOTATION MARK 226 128 152 (decimal) → 0xE2 0x80 0x98 (hexadecimal) → UTF-8 sequence for Unicode code point U+2018 ([LEFT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200 128 0x80 342 200 230 (octal) → 0xE2 0x80 0x98 (hexadecimal) → UTF-8 sequence for Unicode code point U+2018 ([LEFT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
230 152 0x98
222 146 0x92 CE/CP-1250 U+2019 RIGHT SINGLE QUOTATION MARK [CE/CP-1250 0x91](https://en.wikipedia.org/wiki/Windows-1250#Character_set) (145 (decimal), 221 (octal)), corresponding to Unicode code point U+2019 ([RIGHT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8212&number=128)).
342 226 0xE2 UTF-8 U+2019 RIGHT SINGLE QUOTATION MARK 226 128 153 (decimal) → 0xE2 0x80 0x98 (hexadecimal) → UTF-8 sequence for Unicode code point U+2019 ([RIGHT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8212&number=128)).
200 128 0x80 342 200 231 (octal) → 0xE2 0x80 0x98 (hexadecimal) → UTF-8 sequence for Unicode code point U+2019 ([RIGHT SINGLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8212&number=128)).
231 153 0x99
223 147 0x93 CE/CP-1250 U+201C LEFT DOUBLE QUOTATION MARK
342 226 0xE2 UTF-8 U+201C LEFT DOUBLE QUOTATION MARK 226 128 156 (decimal) → 0xE2 0x80 0x9C (hexadecimal) → UTF-8 sequence for Unicode code point U+201C ([LEFT DOUBLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200 128 0x80 342 200 234 (octal) → 0xE2 0x80 0x9C (hexadecimal) → UTF-8 sequence for Unicode code point U+201C ([LEFT DOUBLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
234 156 0x9C
Alternative web site:
342 200 234 (octal) → 0xE2 0x80 0x9C (hexadecimal) → UTF-8 sequence for Unicode code point U+201C ([LEFT DOUBLE QUOTATION MARK](https://www.charset.org/utf-8/9)).
224 148 0x94 CE/CP-1250 U+201D RIGHT DOUBLE QUOTATION MARK
342 226 0xE2 UTF-8 U+201D RIGHT DOUBLE QUOTATION MARK 226 128 157 (decimal) → 0xE2 0x80 0x9D (hexadecimal) → UTF-8 sequence for Unicode code point U+201D ([RIGHT DOUBLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
200 128 0x80 342 200 235 (octal) → 0xE2 0x80 0x9D (hexadecimal) → UTF-8 sequence for Unicode code point U+201D ([RIGHT DOUBLE QUOTATION MARK](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)).
235 157 0x9D
Alternative web site:
226 128 157 (decimal) → 0xE2 0x80 0x9D (hexadecimal) → UTF-8 sequence for Unicode code point U+201D ([RIGHT DOUBLE QUOTATION MARK](https://www.charset.org/utf-8/9)).
342 200 235 (octal) → 0xE2 0x80 0x9D (hexadecimal) → UTF-8 sequence for Unicode code point U+201D ([RIGHT DOUBLE QUOTATION MARK](https://www.charset.org/utf-8/9)).
342 226 0xE2 UTF-8 U+2028 LINE SEPARATOR 226 128 168 (decimal) → 0xE2 0x80 0xA8 (hexadecimal) → UTF-8 sequence for Unicode code point U+2028 ([LINE SEPARATOR](https://www.charset.org/utf-8/9)).
200 128 0x80 342 200 250 (octal) → 0xE2 0x80 0xA8 (hexadecimal) → UTF-8 sequence for Unicode code point U+2028 ([LINE SEPARATOR](https://www.charset.org/utf-8/9)).
250 168 0xA8
342 226 0xE2 UTF-8 U+2029 PARAGRAPH SEPARATOR 226 128 169 (decimal) → 0xE2 0x80 0xA9 (hexadecimal) → UTF-8 sequence for Unicode code point U+2029 ([PARAGRAPH SEPARATOR](https://www.charset.org/utf-8/9)).
200 128 0x80 342 200 251 (octal) → 0xE2 0x80 0xA9 (hexadecimal) → UTF-8 sequence for Unicode code point U+2029 ([PARAGRAPH SEPARATOR](https://www.charset.org/utf-8/9)).
251 169 0xA9
342 226 0xE2 UTF-8 U+202A LEFT-TO-RIGHT EMBEDDING 226 128 170 (decimal) → 0xE2 0x80 0xAA (hexadecimal) → UTF-8 sequence for Unicode code point U+202A ([LEFT-TO-RIGHT EMBEDDING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
200 128 0x80 342 200 252 (octal) → 0xE2 0x80 0xAA (hexadecimal) → UTF-8 sequence for Unicode code point U+202A ([LEFT-TO-RIGHT EMBEDDING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
252 170 0xAA
342 226 0xE2 UTF-8 U+202B RIGHT-TO-LEFT EMBEDDING 226 128 171 (decimal) → 0xE2 0x80 0xAB (hexadecimal) → UTF-8 sequence for Unicode code point U+202B ([RIGHT-TO-LEFT EMBEDDING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
200 128 0x80 342 200 253 (octal) → 0xE2 0x80 0xAB (hexadecimal) → UTF-8 sequence for Unicode code point U+202B ([RIGHT-TO-LEFT EMBEDDING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
253 171 0xAB
342 226 0xE2 UTF-8 U+202C POP DIRECTIONAL FORMATTING 226 128 172 (decimal) → 0xE2 0x80 0xAC (hexadecimal) → UTF-8 sequence for Unicode code point U+202C ([POP DIRECTIONAL FORMATTING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
200 128 0x80 342 200 254 (octal) → 0xE2 0x80 0xAC (hexadecimal) → UTF-8 sequence for Unicode code point U+202C ([POP DIRECTIONAL FORMATTING](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8230&number=128)).
254 172 0xAC
========================================================================================================
Start of \342 \20x series:
342 226 0xE2 UTF-8 U+2060 WORD JOINER 226 129 160 (decimal) → 0xE2 0x81 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+2060 ([WORD JOINER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8064)).
201 129 0x81 342 201 240 (octal) → 0xE2 0x81 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+2060 ([WORD JOINER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8064)).
240 160 0xA0
342 226 0xE2 UTF-8 U+21B5 DOWNWARDS ARROW WITH CORNER LEFTWARDS 226 134 181 (decimal) → 0xE2 0x86 0xB5 (hexadecimal) → UTF-8 sequence for Unicode code point U+21B5 ([DOWNWARDS ARROW WITH CORNER LEFTWARDS](https://www.charset.org/utf-8/9)).
206 134 0x86 342 206 265 (octal) → 0xE2 0x86 0xB5 (hexadecimal) → UTF-8 sequence for Unicode code point U+21B5 ([DOWNWARDS ARROW WITH CORNER LEFTWARDS](https://www.charset.org/utf-8/9)).
265 181 0xB5
342 226 0xE2 UTF-8 U+2011 NON-BREAKING HYPHEN 226 128 145 (decimal) → 0xE2 0x80 0x91 (hexadecimal) → UTF-8 sequence for Unicode code point U+2011 ([NON-BREAKING HYPHEN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8205&number=128)).
200 128 0x80 342 200 221 (octal) → 0xE2 0x80 0x91 (hexadecimal) → UTF-8 sequence for Unicode code point U+2011 ([NON-BREAKING HYPHEN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8205&number=128)).
221 145 0x91
342 226 0xE2 UTF-8 U+2212 MINUS SIGN 226 136 146 (decimal) → 0xE2 0x88 0x92 (hexadecimal) → UTF-8 sequence for Unicode code point U+2212 ([MINUS SIGN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
210 136 0x88 342 210 222 (octal) → 0xE2 0x88 0x92 (hexadecimal) → UTF-8 sequence for Unicode code point U+2212 ([MINUS SIGN](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
222 146 0x92
Note: 222 can also be the stand-alone CE/CP-1250, corresponding to U+2019 (RIGHT SINGLE QUOTATION MARK).
Alternative web site:
226 136 146 (decimal) → 0xE2 0x88 0x92 (hexadecimal) → UTF-8 sequence for Unicode code point U+2212 ([MINUS SIGN](https://www.charset.org/utf-8/9)).
342 210 222 (octal) → 0xE2 0x88 0x92 (hexadecimal) → UTF-8 sequence for Unicode code point U+2212 ([MINUS SIGN](https://www.charset.org/utf-8/9)).
342 226 0xE2 UTF-8 U+2217 ASTERISK OPERATOR 226 136 151 (decimal) → 0xE2 0x88 0x97 (hexadecimal) → UTF-8 sequence for Unicode code point U+2217 ([ASTERISK OPERATOR](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
210 136 0x88 342 210 227 (octal) → 0xE2 0x88 0x97 (hexadecimal) → UTF-8 sequence for Unicode code point U+2217 ([ASTERISK OPERATOR](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
227 151 0x97
342 226 0xE2 UTF-8 U+2260 NOT EQUAL TO 226 137 160 (decimal) → 0xE2 0x89 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+2260 ([NOT EQUAL TO](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
211 137 0x89 342 211 240 (octal) → 0xE2 0x89 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+2260 ([NOT EQUAL TO](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8704&number=128)).
240 160 0xA0
========================================================================================================
357 239 0xEF UTF-8 U+FEFF ZERO WIDTH NO-BREAK SPACE 239 187 191 (decimal) → 0xEF 0xBB 0xBF (hexadecimal) → UTF-8 sequence for Unicode code point U+FEFF ([ZERO WIDTH NO-BREAK SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65272&number=128)).
273 187 0xBB 357 273 277 (octal) → 0xEF 0xBB 0xBF (hexadecimal) → UTF-8 sequence for Unicode code point U+FEFF ([ZERO WIDTH NO-BREAK SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65272&number=128)).
277 191 0xBF
Alternative web site:
239 187 191 (decimal) → 0xEF 0xBB 0xBF (hexadecimal) → UTF-8 sequence for Unicode code point U+FEFF ([ZERO WIDTH NO-BREAK SPACE](https://www.charset.org/utf-8/66)).
344 228 0xE4 CE/CP-1250 U+00E4 LATIN SMALL LETTER A WITH DIAERESIS https://www.utf8-chartable.de/unicode-utf8-table.pl?start=224
303 195 0xC3 UTF-8 U+00E4 LATIN SMALL LETTER A WITH DIAERESIS 195 164 (decimal) → 0xC3 0xA4 (hexadecimal) → UTF-8 sequence for Unicode code point U+00E4 ([LATIN SMALL LETTER A WITH DIAERESIS](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=224)).
244 164 0xA4 303 244 (octal) → 0xC3 0xA4 (hexadecimal) → UTF-8 sequence for Unicode code point U+00E4 ([LATIN SMALL LETTER A WITH DIAERESIS](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=224)).
357 239 0xEF UTF-8 U+FF1A FULLWIDTH COLON 239 188 154 (decimal) → 0xEF 0xBC 0x9A (hexadecimal) → UTF-8 sequence for Unicode code point U+FF1A ([FULLWIDTH COLON](https://www.charset.org/utf-8/66)).
274 188 0xBC 357 274 232 (octal) → 0xEF 0xBC 0x9A (hexadecimal) → UTF-8 sequence for Unicode code point U+FF1A ([FULLWIDTH COLON](https://www.charset.org/utf-8/66)).
232 154 0x9A
357 239 0xEF UTF-8 U+FFFC OBJECT REPLACEMENT CHARACTER 239 191 188 (decimal) → 0xEF 0xBF 0xBC (hexadecimal) → UTF-8 sequence for Unicode code point U+FFFC ([OBJECT REPLACEMENT CHARACTER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65526)).
277 191 0xBF 357 277 274 (octal) → 0xEF 0xBF 0xBC (hexadecimal) → UTF-8 sequence for Unicode code point U+FFFC ([OBJECT REPLACEMENT CHARACTER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65526)).
274 188 0xBC
357 239 0xEF UTF-8 U+FFFD REPLACEMENT CHARACTER 239 191 189 (decimal) → 0xEF 0xBF 0xBD (hexadecimal) → UTF-8 sequence for Unicode code point U+FFFD ([REPLACEMENT CHARACTER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65526)).
277 191 0xBF 357 277 275 (octal) → 0xEF 0xBF 0xBD (hexadecimal) → UTF-8 sequence for Unicode code point U+FFFD ([REPLACEMENT CHARACTER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65526)).
275 189 0xBD
Signatures
The UTF-8 sequences often start with:
0xC2 (octal 302). Corresponding part of an error message: “error: stray ‘\302’ in program”. Stack Overflow search. This covers the canonical question (mentioned above).
0xE2 (octal 342). Corresponding part of an error message: “error: stray ‘\342’ in program”. Stack Overflow search.
A less specific search (for “error stray in program”).
Other languages than C or C++
PowerShell
Real-world example: copying code through Skype chat may introduce U+00A0 (NO-BREAK SPACE).
This will result in a confusing error message. Something like:
“Â : The term ‘Â’ is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At C:\UserData\PowerShell\BuildScripts\TempTest.ps1:14”
Non UTF-8 sequences
CE/CP-1250
From the table, these are the character byte values in octal (and can thus be used as signatures in “stray” compiler error output), with their corresponding Unicode characters:
221 (LEFT SINGLE QUOTATION MARK), 222 (RIGHT SINGLE QUOTATION MARK), 223 (LEFT DOUBLE QUOTATION MARK), 224 (RIGHT DOUBLE QUOTATION MARK), 226 (EN DASH), 240 (NO-BREAK SPACE), and 344 (LATIN SMALL LETTER A WITH DIAERESIS)
Binary
Common: \177. It is not known if this is specific to a certain type of system or environment, e.g., binary files on Linux (ELF).
Indeed, the very first byte of an ELF file is 0x7F (octal 177, decimal 127), followed by the three (in ASCII) characters E, F, and F.
Error messages
They vary depending on the programming language.
C/C++
“someFile.c:42: error: stray ‘\302’ in program”
Note that the number is usually in octal, but they have also been observed in decimal (this may vary depending on the compiler or its configuration).
PowerShell
“Â : The term ‘Â’ is not recognized as the name of a cmdlet, function, script file, or operable program.”
Automation
In order to demystify it (and save time), these checks can added to build scripts and IDEs/projects. Then an understandable and much better error message can be issued.
For example, it might be included in Edit Overflow’s build script.
Stack Overflow
Canonical question
For the origin of the offending characters by copying code from web pages, PDF documents, and chat (e.g. Skype Chat or Facebook Messenger), etc.:
Compilation error: stray ‘\302’ in program, etc.
Canned comments:
Related: *[Compilation error: stray ‘\302’ in program, etc.](https://stackoverflow.com/questions/19198332/)*
This is a ***very*** common error when copying code from web pages, [PDF](https://en.wikipedia.org/wiki/Portable_Document_Format) documents, through chat (e.g. [Skype Chat](https://en.wikipedia.org/wiki/Features_of_Skype#Skype_chat) or [Facebook Messenger](https://en.wikipedia.org/wiki/Facebook_Messenger)), etc. The canonical question is *[Compilation error: stray ‘\302’ in program, etc.](https://stackoverflow.com/questions/19198332)*.
The most common ones can ***positively*** (guesswork isn't required) ***be searched*** for (and replaced) using the regular expression \x{00A0}|\x{00A6}|\x{00AB}|\x{00AE}|\x{00BB}|\x{00E4}|\x{2003}|\x{2009}|\x{200B}|\x{200C}|\x{2013}|\x{2014}|\x{2018}|\x{2019}|\x{201C}|\x{201D}|\x{2028}|\x{2029}|\x{202A}|\x{202B}|\x{202C}|\x{2060}|\x{21B5}|\x{2011}|\x{2212}|\x{2217}|\x{2260}|\x{FEFF}|\x{FF1A}|\x{FFFC}|\x{FFFD} (NO-BREAK SPACE, BROKEN BAR, LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, REGISTERED SIGN, RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, LATIN SMALL LETTER A WITH DIAERESIS, EM SPACE, THIN SPACE, ZERO WIDTH SPACE, ZERO WIDTH NON-JOINER, EN DASH, EM DASH, LEFT SINGLE QUOTATION MARK, RIGHT SINGLE QUOTATION MARK, LEFT DOUBLE QUOTATION MARK, RIGHT DOUBLE QUOTATION MARK, LINE SEPARATOR, PARAGRAPH SEPARATOR, LEFT-TO-RIGHT EMBEDDING, RIGHT-TO-LEFT EMBEDDING, POP DIRECTIONAL FORMATTING, WORD JOINER, DOWNWARDS ARROW WITH CORNER LEFTWARDS, MINUS SIGN, ASTERISK OPERATOR, NOT EQUAL TO, ZERO WIDTH NO-BREAK SPACE, FULLWIDTH COLON, OBJECT REPLACEMENT CHARACTER, REPLACEMENT CHARACTER, etc.).
Rendered:
- Related: Compilation error: stray ‘\302’ in program, etc.
- This is a very common error when copying code from web pages, PDF documents, through chat (e.g. Skype Chat or Facebook Messenger), etc. The canonical question is Compilation error: stray ‘\302’ in program, etc..
- The most common ones can positively (guesswork isn’t required) be searched for (and replaced) using the regular expression \x{00A0}|\x{00A6}|\x{00AB}|\x{00AE}|\x{00BB}|\x{00E4}|\x{2003}|\x{2009}|\x{200B}|\x{200C}|\x{2013}|\x{2014}|\x{2018}|\x{2019}|\x{201C}|\x{201D}|\x{2028}|\x{2029}|\x{202A}|\x{202B}|\x{202C}|\x{2060}|\x{21B5}|\x{2011}|\x{2212}|\x{2217}|\x{2260}|\x{FEFF}|\x{FF1A}|\x{FFFC}|\x{FFFD} (NO-BREAK SPACE, BROKEN BAR, LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, REGISTERED SIGN, RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, LATIN SMALL LETTER A WITH DIAERESIS, EM SPACE, THIN SPACE, ZERO WIDTH SPACE, ZERO WIDTH NON-JOINER, EN DASH, EM DASH, LEFT SINGLE QUOTATION MARK, RIGHT SINGLE QUOTATION MARK, LEFT DOUBLE QUOTATION MARK, RIGHT DOUBLE QUOTATION MARK, LINE SEPARATOR, PARAGRAPH SEPARATOR, LEFT-TO-RIGHT EMBEDDING, RIGHT-TO-LEFT EMBEDDING, POP DIRECTIONAL FORMATTING, WORD JOINER, DOWNWARDS ARROW WITH CORNER LEFTWARDS, MINUS SIGN, ASTERISK OPERATOR, NOT EQUAL TO, ZERO WIDTH NO-BREAK SPACE, FULLWIDTH COLON, OBJECT REPLACEMENT CHARACTER, REPLACEMENT CHARACTER, etc.).
Detection of new Stack Overflow questions
A relative efficient method is on-site search with “error stray in program”:
https://stackoverflow.com/search?tab=newest&q=error%20stray%20in%20program&searchOn=3
Web site blues
On 2023-04-25, https://www.utf8-chartable.de/ timed out. But it came back.
Resources
Unicode lookup sites
https://www.utf8-chartable.de/. Example: https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128
https://codepoints.net/. Example: https://codepoints.net/U+3F38
https://www.charset.org/. Example: https://www.charset.org/utf-8/66
https://www.fileformat.info/. Example: https://www.fileformat.info/info/unicode/char/2217/index.htm. U+FEFF: https://www.fileformat.info/info/charset/UTF-16/list.htm?start=44205. U+FEFF may result in the signature doublet “stray \377 … stray \376”, possible only if UTF-16 is used (FF for BOM, UTF-16LE (little-endian), octal 377, decimal 255, hexadecimal 0xFF). For UTF-8, the signature is octal 357 (followed by 273 and 277), hexadecimal 0xEF 0xBB 0xBF, decimal 239 187 191.
Leave a Reply