Contour Line

July 10, 2007

Regex nightmare in Java: s!\\!\\\\!g becomes .replace(“\\\\”,”\\\\\\\\”);

Filed under: Uncategorized — jmarca @ 10:44 pm

If you like Java and love java.util.regex even more, stop reading now. I have a small complaint that was a pain to figure out so I am writing it here in the hopes that it may prove useful to someone else.

The title says it all. If you are looking to escape a single backslash in a string, a lowly ‘\’ character, you have to hit it with 4 and then 8 slashes in the replace command.

“Why” you may well ask, when Perl, which also needs to escape the backslash in a regex, just needs the parsimonious 2 and then 4 backslashes to replace a single slash with a double slash. Well, the answer (as near as I can tell) is due to the fact that all strings in Java are interpolated. That is, there is no notion of the uninterpolated string, at least as far as escaping characters goes. So you get the worst of both worlds—there is no way to write carp "The $variable" in a single string (variable interpolation), and there is no way to write carp q{Ceci \n n'est pas une linebreak}, with no escaping happening at all.

Instead, you get the following. Suppose you have a code snippet such as


this.encodedPointsLiteral=encodedPoints.replaceAll("\\","\\\\");

Suppose you pass to that argument the output of the excellent PolylineEncoder class you’ve converted into a Java class to run on your server. Well, if you expect the replaceAll command to work, you will instead get hit with a runtime error. It seems that the string “\\” properly produces a single “\”, which I would expect would be used to find all single backslashes. Instead, it is passed to something else deep in the guts of the regex implementation and interpolated again meaning that the single slash is escaping nothing at all, which is of course an error.

If you instead use the code snippet


this.encodedPointsLiteral=encodedPoints.replaceAll("\\\\","\\\\\\\\");

Then the first quadruple backslash is reduced to a double backlash on the first interpolation (when you write the string with double quotes), and then is interpolated again to result in a single backlash when the pattern is compiled. Similarly, the eight backslashes are reduced to four on the first interpolation, and then reduced to two when the pattern is evaluated as a replacement string.

It is little things like this that give me hope that my Perl skills, such as they are, will never become as useless as my 68000 assembly language talents.

And if you study Dr. McClure’s javascript code, you will see that JavaScript is also as sane as Perl, with the idiom


encodedPointsLiteral: encodedPoints.replace(/\\/g,"\\\\")

doing the job properly.

No Comments Yet »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.