How to make a single-quoted string act like a double-quoted string in Ruby?

Issue

I have a file that have an HTMl code, the HTML tags are encoded like the following content:

\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e

The decoded HTML should be:

<div data-name="region-name" class="main-id">UK</div>

In Ruby, I used cgi library to unescapeHTML however it does not work because when it read the content it does not identify the encoded tags, here is another example:

require 'cgi'

single_quoted_string = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
double_quoted_string = "\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e"


puts 'unescape single_quoted_string ' + CGI.unescapeHTML(single_quoted_string)
puts 'unescape double_quoted_string ' + CGI.unescapeHTML(double_quoted_string)

The output of the previous code is:

unescape single_quoted_string \x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e
unescape double_quoted_string <div data-name="region-name" class="main-id">UK</div>

My question is, how can I make the single_quoted_string act as if its content is double-quoted to make the function understand the encoded tags?

Thanks

Solution

Ruby’s parser allows certain escape sequences in string literals.

The double-quoted string literal "\x3c" is recognized as containing a hexadecimal pattern \xnn which represents the single character <. (0x3C in ASCII)

The single-quoted string literal '\x3c' however is treated literally, i.e. it represents four characters: \, x, 3, and c.

how can I make the single_quoted_string act as if its content is double-quoted

You can’t. In order to turn these four characters into < you have to parse the string yourself:

str = '\x3c'

str[2, 2]         #=> "3c"  take hex part
str[2, 2].hex     #=> 60    convert to number
str[2, 2].hex.chr #=> "<"   convert to character

You can apply this to gsub:

str = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'

str.gsub(/\\x\h{2}/) { |m| m[2, 2].hex.chr }
#=> "<div data-name=\"region-name\" class=\"main-id\">UK</div>"

/\\x\h{2}/ matches a literal backslash (\\) followed by x and two ({2}) hex characters (\h).


Just for reference, a CGI encoded string would look like this:

str = "<div data-name=\"region-name\" class=\"main-id\">UK</div>"

CGI.escapeHTML(str)
#=> "&lt;div data-name=&quot;region-name&quot; class=&quot;main-id&quot;&gt;UK&lt;/div&gt;"

It uses &...; style character references.

Answered By – Stefan

Answer Checked By – Cary Denson (AngularFixing Admin)

Leave a Reply

Your email address will not be published.