to_json hack for Safari

Posted by: Shugo Maeda Sat, 15 Jul 2006 08:30:00 GMT

I found the reason why multibyte characters are encoded in to_json. Safari can't handle UTF-8 strings in text/javascript correctly:(

Here is new code:

class String
  JSON_ESCAPED = {
     "\010" =>  '\b',
     "\f" =>    '\f',
     "\n" =>    '\n',
     "\r" =>    '\r',
     "\t" =>    '\t',
     '"' =>     '\"',
     '\\' =>    '\\\\'
  }

  def to_json
    return '"' + gsub(/[\010\f\n\r\t"\\]/) { |s|
      JSON_ESCAPED[s]
    }.gsub(/([\xC0-\xDF][\x80-\xBF]|
             [\xE0-\xEF][\x80-\xBF]{2}|
             [\xF0-\xF7][\x80-\xBF]{3})+/ux) { |s|
      s.unpack("U*").pack("n*").unpack("H*")[0].gsub(/.{4}/, '\\\\u\&')
    } + '"'
  end
end

I created a patch for ActiveSupport (trac of Rails is down, so I can't attach this patch to the ticket).

Posted in Rails | no comments | no trackbacks

to_json too slow

Posted by: Shugo Maeda Tue, 11 Jul 2006 03:51:00 GMT

I felt my application slow yesterday and profiled it. RJS is very useful, but it seems to be slow.

Thanks to Charlies's ruby-prof (it doesn't belongs to me any more and has fascinating call graph support), I found the bottleneck. It was Object#to_json.

The original code is here:

define_encoder String do |string|
  returning value = '"' do
    string.each_char do |char|
      value << case
      when char == "\010":  '\b'
      when char == "\f":    '\f'
      when char == "\n":    '\n'
      when char == "\r":    '\r'
      when char == "\t":    '\t'
      when char == '"':     '\"'
      when char == '\':    '\\'
      when char.length > 1: "\u#{'%04x' % char.unpack('U').first}"
      else;                 char
      end
    end
    value << '"'
  end
end

Ruby can't handle characters fast, so this code is very slow. You'd should use builtin methods such as String#gsub to handle multiple characters at once.

I put the following code into config/environment.rb, and it looks quite fast:)

class String
  JSON_ESCAPED = {
     "\010" =>  '\b',
     "\f" =>    '\f',
     "\n" =>    '\n',
     "\r" =>    '\r',
     "\t" =>    '\t',
     '"' =>     '\"',
     '\' =>    '\\'
  }

  def to_json
    return '"' + gsub(/[\010\f\n\r\t"\]/) { |s|
      JSON_ESCAPED[s]
    } + '"'
  end
end

Then I posted a patch to a related ticket.

If there is any reason to encode multibyte characters, more hacks are needed. I think the best way is to encode multiple multibyte characters at once using String#gsub and String#unpack, Array#pack.

Posted in Rails | no comments | no trackbacks