We recently overcame a problem where we were trying to pull data from Microsoft’s SQL Server into Ruby for some processing. First, the SQL Server we were connecting to was configured to use a case insensitive form of latin1. You can find your encoding by executing:

SELECT DATABASEPROPERTYEX('DBName', 'Collation') SQLCollation;

In our case we were getting a: SQL_Latin1_General_CP1_CI_AS. Now, we were making use of the fine Ruby ODBC library, and we tried to have the library do a conversion to UTF-8 for us, by setting the ODBC::UTF8 constant to true before proceeding. Although, we didn’t get what we’d expect. Our algorithm ran fine until we tried to convert one of the VarChar columns to UTF-8, and we ran into the following error:

Encoding::UndefinedConversionError: "x96" from ASCII-8BIT to UTF-8

Now looking at the string, we could see a single character “x96″ in the string returned from the ODBC library. Querying the database through the Management Studio we found that the offending character was a hyphen. Now looking at 0x96 (150 in decimal) in the extended ASCII table we find a dash. The problem is that our keyboards have a hyphen, or a 0x2d (45 in decimal). This can be verified in the interactive ruby interpreter:

ruby-1.9.2-p290 :001 > "-".ord
 => 45
ruby-1.9.2-p290 :002 > "x96".ord
 => 150
ruby-1.9.2-p290 :003 > 45.chr
 => "-"
ruby-1.9.2-p290 :004 > 150.chr
 => "x96"
ruby-1.9.2-p290 :005 > "x96".encode('UTF-8')
Encoding::UndefinedConversionError: "x96" from ASCII-8BIT to UTF-8

No dash for you!
The easiest way to get rid of this problem was to just substitute the dash for a legit hyphen

bad_string.gsub(150.chr,'-')