Created
December 14, 2012 01:06
-
-
Save anonymous/4281645 to your computer and use it in GitHub Desktop.
Hacky bandaid to convert windows-1252 CSV uploads to UTF8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def ensure_utf8 | |
# TODO Proper solution using https://github.com/brianmario/charlock_holmes & detection | |
# processing copied from http://trackingrails.com/posts/video-encoding-processor-for-carrierwave | |
cache_stored_file! if !cached? | |
file_data = File.read(current_path) | |
# Ugh. Only way to detect bad UTF8. See | |
# http://bibwild.wordpress.com/2012/04/17/checkingfixing-bad-bytes-in-ruby-1-9-char-encoding/ | |
is_valid_utf8 = begin | |
file_data =~ // | |
true | |
rescue ArgumentError => e | |
if e.message == 'invalid byte sequence in UTF-8' | |
false | |
else | |
raise | |
end | |
end | |
if !is_valid_utf8 | |
# Assume that it's windows-1252 | |
File.open(current_path, 'w+') do |f| | |
f.write file_data.encode('UTF-8', 'windows-1252', :replace => '?', :invalid => :replace, :undef => :replace) | |
end | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment