Skip to content

Instantly share code, notes, and snippets.

@fabiolimace
Created February 2, 2025 16:02
Show Gist options
  • Save fabiolimace/f264b4f7c02eecfe043c114ddc740759 to your computer and use it in GitHub Desktop.
Save fabiolimace/f264b4f7c02eecfe043c114ddc740759 to your computer and use it in GitHub Desktop.
Transform UTF-16 escape sequences generated by WGET into HTML entities
#!/bin/local/gawk -f
# parse utf-16 escape sequences generated by WGET
# transform them all into HTML entities: `�`
function parse_unicode() {
while (match($0, /\\u[0-9a-f][0-9a-f][0-9a-f][0-9a-f]/)) {
hex = substr($0, RSTART+2, RLENGTH-2);
gsub( "\\\\u" hex, "\\&#x" hex ";", $0);
}
}
/\\u[0-9a-f][0-9a-f][0-9a-f][0-9a-f]/ {
parse_unicode();
print;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment