Convert string to utf8 javascript
Use the utf8 module from npm to encode/decode the string. Show Installation:
In a browser:
In Node.js:
API: Encode:
Encodes any given JavaScript string (string) as UTF-8, and returns the UTF-8-encoded version of the string. It throws an error if the input string contains a non-scalar value, i.e. a lone surrogate. (If you need to be able to encode non-scalar values as well, use WTF-8 instead.)
Decode:
Decodes any given UTF-8-encoded string (byteString) as UTF-8, and returns the UTF-8-decoded version of the string. It throws an error when malformed UTF-8 is detected. (If you need to be able to decode encoded non-scalar values as well, use WTF-8 instead.)
Resources 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 // Note: JavaScript engine stores string literals in the UTF-16 encoding format.
var toUtf8 = function(text) {
var surrogate = encodeURIComponent(text);
var result = '';
for (var i = 0; i < surrogate.length;) {
var character = surrogate[i];
i += 1;
if (character == '%') {
var hex = surrogate.substring(i, i += 2);
if (hex) {
result += String.fromCharCode(parseInt(hex, 16));
}
} else {
result += character;
}
}
return result;
};
// Usage example:
const utf16Text = 'I ❤️ JS'; // I ❤️ JS
const utf8Text = toUtf8(utf16Text); // I â¤ï¸ JS
console.log(utf8Text); // I â¤ï¸ JS
// See also:
// https://dirask.com/questions/What-encoding-uses-JavaScript-to-stores-string-jM73zD 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 // Note: JavaScript engine stores string literals in the UTF-16 encoding format.
const utf16Text = 'I ❤️ JS'; // I ❤️ JS
const utf8Text = unescape(encodeURIComponent(utf16Text)); // I â¤ï¸ JS
console.log(utf8Text); // I â¤ï¸ JS
// Warning:
//
// Although unescape() is not strictly deprecated (as in "removed from the Web standards"),
// it is defined in Annex B of the ECMA-262 standard, whose introduction states:
//
// … All of the language features and behaviors specified in this annex have one or more
// undesirable characteristics and in the absence of legacy usage would be removed from
// this specification. … … Programmers should not use or assume the existence of these
// features and behaviors when writing new ECMAScript code. …
//
// Source:
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/unescape
//
// See also:
// https://dirask.com/questions/What-encoding-uses-JavaScript-to-stores-string-jM73zD The Parameters
A string containing the text to encode. uint8Array A Return valueAn object, which contains two members: read The number of UTF-16 units of code from the source that has been converted over to UTF-8. This may be less than written The number of bytes modified in the
destination Encode into a specific position encoder
Buffer sizing To convert a JavaScript string If the output is expected to be long-lived, it makes sense to
compute minimum allocation Above If the behavior of your allocator is unknown, you might want to have up to two reallocation steps
and make the first reallocation step multiply the remaining unconverted length by two instead of three. However, in that case, it makes sense not to implement the usual multiplying by two of the already written buffer length, because in such a case if a second reallocation happened, it would always overallocate compared to the original length times three. The above advice assumes that you don't need to allocate space for a zero terminator. That is, on the Wasm side you are
working with Rust strings or a non-zero-terminating C++ class. If you are working with C++ No Zero-termination If the input string contains the character U+0000 in the input, If your Wasm program uses C strings, it's your responsibility to write the 0x00 sentinel and you can't prevent your Wasm program from seeing a logically truncated string if the JavaScript string contained U+0000. Observe:
Examples
Specifications
Browser compatibilityBCD tables only load in the browser See also |