Text and Internationalization support in Opera Presto 2.7
Unicode character set support in Opera
Opera can work with all the characters in the Unicode specification.
- All text communicated to Opera from the network is converted into Unicode.
- In order for Opera to render Unicode characters, the needed glyphs have to be available in the fonts on your system. This
might be a problem for older Windows systems. For information on available fonts, see
Unicode fonts for Windows computers.
- Updated Unicode character data tables from Unicode v5.0.0 to v5.1.0.
- Uniblocks table now supports ranges outside Unicode plane 0. This is needed to do proper font-switching of characters
outside the Unicode plane 0.
Opera implements the following writing system related functionality improvements:
- font-switching: needed in order to display characters that the current font does not include
- line-breaking: needed in order to break scripts written without spaces, such as Chinese, Japanese, and
- CJK: improved line height and underlining in Chinese, Japanese, and Korean
- KDDI emojis: improved support for KDDI emojis and special characters
- Multistyle: improved default fonts for non-western Web pages
Opera relies on the operating system to perform:
- character shaping: contextual glyph selection, ligature forming, character stacking, combining character
Opera Presto includes support for Unicode 5.2 character properties (
class, casing, bidirectionality, mirroring, normalization) from 5.0.
Legacy encoding support
Although Opera works with the Unicode character set and its character encodings of UTF-16 and UTF-8, most text on the Internet
is encoded in legacy encodings, for instance:
- ISO 8859-1
- Shift_JIS (MIME name)
Opera handles this by detecting the character encoding used, and converting it to UTF-16. The user has three options for
how to handle these pages.
- Auto-detect: in this mode Opera will attempt to detect the encoding used by the page
- If the transport protocol provides an encoding name, that is used
- If not, Opera will look at the page for a charset declaration
- If this is missing, Opera will attempt to auto-detect the encoding, using the domain name to see if the script is a CJK
script, and if so which one
- Opera can also auto-detect UTF-8
- Writing script auto-detect: In this mode the user can tell that this is a Japanese or Chinese page, but that the encoding
is unknown. Opera will then analyze the text in the page to determine which encoding is used.
- Encoding override: In this mode the user selects an encoding. This encoding will be used by Opera, regardless of what
the page and transport protocol claims is the encoding for the page.
Big5-HKSCS support for the HKSCS-2008 encoding standard has been updated.
Support for bidirectional text
Opera supports bidirectional text as described in Unicode,
HTML, and CSS.