Text and Internationalization support in Opera Presto 2.7
Unicode character set support in Opera
Opera can work with all the characters in the Unicode specification.
- All text communicated to Opera from the network is converted into Unicode.
- In order for Opera to render Unicode characters, the needed glyphs have to be available in the fonts on your system. This
might be a problem for older Windows systems. For information on available fonts, see
Unicode fonts for Windows computers.
- Updated Unicode character data tables from Unicode v5.0.0 to v5.1.0.
- Uniblocks table now supports ranges outside Unicode plane 0. This is needed to do proper font-switching of characters
outside the Unicode plane 0.
Opera implements the following writing system related functionality improvements:
- font-switching: needed in order to display characters that the current font does not include
- line-breaking: needed in order to break scripts written without spaces, such as Chinese, Japanese, and
Korean
- CJK: improved line height and underlining in Chinese, Japanese, and Korean
- KDDI emojis: improved support for KDDI emojis and special characters
- Multistyle: improved default fonts for non-western Web pages
Opera relies on the operating system to perform:
- character shaping: contextual glyph selection, ligature forming, character stacking, combining character
support, etc.
Opera Presto includes support for Unicode 5.2 character properties (class, casing, bidirectionality, mirroring, normalization) from 5.0.
Legacy encoding support
Although Opera works with the Unicode character set and its character encodings of UTF-16 and UTF-8, most text on the Internet
is encoded in legacy encodings, for instance:
- ISO 8859-1
- Windows-1251
- Shift_JIS (MIME name)
- EUC-KR
Opera handles this by detecting the character encoding used, and converting it to UTF-16. The user has three options for
how to handle these pages.
- Auto-detect: in this mode Opera will attempt to detect the encoding used by the page
- If the transport protocol provides an encoding name, that is used
- If not, Opera will look at the page for a charset declaration
- If this is missing, Opera will attempt to auto-detect the encoding, using the domain name to see if the script is a CJK
script, and if so which one
- Opera can also auto-detect UTF-8
- Writing script auto-detect: In this mode the user can tell that this is a Japanese or Chinese page, but that the encoding
is unknown. Opera will then analyze the text in the page to determine which encoding is used.
- Encoding override: In this mode the user selects an encoding. This encoding will be used by Opera, regardless of what
the page and transport protocol claims is the encoding for the page.
Big5-HKSCS support for the HKSCS-2008 encoding standard has been updated.
Support for bidirectional text
Opera supports bidirectional text as described in Unicode,
HTML, and CSS.