Domains are an integral part of the internet. Similar to how people write different languages using different characters or scripts, domain names can be composed of various scripts in whole or in part, and are called Internationalized Domain Names (IDN). It is possible to create labels which look similar to combining characters from different scripts. An example would be using Cyrillic “а” (U+0430) which resembles ASCII “a” (U+0061). Malicious players could abuse this factor in order to spoof domain names and trick the user. For this reason, browsers have been very careful in determining when to show the Unicode form of the characters composing an IDN or an alternative form comprising of only ASCII letters, called Punycode.

Recently, through the work of Xudong_Zheng, such an IDN homograph phishing possibility was revealed. It took advantage of a case which was not previously covered by the checks which determined how the IDN was displayed in the address field and other UI. Chromium fixed this quite quickly. As the Opera browser, since 2013, is participating in the Chromium-project, we have now merged this fix to all channels and a stable channel update of the Opera Desktop browser will be released in the coming week.

While a few demonstration websites were set up to show how this attack might work, real attacks are much less likely to appear. To use this for phishing, an attacker would need to get past a domain registrar’s checks in order to register a domain which appears to be the same as the domain name they are trying to falsify. Domain registrars are now aware of this case, and should improve their checks accordingly.

We hope this information is helpful for you. Please feel free to leave your feedback in the comment box below if you have any concerns. Browse safe.

25 April 2017 edit: The update has been released.

Back to top
  • ratamies

    This post doesn’t contain details about fix, so here what I found.
    Previously Chromium hid Unicode form when domain contained characters from different languages. It made possible attacks when every character is replaced with a similar character from other language.

    This was partly fixed now ( https://codereview.chromium.org/2683793010 ):
    “Block domain labels made of Cyrillic letters that look alike Latin
    Block a label made entirely of Latin-look-alike Cyrillic letters when the TLD is not an IDN (i.e. this check is ON only for TLDs like ‘com’, ‘net’, ‘uk’, but not applied for IDN TLDs like рф.”

    Why partly? Well, that’s because it is hard to detect single language homographs programmatically. Firefox even decided not to fix this bug and rely on domain registrars ( https://bugzilla.mozilla.org/show_bug.cgi?id=1332714#c78 )

    And some more comments why Firefox didn’t apply Chromium fix ( https://wiki.mozilla.org/Gerv's_IDN_Display_Algorithm_FAQ#Why_doesn.27t_Firefox_implement_Chrome.27s_fix.3F ):
    “Chrome’s “fix” is very specific to the issue of Cyrillic/Latin spoofing (whereas there are many other troublesome combinations) and, while they have attempted to reduce the scope, still treats that script as second-class. Their changes makes 2,800 legitimately-registered domains in .com alone stop displaying properly. If one of them was the name of your company, you would be justifiably upset.”

    • tarquinwj

      IDN homograph attacks have been known for quite some time and the approach to IDN support is continuously improved as new possibilities for confusion are found. Some parts of the solution will inevitably target specific languages more than others, in the same way that some attacks will do the same. For the detail of how this issue has being fixed, and for how the IDN support is implemented in general, you can refer to https://www.chromium.org/developers/design-documents/idn-in-google-chrome – we will implement the fix in the same way as Chrome.

    • tarquinwj

      Separate note about the legitimate domain names that can no longer be displayed in IDN. Support for IDN will always require some compromise between allowing websites to have domain names in the natural language for that website, and preventing a user from becoming confused when two domain names can appear the same due to IDN homographs. When there is a possibility for two scripts to contain characters that appear the same, the browser will have to decide which one to display as an IDN, and which one to display in some other format such as punycode.

      The browsers that support IDN will try as much as possible to display the domain name as IDN. However, there will be cases where the browser cannot decide whether a domain name is a valid domain name in one script, or a phishing attempt that uses homograph characters from another script. The demo used “apple” but it could just as easily have used a valid word written in Cyrillic that could be spoofed using Latin characters – spoofing is possible in both directions. An example would be “гораагора.com” in Cyrillic script, which could be spoofed using Latin characters “ropaaropa.com”.

      The solution is to ensure that when a spoof is possible, one of them must be displayed in some other format. From a security perspective, it does not really matter which one ends up being displayed which way, as long as a given domain name is always displayed in that way, so the user knows what to expect for the domain they are visiting, and so that it does not look the same as the spoofed version. For legacy reasons (domain names were traditionally only allowed to use Latin characters, so far more registered domain names use Latin), Latin characters are usually the ones chosen to be displayed as their own characters, while characters from other scripts are displayed in punycode instead, when a spoof is possible.

      It is unfortunate that there are some valid domain names that can not be displayed as IDN (they are displayed as punycode instead), but this now also prevents those domain names from being confused with their Latin equivalents, which in turn could have been used to spoof the legitimate IDN domain name.