VigilDNS

Homoglyph attacks: when the letters themselves lie

A homoglyph attack builds a fake domain out of characters that look like the real ones: a zero for an o, the pair rn for an m, or a Cyrillic letter that renders pixel-identical to its Latin twin. The result can be a domain that is byte-for-byte different from yours yet visually indistinguishable, even to careful readers.

Homoglyph, homograph: what the terms mean

A homoglyph is a character that looks like another character. A homograph attack (often called an IDN homograph attack when Unicode is involved) is the use of homoglyphs to construct a deceptive string, most importantly a domain name. The two terms are used almost interchangeably in practice. Homoglyph substitution is one of the core techniques in the broader family covered in what is typosquatting, but it deserves its own treatment because it defeats the usual advice of "read the URL carefully."

ASCII homoglyphs: no Unicode required

The oldest tricks stay inside plain ASCII, so they work in every system and bypass every Unicode defense:

How convincing these are depends heavily on font and rendering. In a proportional font at email-client sizes, rn versus m is genuinely hard to catch.

IDN homographs and punycode

Internationalized Domain Names (IDNs) let domains contain non-ASCII characters, which is essential for most of the world's languages. The side effect: many Unicode characters are visually identical to Latin letters. Cyrillic а (U+0430) and Latin a (U+0061) typically render exactly the same, yet they are different characters, so a domain spelled with a Cyrillic first letter is a completely different domain from the real one.

Under the hood, DNS only carries ASCII, so IDNs are encoded as punycode with the prefix xn--. The Cyrillic-a version of a domain looks like xn--cmebank-... in raw DNS. That encoded form is your friend: it is unambiguous, and seeing an unexpected xn-- where a familiar brand should be is a reliable red flag.

A small table of confusables

GenuineLookalikeWhat it is
o0ASCII digit zero
l1 or IASCII digit one, capital i
mrnTwo ASCII characters
wvvTwo ASCII characters
a (Latin)а (Cyrillic U+0430)IDN homograph
e (Latin)е (Cyrillic U+0435)IDN homograph
o (Latin)о (Cyrillic U+043E)IDN homograph

Unicode's confusables data lists thousands of such pairs across scripts; this table is only a taste of the space an attacker can draw from.

How browsers defend, and where that fails

Modern browsers apply IDN display policies: if a domain mixes scripts suspiciously (say, one Cyrillic letter inside an otherwise Latin label), the address bar shows the raw punycode (xn--...) instead of the deceptive glyphs. These rules, tightened after high-profile proof-of-concept domains, make pure IDN homographs much less effective in the URL bar itself.

The defenses fail where the link is actually read, which is usually not the address bar:

How detection works

Defenders cannot rely on eyes, so detection is computational. Monitoring systems generate the homoglyph permutations of a brand domain (using confusables mappings), then check which ones are registered, resolve in DNS, hold TLS certificates, or serve content. Normalization techniques map lookalike characters back to a canonical skeleton so that the Cyrillic and Latin spellings collide and get flagged. Certificate Transparency monitoring is especially useful here, since a homograph domain getting a certificate is a strong pre-attack signal, and the CT entry contains the unambiguous punycode form. VigilDNS includes homoglyph generation among its 11 permutation techniques and matches CT log entries against them continuously.

Defending your brand

There is no single fix, so layer three things. Monitor the permutation space continuously, because registration is the attacker's first observable move. Train users to treat links in email and chat as untrusted regardless of how the text looks, since looking carefully is exactly what homoglyphs defeat. Register key variants defensively, but understand it is partial at best: with thousands of confusable combinations across scripts and TLDs, the space is far too large to buy out. If one does appear against your domain, follow the steps in someone registered a lookalike of my domain.

Frequently asked questions

Are IDN homograph attacks still possible in modern browsers?

Pure single-script tricks in the address bar are largely blocked by punycode display rules, but those defenses do not extend to email clients, chat apps, and link previews, and ASCII homoglyphs like rn for m are unaffected entirely. The attack surface moved; it did not disappear.

How can I check whether a domain uses Unicode lookalikes?

Look at its punycode form. If converting the domain yields an xn-- prefix you did not expect, it contains non-ASCII characters. CT log records and raw DNS always show the punycode form, which is unambiguous.

Does registering my domain in other scripts protect me?

Only slightly. You can register a handful of obvious variants, but confusable characters across Cyrillic, Greek, and other scripts multiply with every letter in your name. Monitoring the space is the scalable defense.

Curious which homoglyph and typo variants of your domain are already registered? Run the free typosquat checker and see in seconds.