ast.ral — eureka's homepage

astr.al

blog / research notes | code | music | recipes | zv.is network research lab

Domain Bitsquatting

Back in 2011, I read an interesting paper by Artem Dinaburg on the concept of bitsquatting: registering domain names that differ from well-known, high-traffic domains by only a single bit in their in-memory representation. Another researcher, Luke Young, performed similar research in 2015 under the name Project Bitfl1p, which also culminated in a DEF CON talk. While a new DEF CON presentation isn't really the sort of thing I'm interested in, the idea behind monitoring bitsquatted domains is rather fascinating, and much has changed in the 11 years since the last deep dive into the topic.

In 2015, DDR4 RAM had only just been released a year earlier and was beginning to make its way into consumer-grade devices, while DDR3 was still alive and well. Samsung had begun mass-producing 20nm DRAM the year prior, and that process stuck around through the transition from DDR3 to DDR4. Many cheaper and slower DRAM chips were still being manufactured on 30nm and 25nm as well. In the realm of CPUs and GPUs, AMD was still firmly using a mixture of 32nm for their Piledriver μarch and 28nm processes for their Excavator cores, while Intel's previous two generations, Ivy Bridge and Haswell, were on 22nm. The eternal phoenix known as Skylake debuted later that year, bringing Intel to 14nm, where they would remain for far longer than most folks would have assumed at the time.

Neither DDR3 nor DDR4 included ECC functionality by default. ECC UDIMMs were a tiny niche in the consumer and workstation markets, while RDIMMs filled the servers of the era. Most consumer equipment had no error correction, and thus single-bit errors were very likely to go undetected in many cases. At that point in time, billions of smartphones were being sold along with hundreds of millions of laptops and desktops, all ripe candidates for single-event upsets (SEUs). In the roughly six months that Luke Young ran his experiment, he collected more than 2 million HTTP requests and dozens of gigabytes of DNS logs.

Revisiting bitsquatting in 2026

In our present era, technology has changed in many ways (and in some ways hasn't changed at all, but that's a different topic :) AMD released their first 7nm CPU architecture, Zen, in 2017, and the most recent revision, Zen 5, is manufactured on TSMC's N4P process. Even Intel finally broke their 14nm curse (and also renamed all of their processes to sound much smaller) and is currently shipping what they call 18A (1.8nm). The current leading-edge mass-market DRAM process is down from 20nm to 12nm, though manufacturing DRAM is a very different problem from manufacturing CPUs. It's worth noting that these process names are effectively meaningless marketing speak. Fabs have achieved considerably reduced feature sizes over the last 10 years, but there is no direct relationship between the marketing monikers (3nm, "18A", etc.) and the actual physical gate length or pitch of these processes. Arguably this was also true, to an extent, 10 years ago, but now we're truly deep in the "call it whatever makes it sell better" era.

The painful truth about our new world of smaller silicon, massive caches, and far more transistors is that reliability is shrinking too. As 10nm and below have become mainstream, there's been a massive uptick in problems with leakage currents and quantum tunnelling, both of which contribute to increased rates of single-event upsets. Error correction has always been a vital part of making the digital world appear consistent and reliable, but the analog world that underpins it has been encroaching ever closer, requiring even consumer-grade equipment to feature ECC in places where it traditionally has been ignored. CPU caches (at least L2 and lower) now include ECC, and DDR5 DRAMs also have on-die ECC. The DDR5 ECC requirement is notable because it's distinct from traditional ECC at the DIMM level: bitflips at the die level aren't reported. On-die ECC doesn't replace traditional ECC DIMMs, but instead is necessary to guarantee acceptable error rates comparable to DDR4 and older process nodes.

These technological evolutions raise the question: are single-event upsets and bitflips more commonplace on modern hardware, or have designers done enough work to hide the beastly analog world from us for a little while longer?

For this experiment, I've chosen googleusercontent.com as my bitsquatting target. Google has specifically stated that they have no interest in "[playing] an unbounded game of whack-a-mole" and, to my benefit, of the 80 possible variants of googleusercontent.com, all but 3 were still available for registration as of 2026-03-05.

Our domains team let us know they won't be trying to grab these, so you can just let them expire... The sheer number of bit-flipping possibilities makes this an unbounded game of whack-a-mole.
- Google's response to Luke Young attempting to hand over the bitsquatted domains

Another thing that's changed since 2015: most mainstream domain registrars now have much stricter phishing and scam filters, and registering domains containing the word google proved to be surprisingly difficult at a low cost. All of the registrars that offered discounted .com registrations either cancelled and refunded my orders, or outright blocked me from registering ([insert namecheap picture here] I'm looking at you, Namecheap - turning the "View Cart" button into "BANNED" is rather amusing). After several false starts, I finally managed to register the first batch of domains with Porkbun. Porkbun has a very pleasant and classic user interface, but unfortunately charges a hefty $11.08 per .com registration, bringing the grand total for this project to at least $853.16 (at least, once I catch 'em all). If you find this work interesting, I'm always open to some research donations; just shoot me an email ;)

The first batch of 13 domains went live on 2026-03-05. Now, in 2026, the internet's "background radiation" problem is incredibly bad. Ever since my first exposure to the world of internet-facing hosts on a meager shared shellbox running FreeBSD 4, I've watched this background radiation worsen year by year. It was especially bad back when OpenVZ containers were still popular; just imagine what it was like for hundreds of OpenSSH daemons filling up the better part of a /24, all running on a single machine and getting bruteforced simultaneously. In recent years, though, it's reached entirely new levels. One major factor is an uptick in aggressively unethical scraping for LLM training purposes, but there's also a vast swarm of shotgun-style exploit attempts ranging from Mirai-derived botnets attempting to spread themselves to numerous "port scan database" sources like Shodan and Qualys. Literally within minutes of bringing an httpd up on the bitflip project VM, I was seeing several requests per second hitting the machine. (In the Seattle PoP alone, with just a single IPv4 /24 announced, we see upwards of 300-400 connection attempts per second at any given time.) I also issued Let's Encrypt wildcard certificates for each of the domains, and again, within minutes of issuing new certificates, scanner traffic picked up steam. I suspect that many scrapers and vuln scanners are monitoring certificate transparency logs for fresh targets, since SNI vhosts are the norm these days, and blindly hitting hosts by IP is far less likely to get a positive hit on a vulnerable target.

Due to the sheer volume of bot traffic, I wrote some quick tools to filter out the undesired access logs using a combination of UA, URI, and IP range filtering. In this particular case, since googleusercontent.com delivers most of its content from a handful of subdomains and generally uses fairly distinctive paths, filtering signal from all this noise is relatively easy. At the time of writing (2026-03-31), I've just purchased the second set of 22 domains, provisioned their DNS zones, and issued certificates as needed. The existing access logs are only ~254,000 entries raw, but with a 99.97% filter rate that leaves a mere 75 legitimate hits:


            statistics:
            hits		254221
            after filter	75
            bot rate	99.97%
            addrs filtered	767
            UAs filtered	15
            paths filtered	247

Future plans

At the moment, I am only collecting HTTP/HTTPS access logs. As I continue to collect the full set of domains in the single-bitflip set, I plan to hack up a small custom DNS server to log all requests. It's likely that many bitflips that do occur don't yield a valid TLS handshake; for example, only the DNS question packet may have experienced a transient flip while the user's browser is still expecting the real googleusercontent.com hostname. I'd especially like to deploy this DNS server on the zv.is anycast network, perhaps to see whether there are any geographic patterns linked to the highest concentration of bitflips. In the preliminary tiny dataset, the #1 most frequent country of origin has been Thailand, closely followed by India. India I can understand, given the density of mobile devices with no ECC, but Thailand is a bit of a surprising outlier. Only time will tell whether it's just statistical noise.