When DNS goes bad

This year someone in China misconfigured something which effectively exported China’s main method of implementing blocks (man in the middle DNS spoofing) semi globally over the Global Crossing backbone for the last few weeks.

Effectively, China’s blocking, went global (for certain providers).

This is a little technical, so bear with me while I try to put in into laymans terms!

When you ask for www.somesite.com, a query is sent to your ISP’s DNS servers asking for the i.p. address. If those DNS servers don’t know, they in turn who then ask their upstream DNS servers (if they exist) and so on, until one of them will then ask the root servers who is responsible for that domain.
These root level servers are based geographically, and are the arbiters of whether a domain is resolvable or not.
If they don’t know about a domain, then essentially that domain doesn’t exist, as they are the servers that other servers rely on.
If for instance a root level server suddenly decided it didn’t know who it should send CN names to, then that entire section of the net would be unresolvable for anyone who used those root servers.

This has actually happened at least once already; Swedish .se domains dropped off the internet completely for a few hours to a day (dependent on caching) due to a misplaced full stop in October 2009.

This is not what was happened with this instance, but hey, its the _same_ company (different division) again with another DNS issue.

I’ll start with the infrastructure -

Swedish company NetNod (aka Autonomica) has a DNS root server* here in China – I.ROOT-SERVERS.NET / 192.36.148.17

*Server in this case actually refers to many servers providing a DNS service as i.root-servers.net
i.root-servers.net servers are geographically located all over Asia (and other places).

(See below for a map)

A root server as stated, is almost the final arbiter of any DNS lookup. It knows which servers service top level domains (TLD’s). So its the one that gets to tell your DNS server where .com, .net, .cn, .hk, queries should be sent to.
All these root servers also provide caching, so if (as is probable), someone else asks for that domain again, it knows how to answer.

Netnod, like other companies that provide root level servers, use a mechanism called anycast to deliver users to the best destination server for the DNS query.

[ From Wikipedia - In anycast, there is a one-to-many association between network addresses and network endpoints: each destination address identifies a set of receiver endpoints, but only one of them is chosen at any given time to receive information from any given sender. ]

Anycast operates over BGP to delegate best routing to a destination based on AS (automated system) rules.
Anycast by design, is inherently insecure, as anyone at the right stage of the chain can intercept packets for the anycast address.  This is really able to be done by routers at the BGP level of routing, so AS owners rely on each other not to mess around.
Essentially, if you are trusted enough to have an AS, you are trusted enough not to screw up.

[ From Wikipedia - On the Internet, anycast is usually implemented by using BGP to simultaneously announce the same destination IP address range from many different places on the Internet. This results in packets addressed to destination addresses in this range being routed to the "nearest" point on the net announcing the given destination IP address.

AS - An autonomous system (AS) is a collection of connected IP routing prefixes under the control of one or more network operator that presents a common, clearly defined routing policy to the Internet (cf. RFC 1930, Section 3.]

Ok, so now you have a pseudo glossed over idea about BGP, AS, and Anycast, I can continue :)

Computers in other countries (mostly on Global Crossing networks, as noted above) were starting to get spurious DNS results.

If you remember above, the NetNod root server based in China, uses AnyCast via BGP to talk to things asking about DNS.  If we look at the BGP routing for the I.ROOT-SERVERS.NET, we can get an idea of how things are laid out from a network perspective

I.ROOT-SERVERS.NET sits in AS29216
Robtex (which is unfortunately blocked in China), shows the connectivity for that block here -
http://www.robtex.com/as/as29216.html#graph

AS29216 apparently only links to AS8674 (NETNOD-IX).
That AS block talks to quite a few others, including one named AS24151.
AS24151 is controlled by CNNIC.

CNNIC is a China government run .cn domain management organization*
(*In practice. They may or may not be government owned in what passes for “reality” here).

What happened (allegedly, as I haven’t read up completely about this on the dns-operations list), is that another DNS server upstream of AS8674 (most probably on AS2151) came along and said hey! I’m a root level server.

This “rogue” root server sat in the anycast block in use by I.ROOT-SERVERS.NET, and advertised themselves as a root node, randomly intercepting traffic (as by design this is supposed to happen in Anycast).  This shouldn’t happen, but as AS2151 is trusted by the other AS’s they accepted its announcement about having a root node server, and the other nodes started caching its queries.

This started causing all sorts of sporadic mischief, as other servers started caching those “bad” (China firewalled) results from the I.ROOT-SERVERS.NET rogue server.

Normally this would be a regional issue, but as BGP is not best distance based, but AS based, other AS’s close (from a network perspective) that also use Anycast via BGP would take answers from that node for queries too.

This “rogue” root node was configured to do its DNS in standard China Firewall style, and null route/ block servers – eg Youtube, Facebook, Twitter…

As it was responding to DNS queries via Anycast in the Root level server AS, other secondary DNS servers and upward were querying it, caching the bad responses, and then null routing those major US based internet services within their own regions.

This started happening intermittently over Global Crossing nodes until the problem was spotted and resolved.

Users locations as far away as California, Chile, and China (although admittedly here its broken by design) were getting DNS results “China style”.

Lots of finger pointing went on until the people running NetNod/Autonomica eventually twigged that this was happening, and stopped accepting BGP / Anycast routes from AS2151 at AS29216, which meant that things would get back to normal after caching expired.

Its quite possible this was just a screwup on someones part here within CNNIC, but the tin foil hat wearing brigade may think otherwise.  Personally I put this down to either testing purposes, or user error.  Understanding the intricacies of implementation and its implications is harder than it first appears, and its easy to screw up.  That said, it did take a lot of coincidence for this to happen like that, and acting like a root server would put a noticeable amount of additional load on the server(s) doing the replies, so it would be noticed as least on that level.

This isn’t the first time this *exact* issue has happened on a global scale either. Network operators in Pakistan did a similar thing in 2008 which affected Youtube globally, with users getting similar bad routing as far away as UK.

What does this mean for the future?

Trust is a delicate issue, and it looks like people will eventually no longer implicitly trust upstream or downstream providers on BGP to do the right thing.

Ironically Autonomica / NetNod are some of the people involved with making sure this kind of thing *doesn’t* happen again!

Autonomica are involved quite heavily in something called DNSSEC.

DNS queries don’t mandate security, so a query can be resolved by a server in the right place, at the wrong time (as seen above). With DNSSEC, the queried server will be the one that answers you using a signed key, so any rogue server in place should not be able to work as it doesn’t have the correct credentials. It also means that a rogue server would be more easily spotted as the keys can be readily identified for a given server.

DNSSEC is in the process of being rolled out, and it looks like things like this will only mandate the rollout goes faster.

Unsuprisingly China is not quite convinced this is a solution, mostly I suspect, as this will break their DNS firewalling methods..

DNSSec rollout map, and a rather excellent talk about this and other DNS issues by Paul Wouter is here:

http://www.xelerance.com/dnssec/

http://www.xelerance.com/talks/sector/Sector2007DNSSEC.pdf

China is also not without its own DNS issues (other than the deliberately implemented ones) as anyone who lives here has experienced.

Last year May saw most of China’s DNS completely collapse for a day as provider DNSPOD was subject to an inadvertent DoS attack via queries against Baofeng.com. Good PDF on that here by China Telecom Guangzhou staff Ziqian Liu – https://www.dns-oarc.net/files/workshop-200911/Ziqian_Liu.pdf


Further reading and research materials below:

http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VNT-4S807WG-G&_user=10&_coverDate=04%2F30%2F2008&_rdoc=1&_fmt=high&_orig=browse&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=ccc0471388f3fb33fcecdd3409f4f9cc Pakistan DNS security weakness

http://en.wikipedia.org/wiki/DNSSEC DNS Security

http://royal.pingdom.com/2009/10/13/sweden%25E2%2580%2599s-internet-broken-by-dns-mistake/ Sweden disappears from the net

http://www.netnod.se/dns_root_nameserver.shtml – NetNod’s website

http://www.isoc.org/briefings/020/ – DNS Root server FAQ’s

http://blogs.csoonline.com/1179/chile_nic_explains_great_firewall_incident

https://lists.dns-oarc.net/pipermail/dns-operations/2010-March/005267.html – DNS issue list where this was noted.

3 Comments to “When DNS goes bad – China’s Firewall goes global.. crossing.”

  • Craig says:

    Interesting stuff! Obviously trust is a problem with untrustworthy servers. i.e. Trusting chinese servers is a bad idea, as they are intentionally breaking the data.. I suppose the system was built for an open internet, not a govt. trying to censor it..

    Not really comfortable putting my accurate email on this, but you know who I am.

  • If you read the notes, especially the posts in the dns-operations list, there are some really choice quotes.

    These are not in any particular order, but are interesting for a background.

    Bill Manning – “this is not particularly odd or even unknown. although many might have forgotten. This particular topic (China running copies of root DNS service inside China and
    the occasional leaking occuring) was discussed at some length at the Paris IETF
    years ago and confirmed by the Chinese.”

    Lee Xiaodong – “As the local host of the mirror site of I root server which was agreed
    by I root server administrator, and also as the “.CN” registry which is
    one of the members of DNS community, we wanna clarify that CNNIC never
    did any interceptions or other things for the mirror site of I root
    serer, CNNIC only provides the stable Internet connection, power and
    necessary hand support.”

    Stephane Bortzmeyer – “Nobody said it was you. It could be the ISP’s IGP which was hacked to direct queries to a rogue and unofficial copy of I-root.”

    Bert Hubert – “Thanks! The exact same issue has been seen from Shanghai for all
    root-servers *except* d.root-servers.net. Including answering with an A
    record for a +norecurs SOA query for facebook.com.

    The IP addresses being returned look like they are random, they do not work
    in any case.

    Interestingly, TCP/IP based DNS is not messed with!”
    ========

    This probably also explains the issues I was having last year with DNSBL lists acting all flaky. Hmmm…
    It also means that if you run servers in China, you just can’t trust the network to give you reliable data.

    This is something I keep having to explain to clients here!

    On the other hand, it keeps me busy, as we can upsell non-tainted internet service over our VPN based links :)

Post comment

Archives

Categories

Most Popular Posts

Tags

Recent Comments

  • tryphon: It helped me to fix mine. I used a pair of pliers like you did and it worked fine. I drink a coffee typing...
  • mark: I have a ms10105 v4.1 moshisoft board and here is the pinout: 1 y stepper a (yellow) 2 y stepper a (white) 3...
  • Lawrence Sheed: Haven’t taken a deep look yet, probably next month can check it out. There are people who are...
  • mark: Yes…that moshi software is crap. I used the corel draw plugin for awhile but it only works about 20% of...
  • Kunlun: I tried to get my motorbike lesson after my car driving lesson, they answered me that I needed to wait 1...

Recent Trackbacks

PHOTOSTREAM

 CNC on the desk at the factory