Support

Blog

Browsing all articles tagged with troubleshooting

Flattr this!

As our clients probably know, we run our own email servers.

Email is one of those things that looks simple on the surface, but isn’t necessarily so simple when it comes down to troubleshooting.

As is typical, the day started with a trouble ticket from a client letting us know that they were having difficulties receiving mail from one of their clients.

I was also given enough information to start troubleshooting without having to ask for additional information, which was nice, as this sometimes takes time to get.

We do have an email issues page with some notes and instructions for what is needed linked off of our webmail page; the issues page is here.

The bounce message supplied included headers, so it was fairly straightforward to work out what server they were using.

The sender (junglefish.com) was seeing the below as an error, when emailing us:

Failure notice from mxttb2.hichina.com:

I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

As we have multiple mail servers in different countries, I first needed to identify which server they were connecting too, so I could check logs.

I usually do this by searching through our logs for occurrences of the senders domain.

If that doesn’t find any results, then I lookup what their mail servers should be and check for entries in our logs from those servers.

This doesn’t always guarantee a result, but in this case I had enough information to get a result.

Our sender was coming from junglefish.net, so I checked to see what mail servers they might use.

To check what mail servers are used, we check the MX (Mail Exchanger) records for their domain. In this instance I used dig – a dns lookup tool.

dig mx junglefish.net

junglefish.net. 3600 IN MX 10 mail.junglefish.net.
junglefish.net. 3600 IN MX 0 mail.junglefish.net.

We can see that mail is served by mail.junglefish.net for the junglefish.net domain. We can also see that the MX records are incorrect, as their are 2 entries pointing at the same server with different preferences. This won’t cause mail to be rejected though, so its not the issue we’re looking for.

Next we check the ip address for mail.junglefish.net

Some mail servers are on shared servers, so the domain name does not always match up with the ip address. We need the ip address as we need to search the logs for that, or addresses close to that range.

A simple ping should return their ip address.


ping mail.junglefish.net

Reply from 218.244.133.134...

In this instance, mail.junglefish.net points at 218.244.133.134

As mail servers also require valid reverse DNS entries, I also checked if that ip address has a valid reverse dns entry with nslookup.

nslookup 218.244.133.134
Server: ::1
Address: ::1#53

Non-authoritative answer:
134.133.244.218.in-addr.arpa name = out133-134.mxttb2.hichina.com.

In this case, their reverse DNS is out133-134.mxttb2.hichina.com
Just to double check thats valid, I pinged the name, and got a valid response.

ping out133-134.mxttb2.hichina.com
PING out133-134.mxttb2.hichina.com (218.244.133.134) 56(84) bytes of data.
64 bytes from out133-134.mxttb2.hichina.com (218.244.133.134): icmp_req=1 ttl=51 time=29.0 ms

So, we’ve done some basic checks –

We’ve checked that the sender domain has MX records.
We’ve checked that their mail server exists, and it has both forward and reverse dns.

Next up, I need to search the logs for connections from their server.

If I search for connections to our mail servers from 218.244.133 I saw the following:

USA – no results.
CN – no results.
CN2 – rejections.

Looking at the logs, I immediately saw that our server was rejecting their mail for failing MFCHECK

@400000004fb3951e1f2ee324 tcpserver: ok 8328 mail.smartshanghai.com:211.144.68.52:25 out133-134.mxttb2.hichina.com:218.244.133.134::51396
@400000004fb3951e28966374 jgreylist[8328]: 218.244.133.134: OK known
@400000004fb3952504477f44 qmail-smtpd[8328]: MFCHECK fail [218.244.133.134] junglefish.net

The MFCHECK code checks the domain portion of the envelope sender address (in the MAIL FROM command) to make sure it’s a real domain which has at least one MX record. This prevents spammers from being able to use phony domain names in their forged sender addresses.

In this case, our server was rejecting mail from them due to this failure.

As we didn’t have sufficient logs to see *why* this was happening, I turned on full connection logging for their ip address, and asked them to send us another mail.

They did so, and I checked the logs for what was happening.
(see below)

@400000004fb48fad0004630c tcpserver: ok 17315 mail.smartshanghai.com:211.144.68.52:25 out133-134.mxttb2.hichina.com:218.244.133.134::50611
400000004fb48fae061430c4 17315 > 220 mail.smartshanghai.com NO UCE ESMTP
400000004fb48fae090ae354 17315 < EHLO mxttb2.hichina.com 400000004fb48fae090decac 17315 > 250-mail.smartshanghai.com NO UCE
400000004fb48fae090df094 17315 > 250-SIZE 0
400000004fb48fae090df47c 17315 > 250-PIPELINING
400000004fb48fae090df864 17315 > 250 8BITMIME
400000004fb48fae0c26ec7c 17315 < MAIL FROM:
400000004fb48fc0396b0344 17315 > 451 DNS temporary failure (#4.3.0)
400000004fb48fc10a1b68f4 17315 < QUIT

In this case, their server started off the conversation with
EHLO mxttb2.hichina.com

As mail servers need to provide a valid domain that they claim to be sending from, I checked if mxttb2.hichina.com existed.
It didn't.

So, that was the reason.

Having identified that, I whitelisted their server ip address for the MFCHECK portion of our checks, and asked them to resend mail again.

Unfortunately this did not let them send mail as they failed another test!

@400000004fb495f92bd72ec4 qmail-smtpd[22644]: Received-SPF: error (mail.smartshanghai.com: error in processing during lookup of junglefish.net: DNS problem)

As SPF relies on TXT records, I tried to manually lookup their DNS

dig TXT junglefish.net

; <<>> DiG 9.7.3 <<>> TXT junglefish.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 11512 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

That failed.

Hmmmm.

I then tried using an oversea's server to lookup their domain.

dig ns junglefish.net

; <<>> DiG 9.7.3 <<>> ns junglefish.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41739 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;junglefish.net. IN NS ;; ANSWER SECTION: junglefish.net. 2931 IN NS ns54.domaincontrol.com. junglefish.net. 2931 IN NS ns53.domaincontrol.com.

That lookup worked.

So, we're getting somewhere.

We've identified a number of issues, and its starting to look like their DNS servers are blocked in China from a cursory test.

To confirm this I checked if using their DNS servers worked from within China.

Check from ns53 fails -
nslookup
> server ns53.domaincontrol.com
Default server: ns53.domaincontrol.com
Address: 216.69.185.27#53
Default server: ns53.domaincontrol.com
Address: 2607:f208:206::1b#53
> junglefish.net
;; connection timed out; no servers could be reached

Check from ns54 fails -
> server ns54.domaincontrol.com
Default server: ns54.domaincontrol.com
Address: 208.109.255.27#53
Default server: ns54.domaincontrol.com
Address: 2607:f208:302::1b#53
> junglefish.net
;; connection timed out; no servers could be reached
>

As those failed, I tried pinging their server.
ping ns54.domaincontrol.com
PING ns54.domaincontrol.com (208.109.255.27) 56(84) bytes of data.
^C
--- ns54.domaincontrol.com ping statistics ---
8 packets transmitted, 0 received, 100% packet loss, time 7056ms

Ping test fails

Next, I tried to see where it was failing with traceroute.

root@edm:/home/shanghaiguide# traceroute ns54.domaincontrol.com
traceroute to ns54.domaincontrol.com (208.109.255.27), 30 hops max, 60 byte packets
1 reserve.cableplus.com.cn (211.144.79.254) 5.723 ms 5.899 ms 6.078 ms
2 211.154.92.89 (211.154.92.89) 3.860 ms 3.878 ms 3.936 ms
3 211.154.64.94 (211.154.64.94) 3.545 ms 3.608 ms 3.686 ms
4 211.154.72.181 (211.154.72.181) 4.154 ms 4.421 ms 4.419 ms
5 202.96.222.73 (202.96.222.73) 4.657 ms 4.632 ms 4.635 ms
6 61.152.81.189 (61.152.81.189) 4.671 ms 5.401 ms 5.367 ms
7 61.152.86.50 (61.152.86.50) 6.387 ms 6.330 ms 6.301 ms
8 202.97.33.86 (202.97.33.86) 8.003 ms 7.941 ms 7.743 ms
9 202.97.33.254 (202.97.33.254) 7.829 ms 7.656 ms 7.647 ms
10 202.97.50.114 (202.97.50.114) 175.491 ms 175.982 ms 171.405 ms
11 202.97.52.218 (202.97.52.218) 155.507 ms 155.755 ms 155.750 ms
12 218.30.54.150 (218.30.54.150) 191.389 ms 191.306 ms 191.544 ms
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *

Traceroute fails (at 218.30.54.150)

whois 218.30.54.150
% [whois.apnic.net node-5]
% Whois data copyright terms http://www.apnic.net/db/dbcopyright.html

inetnum: 218.30.32.0 - 218.30.55.255
netname: CHINANET-US-POP
descr: Chinanet POP in American
descr: 201 S. Lake Ave. Suite 604, Pasadena, CA 91101
country: CN
admin-c: CH93-AP
tech-c: CH93-AP
mnt-by: MAINT-CHINANET
changed: hostmaster@ns.chinanet.cn.net 20020221
status: ALLOCATED NON-PORTABLE
source: APNIC

So, it looks like there are 3 issues.

1) Invalid MX setup (although this is not going to stop mail from failing).
2) Their Mail server claims to be a non-existent domain.
(Note that this seems to have been subsequently rectified, as that domain now resolves).
3) Their DNS servers are blocked within China.

To solve that, I added an external DNS lookup from outside the country.

In summation, simple things may involve multiple issues, as we can see from above!

In this case, we resolved the issue by whitelisting the senders ip to bypass the MFCHECK failure. We then added external DNS resolvers to our Mail Servers to bypass the block.
Lastly informed the sender, and the client of all of the issues noted so that they could be resolved.