Discussion:
[Dnsmasq-discuss] intermittent connection refused errors
Guido Pepper
2017-05-18 22:45:57 UTC
Permalink
Hello.
We are running dnsmasq version

/usr/sbin/dnsmasq --version
Dnsmasq version 2.76 Copyright (c) 2000-2016 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP
DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect
inotify

We run dnsmasq in our kubernetes (https://kubernetes.io/) clusters to
perform DNS resolution for the container based services running in the
cluster. I wrote up a bigger picture overview of our situation here
http://stackoverflow.com/q/44030167/6067470.

The key points are that the applications running in our clusters
experience intermittent name resolution errors. At the same time that
1 or more applications have a name resolution error we get connection
refused errors from an application that is querying dnsmaq for it's
metrics (eg: dig +short chaos txt cachesize.bind). I'm thinking that
the DNS failures we are seeing is that dnsmasq is refusing the
connection. I'm hoping someone can point me in a direction to get to
the root of these issues. The only thought I have is to run dnsmasq
in debug mode in the hopes that when connections are not being
accepted something will get logged that would be a clue as to why this
is happening. I'm wondering if that's a sound approach or if anyone
has alternate ideas for me to move this situation forward.

Thanks for listening!
Simon Kelley
2017-05-20 20:28:53 UTC
Permalink
On what basis do you think that your clients are getting "connection
refused". That's a specific ICMP error which typically originates in the
kernel, because there's nothing actually listening on a port.

Also, are you using TCP connections? If not, there's not a connection,
as such, the client sends a single UDP packet with the query, and some
time later it, probably, gets a single UDP packet reply. The reply may
never come because UDP is unreliable, either to and from dnsmasq, or to
and from the upstream server it forwards too. Under heavy load, UDP
packets may be dropped by the kernel too.


Your error message

ERROR: logging before flag.Parse: W0517 03:19:50.139060 1 server.go:53]
Error getting metrics from dnsmasq: read udp
127.0.0.1:36181->127.0.0.1:53: i/o timeout

implies that the client is timing out awaiting the UDP reply. Does it
retry the query under those circumstances? If it doesn't, that's your
problem, you're assuming UDP is reliable, when it ain't.


Cheers,

Simon.
Post by Guido Pepper
Hello.
We are running dnsmasq version
/usr/sbin/dnsmasq --version
Dnsmasq version 2.76 Copyright (c) 2000-2016 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP
DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect
inotify
We run dnsmasq in our kubernetes (https://kubernetes.io/) clusters to
perform DNS resolution for the container based services running in the
cluster. I wrote up a bigger picture overview of our situation here
http://stackoverflow.com/q/44030167/6067470.
The key points are that the applications running in our clusters
experience intermittent name resolution errors. At the same time that
1 or more applications have a name resolution error we get connection
refused errors from an application that is querying dnsmaq for it's
metrics (eg: dig +short chaos txt cachesize.bind). I'm thinking that
the DNS failures we are seeing is that dnsmasq is refusing the
connection. I'm hoping someone can point me in a direction to get to
the root of these issues. The only thought I have is to run dnsmasq
in debug mode in the hopes that when connections are not being
accepted something will get logged that would be a clue as to why this
is happening. I'm wondering if that's a sound approach or if anyone
has alternate ideas for me to move this situation forward.
Thanks for listening!
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Loading...