Discussion:
[Dnsmasq-discuss] refused responses for simple hostnames, domain-needed, and no upstream servers
Legacy, Allain
2016-01-17 23:18:40 UTC
Permalink
Hi,
We have noticed an inconsistency in how dnsmasq responds to queries for simple hostnames (no dots) depending on whether there are any configured upstream servers or not. I am unsure if this is because we have misconfigured something, whether we are trying to do something that is not supported (or shouldn't be attempted), or if there is a bug in dnsmasq.

The scenario we are trying to implement is as follows.

+ We have a system with several nodes on the same private network. Most of the nodes have addresses assigned by dnsmasq via DHCP while a select few of those nodes have addresses in /etc/hosts on the node running dnsmasq.

+ The hostname of each node are simple hostnames with no domain (e.g., "server1", "server2", etc. ).

+ Some of the nodes have an IPv4 or IPv6 address while others have both IPv4 and IPv6.

+ Clients running on each node will attempt to resolve their peer node names with commands such as "curl http://server1/foobar.txt", "ping6 server10", "dig server2 any", and so on.

+ Clients have a simple /etc/resolv.conf file with only the IP address of the server running dnsmasq. The resolv.conf has no default search domain.

+ We support allowing the dnsmasq server to be configured with additional upstream servers if the situation requires accessing DNS over the system's public network interface.

+ The dnsmasq server is configured with the "domain-needed" option so that requests for nodes that have not been configured yet do not get forwarded to upstream servers (if configured).


Here is the issue.

When we test with only IPv4 address throughout the system everything works as expected and we do not see any obvious issues or errors.

When we test with a mixture of IPv4, IPv6 or both IPv4 and IPv6 addresses on the nodes we see failures to resolve our simple hostnames. The failures manifest themselves as typical "cannot resolve hostname... " errors from whatever client is being run at the time. The failures don't happen on all nodes but we have been able to correlate the failures to those nodes that have an IPv6 address but have no IPv4 address. ...and this only happens when we have no upstream servers configured; if we configure some upstream servers then there are no failures.

Running tcpdump and strace on a commands such as "curl http://server1/foobar.txt" we noticed that the client DNS resolver sends out both an A query and AAAA query. This is normal as we do not want to force a "-4" or "-6" option on any clients as we want either IPv4 or IPv6 addresses to be returned without needed to know ahead of time what to ask for. The tcpdump traces shows that a response is returned for both the A and AAAA query. The A has a status of REFUSED while the AAAA has a valid response with the expected IPv6 address. Looking at the client DNS resolver code (glibc getaddrinfo()) we have noted that if the first response returned has a "REFUSED" response then the operation is aborted without considering the AAAA response.

Running this same test while we have upstream servers configured in dnsmasq we have noted that the A query returns successfully with no data (instead of REFUSED as in the first test), and the AAAA returns successfully with an IPv6 address as it did before. Under these circumstances the client DNS resolver returns with the IPv6 address instead of an error since it didn't get a REFUSED on the first response received.

Looking through the dnsmasq code we think we have identified a bug but are looking for an opinion about whether we are doing something wrong or whether this is a legitimate issue.

What we think is a bug is that the OPT_NODOTS_LOCAL (domain-needed) is only checked where there is at least 1 upstream server (forward.c::search_servers()). When there are servers and OPT_NODOTS_LOCAL is set then an empty response is returned for an A query that does not resolve to an IPv4 address. Unfortunately, when there are no servers configured this code is not reached and instead a REFUSED is returned for an A query that has no IPv4 address. It is this REFUSED response that is causing grief at the client resolver.

It is my opinion that the check for OPT_NODOTS_LOCAL should be performed in forward.c::receive_query() when an answer is not found by forward.c::answer_query() instead of calling forward_query(). I have attached a patch file which adds an additional IF statement at the top of forward_query() to illustrate what I mean. note: as I said, i believe the proper way to fix this is in receive_query() before calling forward_query() at all, but it was easier to prototype this directly inside of forward_query() since the reply code already existed there.

Can you comment on whether this is a configuration/usecase issue or whether the behavior described requires a code a change?

Regards,
Allain


Allain Legacy, Software Developer, Wind River an Intel company
direct 613.270.2279  fax 613.492.7870 skype allain.legacy
 
Simon Kelley
2016-01-19 21:30:07 UTC
Permalink
Hi, We have noticed an inconsistency in how dnsmasq responds to
queries for simple hostnames (no dots) depending on whether there are
any configured upstream servers or not. I am unsure if this is
because we have misconfigured something, whether we are trying to do
something that is not supported (or shouldn't be attempted), or if
there is a bug in dnsmasq.
The scenario we are trying to implement is as follows.
+ We have a system with several nodes on the same private network.
Most of the nodes have addresses assigned by dnsmasq via DHCP while a
select few of those nodes have addresses in /etc/hosts on the node
running dnsmasq.
+ The hostname of each node are simple hostnames with no domain
(e.g., "server1", "server2", etc. ).
+ Some of the nodes have an IPv4 or IPv6 address while others have both IPv4 and IPv6.
+ Clients running on each node will attempt to resolve their peer
node names with commands such as "curl http://server1/foobar.txt",
"ping6 server10", "dig server2 any", and so on.
+ Clients have a simple /etc/resolv.conf file with only the IP
address of the server running dnsmasq. The resolv.conf has no
default search domain.
+ We support allowing the dnsmasq server to be configured with
additional upstream servers if the situation requires accessing DNS
over the system's public network interface.
+ The dnsmasq server is configured with the "domain-needed" option
so that requests for nodes that have not been configured yet do not
get forwarded to upstream servers (if configured).
Here is the issue.
When we test with only IPv4 address throughout the system everything
works as expected and we do not see any obvious issues or errors.
When we test with a mixture of IPv4, IPv6 or both IPv4 and IPv6
addresses on the nodes we see failures to resolve our simple
hostnames. The failures manifest themselves as typical "cannot
resolve hostname... " errors from whatever client is being run at the
time. The failures don't happen on all nodes but we have been able
to correlate the failures to those nodes that have an IPv6 address
but have no IPv4 address. ...and this only happens when we have no
upstream servers configured; if we configure some upstream servers
then there are no failures.
Running tcpdump and strace on a commands such as "curl
http://server1/foobar.txt" we noticed that the client DNS resolver
sends out both an A query and AAAA query. This is normal as we do
not want to force a "-4" or "-6" option on any clients as we want
either IPv4 or IPv6 addresses to be returned without needed to know
ahead of time what to ask for. The tcpdump traces shows that a
response is returned for both the A and AAAA query. The A has a
status of REFUSED while the AAAA has a valid response with the
expected IPv6 address. Looking at the client DNS resolver code
(glibc getaddrinfo()) we have noted that if the first response
returned has a "REFUSED" response then the operation is aborted
without considering the AAAA response.
Running this same test while we have upstream servers configured in
dnsmasq we have noted that the A query returns successfully with no
data (instead of REFUSED as in the first test), and the AAAA returns
successfully with an IPv6 address as it did before. Under these
circumstances the client DNS resolver returns with the IPv6 address
instead of an error since it didn't get a REFUSED on the first
response received.
Looking through the dnsmasq code we think we have identified a bug
but are looking for an opinion about whether we are doing something
wrong or whether this is a legitimate issue.
What we think is a bug is that the OPT_NODOTS_LOCAL (domain-needed)
is only checked where there is at least 1 upstream server
(forward.c::search_servers()). When there are servers and
OPT_NODOTS_LOCAL is set then an empty response is returned for an A
query that does not resolve to an IPv4 address. Unfortunately, when
there are no servers configured this code is not reached and instead
a REFUSED is returned for an A query that has no IPv4 address. It is
this REFUSED response that is causing grief at the client resolver.
It is my opinion that the check for OPT_NODOTS_LOCAL should be
performed in forward.c::receive_query() when an answer is not found
by forward.c::answer_query() instead of calling forward_query(). I
have attached a patch file which adds an additional IF statement at
the top of forward_query() to illustrate what I mean. note: as I
said, i believe the proper way to fix this is in receive_query()
before calling forward_query() at all, but it was easier to prototype
this directly inside of forward_query() since the reply code already
existed there.
Can you comment on whether this is a configuration/usecase issue or
whether the behavior described requires a code a change?
Regards, Allain
Well done for coming to terms with the most gnarly, old and horrible
code in dnsmasq. I just bottled-out of totally rewriting this. It needs
to be done, but just capturing all the existing behaviour is a nightmare.

I can't disagree with the bug report or diagnosis at all. My fix is a
bit simpler, it just moved the test for daemon->servers being NULL to
after the call to search_servers. Whilst looking at the code, I noticed
that the response when out of memory is wrong too, so the commit also
fixes that.

Code in the git repo now. Please could you check that it behaves as you
expect?


Cheers,

Simon.
Legacy, Allain
2016-01-19 21:51:27 UTC
Permalink
-----Original Message-----
From: Dnsmasq-discuss [mailto:dnsmasq-discuss-
Sent: Tuesday, January 19, 2016 4:30 PM
Subject: Re: [Dnsmasq-discuss] refused responses for simple hostnames,
domain-needed, and no upstream servers
Hi, We have noticed an inconsistency in how dnsmasq responds to
queries for simple hostnames (no dots) depending on whether there are
any configured upstream servers or not. I am unsure if this is
because we have misconfigured something, whether we are trying to do
something that is not supported (or shouldn't be attempted), or if
there is a bug in dnsmasq.
The scenario we are trying to implement is as follows.
+ We have a system with several nodes on the same private network.
Most of the nodes have addresses assigned by dnsmasq via DHCP while a
select few of those nodes have addresses in /etc/hosts on the node
running dnsmasq.
+ The hostname of each node are simple hostnames with no domain
(e.g., "server1", "server2", etc. ).
+ Some of the nodes have an IPv4 or IPv6 address while others have both IPv4 and IPv6.
+ Clients running on each node will attempt to resolve their peer
node names with commands such as "curl http://server1/foobar.txt",
"ping6 server10", "dig server2 any", and so on.
+ Clients have a simple /etc/resolv.conf file with only the IP
address of the server running dnsmasq. The resolv.conf has no default
search domain.
+ We support allowing the dnsmasq server to be configured with
additional upstream servers if the situation requires accessing DNS
over the system's public network interface.
+ The dnsmasq server is configured with the "domain-needed" option
so that requests for nodes that have not been configured yet do not
get forwarded to upstream servers (if configured).
Here is the issue.
When we test with only IPv4 address throughout the system everything
works as expected and we do not see any obvious issues or errors.
When we test with a mixture of IPv4, IPv6 or both IPv4 and IPv6
addresses on the nodes we see failures to resolve our simple
hostnames. The failures manifest themselves as typical "cannot
resolve hostname... " errors from whatever client is being run at the
time. The failures don't happen on all nodes but we have been able
to correlate the failures to those nodes that have an IPv6 address
but have no IPv4 address. ...and this only happens when we have no
upstream servers configured; if we configure some upstream servers
then there are no failures.
Running tcpdump and strace on a commands such as "curl
http://server1/foobar.txt" we noticed that the client DNS resolver
sends out both an A query and AAAA query. This is normal as we do not
want to force a "-4" or "-6" option on any clients as we want either
IPv4 or IPv6 addresses to be returned without needed to know
ahead of time what to ask for. The tcpdump traces shows that a
response is returned for both the A and AAAA query. The A has a
status of REFUSED while the AAAA has a valid response with the
expected IPv6 address. Looking at the client DNS resolver code
(glibc getaddrinfo()) we have noted that if the first response
returned has a "REFUSED" response then the operation is aborted
without considering the AAAA response.
Running this same test while we have upstream servers configured in
dnsmasq we have noted that the A query returns successfully with no
data (instead of REFUSED as in the first test), and the AAAA returns
successfully with an IPv6 address as it did before. Under these
circumstances the client DNS resolver returns with the IPv6 address
instead of an error since it didn't get a REFUSED on the first
response received.
Looking through the dnsmasq code we think we have identified a bug but
are looking for an opinion about whether we are doing something wrong
or whether this is a legitimate issue.
What we think is a bug is that the OPT_NODOTS_LOCAL (domain-needed)
is
only checked where there is at least 1 upstream server
(forward.c::search_servers()). When there are servers and
OPT_NODOTS_LOCAL is set then an empty response is returned for an A
query that does not resolve to an IPv4 address. Unfortunately, when
there are no servers configured this code is not reached and instead a
REFUSED is returned for an A query that has no IPv4 address. It is
this REFUSED response that is causing grief at the client resolver.
It is my opinion that the check for OPT_NODOTS_LOCAL should be
performed in forward.c::receive_query() when an answer is not found by
forward.c::answer_query() instead of calling forward_query(). I have
attached a patch file which adds an additional IF statement at
the top of forward_query() to illustrate what I mean. note: as I
said, i believe the proper way to fix this is in receive_query()
before calling forward_query() at all, but it was easier to prototype
this directly inside of forward_query() since the reply code already
existed there.
Can you comment on whether this is a configuration/usecase issue or
whether the behavior described requires a code a change?
Regards, Allain
Well done for coming to terms with the most gnarly, old and horrible code in
dnsmasq. I just bottled-out of totally rewriting this. It needs to be done, but
just capturing all the existing behaviour is a nightmare.
I can't disagree with the bug report or diagnosis at all. My fix is a bit simpler, it
just moved the test for daemon->servers being NULL to after the call to
search_servers. Whilst looking at the code, I noticed that the response when
out of memory is wrong too, so the commit also fixes that.
Code in the git repo now. Please could you check that it behaves as you
expect?
[AL] Thanks. I'll take a look in the next couple of days and get back to you.

FWIW, I found that using "local=//" in my /etc/dnsmasq.conf file also improved the behavior without including my code change at all.

Regards,
Allain

Loading...