Discussion:
[Dnsmasq-discuss] Ignore TTL if "upstream" DNS server is not available
Akram Ben Aissi
2017-11-23 14:02:59 UTC
Permalink
Hi all,

I would be interrested by the following feature:

In case we have a dns forward for a given domain and upstream dns server is
not available for this domain (connection refused on UDP port 53) , I want
TTL to be ignored (or countdown restarts to old TTL value or to
*min-cache-ttl*) and still have the old record to be returned.


I am interrested in this feature to be used by our OpenShift infrastructure
in which we use dnsmasq to forward queries to our internal skydns.

In case of skydns not being available, for example, in case of a major
crash, we still want dnsmasq to return old values, until skydns is back
again.


Any thhougths ?

Akram
Simon Kelley
2017-11-24 16:56:18 UTC
Permalink
A couple of possible problems with this are as follows.

1) It may not be possible to determine that the upstream server is not
answering in a tinely manner. "Connection refused replies work, but
don't arrive in many otherwise reasonable network config, and without
those, you're relying on timeouts.

2) Once you've determined that the upstream server is not answering,
there's no guarantee that the record you need will be in the cache, even
with a stale TTL. Cache entries can easily be evicted even before the
end of the TTL by newer entries, as the system uses LRU cache
replacment. I get the feeling that you want some guarantees, and this
doesn't give that, just a lower probability of failure.

Cheers,

Simon.
Post by Akram Ben Aissi
Hi all,
In case we have a dns forward for a given domain and upstream dns server
is not available for this domain (connection refused on UDP port 53) , I
want TTL to be ignored (or countdown restarts to old TTL value or
to *min-cache-ttl*)   and still have the old record to be returned.
I am interrested in this feature to be used by our OpenShift
infrastructure in which we use dnsmasq to forward queries to our
internal skydns.
In case of skydns not being available, for example, in case of a major
crash, we still want dnsmasq to return old values, until skydns is back
again.
Any thhougths ?
Akram
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Akram B.A
2017-11-24 19:46:28 UTC
Permalink
Hi Simon
Some answers inline sorry for the netiquette

Sent from my iPhone
Post by Simon Kelley
A couple of possible problems with this are as follows.
1) It may not be possible to determine that the upstream server is not
answering in a tinely manner. "Connection refused replies work, but
don't arrive in many otherwise reasonable network config, and without
those, you're relying on timeouts.
I think it is ok to fallback on timeouts and consider connection refused or NXDOMAIN as the primary time of failure.
Post by Simon Kelley
2) Once you've determined that the upstream server is not answering,
there's no guarantee that the record you need will be in the cache, even
with a stale TTL. Cache entries can easily be evicted even before the
end of the TTL by newer entries, as the system uses LRU cache
replacment. I get the feeling that you want some guarantees, and this
doesn't give that, just a lower probability of failure.
You probably mean that cache eviction happens even if a query is not made and is probably made on a timer basis. If so, such a behavior for sure would require to handle cache eviction differently and probably at query time or at a max ttl value, 86400 for example.
Post by Simon Kelley
Cheers,
Simon.
Post by Akram Ben Aissi
Hi all,
In case we have a dns forward for a given domain and upstream dns server
is not available for this domain (connection refused on UDP port 53) , I
want TTL to be ignored (or countdown restarts to old TTL value or
to *min-cache-ttl*) and still have the old record to be returned.
I am interrested in this feature to be used by our OpenShift
infrastructure in which we use dnsmasq to forward queries to our
internal skydns.
In case of skydns not being available, for example, in case of a major
crash, we still want dnsmasq to return old values, until skydns is back
again.
Any thhougths ?
Akram
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Simon Kelley
2017-12-01 21:20:13 UTC
Permalink
Post by Akram B.A
Hi Simon
Some answers inline sorry for the netiquette
Sent from my iPhone
Post by Simon Kelley
A couple of possible problems with this are as follows.
1) It may not be possible to determine that the upstream server is not
answering in a tinely manner. "Connection refused replies work, but
don't arrive in many otherwise reasonable network config, and without
those, you're relying on timeouts.
I think it is ok to fallback on timeouts and consider connection refused or NXDOMAIN as the primary time of failure.
Certainly NOT on NXDOMAIN. That's a valid answer to the query, and
shouldn't be changed. The problem with connection refused is that it's
fragile: if the server is down, you won't see it. If the network is down
you won't see it.
Post by Akram B.A
Post by Simon Kelley
2) Once you've determined that the upstream server is not answering,
there's no guarantee that the record you need will be in the cache, even
with a stale TTL. Cache entries can easily be evicted even before the
end of the TTL by newer entries, as the system uses LRU cache
replacment. I get the feeling that you want some guarantees, and this
doesn't give that, just a lower probability of failure.
You probably mean that cache eviction happens even if a query is not made and is probably made on a timer basis. If so, such a behavior for sure would require to handle cache eviction differently and probably at query time or at a max ttl value, 86400 for example.
The cache is of fixed size. When a new name is cached, space is made by
evicting an existing cache entry. First, entries that have expired TTLs
are evicted. If there are none, then an entry is chosen to evict by
finding the entry which was used (or installed) the longest time ago.

To make something which could guarantee an answer, rather than improving
"best effort" would be a large change from the existing implementation.

Simon.
Loading...