Discussion:
[Dnsmasq-discuss] Move 'dnssec time check enable' from SIGHUP to SIGUSR2
Kevin Darbyshire-Bryant
2018-01-03 11:07:22 UTC
Permalink
Hi Simon,

Happy New Year!

I suspect this patch is going to get quite a push back in the name of backwards compatibility, however the problem is real and getting worse on some platforms - from the patch submitted to the LEDE/Openwrt platform:

"Move 'check dnssec timestamp enable' from SIGHUP handler to SIGUSR2.

Dnsmasq uses SIGHUP to do too many things: 1) set dnssec time validation
enabled, 2) bump SOA zone serial, 3) clear dns cache, 4) reload hosts
files, 5) reload resolvers/servers files. SIGUSR2 is used to
re-open/re-start the logfile. Default LEDE does not use logfile
functionality.

Many subsystems within LEDE can send SIGHUP to dnsmasq: 1) ntpd hotplug
(to indicate time is valid for dnssec) 2) odhcpd (to indicate a
new/removed host - typically DHCPv6 leases) 3) procd on interface state
changes 4) procd on system config state changes, 5) service reload.

If dnssec time validation is enabled before the system clock has been
set to a sensible time, name resolution will fail. Because name
resolution fails, ntpd is unable to resolve time server names to
addresses, so is unable to set time. Classic chicken/egg.

Since commits 23bba9cb330cd298739a16e350b0029ed9429eef (service reload) &
4f02285d8b4a66359a8fa46f22a3efde391b5419 (system config) make it more
likely a SIGHUP will be sent for events other than 'ntpd has set time'
it is more likely that an errant 'name resolution is failing for
everything' situation will be encountered.

Ideally dnsmasq would have some other IPC mechanism for indicating 'time
is valid, go check dnssec timestamps', but until that time
(implementation is left as an exercise for the interested/competent
reader/bikeshedder) the next best thing is to move functionality from
the overloaded SIGHUP signal to the under-utilised SIGUSR2.”

I do think that SIGHUP is overloaded, doing something sensible about it is challenging. Thoughts?
Simon Kelley
2018-01-03 12:34:42 UTC
Permalink
Happy new year all.


"Ideally dnsmasq would have some other IPC mechanism for indicating
'time is valid, go check dnssec timestamps'"


I suspect I know that answer to this, but dnsmasq _does_ have another
IPC mechanism, DBus. Could this be solved by providing a DBus method?


Failing that, what's the problem with using the timestamp file
mechanism? I would have thought that was ideal for LEDE, which has a
writable persistent filesystem available.

If we move to SIGUSR2, the backwards compatibility objection could
addressed by making the signal to be used an argument to
--dnssec-no-timecheck

--dnssec-no-timecheck=sigusr2

Cheers,

Simon.
Post by Kevin Darbyshire-Bryant
Hi Simon,
Happy New Year!
"Move 'check dnssec timestamp enable' from SIGHUP handler to SIGUSR2.
Dnsmasq uses SIGHUP to do too many things: 1) set dnssec time validation
enabled, 2) bump SOA zone serial, 3) clear dns cache, 4) reload hosts
files, 5) reload resolvers/servers files. SIGUSR2 is used to
re-open/re-start the logfile. Default LEDE does not use logfile
functionality.
Many subsystems within LEDE can send SIGHUP to dnsmasq: 1) ntpd hotplug
(to indicate time is valid for dnssec) 2) odhcpd (to indicate a
new/removed host - typically DHCPv6 leases) 3) procd on interface state
changes 4) procd on system config state changes, 5) service reload.
If dnssec time validation is enabled before the system clock has been
set to a sensible time, name resolution will fail. Because name
resolution fails, ntpd is unable to resolve time server names to
addresses, so is unable to set time. Classic chicken/egg.
Since commits 23bba9cb330cd298739a16e350b0029ed9429eef (service reload) &
4f02285d8b4a66359a8fa46f22a3efde391b5419 (system config) make it more
likely a SIGHUP will be sent for events other than 'ntpd has set time'
it is more likely that an errant 'name resolution is failing for
everything' situation will be encountered.
Ideally dnsmasq would have some other IPC mechanism for indicating 'time
is valid, go check dnssec timestamps', but until that time
(implementation is left as an exercise for the interested/competent
reader/bikeshedder) the next best thing is to move functionality from
the overloaded SIGHUP signal to the under-utilised SIGUSR2.”
I do think that SIGHUP is overloaded, doing something sensible about it is challenging. Thoughts?
Cheers,
Kevin D-B
012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Kevin Darbyshire-Bryant
2018-01-03 14:34:07 UTC
Permalink
Post by Simon Kelley
Happy new year all.
"Ideally dnsmasq would have some other IPC mechanism for indicating
'time is valid, go check dnssec timestamps'"
I suspect I know that answer to this, but dnsmasq _does_ have another
IPC mechanism, DBus. Could this be solved by providing a DBus method?
I don’t know the implications of dbus on lede - a dbus method sounds like a useful idea though if nothing else to avoid the overloading of SIGHUP
 but not a priority for lede.
Post by Simon Kelley
Failing that, what's the problem with using the timestamp file
mechanism? I would have thought that was ideal for LEDE, which has a
writable persistent filesystem available.
Ahh, oh boy, long story. Openwrt/LEDE did use that mechanism a while back but there were several niggles: writing to flash, handling conditional copying of the timestamo file across system updates, lede being too clever and updating clock to ‘latest timestamp in /etc’ temporarily before using ntp to set to real time. In the end a mechanism whereby ‘ntpd’ pokes ‘dnsmasq’ when it has set time was easier, simpler, more reliable
.in most circumstances, but openwrt/lede it appears is getting more persistent in using SIGHUP for other things and conflicting with dnssec timestamps.
Post by Simon Kelley
If we move to SIGUSR2, the backwards compatibility objection could
addressed by making the signal to be used an argument to
--dnssec-no-timecheck
--dnssec-no-timecheck=sigusr2
Now that I like :-)

Cheers,

Kevin
Kevin Darbyshire-Bryant
2018-01-05 09:08:58 UTC
Permalink
Post by Kevin Darbyshire-Bryant
Post by Simon Kelley
If we move to SIGUSR2, the backwards compatibility objection could
addressed by making the signal to be used an argument to
--dnssec-no-timecheck
--dnssec-no-timecheck=sigusr2
Now that I like :-)
Am I waiting on you or are you waiting on me (to produce some laughably awful code that you’ll fix up anyway) :-)

Kevin
Kevin Darbyshire-Bryant
2018-01-08 22:15:18 UTC
Permalink
Post by Kevin Darbyshire-Bryant
Am I waiting on you or are you waiting on me (to produce some laughably awful code that you’ll fix up anyway) :-)
And for the purposes of a jolly good laugh
.. :-)
Simon Kelley
2018-01-14 22:12:53 UTC
Permalink
Right, I thought about this again, and concluded that whilst sharing the
"now use the time" function with something other than "reload loads of
stuff" is an improvement, it doesn't really get us that much farther to
share with something else, since conflicts could still arise.

For instance sharing with USR2, which restarts the log file, could bite
if the logs get rotated at startup.

The obvious thing to do is to use a different signal, and poking around
the documentation, it's clear that there is a candidate, SIGINT. This is
not going to get sent for spurious reasons - the only time it gets send
is from the shell when you hit ctrl-c - and it can be reassigned as
required.

The only odd thing I've done is to preserve the default action of SIGINT
when in debug mode - that makes life easier when keeping dnsmasq in the
foreground to run it in gdb or to test stuff. This is not a big thing:
quite a lot of stuff runs subtly differently in debug mode to make it
easier to debug whilst not being a full-fledged daemon. There's a
seperate "run in foreground" flag for use with daemon-tools etc that
doesn't make these adjustments.

Given that this is a meant to be a definitive solution, I judge it's
worth taking the one-time backward compatibility hit, and have left out
the ability to select which signal to use.

http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=3c973ad92d317df736d5a8fde67baba6b102d91e


Comments?


Cheers,

Simon.
Kevin Darbyshire-Bryant
2018-01-15 13:00:25 UTC
Permalink
Post by Simon Kelley
Right, I thought about this again, and concluded that whilst sharing the
"now use the time" function with something other than "reload loads of
stuff" is an improvement, it doesn't really get us that much farther to
share with something else, since conflicts could still arise.
<snip>
Post by Simon Kelley
Given that this is a meant to be a definitive solution, I judge it's
worth taking the one-time backward compatibility hit, and have left out
the ability to select which signal to use.
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=3c973ad92d317df736d5a8fde67baba6b102d91e
Comments?
Beyond “gaaahhhhhh why didn’t I think of SIGINT”
.. excellent. Understand the reasoning, agree, running chez Kevin and backport for LEDE master submitted.

Thank you muchly.

Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A
Simon Kelley
2018-01-15 23:27:24 UTC
Permalink
Post by Kevin Darbyshire-Bryant
Beyond “gaaahhhhhh why didn’t I think of SIGINT”
.. excellent. Understand the reasoning, agree, running chez Kevin and backport for LEDE master submitted.
and there's still SIGQUIT available!

Out of interest, how does the LEDE plumbing deal with a restart of
dnsmasq _after_ ntp has established lock?

I've dogfooded the latest code into the boat's router which is running
Chaos Calmer, and added a hotplug script which works great at startup
when ntp comes alive after dnsmasq, but if I restart dnsmasq, there's
nothing to tell dnsmasq that the time is already valid.

Cheers,

Simon.
Kevin Darbyshire-Bryant
2018-01-15 23:42:12 UTC
Permalink
Post by Simon Kelley
Post by Kevin Darbyshire-Bryant
Beyond “gaaahhhhhh why didn’t I think of SIGINT”
.. excellent. Understand the reasoning, agree, running chez Kevin and backport for LEDE master submitted.
and there's still SIGQUIT available!
Out of interest, how does the LEDE plumbing deal with a restart of
dnsmasq _after_ ntp has established lock?
There’s an ntp hotplug script that creates a ‘time is valid’ flag file (/var/state/dnsmasqsec - being in /var means actually in tmp hence ram). If the file doesn’t exist already on stratum change then a) it gets created and b) SIGINTs dnsmasq. dnsmasq startup changes too
 if the file doesn’t exist then it gets started with ‘—no-dnssec-timestamp’ expecting to be SIGINT’d by the hotplug script. If dnsmasq gets (re)-started and ntpd hotplug has created our ‘time valid’ file, then dnsmasq is started *without* —no-dnssec-timestamp’. There’s a whole raft of logic related to whether or not we’re using dnssec and quite what ntp client is being used.

The SIGINT support was committed to LEDE/openwrt master (rather than CC, 1701 or whatever) around 60 minutes ago.


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A

Loading...