Discussion:
[Dnsmasq-discuss] segv using 2.75 with dnssec
Andre Heider
2015-12-07 18:31:47 UTC
Permalink
Hi,

I just noticed a crash on my mips based openwrt router.
It looks like it's because of the initial time change on boot, this
device doesn't have a RTC and the time is set using ntp.
I dunno how reproducible this is, maybe it happend before and I just
didn't notice.

Mon Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: started, version
2.75 cachesize 150
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: compile time
options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua
TFTP no-conntrack ipset auth DNSSEC loop-detect inotify
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: DNS service
limited to local subnets
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: DNSSEC validation enabled
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq-dhcp[1981]: DHCP, IP
range 192.168.0.200 -- 192.168.0.254, lease time 1h
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: using nameserver xxx#53
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: using nameserver yyy#53
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: using local
addresses only for domain internal
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: read
/tmp/hosts/dhcp - 1 addresses
Mon Dec 7 17:59:25 2015 daemon.info pppd[1833]: System time change detected.
Mon Dec 7 18:00:10 2015 kern.info kernel: [ 122.524730]
Mon Dec 7 18:00:10 2015 kern.info kernel: [ 122.524730]
do_page_fault(): sending SIGSEGV to dnsmasq for invalid write access
to 7f282ffc
Mon Dec 7 18:00:10 2015 kern.info kernel: [ 122.531825] epc =
00410911 in dnsmasq[400000+2c000]
Mon Dec 7 18:00:10 2015 kern.info kernel: [ 122.536700] ra =
00410959 in dnsmasq[400000+2c000]
Mon Dec 7 18:00:10 2015 kern.info kernel: [ 122.541460]

./staging_dir/toolchain-mips_34kc+dsp_gcc-5.2.0_musl-1.1.11/bin/mips-openwrt-linux-gdb
-n ./staging_dir/target-mips_34kc+dsp_musl-1.1.11/root-lantiq/usr/sbin/dnsmasq
(gdb) info symbol 0x00410911
free_frec in section .text
(gdb) info symbol 0x00410959
free_frec + 72 in section .text

./staging_dir/toolchain-mips_34kc+dsp_gcc-5.2.0_musl-1.1.11/bin/mips-openwrt-linux-objdump
-d ./staging_dir/target-mips_34kc+dsp_musl-1.1.11/root-lantiq/usr/sbin/dnsmasq
00410910 <free_frec>:
410910: 64f5 save 40,ra,s0-s1
410912: 6704 move s0,a0
410914: 9c8c lw a0,48(a0)
410916: 1a00 423d jal 4108f4 <free_rfd>
41091a: 6900 li s1,0
41091c: 988d lw a0,52(s0)
41091e: d82c sw s1,48(s0)
410920: d82b sw s1,44(s0)
410922: 1a00 423d jal 4108f4 <free_rfd>
410926: d833 sw s1,76(s0)
410928: f0a0 988c lw a0,172(s0)
41092c: d82d sw s1,52(s0)
41092e: 2405 beqz a0,41093a <free_frec+0x2a>
410930: 1a00 8f6f jal 423dbc <blockdata_free>
410934: 6500 nop
410936: f0a0 d82c sw s1,172(s0)
41093a: f0a0 9890 lw a0,176(s0)
41093e: 2406 beqz a0,41094c <free_frec+0x3c>
410940: 1a00 8f6f jal 423dbc <blockdata_free>
410944: 6500 nop
410946: 6a00 li v0,0
410948: f0a0 d850 sw v0,176(s0)
41094c: f0c0 9884 lw a0,196(s0)
410950: 2403 beqz a0,410958 <free_frec+0x48>
410952: 1a00 4244 jal 410910 <free_frec>
410956: 6500 nop
410958: 6a00 li v0,0
41095a: f0c0 d844 sw v0,196(s0)
41095e: f0c0 d840 sw v0,192(s0)
410962: 6475 restore 40,ra,s0-s1
410964: e8a0 jrc ra
410966: 6500 nop

I don't do asm, but that looks like it might be due to the recursive
'free_frec(f->blocking_query);'?

The config:
dhcp-authoritative
domain-needed
no-hosts
no-resolv
localise-queries
bogus-priv
expand-hosts
local-service
quiet-dhcp
domain=internal
server=/internal/
server=xxx
server=yyy
dhcp-leasefile=/tmp/dhcp.leases
stop-dns-rebind
rebind-localhost-ok
conf-file=/usr/share/dnsmasq/trust-anchors.conf
dnssec
dnssec-timestamp=/etc/dnsmasq.time
dnssec-check-unsigned
dhcp-broadcast=tag:needs-broadcast
addn-hosts=/tmp/hosts
conf-dir=/tmp/dnsmasq.d
dhcp-range=lan,192.168.0.200,192.168.0.254,255.255.255.0,1h
dhcp-option=lan,option:ntp-server,192.168.0.2
dhcp-option=lan,vendor:MSFT,2,1i
no-dhcp-interface=pppoe-wan

Regards,
Andre
Simon Kelley
2015-12-07 23:08:01 UTC
Permalink
I think the problem may be the code-path in get_new_frec() which frees
old frecs. Warping the time will make all the records suddenly old,
and they'll get freed by the loop. The problem is that some will be
freed by the recursive calls to free_frec within the loop and then
free'd again by the loop.

I think the solution is to add

if (f->sentto == NULL)
return;

at the very start of free_frec()

Are you in a position to try that?

Cheers,

Simon.
Hi,
I just noticed a crash on my mips based openwrt router. It looks
like it's because of the initial time change on boot, this device
doesn't have a RTC and the time is set using ntp. I dunno how
reproducible this is, maybe it happend before and I just didn't
notice.
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: started,
version 2.75 cachesize 150 Mon Dec 7 17:05:27 2015 daemon.info
dnsmasq[1981]: compile time options: IPv6 GNU-getopt no-DBus
no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth
DNSSEC loop-detect inotify Mon Dec 7 17:05:27 2015 daemon.info
dnsmasq[1981]: DNS service limited to local subnets Mon Dec 7
17:05:27 2015 daemon.info dnsmasq[1981]: DNSSEC validation enabled
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq-dhcp[1981]: DHCP, IP
range 192.168.0.200 -- 192.168.0.254, lease time 1h Mon Dec 7
17:05:27 2015 daemon.info dnsmasq[1981]: using nameserver xxx#53
Mon Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: using
nameserver yyy#53 Mon Dec 7 17:05:27 2015 daemon.info
dnsmasq[1981]: using local addresses only for domain internal Mon
Dec 7 17:05:27 2015 daemon.info dnsmasq[1981]: read
/tmp/hosts/dhcp - 1 addresses Mon Dec 7 17:59:25 2015 daemon.info
pppd[1833]: System time change detected. Mon Dec 7 18:00:10 2015
kern.info kernel: [ 122.524730] Mon Dec 7 18:00:10 2015 kern.info
kernel: [ 122.524730] do_page_fault(): sending SIGSEGV to dnsmasq
for invalid write access to 7f282ffc Mon Dec 7 18:00:10 2015
kern.info kernel: [ 122.531825] epc = 00410911 in
dnsmasq[400000+2c000] Mon Dec 7 18:00:10 2015 kern.info kernel: [
122.536700] ra = 00410959 in dnsmasq[400000+2c000] Mon Dec 7
18:00:10 2015 kern.info kernel: [ 122.541460]
./staging_dir/toolchain-mips_34kc+dsp_gcc-5.2.0_musl-1.1.11/bin/mips-o
penwrt-linux-gdb
- -n
./staging_dir/target-mips_34kc+dsp_musl-1.1.11/root-lantiq/usr/sbin/dnsm
asq
(gdb) info symbol 0x00410911 free_frec in section .text (gdb) info
symbol 0x00410959 free_frec + 72 in section .text
./staging_dir/toolchain-mips_34kc+dsp_gcc-5.2.0_musl-1.1.11/bin/mips-o
penwrt-linux-objdump
- -d
./staging_dir/target-mips_34kc+dsp_musl-1.1.11/root-lantiq/usr/sbin/dnsm
asq
00410910 <free_frec>: 410910: 64f5 save 40,ra,s0-s1
410912: 6704 move s0,a0 410914: 9c8c lw
d82c sw s1,48(s0) 410920: d82b sw s1,44(s0)
410922: 1a00 423d jal 4108f4 <free_rfd> 410926: d833
sw s1,76(s0) 410928: f0a0 988c lw a0,172(s0) 41092c: d82d
sw s1,52(s0) 41092e: 2405 beqz a0,41093a
<free_frec+0x2a> 410930: 1a00 8f6f jal 423dbc <blockdata_free>
410934: 6500 nop 410936: f0a0 d82c sw s1,172(s0)
41093a: f0a0 9890 lw a0,176(s0) 41093e: 2406 beqz
a0,41094c <free_frec+0x3c> 410940: 1a00 8f6f jal 423dbc
<blockdata_free> 410944: 6500 nop 410946: 6a00 li
v0,0 410948: f0a0 d850 sw v0,176(s0) 41094c: f0c0 9884 lw
a0,196(s0) 410950: 2403 beqz a0,410958
<free_frec+0x48> 410952: 1a00 4244 jal 410910 <free_frec>
f0c0 d844 sw v0,196(s0) 41095e: f0c0 d840 sw v0,192(s0)
410962: 6475 restore 40,ra,s0-s1 410964: e8a0 jrc
ra 410966: 6500 nop
I don't do asm, but that looks like it might be due to the
recursive 'free_frec(f->blocking_query);'?
The config: dhcp-authoritative domain-needed no-hosts no-resolv
localise-queries bogus-priv expand-hosts local-service quiet-dhcp
domain=internal server=/internal/ server=xxx server=yyy
dhcp-leasefile=/tmp/dhcp.leases stop-dns-rebind
rebind-localhost-ok
conf-file=/usr/share/dnsmasq/trust-anchors.conf dnssec
dnssec-timestamp=/etc/dnsmasq.time dnssec-check-unsigned
dhcp-broadcast=tag:needs-broadcast addn-hosts=/tmp/hosts
conf-dir=/tmp/dnsmasq.d
dhcp-range=lan,192.168.0.200,192.168.0.254,255.255.255.0,1h
dhcp-option=lan,option:ntp-server,192.168.0.2
dhcp-option=lan,vendor:MSFT,2,1i no-dhcp-interface=pppoe-wan
Regards, Andre
_______________________________________________ Dnsmasq-discuss
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Andre Heider
2015-12-08 07:48:10 UTC
Permalink
Hi Simon,
Post by Simon Kelley
I think the problem may be the code-path in get_new_frec() which frees
old frecs. Warping the time will make all the records suddenly old,
and they'll get freed by the loop. The problem is that some will be
freed by the recursive calls to free_frec within the loop and then
free'd again by the loop.
I think the solution is to add
if (f->sentto == NULL)
return;
at the very start of free_frec()
Are you in a position to try that?
I am, and I'll try just that, thanks.

Regards,
Andre
Simon Kelley
2015-12-08 16:33:36 UTC
Permalink
Staring at the code some more, I'm less sure I've found the problem.

If it breaks again, running a backtrace (bt command) would be useful.


Cheers,

Simon.
Post by Andre Heider
Hi Simon,
On Tue, Dec 8, 2015 at 12:08 AM, Simon Kelley
Post by Simon Kelley
I think the problem may be the code-path in get_new_frec() which
frees old frecs. Warping the time will make all the records
suddenly old, and they'll get freed by the loop. The problem is
that some will be freed by the recursive calls to free_frec
within the loop and then free'd again by the loop.
I think the solution is to add
if (f->sentto == NULL) return;
at the very start of free_frec()
Are you in a position to try that?
I am, and I'll try just that, thanks.
Regards, Andre
Andre Heider
2015-12-08 18:43:39 UTC
Permalink
Hi,
Post by Simon Kelley
Staring at the code some more, I'm less sure I've found the problem.
free_frec looks like this now:
static void free_frec(struct frec *f)
{
if (f->sentto == NULL) {
my_syslog(LOG_WARNING, "free_frec break");
return;
}
...

In my humble testing I haven't encountered the message so far, but I
suspect the issue is timing dependent.
Post by Simon Kelley
If it breaks again, running a backtrace (bt command) would be useful.
There's no gdb nor space for it on this device itself. But I'll check
if openwrt exposes some option to save coredumps.

Regards,
Andre

Loading...