Discussion:
[Dnsmasq-discuss] Intermittent SIGSEGV crash of dnsmasq-full
Kevin Darbyshire-Bryant
2017-05-08 12:30:53 UTC
Permalink
Hi Simon,

Got a report in LEDE land about a SIGSEGV issue, I'm able to replicate
easily as described.

Thoughts?

Cheers,

Kevin


-------- Forwarded Message --------
Subject: [FS#766] Intermittent SIGSEGV crash of dnsmasq-full
Date: Mon, 08 May 2017 05:57:18 +0000
From: LEDE Bugs <lede-***@lists.infradead.org>
Reply-To: lede-***@lists.infradead.org
To: lede-***@lists.infradead.org

The following task has a new comment added:

FS#766 - Intermittent SIGSEGV crash of dnsmasq-full User who did this -
guidosarducci (guidosarducci)

----------
After a little more investigation, this is definitely a bug that also
exists in the latest lede/master which uses dnsmasq-2.77test5. It is
easily triggered via a common mozilla DNS query, and appears related to
using split DNS and DNSSEC.

A minimal, standalone dnsmasq.conf that is vulnerable:
listen-address=192.168.1.1
port=55553
bind-interfaces
no-daemon
no-hosts
no-resolv
log-queries=extra
server=8.8.8.8
server=/cloudfront.net/50.22.147.234
dnssec
dnssec-check-unsigned
trust-anchor=.,19036,8,2,49AAC11D7B6F6446702E54A1607371607A1A41855200FD2CE1CDDE32F24E8FB5
trust-anchor=.,20326,8,2,E06D44B80B8F1D39A95C0B0D7C65D08458E880409BBC683457104237C7F8EC8D


Removing either of these config lines results in no SIGSEGV:
server=/cloudfront.net/50.22.147.234
dnssec-check-unsigned

The bug can be triggered from a DNS client simply (e.g.a blank Firefox
page!):
ubuntu$ nslookup -port=55553 tiles-cloudfront.cdn.mozilla.net 192.168.1.1
;; Question section mismatch: got cloudfront.net/DS/IN
;; connection timed out; no servers could be reached


I also captured a dnsmasq core file from my router and ran it through gdb:
ubuntu$
./staging_dir/toolchain-mips_24kc_gcc-5.4.0_musl-1.1.16/bin/mips-openwrt-linux-gdb
-d
./build_dir/target-mips_24kc_musl-1.1.16/dnsmasq-full/dnsmasq-2.77test5/src/
-n
./staging_dir/target-mips_24kc_musl-1.1.16/root-ar71xx/usr/sbin/dnsmasq
dnsmasq.757.11.1494218146.core
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later ...
Reading symbols from
./staging_dir/target-mips_24kc_musl-1.1.16/root-ar71xx/usr/sbin/dnsmasq...done.
[New LWP 757]
...
Core was generated by `dnsmasq -C crash-dnsmasq.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 forward_query (udpfd=, udpaddr=***@entry=0x7fc1d930,
dst_addr=, dst_iface=***@entry=0,
header=***@entry=0x7c8010, plen=43, ***@entry=50,
now=***@entry=1494218146, forward=0x77cabd90, ad_reqd=***@entry=0,
do_bit=***@entry=0) at forward.c:281
281 if (forward->sentto->addr.sa.sa_family == AF_INET)
(gdb) bt
#0 forward_query (udpfd=, udpaddr=***@entry=0x7fc1d930,
dst_addr=, dst_iface=***@entry=0,
header=***@entry=0x7c8010, plen=43, ***@entry=50,
now=***@entry=1494218146, forward=0x77cabd90, ad_reqd=***@entry=0,
do_bit=***@entry=0) at forward.c:281
#1 0x00410275 in receive_query (listen=***@entry=0x77cbffe0,
now=***@entry=1494218146) at forward.c:1443
#2 0x00412825 in check_dns_listeners (now=***@entry=1494218146)
at dnsmasq.c:1565
#3 0x004047db in main (argc=, argv=)
at dnsmasq.c:1044
(gdb)


The dnsmasq config file, log file, and client log are attached. I'm not
sure I can go any further, so would appreciate the dnsmasq package
maintainer taking a look and advising.

Thanks!
----------
Simon Kelley
2017-05-09 00:39:46 UTC
Permalink
That was a horrible one.

Fix committed, and an optimistic 2.77rc1 tag added.

I really hope to get out a 2.77 release soon.


Cheers,

Simon.
Post by Kevin Darbyshire-Bryant
Hi Simon,
Got a report in LEDE land about a SIGSEGV issue, I'm able to replicate
easily as described.
Thoughts?
Cheers,
Kevin
-------- Forwarded Message --------
Subject: [FS#766] Intermittent SIGSEGV crash of dnsmasq-full
Date: Mon, 08 May 2017 05:57:18 +0000
FS#766 - Intermittent SIGSEGV crash of dnsmasq-full User who did this -
guidosarducci (guidosarducci)
----------
After a little more investigation, this is definitely a bug that also
exists in the latest lede/master which uses dnsmasq-2.77test5. It is
easily triggered via a common mozilla DNS query, and appears related to
using split DNS and DNSSEC.
listen-address=192.168.1.1
port=55553
bind-interfaces
no-daemon
no-hosts
no-resolv
log-queries=extra
server=8.8.8.8
server=/cloudfront.net/50.22.147.234
dnssec
dnssec-check-unsigned
trust-anchor=.,19036,8,2,49AAC11D7B6F6446702E54A1607371607A1A41855200FD2CE1CDDE32F24E8FB5
trust-anchor=.,20326,8,2,E06D44B80B8F1D39A95C0B0D7C65D08458E880409BBC683457104237C7F8EC8D
server=/cloudfront.net/50.22.147.234
dnssec-check-unsigned
The bug can be triggered from a DNS client simply (e.g.a blank Firefox
ubuntu$ nslookup -port=55553 tiles-cloudfront.cdn.mozilla.net 192.168.1.1
;; Question section mismatch: got cloudfront.net/DS/IN
;; connection timed out; no servers could be reached
ubuntu$
./staging_dir/toolchain-mips_24kc_gcc-5.4.0_musl-1.1.16/bin/mips-openwrt-linux-gdb
-d
./build_dir/target-mips_24kc_musl-1.1.16/dnsmasq-full/dnsmasq-2.77test5/src/
-n
./staging_dir/target-mips_24kc_musl-1.1.16/root-ar71xx/usr/sbin/dnsmasq
dnsmasq.757.11.1494218146.core
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later ...
Reading symbols from
./staging_dir/target-mips_24kc_musl-1.1.16/root-ar71xx/usr/sbin/dnsmasq...done.
[New LWP 757]
...
Core was generated by `dnsmasq -C crash-dnsmasq.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
281 if (forward->sentto->addr.sa.sa_family == AF_INET)
(gdb) bt
at dnsmasq.c:1565
#3 0x004047db in main (argc=, argv=)
at dnsmasq.c:1044
(gdb)
The dnsmasq config file, log file, and client log are attached. I'm not
sure I can go any further, so would appreciate the dnsmasq package
maintainer taking a look and advising.
Thanks!
----------
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Kevin Darbyshire-Bryant
2017-05-09 08:33:19 UTC
Permalink
Post by Simon Kelley
That was a horrible one.
Fix committed, and an optimistic 2.77rc1 tag added.
Sadly a tad optimistic. From the original reporter, and I can confirm
'domain-needed' is the crash enabling option:

Sorry!

Looking forward to the final release following the rc2 :-)

Cheers,

Kevin
Post by Simon Kelley
I saw the update from Simon Kelley (thank you!) on the Dnsmasq-discuss mailing list and built an updated LEDE dnsmasq-2.77rc1 package to test. (see required patch attached)
The prior minimal test-case passed, but the original production config
file now creates a horrible SIGSEGV crash-loop (log attached):
Mon May 8 22:59:46 2017 kern.info kernel: [1738736.539480]
do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from
00000000
Mon May 8 22:59:46 2017 kern.info kernel: [1738736.548375] epc =
0040e79b in dnsmasq[400000+2d000]
Mon May 8 22:59:46 2017 kern.info kernel: [1738736.553564] ra =
0040e773 in dnsmasq[400000+2d000]


Stack trace indicates something to do with logging:
(gdb) core-file dnsmasq.18906.11.1494309586.core
[New LWP 18906]
...
Core was generated by `dnsmasq -C /var/etc/dnsmasq.conf.cfg02411c
--no-daemon'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0040e79b in search_servers (now=***@entry=1494309586,
addrpp=***@entry=0x0, qtype=***@entry=32768, qdomain=,
type=***@entry=0x7fd02c74, domain=***@entry=0x7fd02c78,
norebind=***@entry=0x0) at forward.c:222
222 log_query(logflags | flags | F_CONFIG | F_FORWARD,
qdomain, *addrpp, NULL);
(gdb) bt
#0 0x0040e79b in search_servers (now=***@entry=1494309586,
addrpp=***@entry=0x0, qtype=***@entry=32768, qdomain=,
type=***@entry=0x7fd02c74, domain=***@entry=0x7fd02c78,
norebind=***@entry=0x0) at forward.c:222
#1 0x00410759 in reply_query (fd=, family=,
now=***@entry=1494309586) at forward.c:938
#2 0x004127dd in check_dns_listeners (now=***@entry=1494309586)
at dnsmasq.c:1560
#3 0x004047db in main (argc=, argv=)
at dnsmasq.c:1044
(gdb) print logflags
$1 = 32800
(gdb) print flags
$2 =
(gdb) print *qdomain
value has been optimized out
(gdb) print addrpp
$3 = (struct all_addr **) 0x0
(gdb)

This turns out to be easy to reproduce. Simply add domain-needed to the
prior standalone config file.
Then trigger the crash from a client with:
$ nslookup -port=55553 google.com 192.168.1.1
;; connection timed out; no servers could be reached

I attached all the relevant logs, configs and patches.


----------

One or more files have been attached.

More information can be found at the following URL:
https://bugs.lede-project.org/index.php?do=details&task_id=766#comment2589
Post by Simon Kelley
I really hope to get out a 2.77 release soon.
Cheers,
Simon.
Post by Kevin Darbyshire-Bryant
Hi Simon,
Got a report in LEDE land about a SIGSEGV issue, I'm able to replicate
easily as described.
Thoughts?
Cheers,
Kevin
-------- Forwarded Message --------
Subject: [FS#766] Intermittent SIGSEGV crash of dnsmasq-full
Date: Mon, 08 May 2017 05:57:18 +0000
FS#766 - Intermittent SIGSEGV crash of dnsmasq-full User who did this -
guidosarducci (guidosarducci)
----------
After a little more investigation, this is definitely a bug that also
exists in the latest lede/master which uses dnsmasq-2.77test5. It is
easily triggered via a common mozilla DNS query, and appears related to
using split DNS and DNSSEC.
listen-address=192.168.1.1
port=55553
bind-interfaces
no-daemon
no-hosts
no-resolv
log-queries=extra
server=8.8.8.8
server=/cloudfront.net/50.22.147.234
dnssec
dnssec-check-unsigned
trust-anchor=.,19036,8,2,49AAC11D7B6F6446702E54A1607371607A1A41855200FD2CE1CDDE32F24E8FB5
trust-anchor=.,20326,8,2,E06D44B80B8F1D39A95C0B0D7C65D08458E880409BBC683457104237C7F8EC8D
server=/cloudfront.net/50.22.147.234
dnssec-check-unsigned
The bug can be triggered from a DNS client simply (e.g.a blank Firefox
ubuntu$ nslookup -port=55553 tiles-cloudfront.cdn.mozilla.net 192.168.1.1
;; Question section mismatch: got cloudfront.net/DS/IN
;; connection timed out; no servers could be reached
ubuntu$
./staging_dir/toolchain-mips_24kc_gcc-5.4.0_musl-1.1.16/bin/mips-openwrt-linux-gdb
-d
./build_dir/target-mips_24kc_musl-1.1.16/dnsmasq-full/dnsmasq-2.77test5/src/
-n
./staging_dir/target-mips_24kc_musl-1.1.16/root-ar71xx/usr/sbin/dnsmasq
dnsmasq.757.11.1494218146.core
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later ...
Reading symbols from
./staging_dir/target-mips_24kc_musl-1.1.16/root-ar71xx/usr/sbin/dnsmasq...done.
[New LWP 757]
...
Core was generated by `dnsmasq -C crash-dnsmasq.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
281 if (forward->sentto->addr.sa.sa_family == AF_INET)
(gdb) bt
at dnsmasq.c:1565
#3 0x004047db in main (argc=, argv=)
at dnsmasq.c:1044
(gdb)
The dnsmasq config file, log file, and client log are attached. I'm not
sure I can go any further, so would appreciate the dnsmasq package
maintainer taking a look and advising.
Thanks!
----------
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Simon Kelley
2017-05-09 21:42:24 UTC
Permalink
Never trust a git commit which happened in the early hours :)

Thanks for a second excellent bug report. This was much easier to find.

I've committed the fix to git.

I'll deal with Petr's patch tomorrow and then tag 2.77rc2

Cheers,

Simon.
Post by Kevin Darbyshire-Bryant
Post by Simon Kelley
That was a horrible one.
Fix committed, and an optimistic 2.77rc1 tag added.
Sadly a tad optimistic. From the original reporter, and I can confirm
Sorry!
Looking forward to the final release following the rc2 :-)
Cheers,
Kevin
Post by Simon Kelley
I saw the update from Simon Kelley (thank you!) on the Dnsmasq-discuss
mailing list and built an updated LEDE dnsmasq-2.77rc1 package to
test. (see required patch attached)
The prior minimal test-case passed, but the original production config
Mon May 8 22:59:46 2017 kern.info kernel: [1738736.539480]
do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from
00000000
Mon May 8 22:59:46 2017 kern.info kernel: [1738736.548375] epc =
0040e79b in dnsmasq[400000+2d000]
Mon May 8 22:59:46 2017 kern.info kernel: [1738736.553564] ra =
0040e773 in dnsmasq[400000+2d000]
(gdb) core-file dnsmasq.18906.11.1494309586.core
[New LWP 18906]
...
Core was generated by `dnsmasq -C /var/etc/dnsmasq.conf.cfg02411c
--no-daemon'.
Program terminated with signal SIGSEGV, Segmentation fault.
222 log_query(logflags | flags | F_CONFIG | F_FORWARD,
qdomain, *addrpp, NULL);
(gdb) bt
#1 0x00410759 in reply_query (fd=, family=,
at dnsmasq.c:1560
#3 0x004047db in main (argc=, argv=)
at dnsmasq.c:1044
(gdb) print logflags
$1 = 32800
(gdb) print flags
$2 =
(gdb) print *qdomain
value has been optimized out
(gdb) print addrpp
$3 = (struct all_addr **) 0x0
(gdb)
This turns out to be easy to reproduce. Simply add domain-needed to the
prior standalone config file.
$ nslookup -port=55553 google.com 192.168.1.1
;; connection timed out; no servers could be reached
I attached all the relevant logs, configs and patches.
----------
One or more files have been attached.
https://bugs.lede-project.org/index.php?do=details&task_id=766#comment2589
Post by Simon Kelley
I really hope to get out a 2.77 release soon.
Cheers,
Simon.
Post by Kevin Darbyshire-Bryant
Hi Simon,
Got a report in LEDE land about a SIGSEGV issue, I'm able to replicate
easily as described.
Thoughts?
Cheers,
Kevin
-------- Forwarded Message --------
Subject: [FS#766] Intermittent SIGSEGV crash of dnsmasq-full
Date: Mon, 08 May 2017 05:57:18 +0000
FS#766 - Intermittent SIGSEGV crash of dnsmasq-full User who did this -
guidosarducci (guidosarducci)
----------
After a little more investigation, this is definitely a bug that also
exists in the latest lede/master which uses dnsmasq-2.77test5. It is
easily triggered via a common mozilla DNS query, and appears related to
using split DNS and DNSSEC.
listen-address=192.168.1.1
port=55553
bind-interfaces
no-daemon
no-hosts
no-resolv
log-queries=extra
server=8.8.8.8
server=/cloudfront.net/50.22.147.234
dnssec
dnssec-check-unsigned
trust-anchor=.,19036,8,2,49AAC11D7B6F6446702E54A1607371607A1A41855200FD2CE1CDDE32F24E8FB5
trust-anchor=.,20326,8,2,E06D44B80B8F1D39A95C0B0D7C65D08458E880409BBC683457104237C7F8EC8D
server=/cloudfront.net/50.22.147.234
dnssec-check-unsigned
The bug can be triggered from a DNS client simply (e.g.a blank Firefox
ubuntu$ nslookup -port=55553 tiles-cloudfront.cdn.mozilla.net 192.168.1.1
;; Question section mismatch: got cloudfront.net/DS/IN
;; connection timed out; no servers could be reached
ubuntu$
./staging_dir/toolchain-mips_24kc_gcc-5.4.0_musl-1.1.16/bin/mips-openwrt-linux-gdb
-d
./build_dir/target-mips_24kc_musl-1.1.16/dnsmasq-full/dnsmasq-2.77test5/src/
-n
./staging_dir/target-mips_24kc_musl-1.1.16/root-ar71xx/usr/sbin/dnsmasq
dnsmasq.757.11.1494218146.core
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later ...
Reading symbols from
./staging_dir/target-mips_24kc_musl-1.1.16/root-ar71xx/usr/sbin/dnsmasq...done.
[New LWP 757]
...
Core was generated by `dnsmasq -C crash-dnsmasq.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
281 if (forward->sentto->addr.sa.sa_family == AF_INET)
(gdb) bt
at dnsmasq.c:1565
#3 0x004047db in main (argc=, argv=)
at dnsmasq.c:1044
(gdb)
The dnsmasq config file, log file, and client log are attached. I'm not
sure I can go any further, so would appreciate the dnsmasq package
maintainer taking a look and advising.
Thanks!
----------
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Kevin Darbyshire-Bryant
2017-05-10 08:03:59 UTC
Permalink
Post by Simon Kelley
Never trust a git commit which happened in the early hours :)
Thanks for a second excellent bug report. This was much easier to find.
Sorry for keeping you up till the wee small hours with your bug hunting
outfit on :-)

Guido does all the hard work with gdb, I just wave a flag, jump about
and say 'lookie here!' :-)
Post by Simon Kelley
I've committed the fix to git.
I'll deal with Petr's patch tomorrow and then tag 2.77rc2
Good stuff. An rc2 I can get into LEDE for more bug hunting :-)
Post by Simon Kelley
Cheers,
Simon.
Loading...