Discussion:
[Dnsmasq-discuss] reproducible segmentation fault
Christian Kujau
2017-08-21 09:46:25 UTC
Permalink
Hi,

while playing around with the "dnseval" tool from the dnsdiag package[0],
I accidently crashed my dnsmasq instance that was running on my router.
This router is running Dnsmasq version 2.77 on a current LEDE operating
sending SIGSEGV to dnsmasq for invalid read access from 00000000
https://bugs.lede-project.org/index.php?do=details&task_id=251
Intermittent SIGSEGV crash of dnsmasq-full
https://bugs.lede-project.org/index.php?do=details&task_id=766
However, both bugs were closed because they either were related to some
busybox machinery errors or were pointing to the upstream project to look
at this.

In the dnsmasq-discuss archives I found a thread from last year[1] with
a crash message that looked very much like the message I receive when
dnsmasq crashes on the LEDE router:

===================
kernel: [ 2860.890789] do_page_fault(): sending SIGSEGV to dnsmasq for invalid write access to 00552000
kernel: [ 2860.899402] epc = 77cd488c in libc.so[77c62000+92000]
kernel: [ 2860.904552] ra = 00406c41 in dnsmasq[400000+21000]
===================

So, I tried to reproduce this scenario on a Debian/amd64 VM and
compiled today's git checkout with -Og -g and used a fairly simple
configuration file to start dnsmasq:

===================
$ cat ~/test/dnsmasq.conf.bug
listen-address=192.168.56.130
bind-interfaces
no-daemon
no-hosts
no-resolv
log-queries=extra
server=8.8.8.8

$ sudo -H src/dnsmasq -C ~/test/dnsmasq.conf.bug
dnsmasq: started, version 2.78test2-6-g69a815a cachesize 150
dnsmasq: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
dnsmasq: using nameserver 8.8.8.8#53
dnsmasq: cleared cache

dnsmasq: 1 192.168.56.1/59405 query[A] www.aol.com from 192.168.56.1
Segmentation fault
===================

The segfault happened after I started "dnseval" against this newly spawned
dnseval: - bulk ping utility that sends an arbitrary DNS query to
a list of DNS servers
Per default, "bulk" means it sends 10 requests to the DNS server, but
dnsmasq segfaults pretty quickly:

GDB output (still carries optimized out values, hm...)
https://paste.fedoraproject.org/paste/awbvnGEvj57ru1TtAuA3ag

tcpdump for this run:
https://paste.fedoraproject.org/paste/X-9Qa67oKT-jlmpKb4IU7A

Ideas welcome :-)

Thanks,
Christian.

[0] https://github.com/farrokhi/dnsdiag
[1] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2016q3/010830.html
--
BOFH excuse #37:

heavy gravity fluctuation, move computer to floor rapidly
AW
2017-08-21 15:09:42 UTC
Permalink
i found something similar:http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2017q3/011691.html
what happens, if u compile dnsmasq with -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 ?
-Arne

Christian Kujau <***@nerdbynature.de> schrieb am 10:34 Montag, 21.August 2017:


Hi,

while playing around with the "dnseval" tool from the dnsdiag package[0],
I accidently crashed my dnsmasq instance that was running on my router.
This router is running Dnsmasq version 2.77 on a current LEDE operating
sending SIGSEGV to dnsmasq for invalid read access from 00000000
https://bugs.lede-project.org/index.php?do=details&task_id=251
Intermittent SIGSEGV crash of dnsmasq-full
https://bugs.lede-project.org/index.php?do=details&task_id=766
However, both bugs were closed because they either were related to some
busybox machinery errors or were pointing to the upstream project to look
at this.

In the dnsmasq-discuss archives I found a thread from last year[1] with
a crash message that looked very much like the message I receive when
dnsmasq crashes on the LEDE router:

===================
kernel: [ 2860.890789] do_page_fault(): sending SIGSEGV to dnsmasq for invalid write access to 00552000
kernel: [ 2860.899402] epc = 77cd488c in libc.so[77c62000+92000]
kernel: [ 2860.904552] ra  = 00406c41 in dnsmasq[400000+21000]
===================

So, I tried to reproduce this scenario on a Debian/amd64 VM and
compiled today's git checkout with -Og -g and used a fairly simple
configuration file to start dnsmasq:

===================
$ cat ~/test/dnsmasq.conf.bug
listen-address=192.168.56.130
bind-interfaces
no-daemon
no-hosts
no-resolv
log-queries=extra
server=8.8.8.8

$ sudo -H src/dnsmasq -C ~/test/dnsmasq.conf.bug
dnsmasq: started, version 2.78test2-6-g69a815a cachesize 150
dnsmasq: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
dnsmasq: using nameserver 8.8.8.8#53
dnsmasq: cleared cache

dnsmasq: 1 192.168.56.1/59405 query[A] www.aol.com from 192.168.56.1
Segmentation fault
===================

The segfault happened after I started "dnseval" against this newly spawned
dnsmasq instance. This "dnseval" thingy is described as:

  > dnseval: -  bulk ping utility that sends an arbitrary DNS query to
  > a list of DNS servers

Per default, "bulk" means it sends 10 requests to the DNS server, but
dnsmasq segfaults pretty quickly:

  GDB output (still carries optimized out values, hm...)
  https://paste.fedoraproject.org/paste/awbvnGEvj57ru1TtAuA3ag

  tcpdump for this run:
  https://paste.fedoraproject.org/paste/X-9Qa67oKT-jlmpKb4IU7A

Ideas welcome :-)

Thanks,
Christian.

[0] https://github.com/farrokhi/dnsdiag
[1] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2016q3/010830.html
--
BOFH excuse #37:

heavy gravity fluctuation, move computer to floor rapidly
Christian Kujau
2017-08-25 11:10:06 UTC
Permalink
Post by AW
i found something similar:http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2017q3/011691.html
what happens, if u compile dnsmasq with -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 ?
Sorry for the late reply. Unfortunately, these adding these options
doesn't help, dnsmasq is still crashing :-\

More ideas welcome :-)

Christian.
--
BOFH excuse #95:

Pentium FDIV bug
AW
2017-08-26 06:03:20 UTC
Permalink
1. what about this patch?
http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2017q3/011697.html


2. what if u compile dnsmasq with -g and then run it in gdb when the crash happens?
-arne
AW
2017-08-27 05:28:48 UTC
Permalink
ohoh...I just found that u already used gdb... :)

when it calls
m = answer_auth(header, ((char *) header) + udp_size, (size_t)n, ...

it seems like udp_size is 0, which causes memset to be called with weird parameters, which causes the segmentation violation...

so we should find out, what sets udp_size to 0...

can u say what gdb says when u type
# frame 2
# print daemon->edns_pktsz
?

-arne
Post by AW
i found something similar:http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2017q3/011691.html
what happens, if u compile dnsmasq with -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 ?
Sorry for the late reply. Unfortunately, these adding these options
doesn't help, dnsmasq is still crashing :-\

More ideas welcome :-)

Christian.
--
BOFH excuse #95:

Pentium FDIV bug
Christian Kujau
2017-08-27 07:40:42 UTC
Permalink
Post by AW
m = answer_auth(header, ((char *) header) + udp_size, (size_t)n, ...
it seems like udp_size is 0, which causes memset to be called with weird parameters, which causes the segmentation violation...
so we should find out, what sets udp_size to 0...
See my other mail[0] I sent a few minutes ago, it seems to be related to
EDNS requests. These can be sent via TCP, so maybe that's why udp_size is
set to 0.
Post by AW
can u say what gdb says when u type
# frame 2
# print daemon->edns_pktsz
Hm, this doesn't work:


(gdb) frame 2
#2 0x000055555556cb86 in receive_query (listen=0x55555579eeb0, now=1503819304) at forward.c:1433
1433 m = answer_request(header, ((char *) header) + udp_size, (size_t)n,
(gdb) print daemon->edns_pktsz
Attempt to extract a component of a value that is not a structure pointer.


And m and udp_size are <optimized out>, n=40 - even though it's compiled
with -Og -g.


Thanks for responding,
Christian.

[0] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2017q3/011704.html
--
BOFH excuse #90:

Budget cuts
Christian Kujau
2017-08-27 08:33:46 UTC
Permalink
Post by AW
can u say what gdb says when u type
# frame 2
# print daemon->edns_pktsz
Hah, in another attempt this worked:

(gdb) frame 2
#2 0x0040d047 in receive_query (listen=***@entry=0x8202c0, now=***@entry=1503822478) at forward.c:1433
1433 m = answer_request(header, ((char *) header) + udp_size, (size_t)n,
(gdb) print daemon->edns_pktsz
$1 = 4096

This is from dnsmasq v2.77, cross-compiled for MIPS 24K and linked against
musl, so it would run on my ar71xx type router (where I noticed the crash
in the first place).

Christian.
--
BOFH excuse #185:

system consumed all the paper for paging
Christian Kujau
2017-08-27 07:18:39 UTC
Permalink
OK, so I should have done this in the first place and used git bisect to
find out which commit in Dnsmasq introduced this behaviour:

fa78573778cb23337f67f5d0c9de723169919047 is the first bad commit
commit fa78573778cb23337f67f5d0c9de723169919047
Author: Simon Kelley <***@thekelleys.org.uk>
Date: Fri Jul 22 20:56:01 2016 +0100

Zero packet buffers before building output, to reduce risk
of information leakage.

The whole commit cannot be reverted cleanly now, but in my case reverting
only the change to src/rfc1035.c did the trick (as it appears to have have
a problem there, see the GDB dump[0]). I've attached a patch as a
temporary (!) workaround to this email.

However, commenting out this section is clearly not the correct solution,
maybe somebody can have another look on what this routine was supposed to
do here and try again.

For completeness' sake, I was curious to see what exactly dnseval[1] was
sending to Dnsmasq and why it would crash the dnsmasq process in the
first place. So, this dnseval thingy is a Python script and in commit
efeccef[2] ("Fix text alignment") they not only changed the "text anlignment"
but switched to sending EDNS queries too. Their ENDS routine was later
modified again and its current version (v1.6.3) doesn't make dnsmasq crash
- but their v1.4.0 does and that's the version that made it to the Debian
distribution :-\


Thanks for listening,
Christian.

[0] https://paste.fedoraproject.org/paste/awbvnGEvj57ru1TtAuA3ag
[1] https://github.com/farrokhi/dnsdiag/blob/master/dnseval.py
[2] https://github.com/farrokhi/dnsdiag/commit/efeccef
--
BOFH excuse #72:

Satan did it
Kevin Darbyshire-Bryant
2017-08-28 14:43:30 UTC
Permalink
Post by Christian Kujau
OK, so I should have done this in the first place and used git bisect to
fa78573778cb23337f67f5d0c9de723169919047 is the first bad commit
commit fa78573778cb23337f67f5d0c9de723169919047
Date: Fri Jul 22 20:56:01 2016 +0100
Zero packet buffers before building output, to reduce risk
of information leakage.
<snip>

Hi Christian,

Thanks for all your investigation and info so far. I too can now crash
dnsmasq at will :-) So putting my novice C and even more novice gdb to
work I've come up with what I feel is a slightly less invasive
mitigation to the problem....which in essence is 'we've been sent a
query but not yet allocated any buffer to it/updated the header limit
offset but we pass a non zero query length. The result is we try to
clear the memory before our buffer.

My workaround is to only call memset if the difference between buffer
begin and buffer limit is bigger than the query length, thus it retains
Simon's intent of clearing memory most of the time but avoids the
SIGSEGV trampling.

This is to be regarded as a sticking plaster rather than real fix but
that needs far greater minds than I to understand the code & intent :-)

Hope this helps someone.

Kevin
Christian Kujau
2017-08-28 16:27:22 UTC
Permalink
My workaround is to only call memset if the difference between buffer begin
and buffer limit is bigger than the query length, thus it retains Simon's
intent of clearing memory most of the time but avoids the SIGSEGV trampling.
Thanks, with your patch dnsmasq doesn't crash anymore when receiving odd
EDNS packets from dnseval.
And thanks for requesting the CVE - I thought about this too, as the bug
constitutes some kind of DoS issue, but since nobody else complained, I
suspected it to be some variation of PEBKAC on my part :)
Oh, I believe it was Juan Manuel requesting the CVE - thanks!

C.
--
BOFH excuse #247:

Due to Federal Budget problems we have been forced to cut back on the number of users able to access the system at one time. (namely none allowed....)
Kevin Darbyshire-Bryant
2017-08-29 11:45:07 UTC
Permalink
I've a *much* better fix for this. Will submit once I've collected
someone from the station!

Mad busy life,

Kevin
Post by Christian Kujau
My workaround is to only call memset if the difference between buffer begin
and buffer limit is bigger than the query length, thus it retains Simon's
intent of clearing memory most of the time but avoids the SIGSEGV trampling.
Thanks, with your patch dnsmasq doesn't crash anymore when receiving odd
EDNS packets from dnseval.
And thanks for requesting the CVE - I thought about this too, as the bug
constitutes some kind of DoS issue, but since nobody else complained, I
suspected it to be some variation of PEBKAC on my part :)
Oh, I believe it was Juan Manuel requesting the CVE - thanks!
C.
Kevin Darbyshire-Bryant
2017-08-29 13:15:56 UTC
Permalink
My workaround is to only call memset if the difference between buffer begin
and buffer limit is bigger than the query length, thus it retains Simon's
intent of clearing memory most of the time but avoids the SIGSEGV trampling.
Thanks, with your patch dnsmasq doesn't crash anymore when receiving odd
EDNS packets from dnseval.
Here is a fix rather than my sticking plaster workaround. My workaround
patch would actually allow dnsmasq to generate invalid replies, this
actually *fixes* the problem!

Cheers,

Kevin
Simon Kelley
2017-09-06 21:43:40 UTC
Permalink
Thanks everyone who's been working on this, and apologies for going MIA
until now.

Looking through the code, I think I can seen what's happening:


memset(((char *)header) + qlen, 0,
(limit - ((char *)header)) - qlen);

Concentrate on the calculation of the length of the memset

(limit - ((char *)header)) - qlen)

limit is a pointer to (one more than) the last valid byte of the buffer,
it's calculated in the call to answer_request as header + buffer-length,
so the expression

limit - ((char *)header)

is actually equivalent to the length of the buffer.

qlen is the length of the received question, which resides at the start
of the buffer, and which we're going to append the answer too, so the
memset zeros the buffer from the end of the question, to the end of the
buffer. Simple. The question is smaller than the buffer (otherwise we
couldn't have received it in the fist place, so the size parameter to
memset must always be positive. There is no problem.

EXCEPT in forward.c where the limit is calculated (in receive_query()),
it actually does something different to what's described above, it
doesn't calculate

header + buffer-length

it calculates header + 512, because 512 is the default maximum size of a
DNS reply, unless overridden by an EDNS0 field in the request. If the
EDNS0 is present in the request, it calculates

header + (EDNS0 maximum packet size)

So, if the request (qlen) is either larger than 512, _or_ it includes an
EDNS0 packet size field, and the request is larger than whatever that
specifies, then the size parameter to memset will go negative, and chaos
ensues. The buffer we use to receive the query is certainly larger then
512 bytes, so there's nothing to stop this being the case, and it's
quite possible that dnseval is sending a EDNS0 packet size of zero, as
Arne noted.

The solution is to calculate the memset size using the actual buffer
size, and not the limit on the size of the returned answer. Since the
question has been successfully received into this buffer, then is MUST
be larger than qlen, and the memset size can never go negative.

Doing that is not totally straightforward, since answer_request is
called from two places, which have very different buffer sizes. When
answering a TCP request, the buffer is 65536 bytes long. This answer is
to remove the memset from answer_request() and answer_auth() and do it
instead just after the reception of the packet, this can be done in the
UDP and TCP code paths and which know the true length of the buffer.



http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=63437ffbb58837b214b4b92cb1c54bc5f3279928

Is my attempt. Please check it out. I've not attempted to reproduce all
the triggers you've found, so it would be good if you can try them
against this code.



Cheers,

Simon.
Post by Kevin Darbyshire-Bryant
My workaround is to only call memset if the difference between buffer begin
and buffer limit is bigger than the query length, thus it retains Simon's
intent of clearing memory most of the time but avoids the SIGSEGV trampling.
Thanks, with your patch dnsmasq doesn't crash anymore when receiving odd
EDNS packets from dnseval.
Here is a fix rather than my sticking plaster workaround. My workaround
patch would actually allow dnsmasq to generate invalid replies, this
actually *fixes* the problem!
Cheers,
Kevin
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
AW
2017-08-27 10:29:39 UTC
Permalink
Post by Christian Kujau
(gdb) print daemon->edns_pktsz
$1 = 4096
that is weird...
why is <header> and <limit> the same then in the stack trace in answer_request(...)?


i mean: what sets <udp_size> to zero?

what about compiling with "-O0 -g"?

-arne
Juan Manuel Fernandez
2017-08-28 08:27:19 UTC
Permalink
Hi,

Last weeks we were fuzzing dnsmasq and found this crash (
https://www.mail-archive.com/dnsmasq-***@lists.thekelle
ys.org.uk/msg11597.html) . We tried to reach Simon on Friday but we have
not had any response from him. We asked mitre for a CVE id and were
assigned CVE-2017-13704.


In our original mail to Simon we attached two packets as examples: one that
crash the application, and another where the memset is set to a lenght of 0
(making it useless).

Regards,
Juan Manuel Fernandez
[image: Tarlogic]
Juan Manuel Fernández Torres
Security Engineer
***@tarlogic.com
(0034) 912 919 319
[image: follow on twiter]
<https://twitter.com/intent/follow?original_referer=http%3A%2F%2Fwww.tarlogic.com%2Fportal%2Finicio%2F&screen_name=Tarlogic&source=followbutton&variant=1.1>
[image:
contact on linked] <https://www.linkedin.com/company/tarlogic?trk=fc_badge>
www.tarlogic.com
POLÍTICA DE PRIVACIDAD Este mensaje es solamente para la persona a la que
va dirigido. Puede contener información confidencial o legalmente
protegida. No hay renuncia a la confidencialidad o privilegio por cualquier
transmisión mala / errónea. Si usted ha recibido este mensaje por error, le
rogamos que borre de su sistema inmediatamente el mensaje así como todas
sus copias, destruya todas las copias del mismo de su disco duro y
notifique al remitente. No debe, directa o indirectamente, usar, revelar,
distribuir, imprimir o copiar ninguna de las partes de este mensaje si no
es usted el destinatario. Cualquier opinión expresada en este mensaje
proviene del remitente, excepto cuando el mensaje establezca lo contrario y
el remitente esta autorizado para establecer que dichas opiniones provienen
de Tarlogic Security S.L.. Nótese que el correo electrónico vía Internet no
permite asegurar ni la confidencialidad de los mensajes que se transmiten
ni la correcta recepción de los mismos. En el caso de que el destinatario
de este mensaje no consintiera la utilización del correo electrónico vía
Internet, rogamos lo ponga en nuestro conocimiento de manera inmediata.

DISCLAIMER This message contains confidential information and is intended
only for the individual named. If you are not the named addressee you
should not disseminate, distribute or copy this e-mail. Please notify the
sender immediately by e-mail if you have received this e-mail by mistake
and delete this e-mail from your system. E-mail transmission cannot be
guaranteed to be secure or error-free as information could be intercepted,
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
The sender therefore does not accept liability for any errors or omissions
in the contents of this message, which arise as a result of e-mail
transmission.
Kevin Darbyshire-Bryant
2017-08-28 09:02:11 UTC
Permalink
Post by Juan Manuel Fernandez
Hi,
Last weeks we were fuzzing dnsmasq and found this crash
We tried to reach Simon on Friday but we have not had any response from
him. We asked mitre for a CVE id and were assigned CVE-2017-13704.
Be aware that it's a bank holiday Monday here in the UK which means it's
popular time to go away for a week or so with family/friends. This may
explain the lack of response so far.

Good that a CVE is assigned. Even better you've got some example
packets that induce the issue :-)
Post by Juan Manuel Fernandez
In our original mail to Simon we attached two packets as examples: one
that crash the application, and another where the memset is set to a
lenght of 0 (making it useless).
Regards,
Juan Manuel Fernandez
Tarlogic
Cheers,

Kevin
Loading...