Discussion:
[Dnsmasq-discuss] Problem with VM and dnsmasq
mario
2015-10-07 13:41:15 UTC
Permalink
Hello, my first post.

I use as a gateway a Debian Jessie pc with dnsmasq providing both DHCP
and DNS.

Normally, it works flawlessly, but there may be one bug. On one of my
pcs (Ubuntu 14.04)
I run a VirtualBox instance with an Arch Linux VM for which I have
chosen a Bridged
Adapter mode. When the host is on Ethernet the VM gets an IP address
from dnsmasq
unfailingly. But when the host in on wifi, and the VM still in bridged
adapter mode, the VM
cannot seem to obtain an IP address.

The funny thing is,I see the DHCP request enter the gateway (via
tcpdump), and in
the file `/var/log/daemon.log` I see the recording of a DHCP offer. But
the packet with the
DHCP offer never leaves the gateway: dnsmasq says it has made an offer,
the offer never
leaves the gateway, never mind reaching the client.

Please notice: dnsmasq works perfectly for all other clients, wired or
wifi. It fails only with the
VM, and then only when the host is on wifi.

In the attached files I provide: 1) the dnsmasq.conf file; 2) tcpdump
capture and daemon.log on the
gateway when the host is connected via ethernet; 3) tcpdump capture and
daemon.log when the
host is connected via wifi. The third file shows that dnsmasq IS reached
by the BOOTP/DHCP request
just like in the ethernet case, but tcpdump does NOT show an outgoing
Reply. Firewall configuration
and sysctl.conf have not changed in the meantime.

I also provide, in the following files called packet_eth and
packet_wlan, the whole BOOTP/DHCP packet,
for the case of a connection via ethernet, and then via wifi. The reason
is that I am not sure the
fault in this behavior lies entirely (or exclusively?) with dnsmasq: it
might lie with Arch Linux dhcpcd
package, or perhaps with VirtualBox.

I would be glad of any feedback, I hope I have provided enough info to
allow diagnosing the problem.

Cheers,

Mario
Albert ARIBAUD
2015-10-07 14:24:44 UTC
Permalink
Hi mario,
Post by mario
Hello, my first post.
I use as a gateway a Debian Jessie pc with dnsmasq providing both DHCP
and DNS.
Normally, it works flawlessly, but there may be one bug. On one of my
pcs (Ubuntu 14.04)
I run a VirtualBox instance with an Arch Linux VM for which I have
chosen a Bridged
Adapter mode. When the host is on Ethernet the VM gets an IP address
from dnsmasq
unfailingly. But when the host in on wifi, and the VM still in bridged
adapter mode, the VM cannot seem to obtain an IP address.
[...]
I would be glad of any feedback, I hope I have provided enough info to
allow diagnosing the problem.
In a Virtualox VM, you have to specify which host interface your VM's
interface is bridged to; there is no setting such as "the currently up
and running interface", it has to be one existing interface, and it is
a static setting.

Therefore, if it is bridged to eth0, and eth0 is on the network, then
all is fine on the VM; but if you switch the host to wlan0, then the VM
still tries to use eth0 (and fails).

One solution may be to define two VM interfaces and bridge one to eth0
and one to wlan0. At least one of them will be... bound... to get DHCP
right (but the other one may still believe it has an IP address when
its host counterpart is down; I havent't tried this setup).
Post by mario
Cheers,
Mario
Amicalement,
--
Albert.
mario
2015-10-07 14:36:20 UTC
Permalink
Bonjour, Albert.

Unfortunately, I wish it were that simple: I know that, and I have always
changed the interface the VM is linked to in an appropriate way. I
thought it was tacitly understood.

Besides, like I said, I always see the BOOTP/DHCP request reach the gateway:
if I had forgotten to switch the interface in VBox, that could not possibly
happen.

Cheers,

Mario
Post by Albert ARIBAUD
Hi mario,
Post by mario
Hello, my first post.
I use as a gateway a Debian Jessie pc with dnsmasq providing both DHCP
and DNS.
Normally, it works flawlessly, but there may be one bug. On one of my
pcs (Ubuntu 14.04)
I run a VirtualBox instance with an Arch Linux VM for which I have
chosen a Bridged
Adapter mode. When the host is on Ethernet the VM gets an IP address
from dnsmasq
unfailingly. But when the host in on wifi, and the VM still in bridged
adapter mode, the VM cannot seem to obtain an IP address.
[...]
I would be glad of any feedback, I hope I have provided enough info to
allow diagnosing the problem.
In a Virtualox VM, you have to specify which host interface your VM's
interface is bridged to; there is no setting such as "the currently up
and running interface", it has to be one existing interface, and it is
a static setting.
Therefore, if it is bridged to eth0, and eth0 is on the network, then
all is fine on the VM; but if you switch the host to wlan0, then the VM
still tries to use eth0 (and fails).
One solution may be to define two VM interfaces and bridge one to eth0
and one to wlan0. At least one of them will be... bound... to get DHCP
right (but the other one may still believe it has an IP address when
its host counterpart is down; I havent't tried this setup).
Post by mario
Cheers,
Mario
Amicalement,
Albert ARIBAUD
2015-10-07 17:06:15 UTC
Permalink
Bonjour mario,
Post by mario
Bonjour, Albert.
Unfortunately, I wish it were that simple: I know that, and I have always
changed the interface the VM is linked to in an appropriate way. I
thought it was tacitly understood.
Well, you know what they say about assumption. :)
Post by mario
if I had forgotten to switch the interface in VBox, that could not possibly
happen.
Sorry, I'd missed that. Assumption works either way, apparently. O:-)

So -- dnsmasq may ignore a request if it is not authoritative.

Your dnsmasq conf file does not say it is authoritative, and I'll...
assume... that no command line option changes that.

So, maybe your dnsmasq thinks it does not have authority to answer the
discovery request when on WLAN.

Stupid question BTW: how does yor host get its eth and wlan IPs? Does
it ask another DHCP server on the segment, or are they fixed? In the
end, does it get the same or different addresses on both I/Fs?

Anyway: the only difference I see between the requests is the broadcast
flag present for the WLAN packet and absent in the ETH packet, but I
don't see how this could affect dnsmasq.
Post by mario
Cheers,
Mario
Amicalement,
--
Albert.
Albert ARIBAUD
2015-10-07 19:09:45 UTC
Permalink
Hi again mario,

Le Wed, 7 Oct 2015 19:06:15 +0200, Albert ARIBAUD
Post by Albert ARIBAUD
Stupid question BTW: how does yor host get its eth and wlan IPs? Does
it ask another DHCP server on the segment, or are they fixed? In the
end, does it get the same or different addresses on both I/Fs?
Also: when your VM switches between host eth tand host wlan for its
bridge, does it do it while off or is it still on? What if both the
host and VM boot on wlan and never switch to eth?
Post by Albert ARIBAUD
Anyway: the only difference I see between the requests is the broadcast
flag present for the WLAN packet and absent in the ETH packet, but I
don't see how this could affect dnsmasq.
Post by mario
Cheers,
Mario
Amicalement,
--
Albert.
mario
2015-10-07 20:48:15 UTC
Permalink
Hello, Albert.
Post by Albert ARIBAUD
Hi again mario,
Le Wed, 7 Oct 2015 19:06:15 +0200, Albert ARIBAUD
Post by Albert ARIBAUD
Stupid question BTW: how does yor host get its eth and wlan IPs? Does
it ask another DHCP server on the segment, or are they fixed? In the
end, does it get the same or different addresses on both I/Fs?
The whole LAN is served by the single dnsmasq we are discussing, both
DHCP and DNS.
The different NICs are dished distinct IP addresses, except for the few
with reserved addresses.
Post by Albert ARIBAUD
Also: when your VM switches between host eth tand host wlan for its
bridge, does it do it while off or is it still on? What if both the
host and VM boot on wlan and never switch to eth?
The VM is brought down completely, powered off. I then
disconnect the host, count to 5 ;-) re-connect the host thru the other
NIC,
change guest setup as regards to network configuration, then I
bring up the VM.

It is as fresh a start as I can concoct, short of bringing the whole
host down.

I do not know what you mean by

.... What if VM boot[s] on wlan and never switch[es] to eth?

The VM does not have access to a wifi connection. It has a single
ethernet NIC;
as per VirtualBox (or any Hypervisor) it is connected either to wired or
to the
wifi NIC of the host. I never switch the guest while it is up and running.
When I want to switch the host's connection, I bring the VM down (=powered
off), reconnect the host, change the VM network configuration so that it
hooks up
now with the new alive NIC, then bring it up.

When the host is conneted via wifi, the VM never receives a reply to its
BOOTP/DHCP
request: I can see, in the tcpdump records, hundreds of requests, dnsmasq
makes hundred of replies, not a single one of them leaves the gateway:
the VM
never gets a DHCP reply. If the host is connected via ethernet, the same
occurs except
the reply occurs in a couple of seconds, and that's the end of the DHCP
process:
the VM is served a proper IP address in a matter of seconds.
Post by Albert ARIBAUD
Post by Albert ARIBAUD
Anyway: the only difference I see between the requests is the broadcast
flag present for the WLAN packet and absent in the ETH packet, but I
don't see how this could affect dnsmasq.
Post by mario
Cheers,
Mario
Amicalement,
Cheers, and thanks for your help,

Mario
Albert ARIBAUD
2015-10-08 06:38:19 UTC
Permalink
Bonjour mario,
Post by mario
Hello, Albert.
Post by Albert ARIBAUD
Hi again mario,
Le Wed, 7 Oct 2015 19:06:15 +0200, Albert ARIBAUD
Post by Albert ARIBAUD
Stupid question BTW: how does yor host get its eth and wlan IPs? Does
it ask another DHCP server on the segment, or are they fixed? In the
end, does it get the same or different addresses on both I/Fs?
The whole LAN is served by the single dnsmasq we are discussing, both
DHCP and DNS.
The different NICs are dished distinct IP addresses, except for the few
with reserved addresses.
OK, so the host has different IPs depending on whether it uses eth0 or
wlan0, correct?
Post by mario
Post by Albert ARIBAUD
Also: when your VM switches between host eth tand host wlan for its
bridge, does it do it while off or is it still on? What if both the
host and VM boot on wlan and never switch to eth?
The VM is brought down completely, powered off. I then
disconnect the host, count to 5 ;-) re-connect the host thru the other
NIC,
change guest setup as regards to network configuration, then I
bring up the VM.
It is as fresh a start as I can concoct, short of bringing the whole
host down.
I do not know what you mean by
.... What if VM boot[s] on wlan and never switch[es] to eth?
The VM does not have access to a wifi connection. It has a single
ethernet NIC;
as per VirtualBox (or any Hypervisor) it is connected either to wired or
to the
wifi NIC of the host. I never switch the guest while it is up and running.
When I want to switch the host's connection, I bring the VM down (=powered
off), reconnect the host, change the VM network configuration so that it
hooks up now with the new alive NIC, then bring it up.
I meant to ask if you had tried a scenario where everything starts up
cold with wlan. AIAU, while the hosts does not boot cold with only
wlan0 up, you *do* boot the VM cold with its interface bridged to the
host's wlan0, so I have my answer: the VM does boot cold on wlan0.

(to be complete, a real cold boot scenario would also include erasing
the client lease files on the host and VM before powering them back on,
to make sure we're in a clean state, but let's put this aside for now)
Post by mario
When the host is conneted via wifi, the VM never receives a reply to its
BOOTP/DHCP
request: I can see, in the tcpdump records, hundreds of requests, dnsmasq
the VM
never gets a DHCP reply. If the host is connected via ethernet, the same
occurs except
the reply occurs in a couple of seconds, and that's the end of the DHCP
the VM is served a proper IP address in a matter of seconds.
Hartmut suggested that you run tcpdump / wireshark on all three
machines. IIUC, the current dumps are of the DHCP server. With a dump
on the host at the same time, we'd see if the DHCP answers reach the
host, and if they do, then with a dump on the VM we could see if they
have left the host.

One thing I'm thinking of all of a sudden: did you check if your Wifi
setting allows packets to a MAC which is not that of the destination
Wifi interface? Most, and probably all nowadays, ethernet interface
allows this in promiscuous mode, but not all wireless interfaces might
not always.

IOW, what happens if you configure the VM to use a fixed IP address,
then cold boot it to bridge over wlan0 and try to ping the server from
the VM, and from the VM to the server?
Post by mario
Cheers, and thanks for your help,
NP. :)
Post by mario
Mario
Amicalement,
--
Albert.
mario
2015-10-08 10:20:54 UTC
Permalink
First of all, I will say it just once, but I have to say it: thanks for
your help.


@Hartmut Krafft

Done, the output of tcpdump on the three machines (simultaneously) is
included in the
file tcpdumps. Not very informative: the Reply never leaves the gateway.

@Simon Kelley

I have included the strace outputs in two included files, logeth0 (when
it works) and
logwlan0 (when it does not work). I have snipped all lines pertaining to
the BAD FILE
DESCRIPTOR, except for the first and last

close(6) = -1 EBADF (Bad file descriptor)
close(65535) = -1 EBADF (Bad file descriptor)

in both files.

As for a method to reproduce the problem:

Host1= Linux rasal 3.16.0-49-generic #65~14.04.1-Ubuntu SMP Wed Sep 9
10:03:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
VirtualBox1 = virtualbox-4.3 4.3.30-101610~Ubuntu~raring amd64

Host2 = Linux debS 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt17-1
(2015-09-26) x86_64 GNU/Linux
VirtualBox2= virtualbox 4.3.30-dfsg-1+deb8u1 amd64

VM's: a variety: OpenBSD, ArchLinux, Debian

ALL of these display the same, identical problem, irrespective of Host
or VM. The logs above have been produced with Host1 (VirtualBox1, of
course).

Gateway: Linux router73 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt17-1
(2015-09-26) x86_64 GNU/Linux

# dnsmasq -v
Dnsmasq version 2.72 Copyright (c) 2000-2014 Simon Kelley
Compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua
TFTP conntrack ipset auth DNSSEC loop-detect


Sequence of operations:

Cycle 1
1. Boot host1 ; eth0 is on dhcp, served by dnsmasq on gateway; host1
obtains IP address correctly, everything OK;
2. BEFORE booting VM1, in configuration, Bridge Network Adapter is
chosen, set to attached to host's eth0;
3. Boot VM1; eth0 in VM is set to dhcp; IP address obtained correctly
from gateway's dnsmasq
4. Poweroff VM1;
5. Disconnect Host1 from eth0;

Cycle 2
6. Connect Host1 thru wlan0; IP address obtained correctly, everything OK
7. Configure VM1 to Bridge Network Adapter, but attached to host's wlan NIC;
8. Boot VM1; since eth0 inside VM1 is still set to dhcp, eth0 tries to
obtain IP address from gateway's dnsmasq, but NO success;
9. Poweroff VM1
10.Disconnect Host1 from wifi

Always reproducible: Cycle1 always yields a properly connected VM1,
Cycle2 always yields a disconnected VM2.

Tried this with both Host1 and Host2, and with OpenBSD/ArchLinux/Debian
VMs on both hosts: always reproducible.

As for the bug: difficult to say at this point. VirtualBox does not use
a bridge when the host is connected via wifi,
for obvious reasons, so, in my case we are really comparing two
different ways of putting the VM onto the LAN.
Also, VirtualBox does not give you any access to its connection
innards,so there is no
control on the bridge, if any.

@Albert Aribaud:

I think the above also answers your questions.
Albert ARIBAUD
2015-10-13 09:49:42 UTC
Permalink
Hi mario,

Sorry for not replying sooner.
Post by mario
First of all, I will say it just once, but I have to say it: thanks for
your help.
@Hartmut Krafft
Done, the output of tcpdump on the three machines (simultaneously) is
included in the
file tcpdumps. Not very informative: the Reply never leaves the gateway.
Actually, The reply is lost somewhere between the gateway (where the
reply is seen to be sent) and the host (where it is not seen to be
received). It is unsure whether it could leave the gateway or not. It
could have left the gateway been lost by the AP (which, IIUC, is
another machine as the gateway).
Post by mario
@Simon Kelley
I have included the strace outputs in two included files, logeth0 (when
it works) and
logwlan0 (when it does not work). I have snipped all lines pertaining to
the BAD FILE
DESCRIPTOR, except for the first and last
close(6) = -1 EBADF (Bad file descriptor)
close(65535) = -1 EBADF (Bad file descriptor)
in both files.
Host1= Linux rasal 3.16.0-49-generic #65~14.04.1-Ubuntu SMP Wed Sep 9
10:03:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
VirtualBox1 = virtualbox-4.3 4.3.30-101610~Ubuntu~raring amd64
Host2 = Linux debS 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt17-1
(2015-09-26) x86_64 GNU/Linux
VirtualBox2= virtualbox 4.3.30-dfsg-1+deb8u1 amd64
VM's: a variety: OpenBSD, ArchLinux, Debian
ALL of these display the same, identical problem, irrespective of Host
or VM. The logs above have been produced with Host1 (VirtualBox1, of
course).
Gateway: Linux router73 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt17-1
(2015-09-26) x86_64 GNU/Linux
# dnsmasq -v
Dnsmasq version 2.72 Copyright (c) 2000-2014 Simon Kelley
Compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua
TFTP conntrack ipset auth DNSSEC loop-detect
Cycle 1
1. Boot host1 ; eth0 is on dhcp, served by dnsmasq on gateway; host1
obtains IP address correctly, everything OK;
2. BEFORE booting VM1, in configuration, Bridge Network Adapter is
chosen, set to attached to host's eth0;
3. Boot VM1; eth0 in VM is set to dhcp; IP address obtained correctly
from gateway's dnsmasq
4. Poweroff VM1;
5. Disconnect Host1 from eth0;
Cycle 2
6. Connect Host1 thru wlan0; IP address obtained correctly, everything OK
7. Configure VM1 to Bridge Network Adapter, but attached to host's wlan NIC;
8. Boot VM1; since eth0 inside VM1 is still set to dhcp, eth0 tries to
obtain IP address from gateway's dnsmasq, but NO success;
9. Poweroff VM1
10.Disconnect Host1 from wifi
Always reproducible: Cycle1 always yields a properly connected VM1,
Cycle2 always yields a disconnected VM2.
Tried this with both Host1 and Host2, and with OpenBSD/ArchLinux/Debian
VMs on both hosts: always reproducible.
As for the bug: difficult to say at this point. VirtualBox does not use
a bridge when the host is connected via wifi,
for obvious reasons, so, in my case we are really comparing two
different ways of putting the VM onto the LAN.
Also, VirtualBox does not give you any access to its connection
innards,so there is no
control on the bridge, if any.
I think the above also answers your questions.
Actually not all of them: I believe you have not tested my suggestion
that you set a fixed IP in the VM instead of using DHCP, connect the
host through wlan, boot up the VM and then test connectivity between the
VM and gateway ([ar]ping, netcat...) so that you can determine whether
your issue is DHCP-related or "just" network-related.

Amicalement,
--
Albert.
mario
2015-10-13 10:29:57 UTC
Permalink
Dear Albert,
Post by Albert ARIBAUD
Actually not all of them: I believe you have not tested my suggestion
that you set a fixed IP in the VM instead of using DHCP, connect the
host through wlan, boot up the VM and then test connectivity between
the VM and gateway ([ar]ping, netcat...) so that you can determine
whether your issue is DHCP-related or "just" network-related.
Amicalement,
it works perfectly, see the attached file. Sequence of ops:

1. Disconnect host from ethernet; put it on wifi;
2. configure VM to connect thru the host wlan0 interface, bridge adapter.
3. start VM; in VM... (see attached file!)
4. stop network-manager;
5. check that VM's ethernet interface has not received an IP address
from gateway;
6. Configure VM's eth0 manually;
7. Ping remote site.
Post by Albert ARIBAUD
Actually, The reply is lost somewhere between the gateway (where the
reply is seen to be sent) and the host (where it is not seen to be
received). It is unsure whether it could leave the gateway or not. It
could have left the gateway been lost by the AP (which, IIUC, is
another machine as the gateway).
Nope, there is a slight confusion: the tcpdump capture on the gateway
includes two distinct conversations, one with a normal Intel pc with MAC
address c4:85:08:7d:79:40 which DOES receive
two replies, and another connection with the typical (for VirtualBox
VMs) CADMUS pc with MAC address
08:00:27:03:6a:3e, which is the VM I am talking about, which never
receives a reply, even though the
daemon.log displays the fact that dnsmasq has offered it a perfectly
good IP address.

In other words, dnsmasq's reply is lost somewhere between dnsmasq itself
and tcpdump on the same gateway.

I might add, to add insult to injury, that during the week-end I tried
this in my home, and my dd-wrt dnsmasq worked flawlessly where my Debian
gateway fails.

Cheers,

Mario
Albert ARIBAUD
2015-10-13 11:28:27 UTC
Permalink
Hi mario,
Post by mario
Dear Albert,
Post by Albert ARIBAUD
Actually not all of them: I believe you have not tested my suggestion
that you set a fixed IP in the VM instead of using DHCP, connect the
host through wlan, boot up the VM and then test connectivity between
the VM and gateway ([ar]ping, netcat...) so that you can determine
whether your issue is DHCP-related or "just" network-related.
Amicalement,
1. Disconnect host from ethernet; put it on wifi;
2. configure VM to connect thru the host wlan0 interface, bridge adapter.
3. start VM; in VM... (see attached file!)
4. stop network-manager;
5. check that VM's ethernet interface has not received an IP address
from gateway;
6. Configure VM's eth0 manually;
7. Ping remote site.
Ok, so ICMP (with ping) and UDP (through name resolution) do work,
which leaves the
Post by mario
Post by Albert ARIBAUD
Actually, The reply is lost somewhere between the gateway (where the
reply is seen to be sent) and the host (where it is not seen to be
received). It is unsure whether it could leave the gateway or not. It
could have left the gateway been lost by the AP (which, IIUC, is
another machine as the gateway).
Nope, there is a slight confusion: the tcpdump capture on the gateway
includes two distinct conversations, one with a normal Intel pc with MAC
address c4:85:08:7d:79:40 which DOES receive
two replies, and another connection with the typical (for VirtualBox
VMs) CADMUS pc with MAC address
08:00:27:03:6a:3e, which is the VM I am talking about, which never
receives a reply, even though the
daemon.log displays the fact that dnsmasq has offered it a perfectly
good IP address.
In other words, dnsmasq's reply is lost somewhere between dnsmasq itself
and tcpdump on the same gateway.
Ok.

I'm not a strace guru, so I did not look into it at first, but I
finally decided to try and I think I got something from the strace.
Look at this part:

write(2, "DHCPOFFER(brlan) 192.168.73.95 0"...,
49DHCPOFFER(brlan) 192.168.73.95 08:00:27:03:6a:3e ) = 49
write(2, "\n", 1 ) = 1
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2652, ...}) = 0
write(12, "<134>Oct 8 10:42:41 dnsmasq-dhc"..., 91) = 91
alarm(7646) = 7646
sendmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(68),
sin_addr=inet_addr("255.255.255.255")},
msg_iov(1)=[{"\2\1\6\0\334\330J\374\0\5\200\0\0\0\0\0\300\250I_\300\250I\1\0\0\0\0\10\0'\3"...,
320}], msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_IP,
cmsg_type=, ...}, msg_flags=0}, 0) = -1 EPERM (Operation not
permitted)

This EPERM seems to indicate that your gateway, for some reason, does
not allow dnsmasq to send *this* packet (previous sendmsg calls for the
same MAC did return ok), which explains why it does not show in the
dump.

Adding Simon as cc: just to draw his attention on this EPERM -- it
might ring a bell to him.

Now, EPERM is usually due to some security or networking policy, or to
the gateway's network configuration, but in the same strace some DHCP
replies to the VM's MAC return ok, so I'm not sure what happens here.
Post by mario
Cheers,
Mario
Amicalement,
--
Albert.
mario
2015-10-13 13:00:28 UTC
Permalink
Yes, thank you Albert, you were perfectly right. This thread is now
solved, shall I somehow
signal this?
Post by Albert ARIBAUD
Ok.
I'm not a strace guru, so I did not look into it at first, but I
finally decided to try and I think I got something from the strace.
write(2, "DHCPOFFER(brlan) 192.168.73.95 0"...,
49DHCPOFFER(brlan) 192.168.73.95 08:00:27:03:6a:3e ) = 49
write(2, "\n", 1 ) = 1
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2652, ...}) = 0
write(12, "<134>Oct 8 10:42:41 dnsmasq-dhc"..., 91) = 91
alarm(7646) = 7646
sendmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(68),
sin_addr=inet_addr("255.255.255.255")},
msg_iov(1)=[{"\2\1\6\0\334\330J\374\0\5\200\0\0\0\0\0\300\250I_\300\250I\1\0\0\0\0\10\0'\3"...,
320}], msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_IP,
cmsg_type=, ...}, msg_flags=0}, 0) = -1 EPERM (Operation not
permitted)
This EPERM seems to indicate that your gateway, for some reason, does
not allow dnsmasq to send *this* packet (previous sendmsg calls for the
same MAC did return ok), which explains why it does not show in the
dump.
Adding Simon as cc: just to draw his attention on this EPERM -- it
might ring a bell to him.
Now, EPERM is usually due to some security or networking policy, or to
the gateway's network configuration, but in the same strace some DHCP
replies to the VM's MAC return ok, so I'm not sure what happens here.
Amicalement,
I had the following rule in my iptables:

iptables -A OUTPUT -d 255.255.255.255 -j DROP

and, for some reason which I do not understand, while the BOOTP/DHCP
requests from pcs and VMs connected thru ethernet do not entail replies
to the broadcast address of the zero network, those for
VMs connected thru wifi do use it. I introduced the above rule when I
still thought that being a good
netizen meant blocking packets addressed to non-routable subnets,
malformed packets, an so on.

After removal of said rule, VMs connected thru host's wifi get their IP
address from the gateway's dnsmasq without any problem. This thread is
solved.

I will have to start dabbling in strace, because it obviously can find
you the solution when nothing
else can. When they told me, as a grad student, that being an
astrophysicist meant knowing a little of
everything, I did not think this would turn out to be so true. ;-)

Cheers, and thanks once again.

Mario
Albert ARIBAUD
2015-10-13 13:54:09 UTC
Permalink
Hi mario,
Post by mario
Yes, thank you Albert, you were perfectly right. This thread is now
solved, shall I somehow
signal this?
So you found the reason why the EPERM happened and determined that it
was not related to dnsmasq?

If you did, then 1) great! :) and 2) it might help someone with the
same kind of problem and who may stumble upon this thread if you would
even briefly describe what was the actual cause of the problem and how
you solved it.

Apart from this, since this is just a mailing list, not a defect
tracking list, you don't need to "close" the topic.

Amicalement,
--
Albert.
Simon Kelley
2015-10-13 16:21:51 UTC
Permalink
mario explained that the problem was an IPtables rule.

iptables -A OUTPUT -d 255.255.255.255 -j DROP

I'm not surprised that such a rule breaks things: it's mentioned twice
in the dnsmasq FAQ :-)

I am surprised that it results in an error return from sendmsg(), I
assumed that the packet would just be silently dropped. Looks like it
would be worthwhile for dnsmasaq to log that error.

Thanks, both of you, for your work on this, and getting to the bottom
of the problem.

Cheers,

Simon.
Post by Albert ARIBAUD
Hi mario,
Post by mario
Yes, thank you Albert, you were perfectly right. This thread is
now solved, shall I somehow signal this?
So you found the reason why the EPERM happened and determined that
it was not related to dnsmasq?
If you did, then 1) great! :) and 2) it might help someone with
the same kind of problem and who may stumble upon this thread if
you would even briefly describe what was the actual cause of the
problem and how you solved it.
Apart from this, since this is just a mailing list, not a defect
tracking list, you don't need to "close" the topic.
Amicalement,
Hartmut Krafft
2015-10-07 18:46:08 UTC
Permalink
Hi,
to get closer, you should run tcpdump on all 3 machines involved, to see
where the packets get lost.
Best,
Hartmut
Simon Kelley
2015-10-07 20:39:52 UTC
Permalink
Post by mario
Hello, my first post.
I use as a gateway a Debian Jessie pc with dnsmasq providing both
DHCP and DNS.
I wonder if this is the same bug as


https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=798981


If so, we have some more information from your work, and a method or
reproducing the problem. Both are valuable.

If you have time, it would be useful to run dnsmasq under strace and
with the -d flag.

strace /path/to/dnsmasq -d 2>&1 | tee /tmp/log-file-to-post

in both the working and non-working cases and post the resulting logs
here.

Cheers,

Simon.
Loading...