Discussion:
[Dnsmasq-discuss] Duplicate IP detection with fixed IP
Bernard CLABOTS
2018-09-18 15:59:01 UTC
Permalink
Hi all,   I have been trying to replicate an issue of IP conflict on Open-WRT, the issue is randomly seen, and I expect in real life, it is related to a de-sync of the lease data base with the actual situation (in case a switch is between the client and the server and the server is rebooted e.g., so that the client acts as though it would have a fixed IP. Reported as seen as well when moving a client from one setup to another setup where the IP that it used to receive is used on the LAN).
   I tested with 2 different versions of dnsmasq (2.78 and 2.79).
   I use Scapy to forge DHCP Requests (see further).
Setup:I have a laptop with a fixed IP inside the range of the DHCP (192.168.1.0/26). I then forge a Request of that IP using scapy and I cannot explain the behavior:1. I see no ARP whatsoever to the requested IP when DNSMasq handles the request.2. When I request the fixed IP for a client with a random MAC, I instantly receive an ACK, then I see some unanswered ARP requests (*after*) as to "who has [IP just assigned]? Tell 192.168.1.1" where 192.168.1.1 is the DHCP server IP.
I end up in a situation where the dhcp.leases contains the fake MAC associated to the lease, while the ARP table contains the MAC of the fixed IP laptop (probably because I'm not sending any IP packet where the IP is associated to the fake MAC, so the switch cannot learn it).
I have observed that Windows 10 has a mechanism to prevent conflicts where, whenever a fixed IP is used/configured, after the link is up an ARP probe is sent with its own IP. In case it gets answered, the client keeps silent and start using a link local IPv4 (169....). Yet I have tested with a very old laptop running Windows 3.1 and I can replicate the issue.But basically, it is puzzling that the device is ARPing *after* the DHCP distributed the IP.

*The all issue seems to boil down to:* why does DNSMasq not check if the IP is free before assigning it?
I thought that unless option "-5" or "--no-ping" was set, DNSMasq would always ping once to the assigned IP *before* assignment (I controlled in the code and see that actually, there is a mechanism to store the positive identification as well as to blacklist IP's in case a client is constantly coming back).The only ARP I see in this case is *after* the IP is assigned. How come DNSMasq is not trying to ping before assignment? Is there an option to force this behavior (from the code I guess not)? Is DNSMasq also somehow relying on the ARP table and flags that are set on reachability? or solely on the _non_ answer to ping?
Thanks a lot for your assistance.
Regards,Bernard
Scapy forged packet (I know the source MAC does not match the client MAC, but I deem this good enough for testing, AFAIK it is a legal packet):dhcp_request = Ether(dst='ff:ff:ff:ff:ff:ff')/IP(src='0.0.0.0', dst='255.255.255.255')/UDP(dport=67, sport=68)/BOOTP(xid=RandInt())/DHCP(options=[('message-type', 'request'),("server_id","192.168.1.1"),("requested_addr","192.168.1.34"),("hostname","Scapy"), 'end'])
dhcp_ack = srp1(dhcp_request, iface='enp9s0')
Simon Kelley
2018-09-18 22:41:03 UTC
Permalink
Post by Bernard CLABOTS
Hi all,
   I have been trying to replicate an issue of IP conflict on Open-WRT,
the issue is randomly seen, and I expect in real life, it is related to
a de-sync of the lease data base with the actual situation (in case a
switch is between the client and the server and the server is rebooted
e.g., so that the client acts as though it would have a fixed IP.
Reported as seen as well when moving a client from one setup to another
setup where the IP that it used to receive is used on the LAN).
   I tested with 2 different versions of dnsmasq (2.78 and 2.79).
   I use Scapy to forge DHCP Requests (see further).
I have a laptop with a fixed IP inside the range of the DHCP
(192.168.1.0/26). I then forge a Request of that IP using scapy and I
1. I see no ARP whatsoever to the requested IP when DNSMasq handles the
request.
2. When I request the fixed IP for a client with a random MAC, I
instantly receive an ACK, then I see some unanswered ARP requests
(*after*) as to "who has [IP just assigned]? Tell 192.168.1.1" where
192.168.1.1 is the DHCP server IP.
I end up in a situation where the dhcp.leases contains the fake MAC
associated to the lease, while the ARP table contains the MAC of the
fixed IP laptop (probably because I'm not sending any IP packet where
the IP is associated to the fake MAC, so the switch cannot learn it).
I have observed that Windows 10 has a mechanism to prevent conflicts
where, whenever a fixed IP is used/configured, after the link is up an
ARP probe is sent with its own IP. In case it gets answered, the client
keeps silent and start using a link local IPv4 (169....). Yet I have
tested with a very old laptop running Windows 3.1 and I can replicate
the issue.
But basically, it is puzzling that the device is ARPing *after* the DHCP
distributed the IP.
*The all issue seems to boil down to:* why does DNSMasq not check if the
IP is free before assigning it?
I thought that unless option "-5" or "--no-ping" was set, DNSMasq would
always ping once to the assigned IP *before* assignment (I controlled in
the code and see that actually, there is a mechanism to store the
positive identification as well as to blacklist IP's in case a client is
constantly coming back).
The only ARP I see in this case is *after* the IP is assigned. How come
DNSMasq is not trying to ping before assignment? Is there an option to
force this behavior (from the code I guess not)? Is DNSMasq also somehow
relying on the ARP table and flags that are set on reachability? or
solely on the _non_ answer to ping?
Thanks a lot for your assistance.
Regards,
Bernard
Scapy forged packet (I know the source MAC does not match the client
dhcp_request = Ether(dst='ff:ff:ff:ff:ff:ff')/IP(src='0.0.0.0',
dst='255.255.255.255')/UDP(dport=67,
sport=68)/BOOTP(xid=RandInt())/DHCP(options=[('message-type',
'request'),("server_id","192.168.1.1"),("requested_addr","192.168.1.34"),("hostname","Scapy"),
'end'])
dhcp_ack = srp1(dhcp_request, iface='enp9s0')
There are two reasons why you are not seeing ARP requests from dnsmasq.

1) DHCP servers (as opposed to clients) use ICMP echo-request AKA ping,
and not ARP to check for address-in-use.

2) Dnsmasq does the ping-check during the DHCPDISCOVER/DHCPOFFER phase
of the protocol, not the DHCPREQUEST/DHCPACK phase.

It's possible for a DHCPREQUEST to create a new entry in the leases
file, but only if the dhcp-authoritative flag is set. In that case when
dnsmasq sees what looks like a lease renewal for a lease it doesn't know
about, it assumes the lease database was lost, and does the renewal
anyway, recreating the lease entry as it does. Without
dhcp-authoritative, it follows the RFC-approved route of replying with
DHCPNAK, which forces the client to go through the whole
DHCPDISCOVER/DHCPOFFER phase, and that does the ping-check.

I don't think it would be possible to make this hack any more safe by
doing the ping-check before re-creating the lease. The whole premise is
that the client attempting to renew is already configured, but the
server lost track of it. So doing a ping-check would expect to get a
reply from the already-configured client.


Cheers,

Simon.
Bernard CLABOTS
2018-09-19 10:20:48 UTC
Permalink
Thanks a lot for this answer.
Indeed, it is a special case as we have a simple two way Request/ACK, this is also what is seen with some implementations when quickly unplugging/re-plugging the cable, it is legal AFAIK.
I also agree on the necessity to be efficient in case of loss of the lease dB.
Yet reading the RFC-2131, I saw:      If the client's request is invalid (e.g., the client has moved
      to a new subnet), servers SHOULD respond with a DHCPNAK message to
      the client. Servers SHOULD NOT respond if their information is not
      guaranteed to be accurate.  For example, a server that identifies a
      request for an expired binding that is owned by another server SHOULD
      NOT respond with a DHCPNAK unless the servers are using an explicit
      mechanism to maintain coherency among the servers.

Referring to the first sentence, I agree it is only a should. Though, the next sentence is, according to your explanation, also relevant in this case, so DNSMasq should not respond if the information is not guaranteed to be accurate. Which also means that changing the authoritative flag, we risk to end up in the exemplified case where DNSMasq cannot guarantee that the requested IP is belonging to another DHCP Server, so it should not NAK and we are going in circles...We can of course discuss whether the Request is invalid simply because that IP is currently used by another device while not even assigned through DHCP. I would argue that the DNSMasq code explicitly accept that requesting the IP of the server fulfills this condition, which IMHO is a similar case.
Anyhow, moving forward to resolve the issue I face, is there any way to force the RFC behavior of NAK-ing and forcing the 4 way exchange?
Thanks a lot!Regards,Bernard
Post by Bernard CLABOTS
Hi all,
   I have been trying to replicate an issue of IP conflict on Open-WRT,
the issue is randomly seen, and I expect in real life, it is related to
a de-sync of the lease data base with the actual situation (in case a
switch is between the client and the server and the server is rebooted
e.g., so that the client acts as though it would have a fixed IP.
Reported as seen as well when moving a client from one setup to another
setup where the IP that it used to receive is used on the LAN).
   I tested with 2 different versions of dnsmasq (2.78 and 2.79).
   I use Scapy to forge DHCP Requests (see further).
I have a laptop with a fixed IP inside the range of the DHCP
(192.168.1.0/26). I then forge a Request of that IP using scapy and I
1. I see no ARP whatsoever to the requested IP when DNSMasq handles the
request.
2. When I request the fixed IP for a client with a random MAC, I
instantly receive an ACK, then I see some unanswered ARP requests
(*after*) as to "who has [IP just assigned]? Tell 192.168.1.1" where
192.168.1.1 is the DHCP server IP.
I end up in a situation where the dhcp.leases contains the fake MAC
associated to the lease, while the ARP table contains the MAC of the
fixed IP laptop (probably because I'm not sending any IP packet where
the IP is associated to the fake MAC, so the switch cannot learn it).
I have observed that Windows 10 has a mechanism to prevent conflicts
where, whenever a fixed IP is used/configured, after the link is up an
ARP probe is sent with its own IP. In case it gets answered, the client
keeps silent and start using a link local IPv4 (169....). Yet I have
tested with a very old laptop running Windows 3.1 and I can replicate
the issue.
But basically, it is puzzling that the device is ARPing *after* the DHCP
distributed the IP.
*The all issue seems to boil down to:* why does DNSMasq not check if the
IP is free before assigning it?
I thought that unless option "-5" or "--no-ping" was set, DNSMasq would
always ping once to the assigned IP *before* assignment (I controlled in
the code and see that actually, there is a mechanism to store the
positive identification as well as to blacklist IP's in case a client is
constantly coming back).
The only ARP I see in this case is *after* the IP is assigned. How come
DNSMasq is not trying to ping before assignment? Is there an option to
force this behavior (from the code I guess not)? Is DNSMasq also somehow
relying on the ARP table and flags that are set on reachability? or
solely on the _non_ answer to ping?
Thanks a lot for your assistance.
Regards,
Bernard
Scapy forged packet (I know the source MAC does not match the client
dhcp_request = Ether(dst='ff:ff:ff:ff:ff:ff')/IP(src='0.0.0.0',
dst='255.255.255.255')/UDP(dport=67,
sport=68)/BOOTP(xid=RandInt())/DHCP(options=[('message-type',
'request'),("server_id","192.168.1.1"),("requested_addr","192.168.1.34"),("hostname","Scapy"),
'end'])
dhcp_ack = srp1(dhcp_request, iface='enp9s0')
There are two reasons why you are not seeing ARP requests from dnsmasq.

1) DHCP servers (as opposed to clients) use ICMP echo-request AKA ping,
and not ARP to check for address-in-use.

2) Dnsmasq does the ping-check during the DHCPDISCOVER/DHCPOFFER phase
of the protocol, not the DHCPREQUEST/DHCPACK phase.

It's possible for a DHCPREQUEST to create a new  entry in the leases
file, but only if the dhcp-authoritative flag is set. In that case when
dnsmasq sees what looks like a lease renewal for a lease it doesn't know
about, it assumes the lease database was lost, and does the renewal
anyway, recreating the lease entry as it does. Without
dhcp-authoritative, it follows the RFC-approved route of replying with
DHCPNAK, which forces the client to go through the whole
DHCPDISCOVER/DHCPOFFER phase, and that does the ping-check.

I don't think it would be possible to make this hack any more safe by
doing the ping-check before re-creating the lease. The whole premise is
that the client attempting to renew is already configured, but the
server lost track of it. So doing a ping-check would expect to get a
reply from the already-configured client.


Cheers,

Simon.
Simon Kelley
2018-09-19 11:52:37 UTC
Permalink
Post by Bernard CLABOTS
Thanks a lot for this answer.
Indeed, it is a special case as we have a simple two way Request/ACK,
this is also what is seen with some implementations when quickly
unplugging/re-plugging the cable, it is legal AFAIK.
I also agree on the necessity to be efficient in case of loss of the
lease dB.
      If the client's request is invalid (e.g., the client has moved
      to a new subnet), servers SHOULD respond with a DHCPNAK message to
      the client. Servers SHOULD NOT respond if their information is not
      guaranteed to be accurate.  For example, a server that identifies a
      request for an expired binding that is owned by another server SHOULD
      NOT respond with a DHCPNAK unless the servers are using an explicit
      mechanism to maintain coherency among the servers.
**//___^Referring to the first sentence, I agree it is only a should.
Though, the next sentence is, according to your explanation, also
relevant in this case, so DNSMasq should not respond if the information
is not guaranteed to be accurate. Which also means that changing the
authoritative flag, we risk to end up in the exemplified case where
DNSMasq cannot guarantee that the requested IP is belonging to another
DHCP Server, so it should not NAK and we are going in circles...
We can of course discuss whether the Request is invalid simply because
that IP is currently used by another device while not even assigned
through DHCP. I would argue that the DNSMasq code explicitly accept that
requesting the IP of the server fulfills this condition, which IMHO is a
similar case.
**//___^
Anyhow, moving forward to resolve the issue I face, is there any way to
force the RFC behavior of NAK-ing and forcing the 4 way exchange?
If you don't set dhcp-authoritative, then the client will eventually
move to the four-way exchange, but it may take some time, as it involves
time-outs. The reason for this is that the dnsmasq server has to assume
there are other DHCP servers on the network which may hold a lease for
the client.

The differences in behaviour are these.

Without dhcp-authoritative:

1) A client sending DHCPREQUEST in init-reboot state which doesn't have
a lease in the database will be ignored.

2) A client sending a DHCPREQUEST in rebind mode which doesn't have a
lease in the database will be ignored. In renew mode (ie unicast
request) it will get a DHCPNAK.

3) A client sending a request with the wrong server-id will be ignored.

With dhcp-authoritative

1) A client sending DHCPREQUEST in init-reboot state which doesn't have
a lease will have the lease created

2) A client sending a DHCPREQUEST in renew or rebind mode which doesn't
have a lease in the database will have a lease created.

3) A client sending a request in INIT_REBOOT or SELECTING state with
the wrong server-id will get a DHCPNAK.


Cheers,

Simon.
Bernard CLABOTS
2018-10-05 11:01:12 UTC
Permalink
Hi Simon,   Sorry to come back to this question again. You wrote:"I don't think it would be possible to make this hack any more safe by
doing the ping-check before re-creating the lease. The whole premise is
that the client attempting to renew is already configured, but the
server lost track of it. So doing a ping-check would expect to get a
reply from the already-configured client."

Well, actually, I see at least two ways to improve that behavior, especially starting from the fact that assuming that the lease dB was corrupted is weird and not safe:
1. ARP the IP to see if you receive any answer other than the client asking the IP (more efficient but fails if using proxy arp).
2. PING the IP and expect to receive an answer from that same MAC (might fail if device is set to not answer ping).
if not, then reject and force 4 way. Indeed, if you receive no answer, you can no longer assume that this is the legit owner of the IP. Anyway, if the client is legitimate, the algorithm will most probably give him the same IP.
The main drawback is an increased handling time, but that only occurs if the lease is unknown.
Post by Bernard CLABOTS
Thanks a lot for this answer.
Indeed, it is a special case as we have a simple two way Request/ACK,
this is also what is seen with some implementations when quickly
unplugging/re-plugging the cable, it is legal AFAIK.
I also agree on the necessity to be efficient in case of loss of the
lease dB.
      If the client's request is invalid (e.g., the client has moved
      to a new subnet), servers SHOULD respond with a DHCPNAK message to
      the client. Servers SHOULD NOT respond if their information is not
      guaranteed to be accurate.  For example, a server that identifies a
      request for an expired binding that is owned by another server SHOULD
      NOT respond with a DHCPNAK unless the servers are using an explicit
      mechanism to maintain coherency among the servers.
**//___^Referring to the first sentence, I agree it is only a should.
Though, the next sentence is, according to your explanation, also
relevant in this case, so DNSMasq should not respond if the information
is not guaranteed to be accurate. Which also means that changing the
authoritative flag, we risk to end up in the exemplified case where
DNSMasq cannot guarantee that the requested IP is belonging to another
DHCP Server, so it should not NAK and we are going in circles...
We can of course discuss whether the Request is invalid simply because
that IP is currently used by another device while not even assigned
through DHCP. I would argue that the DNSMasq code explicitly accept that
requesting the IP of the server fulfills this condition, which IMHO is a
similar case.
**//___^
Anyhow, moving forward to resolve the issue I face, is there any way to
force the RFC behavior of NAK-ing and forcing the 4 way exchange?
If you don't set dhcp-authoritative, then the client will eventually
move to the four-way exchange, but it may take some time, as it involves
time-outs. The reason for this is that the dnsmasq server has to assume
there are other DHCP servers on the network which may hold a lease for
the client.

The differences in behaviour are these.

Without dhcp-authoritative:

1) A client sending DHCPREQUEST in init-reboot state which doesn't have
a lease in the database will be ignored.

2) A client sending a DHCPREQUEST in rebind mode which doesn't have a
lease in the database will be ignored. In renew mode (ie unicast
request) it will get a DHCPNAK.

3) A client sending a request with the wrong server-id will be ignored.

With dhcp-authoritative

1) A client sending DHCPREQUEST in init-reboot state which doesn't have
a lease will have the lease created

2) A client sending a DHCPREQUEST in renew or rebind mode which doesn't
have a lease in the database will have a lease created.

3)  A client sending a request in INIT_REBOOT or SELECTING state with
the wrong server-id will get a DHCPNAK.


Cheers,

Simon.
Bernard CLABOTS
2018-10-08 10:39:27 UTC
Permalink
Hi Simon,   I am definitely not a DHCP expert and I don't want either to become a pain. Yet...   I can read that not only the behavior is not inline with RFC, but it even contradicts the RFC:
"3.2 Client-server interaction - reusing a previously allocated network address If a client remembers and wishes to reuse a previously allocated
network address, a client may choose to omit some of the steps
described in the previous section. The timeline diagram in figure 4
shows the timing relationships in a typical client-server interaction
for a client reusing a previously allocated network address."

=> So My iPhone is legit.

"Servers with knowledge of the client's configuration parameters
respond with a DHCPACK message to the client. Servers SHOULD NOT
check that the client's network address is already in use; the
client may respond to ICMP Echo Request messages at this point."
=> Invalidates the fix you did in 2017:"
| commit 5ce3e76fbf89e942e8c54ef3e3389facf0d9067a |
| Author: Simon Kelley <***@thekelleys.org.uk> |
| Date:   Fri Apr 28 22:14:20 2017 +0100 |
|   |
|     DHCPv4: do ICMP-ping check in all cases other that current lease. |

"=> This is a real Bug affecting the current behavior. I would really appreciate that you unlink the authoritative link to always breaking this statement:"Servers SHOULD NOT respond if their information is not _guaranteed_ to be accurate. For example, a server that identifies a
request for an expired binding that is owned by another server SHOULD
NOT respond with a DHCPNAK unless the servers are using an explicit
mechanism to maintain coherency among the servers."

I agree that the example exposes a philosophy similar to what you implemented, in the sense that being an authoritative server somehow fulfils the second part of the example, but the example is just an example. IMHO not having a trace of the lease in combination with being authoritative should/could be interpreted as a potential Rogue client attempting a MIM attack which is actually an accurate info to be used to NAK the request, or arguably not answer at all. Could you at least make this behavior configurable? Thanks a lot!!!

I am not sure I understand what the implications might be that you fear in NAK-ing the request.

Thanks a lot!

Regards,
Bernard

   
Simon Kelley
2018-10-15 23:26:28 UTC
Permalink
Post by Bernard CLABOTS
=> So My iPhone is legit.
"Servers with knowledge of the client's configuration parameters
respond with a DHCPACK message to the client. Servers SHOULD NOT
check that the client's network address is already in use; the
client may respond to ICMP Echo Request messages at this point."
"
commit 5ce3e76fbf89e942e8c54ef3e3389facf0d9067a
Date:   Fri Apr 28 22:14:20 2017 +0100
 
    DHCPv4: do ICMP-ping check in all cases other that current lease.
"
This was partially reverted in 1d224949cced9e82440d00b3dbaf32c262bac2ff
Loading...