[Dnsmasq-discuss] A (possibly bad) idea: failover in dnsmasq

Discussion:

Jan-Piet Mens

2012-05-25 11:17:57 UTC

Starting just a few days before the day the machine running dnsmasq in
my SOHO died, I was giving some thought to how I'd go about ensuring
a backup copy of dnsmasq could take over if my only running instance
died. Needless to say, the death of the machine left my small network in
shambles, because I couldn't connect to anything to fix things without
first configuring temporary static addresses; sans DHCP, stuff fails... :)

I'm anything but a DHCP specialist, but I want to bounce this idea off
you anyway, even if you mind. ;-)

The trick, as I understand it, in setting up more than a single dnsmasq
instance in a network, is to ensure that it uses --dhcp-script to STORE
the leases and --leasefile-ro to force the script to produce a list of
current leases ("init") from which a launching dnsmasq obtains its data
before going on its usual business.

If we were able to ensure the "data store" (i.e. lease database) were
available on two machines A and B (and up to date on both of course) the
solution would be easy, except for the fact that dnsmasq does not LOOKUP
(i.e. query) for a lease in the data store except upon startup.

I'm thinking along the lines of having a function lease_query() in
lease.c which dnsmasq invokes to determine whether a lease exists before
issuing a new lease for a device.

Being very lightweight, dnsmasq must not be bloated by having a huge
MySQL or other database attached to it. I've been searching the
Internets and finally landed upon Tokyo Tryant [1] which I've discussed a
long time ago [2].

What I'm basically getting at is providing dnsmasq with an optional very
lightweight replicating server which it (optionally) uses to ensure the
lease database can be propagated to a second (or third or fourth)
dnsmasq instance. The reason I'm suggesting Tryant is that, it too, is
lightweight and offers multi-master setups.

+------------+ +-------------+
| dnsmasq | | dnsmasq |
| A | | B |
+-----+------+ +-------------+
| +
| |
| |
+-----v-------+ +------v-------+
| Tryant | | Tryant |
| A |+---------------------> B |
| |<---------------------+ |
+-------------+ +--------------+

+-------------+ +---------------+
| leases | | leases |
|-------------| |---------------|
+-------------+ +---------------+

In other words, dnsmasq (A) reads/writes leases from Tryant (A) and
dnsmasq (B) read/writes from/to Tryant (B). If Tryant (A) and (B) can
speak to eachother, the database is replicated, irrespective of which
dnsmasq (A) or (B) has last written a lease.

I'll stop here, before boring you even more, but I'll gladly send you
snippets of code and a short "howto" set up a multi-master system. Most
important IMO is to keep things very light-weight in the spirit of
dnsmasq.

Best regards,

-JP

[1] http://fallabs.com/tokyocabinet/tokyoproducts.pdf
[2] http://jpmens.net/2009/09/06/tokyocabinet-a-wow-replacement-for-dbm/

Jan-Piet Mens

2012-05-25 12:03:16 UTC

Permalink

1,$s/Tryant/Tyrant/g

-JP

Simon Kelley

2012-05-25 15:26:25 UTC

Permalink

Post by Jan-Piet Mens
Starting just a few days before the day the machine running dnsmasq in
my SOHO died, I was giving some thought to how I'd go about ensuring
a backup copy of dnsmasq could take over if my only running instance
died. Needless to say, the death of the machine left my small network in
shambles, because I couldn't connect to anything to fix things without
first configuring temporary static addresses; sans DHCP, stuff fails... :)
I'm anything but a DHCP specialist, but I want to bounce this idea off
you anyway, even if you mind. ;-)
The trick, as I understand it, in setting up more than a single dnsmasq
instance in a network, is to ensure that it uses --dhcp-script to STORE
the leases and --leasefile-ro to force the script to produce a list of
current leases ("init") from which a launching dnsmasq obtains its data
before going on its usual business.
If we were able to ensure the "data store" (i.e. lease database) were
available on two machines A and B (and up to date on both of course) the
solution would be easy, except for the fact that dnsmasq does not LOOKUP
(i.e. query) for a lease in the data store except upon startup.
I'm thinking along the lines of having a function lease_query() in
lease.c which dnsmasq invokes to determine whether a lease exists before
issuing a new lease for a device.
Being very lightweight, dnsmasq must not be bloated by having a huge
MySQL or other database attached to it. I've been searching the
Internets and finally landed upon Tokyo Tryant [1] which I've discussed a
long time ago [2].
What I'm basically getting at is providing dnsmasq with an optional very
lightweight replicating server which it (optionally) uses to ensure the
lease database can be propagated to a second (or third or fourth)
dnsmasq instance. The reason I'm suggesting Tryant is that, it too, is
lightweight and offers multi-master setups.
+------------+ +-------------+
| dnsmasq | | dnsmasq |
| A | | B |
+-----+------+ +-------------+
| +
| |
| |
+-----v-------+ +------v-------+
| Tryant | | Tryant |
| A |+---------------------> B |
| |<---------------------+ |
+-------------+ +--------------+
+-------------+ +---------------+
| leases | | leases |
|-------------| |---------------|
+-------------+ +---------------+
In other words, dnsmasq (A) reads/writes leases from Tryant (A) and
dnsmasq (B) read/writes from/to Tryant (B). If Tryant (A) and (B) can
speak to eachother, the database is replicated, irrespective of which
dnsmasq (A) or (B) has last written a lease.
I'll stop here, before boring you even more, but I'll gladly send you
snippets of code and a short "howto" set up a multi-master system. Most
important IMO is to keep things very light-weight in the spirit of
dnsmasq.
Best regards,
-JP

It's necessary to decide what you're trying to achieve for failover. If
you want a system which just transparently keeps working when a DHCP
server fails, then the ISC server is the best bet, without a doubt.
Let's assume you don't want that, but don't want to be dead in the water
when a machine running dnsmasq fails.

The first thing to note is that DHCP sort of keeps working anyway. Even
if the server goes down and the lease database is lost, the clients will
continue to work until the leases expire. What's more, if they get
towards the end of the lease period without contacting the DHCP server
that gave them a lease, they'll broadcast and accept a renewal from any
server. This works now. If you set the lease time to 2 days, and then
take down the dnsmasq server, you have a day to bring up dnsmasq on
another machine before any client loses network connectivity, and once
that second server is up, its lease database will gradually populate
with the all the clients that were in the old database,
_at_the_same_IP_addresses_.

The problem with this, is that until a client talks to the new server
and appears in the new lease database, it effectively disappears from
the DNS. That's what will break things and why preserving a copy of the
lease database is useful.

The above applies to active-passive. Active-active, as you suggest, is
more complex, because either server can talk to a client, so things like
lease times have to be co-ordinated. This is what the ISC failover
protocol does, I believe.

For dnsmasq, I can see that active-passive is easy to do. Take your
diagram above, and delete dnsmasq B. dnsmasq A keeps the tryant instance
A up-to-date with the lease database and that gets replicated to tyrant
B. If dnsmasq A fails, then dnsmasq B is started, intialises its lease
database from the tyrant B and is there for clients as they fail to talk
to dnsmasq A and start to broadcast. More important dnsmasq B can
provide a DNS service with all the clients in it straight away.

This active-passive scheme shouldn't need any dnsmasq changes, and
arranging to monitor server instances and start a new one when an
existing one goes down is a solved problem: it's exactly what heartbeat
does.

Building a heartbeat harness to run dnsmasq active-passive and
replicated tyrant (or another database) sure looks like a useful thing
to try, IMHO.

Simon.

Jan-Piet Mens

2012-05-26 08:17:29 UTC

Permalink

Post by Simon Kelley
For dnsmasq, I can see that active-passive is easy to do. Take your
diagram above, and delete dnsmasq B. dnsmasq A keeps the tryant instance
A up-to-date with the lease database and that gets replicated to tyrant
B. If dnsmasq A fails, then dnsmasq B is started, intialises its lease
database from the tyrant B and is there for clients as they fail to talk
to dnsmasq A and start to broadcast. More important dnsmasq B can
provide a DNS service with all the clients in it straight away.

Understood.

Post by Simon Kelley
This active-passive scheme shouldn't need any dnsmasq changes, and
arranging to monitor server instances and start a new one when an
existing one goes down is a solved problem: it's exactly what heartbeat
does.
Building a heartbeat harness to run dnsmasq active-passive and
replicated tyrant (or another database) sure looks like a useful thing
to try, IMHO.

I'll give that a bit of thought. (/dev/rob0's suggestion of using SQLite
is suddenly more appealing in this light, as it involves fewer moving
parts...)

-JP

/dev/rob0

2012-05-25 18:23:08 UTC

Permalink

Post by Jan-Piet Mens
Being very lightweight, dnsmasq must not be bloated by having
a huge MySQL or other database attached to it.

I'd suggest SQLite as a possibility. Easy to include, and as they
say: "Small. Fast. Reliable. Choose any three."

http://sqlite.org/

I'm not sure how/if this would help with the goal of failover, but
I think it might be worth considering if there is to be external
database/storage for dnsmasq.

--
http://rob0.nodns4.us/ -- system administration and consulting
Offlist GMX mail is seen only if "/dev/rob0" is in the Subject:

Jan-Piet Mens

2012-05-25 20:08:19 UTC

Permalink

Post by /dev/rob0
I'd suggest SQLite as a possibility. Easy to include, and as they
say: "Small. Fast. Reliable. Choose any three."

SQLite was my first option, but it doesn't replicate "automatically".
Easy to set up with rsync or something like it, of course, but that
wouldn't enable two dnsmasq servers to consult the same live data.

-JP

Vincent Cadet

2012-05-26 09:24:51 UTC

Permalink

Post by Simon Kelley

Post by Simon Kelley
This active-passive scheme shouldn't need any dnsmasq

changes, and

Post by Simon Kelley
arranging to monitor server instances and start a new

one when an

Post by Simon Kelley
existing one goes down is a solved problem: it's

exactly what heartbeat

Post by Simon Kelley
does.
Building a heartbeat harness to run dnsmasq

active-passive and

Post by Simon Kelley
replicated tyrant (or another database) sure looks like

a useful thing

Post by Simon Kelley
to try, IMHO.

What if there be a heartbeat link in dnsmasq through which the active dnsmasq would stream changes (or the whole block of data) to the passive instance along with keep-alive probes? Something similar to Postgres streaming replication in fact. An interruption in the stream for more than a programmed delay would then be interpreted as a fail-over request. The link would be a socket, serial link, whatever.

Vincent

Simon Kelley

2012-05-26 10:35:00 UTC

Permalink

Post by Vincent Cadet

Post by Simon Kelley

Post by Simon Kelley
This active-passive scheme shouldn't need any dnsmasq

changes, and

Post by Simon Kelley
arranging to monitor server instances and start a new

one when an

Post by Simon Kelley
existing one goes down is a solved problem: it's

exactly what heartbeat

Post by Simon Kelley
does.
Building a heartbeat harness to run dnsmasq

active-passive and

Post by Simon Kelley
replicated tyrant (or another database) sure looks like

a useful thing

Post by Simon Kelley
to try, IMHO.

What if there be a heartbeat link in dnsmasq through which the active
dnsmasq would stream changes (or the whole block of data) to the
passive instance along with keep-alive probes?

That has attractions: Both dnsmasq instances could provide DNS service
at all times, and whichever was "master" could provide DHCP, whilst the
"slave" just keeps it's database up-to-date. The main problem with this
is the "split brain" scenario, where both instances are up, but they
can't talk to each other because the network between them is
partitioned. In that case both acting as masters for their half of the
network is fine, the problem comes when connectivity returns and the
lease databases have to be reconciled....

Post by Vincent Cadet
Something similar to
Postgres streaming replication in fact. An interruption in the stream
for more than a programmed delay would then be interpreted as a
fail-over request. The link would be a socket, serial link,
whatever.

Worth thinking about....

Simon.

Vincent Cadet

2012-05-26 11:26:24 UTC

Permalink

--- On Sat 26.5.12, Simon Kelley wrote :
...

Post by Vincent Cadet

Post by Vincent Cadet
What if there be a heartbeat link in dnsmasq through

which the active

Post by Vincent Cadet
dnsmasq would stream changes (or the whole block of

data) to the

Post by Vincent Cadet
passive instance along with keep-alive probes?

That has attractions: Both dnsmasq instances could provide
DNS service at all times, and whichever was "master" could
provide DHCP, whilst the "slave" just keeps it's database
up-to-date. The main problem with this is the "split brain"
scenario, where both instances are up, but they can't talk
to each other because the network between them is
partitioned. In that case both acting as masters for their
half of the network is fine, the problem comes when
connectivity returns and the lease databases have to be
reconciled....

Hmmm... a failed dnsmasq could request all the changes that occurred since its last failure from its peer(s). Newer records overwrite older ones. Expired leases and records are to be removed [or overwritten according to the received data block that was requested].

Since machines with a lease send their requests to only one dnsmasq instance, lease and record reconciliation should be rather straight forward IMHO and all records from all dnsmasq peers can be merged in decreasing order of expiry date.

That would also suggest each dnsmasq instance maintains a "dirty" state flag until its database is completely in sync with others.

What needs to be done, I guess, is that the "dirty" dnsmasq instance that recovers connection from his other peers must immediately switch to non-authoritative mode and return to passive mode, handing over (or forwarding) its [live] DNS requests to the "master" instance. No DHCP requests should be answered.

If the network connectivity is restored before the failed dnsmasq instance runs again then the latter switches to "dirty" state and non authoritative mode, syncing its database with his other peers.

This implies that a non master dnsmasq should still be able to receive DNS requests. There's a choice here. Either reply directly or forward them to the new dnsmasq master. Could be a mix of both: directly answer requests, which the slave knows aren't yet replicated with the master.

The complete handshake protocol would require that a dnsmasq instance notifies the requesting peer that the sync is complete so that it can switch to "non-dirty and passive" state.

I haven't thought thoroughly, it's just a rough idea for the moment.

Vincent

Simon Kelley

2012-05-26 12:01:39 UTC

Permalink

--- On Sat 26.5.12, Simon Kelley wrote : ...

Post by Vincent Cadet

Post by Vincent Cadet
What if there be a heartbeat link in dnsmasq through

which the active

Post by Vincent Cadet
dnsmasq would stream changes (or the whole block of

data) to the

Post by Vincent Cadet
passive instance along with keep-alive probes?

That has attractions: Both dnsmasq instances could provide DNS
service at all times, and whichever was "master" could provide
DHCP, whilst the "slave" just keeps it's database up-to-date. The
main problem with this is the "split brain" scenario, where both
instances are up, but they can't talk to each other because the
network between them is partitioned. In that case both acting as
masters for their half of the network is fine, the problem comes
when connectivity returns and the lease databases have to be
reconciled....

Hmmm... a failed dnsmasq could request all the changes that occurred
since its last failure from its peer(s). Newer records overwrite
older ones. Expired leases and records are to be removed [or
overwritten according to the received data block that was
requested].
Since machines with a lease send their requests to only one dnsmasq
instance, lease and record reconciliation should be rather straight
forward IMHO and all records from all dnsmasq peers can be merged in
decreasing order of expiry date.
That would also suggest each dnsmasq instance maintains a "dirty"
state flag until its database is completely in sync with others.
What needs to be done, I guess, is that the "dirty" dnsmasq instance
that recovers connection from his other peers must immediately switch
to non-authoritative mode and return to passive mode, handing over
(or forwarding) its [live] DNS requests to the "master" instance. No
DHCP requests should be answered.
If the network connectivity is restored before the failed dnsmasq
instance runs again then the latter switches to "dirty" state and non
authoritative mode, syncing its database with his other peers.
This implies that a non master dnsmasq should still be able to
receive DNS requests. There's a choice here. Either reply directly or
directly answer requests, which the slave knows aren't yet replicated
with the master.
The complete handshake protocol would require that a dnsmasq instance
notifies the requesting peer that the sync is complete so that it can
switch to "non-dirty and passive" state.
I haven't thought thoroughly, it's just a rough idea for the moment.

OK, here's my back-of-envelope suggestion, with minimal reference to yours.

Dnsmasq instances can be configured as either primary or secondary.

Primary behaviour:

Work pretty much as usual except that we accept connections from
secondaries. When a secondary connects, it sends its current idea of
the lease database to the primary. The primary merges that with its own
lease database and sends the result back to the secondary. It then
serves DHCP requests as normal and sends incremental changes to the
lease database to any connected secondary.

Secondary behaviour.

At start up, load the lease database from local disk as usual, then
attempt to connect to our configured primary. If this succeeds, do the
lease database swap described above then enter secondary-passive mode
where DNS queries are answered but not DHCP requests. If the primary
connection cannot be established or fails, enter secondary-active mode
where DHCP requests are answered. Try to contact the primary a regular
intervals. When the link to the primary comes back, do the
lease-database exchange, and then go back to secondary-passive mode.

The secondary-primary connections will be over TCP, or possibly SCTP.

Configuration on a primary looks like

--failover-listen= <port no>

Configuration on a secondary looks like

--failover-master=<IP of primary>,<port on primary>

Need to wonder about security, since connections to the primary can mess
with things.

This only works with one primary and one secondary: if there are
multiple secondaries they'll all become active when the primary dies,
which is wrong.

Cheers,

Simon.

r***@gmail.com

2012-05-27 03:18:21 UTC

Permalink

Post by Simon Kelley
Configuration on a primary looks like
--failover-listen= <port no>
Configuration on a secondary looks like
--failover-master=<IP of primary>,<port on primary>

I think more consideration should go into the configuration command
names, since putting a "fallover-master" option on a secondary is
counter-intuitive. After all, one doesn't put a "dhcp-authoritative"
option on non-authoritative servers to tell them where to find the
authoritative server. Also, shouldn't the standby/failover behavior
be linked to authoritative?

Don Muller

2012-05-27 12:58:05 UTC

Permalink

I could be way off base here but here is my 2 cents.

Maybe a better idea is to have all dnsmasq instances talking to each other listing each one with something like

partner=<ip or dns name>
partner=<ip or dns name>

Also add two more statements. One for the primary and one for the secondaries.

primary=yes

secondary=1 or 2 or 3 etc

Each secondary has a differenet number and when the primary fails the secondary with the lowest number takes over until the primary comes back online. You could say that master=0.

Maybe add a heartbeat statement that specifies how often the master will send keepalive messages out so everyone else knows he is still alive and well.

Don

Post by r***@gmail.com

Post by Simon Kelley
Configuration on a primary looks like
--failover-listen= <port no>
Configuration on a secondary looks like
--failover-master=<IP of primary>,<port on primary>

Simon Kelley

2012-05-28 09:14:14 UTC

Permalink

Post by Don Muller
I could be way off base here but here is my 2 cents.
Maybe a better idea is to have all dnsmasq instances talking to each other listing each one with something like
partner=<ip or dns name>
partner=<ip or dns name>
Also add two more statements. One for the primary and one for the secondaries.
primary=yes
secondary=1 or 2 or 3 etc
Each secondary has a differenet number and when the primary fails the secondary with the lowest number takes over until the primary comes back online. You could say that master=0.
Maybe add a heartbeat statement that specifies how often the master will send keepalive messages out so everyone else knows he is still alive and well.

My experience if HA systems is that is the complexity increases the
probability that having an HA system _actually_ increases availability,
rather than finding nasty bugs just when you don't need the, decreases
exponentially. I really don't think that it makes sense to try and
support more than a failover pair for dnsmasq, limiting the number of
servers to two makes it much simpler, and reduces the impact of
split-brain. Hard-coding the primary/secondary eliminates all the voting
and priority stuff trivial.

Post by Don Muller
Don

Post by r***@gmail.com

Post by Simon Kelley
Configuration on a primary looks like
--failover-listen= <port no>
Configuration on a secondary looks like
--failover-master=<IP of primary>,<port on primary>

That's a valid argument. How about --failover-from=<address>

Also, shouldn't the standby/failover behavior

Post by Don Muller

Post by r***@gmail.com
be linked to authoritative?

I _think_ authoritative should not be used with failover, but I need to
trace through all the paths to be sure.

Simon.

Post by Don Muller

Post by r***@gmail.com
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss

_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss

r***@gmail.com

2012-05-28 18:51:45 UTC

Permalink

Post by Simon Kelley

Post by r***@gmail.com

Post by Simon Kelley
Configuration on a primary looks like
--failover-listen= <port no>
Configuration on a secondary looks like
--failover-master=<IP of primary>,<port on primary>

That's a valid argument. How about --failover-from=<address>

That sounds much better, less chance of confusion.

Post by Simon Kelley

Post by r***@gmail.com
Also, shouldn't the standby/failover behavior
be linked to authoritative?

I _think_ authoritative should not be used with failover, but I need to
trace through all the paths to be sure.

Probably a candidate for logging a warning if both options are used.

Vincent Cadet

2012-05-26 13:37:31 UTC

Permalink

--- On Sat 26.5.12, Simon Kelley wrote :

Oops I had overlooked there is already such configuration :D Sorry for the noise.

...

Post by Simon Kelley
Need to wonder about security, since connections to the
primary can mess with things.
This only works with one primary and one secondary: if there
are multiple secondaries they'll all become active when the
primary dies, which is wrong.

As soon as a slave detects the master is down, it waits for a random delay (that must always be greater than the biggest latency) before probing the other living slaves, notifying it wants to become a master. Slaves which receive a notification cancel the process of becoming a master. The one slave that has received no notification (in a programmed duration that is common to all slaves and that should be greater than the highest latency) becomes the master; immediately after, it notifies the other slaves it is the master.

I guess every peer should be able to measure the latency between each of his living peers and send notifications to the fastest peer first, in order of increasing latency. To be significant the a mean latency value should be computed more or less like the load average.

The algorithm might even be generalized to become the default startup process. In this case, all you need to do is specify the list of peers and let the farm determine which will be the master. First come first served.

Does that make sense?

Best regards,
Vincent