Discussion:
[Dnsmasq-discuss] TFTP Boot update "for those who find this problem in the future"
Philippe Faure
2009-08-25 01:01:09 UTC
Permalink
It would seem that the network MTU was my limiting factor. With
Simon's Help, we were able to find the problem and solution.

My config file didn't mention (being that it was too old) the switch,
tftp-no-blocksize

Adding it, and restarting dnsmasq, the new system booted straight to
the install page.

I am using a boot client that is part of the motherboard.
MB: Asus, M4N78 Pro
Nvidia Boot Agent version: 249.0542.

Snip from Simon's Email
OK, it looks like the client is asking for a blocksize (ie packetsize)
of 1456 bytes, and that's too big for your network. Because of that the
in the end the client does something really strange which provokes the
"unsupported request" error.
Try adding
tftp-no-blocksize
to /etc/dnsmasq.conf. That will cause dnsmasq to reject the request from
the client for bigger blocks, and may be enough to make it all work.
Alternatively if you can increase the MTU on the network that might fix
things.
Philippe
r***@gmail.com
2009-08-25 04:14:55 UTC
Permalink
I can't think of a single circumstance where a manufacturer-provided
boot PROM would have more appropriate network-specific settings than
the TFTP server configuration.

Maybe tftp-no-blocksize should be set by default (with a
tftp-honor-blocksize to negate it).

But I don't use BOOTP remote booting, so Simon probably has good
reasons for doing things the way they are.
It would seem that the network MTU was my limiting factor.  With
Simon's Help, we were able to find the problem and solution.
My config file didn't mention (being that it was too old) the switch,
tftp-no-blocksize
Adding it, and restarting dnsmasq, the new system booted straight to
the install page.
I am using a boot client that is part of the motherboard.
MB: Asus, M4N78 Pro
Nvidia Boot Agent version: 249.0542.
Snip from Simon's Email
OK, it looks like the client is asking for a blocksize (ie packetsize)
of 1456 bytes, and that's too big for your network. Because of that the
in the end the client does something really strange which provokes the
"unsupported request" error.
Try adding
tftp-no-blocksize
to /etc/dnsmasq.conf. That will cause dnsmasq to reject the request from
the client for bigger blocks, and may be enough to make it all work.
Alternatively if you can increase the MTU on the network that might fix
things.
Philippe
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Simon Kelley
2009-08-25 09:31:00 UTC
Permalink
Post by r***@gmail.com
I can't think of a single circumstance where a manufacturer-provided
boot PROM would have more appropriate network-specific settings than
the TFTP server configuration.
Maybe tftp-no-blocksize should be set by default (with a
tftp-honor-blocksize to negate it).
But I don't use BOOTP remote booting, so Simon probably has good
reasons for doing things the way they are.
Setting tftp-no-blocksize forces 512-byte blocks and makes the
already-slow TFTP transfer three times slower. Since most netbooting
happens over a local net which is a physical ethernet with well-known
MTU, it makes sense for the client to request a blocksize suitable for
that media.

It's not clear to me why the MTU on Philippe's network is smaller, but I
think a small MTU is a fairly rare occurrence. Even when it does
happen, it shouldn't be a show stopper: that takes badly broken client
firmware that has clearly never had any code-paths other than the most
common ones tested.

Cheers,

Simon.
r***@gmail.com
2009-08-25 13:52:23 UTC
Permalink
Post by r***@gmail.com
I can't think of a single circumstance where a manufacturer-provided
boot PROM would have more appropriate network-specific settings than
the TFTP server configuration.
Maybe tftp-no-blocksize should be set by default (with a
tftp-honor-blocksize to negate it).
But I don't use BOOTP remote booting, so Simon probably has good
reasons for doing things the way they are.
Setting tftp-no-blocksize forces 512-byte blocks and makes the already-slow
TFTP transfer three times slower. Since most netbooting happens over a local
net which is a physical ethernet with well-known MTU, it makes sense for the
client to request a blocksize suitable for that media.
Is that 512 adjustable? b/c the local dnsmasq admin can surely make a
better choice than the PROM developer. Plus I think most tcp/ip
stacks automatically determine path MTU, don't know if dnsmasq could
retrieve the value estimated for some other local host on the same
interface as a reasonable default in the absence of configuration.
There's probably no portable way to do that though.

Also, a quick look at the protocol indicates that "only one packet may
be in-flight at a time" but that data packets and acknowledgements all
carry sequence numbers, I'm not sure what exactly about the format
requires stop-and-wait.
It's not clear to me why the MTU on Philippe's network is smaller, but I
 think a small MTU is a fairly rare occurrence. Even when it does happen, it
shouldn't be a show stopper: that takes badly broken client firmware that
has clearly never had any code-paths other than the most common ones tested.
Passing through a switch which adds VLAN marking often causes
fragmentation of maximally sized payloads. Wireless hops could change
MSS as well.

But maybe the best solution would just be to mention tftp-no-blocksize
in the error message as a possible fix.
Cheers,
Simon.
Simon Kelley
2009-08-25 14:07:32 UTC
Permalink
Post by r***@gmail.com
Post by r***@gmail.com
I can't think of a single circumstance where a manufacturer-provided
boot PROM would have more appropriate network-specific settings than
the TFTP server configuration.
Maybe tftp-no-blocksize should be set by default (with a
tftp-honor-blocksize to negate it).
But I don't use BOOTP remote booting, so Simon probably has good
reasons for doing things the way they are.
Setting tftp-no-blocksize forces 512-byte blocks and makes the already-slow
TFTP transfer three times slower. Since most netbooting happens over a local
net which is a physical ethernet with well-known MTU, it makes sense for the
client to request a blocksize suitable for that media.
Is that 512 adjustable? b/c the local dnsmasq admin can surely make a
better choice than the PROM developer.
Sort of. If the client doesn't invoke the blocksize extension, then it
has to be 512. If the client says "I want blocksize x" then the server
can reply "you can have blocksize y" where y<x
Post by r***@gmail.com
Plus I think most tcp/ip
stacks automatically determine path MTU, don't know if dnsmasq could
retrieve the value estimated for some other local host on the same
interface as a reasonable default in the absence of configuration.
There's probably no portable way to do that though.
Path-MTU discovery is turned off for the UDP socket used for TFTP,
because the presence of the don't fraqment bit confuses some PXE ROMs.
Sadly it looks like receiving fragmented packets confuses other PXE ROMS!
Post by r***@gmail.com
Also, a quick look at the protocol indicates that "only one packet may
be in-flight at a time" but that data packets and acknowledgements all
carry sequence numbers, I'm not sure what exactly about the format
requires stop-and-wait.
It's specified in the RFC: the T in TFTP stands for "trivial".
Post by r***@gmail.com
It's not clear to me why the MTU on Philippe's network is smaller, but I
think a small MTU is a fairly rare occurrence. Even when it does happen, it
shouldn't be a show stopper: that takes badly broken client firmware that
has clearly never had any code-paths other than the most common ones tested.
Passing through a switch which adds VLAN marking often causes
fragmentation of maximally sized payloads. Wireless hops could change
MSS as well.
A possible fix for some (but not all) situations is to check the MTU on
the interface handling the TFTP traffic and scale back blocksize
requests to match that.
Post by r***@gmail.com
But maybe the best solution would just be to mention tftp-no-blocksize
in the error message as a possible fix.
Easier said than done: the sequence we saw with the NVIDIA PXE ROM was

PXE asks for data
{
PXE gets data (fragmented) and ignores it
server times out and retries
} repeat
PXE times out and send completely nonsense ACK packet to the wrong port
dnsmasq generates "unsupported request" because it doesn't understand
the packet.

The extent of broken-ness in netboot firmware is astonishing.

Cheers,
Simon.
r***@gmail.com
2009-08-25 17:33:44 UTC
Permalink
Post by Simon Kelley
Post by r***@gmail.com
But maybe the best solution would just be to mention tftp-no-blocksize
in the error message as a possible fix.
Easier said than done: the sequence we saw with the NVIDIA PXE ROM was
PXE asks for data
{
 PXE gets data (fragmented) and ignores it
 server times out and retries
what if, if the very first block times out, dnsmasq automatically
reduces the blocksize when retrying?
Post by Simon Kelley
} repeat
PXE times out and send completely nonsense ACK packet to the wrong port
dnsmasq generates "unsupported request" because it doesn't understand the
packet.
The extent of broken-ness in netboot firmware is astonishing.
Cheers,
Simon.
Simon Kelley
2009-08-25 17:46:59 UTC
Permalink
Post by r***@gmail.com
Post by Simon Kelley
PXE asks for data
{
PXE gets data (fragmented) and ignores it
server times out and retries
what if, if the very first block times out, dnsmasq automatically
reduces the blocksize when retrying?
No, it can't do that because it has no way to re-negotiate the
blocksize with the client and a packet whose size is less than
the blocksize is the EOF marker in TFTP.

Cheers,

Simon.

Philippe Faure
2009-08-25 13:54:37 UTC
Permalink
The issue isn't really with the boot client, but with my network. I
had to pair back the MTU size, so the blocks being handed out are
smaller than what is normal (set to 1400). There is something "fishy"
with my router, ISP and work network, that it wouldn't let me access
the my home server from work. I completely forgot about this
limitation till Simon mentioned blocksizes while debugging this
problem. (I am going to be replacing the router soon).

Because of this limitation, the TFTP had problems. I would suggest to
leave things the way they are, but have the tftp-no-blocksize as an
option. Since my case is the special case, probably not the norm.

Philippe
Post by r***@gmail.com
I can't think of a single circumstance where a manufacturer-provided
boot PROM would have more appropriate network-specific settings than
the TFTP server configuration.
Maybe tftp-no-blocksize should be set by default (with a
tftp-honor-blocksize to negate it).
But I don't use BOOTP remote booting, so Simon probably has good
reasons for doing things the way they are.
It would seem that the network MTU was my limiting factor.  With
Simon's Help, we were able to find the problem and solution.
My config file didn't mention (being that it was too old) the switch,
tftp-no-blocksize
Adding it, and restarting dnsmasq, the new system booted straight to
the install page.
I am using a boot client that is part of the motherboard.
MB: Asus, M4N78 Pro
Nvidia Boot Agent version: 249.0542.
Snip from Simon's Email
OK, it looks like the client is asking for a blocksize (ie packetsize)
of 1456 bytes, and that's too big for your network. Because of that the
in the end the client does something really strange which provokes the
"unsupported request" error.
Try adding
tftp-no-blocksize
to /etc/dnsmasq.conf. That will cause dnsmasq to reject the request from
the client for bigger blocks, and may be enough to make it all work.
Alternatively if you can increase the MTU on the network that might fix
things.
Philippe
_______________________________________________
Dnsmasq-discuss mailing list
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Loading...