I've still been struggling with this problem. It's happened several more times, and the outage lasts until my DHCP lease is near expiration, and my DHCP client falls back to the initial discovery process using broadcast requests. That can be quite a few hours.
The general sequence of events seems to be this:
- Traffic flows along fine.
- At some point, the next-hop (Sonic) router stops responding to ARP requests.
- Traffic continues to flow until the ARP cache entry goes stale.
- IPv4 traffic stops flowing. IPv6 traffic continues unabated.
- At some point hours later, the DHCP lease will be up for renewal. The DHCP server tries to send unicast DHCPREQUEST packets to the next-hop router (which I suspect is actually a DHCP relay, but may in fact be running a DHCP service). But it is unable to send these packets, because there's no valid ARP entry, and ARP requests are still being ignored.
- Eventually, the DHCP lease will be about up, and my DHCP client will fall back to DHCPREQUEST to the broadcast address.
- The next-hop Sonic router immediately responds. The ARP cache entry is filled. (I haven't actually caught the ARP request and response happening, yet, but I assume they must.) Something about this seems to trigger allowing ARP requests through again.
- IPv4 traffic starts flowing again.
I have talked with Sonic tech support about this a couple times. The first time, they updated the firmware in my ONT, but that didn't help. Apparently, there was some unrelated known issue with the ONT getting addresses, but I'm not sure what that was about.
The second time, the tech looked at the packets coming from my router, and saw that IPv6 was still flowing, but didn't see the ARP requests from my router, even as I watched my router send them (via snooping on the ethernet interface). So it seems like the ONT stops bridging ARP packets, for some reason? I'm perplexed.
I've tried two different routers with different configurations. The ethernet interface error counters are at 0. And IPv6 doesn't have any problems talking with the same next-hop router (with the same MAC address for both IPv4 and IPv6, so almost certainly a single, dual-stack router).
The next step appears to be to swap out the ONT. But this really seems like a software issue on the ONT or upstream, so I'm not sure if that will even help.
The general sequence of events seems to be this:
- Traffic flows along fine.
- At some point, the next-hop (Sonic) router stops responding to ARP requests.
- Traffic continues to flow until the ARP cache entry goes stale.
- IPv4 traffic stops flowing. IPv6 traffic continues unabated.
- At some point hours later, the DHCP lease will be up for renewal. The DHCP server tries to send unicast DHCPREQUEST packets to the next-hop router (which I suspect is actually a DHCP relay, but may in fact be running a DHCP service). But it is unable to send these packets, because there's no valid ARP entry, and ARP requests are still being ignored.
- Eventually, the DHCP lease will be about up, and my DHCP client will fall back to DHCPREQUEST to the broadcast address.
- The next-hop Sonic router immediately responds. The ARP cache entry is filled. (I haven't actually caught the ARP request and response happening, yet, but I assume they must.) Something about this seems to trigger allowing ARP requests through again.
- IPv4 traffic starts flowing again.
I have talked with Sonic tech support about this a couple times. The first time, they updated the firmware in my ONT, but that didn't help. Apparently, there was some unrelated known issue with the ONT getting addresses, but I'm not sure what that was about.
The second time, the tech looked at the packets coming from my router, and saw that IPv6 was still flowing, but didn't see the ARP requests from my router, even as I watched my router send them (via snooping on the ethernet interface). So it seems like the ONT stops bridging ARP packets, for some reason? I'm perplexed.
I've tried two different routers with different configurations. The ethernet interface error counters are at 0. And IPv6 doesn't have any problems talking with the same next-hop router (with the same MAC address for both IPv4 and IPv6, so almost certainly a single, dual-stack router).
The next step appears to be to swap out the ONT. But this really seems like a software issue on the ONT or upstream, so I'm not sure if that will even help.
Statistics: Posted by gadams — Wed Jan 01, 2025 8:08 pm