Network-Engineering | < 1 min read
Junos Nonstop Active Routing and rpd failure
A support case where the routing protocol process (rpd) on a Juniper MX router crashed:
This is very bad and should of course not happen. But since every piece of software has some bugs it’s not too big of a surprise that also rpd can crash. On Junos OS, rpd is the process which implements the routing protocols like BGP or OSPF. It maintains adjacencies and sessions to neighbouring routers.
The affected MX router was equipped with two routing engines and Nonstop Active Routing (NSR) was active. It was a surprise that this rpd crash caused downtime! The expectation was that the backup routing engine (RE) takes over and the neighbouring routers do not notice.
It turns out that this router missed the configuration statement “set system switchover-on-routing-crash“.
Only with this setting added a failover happens when rpd crashes.
In the meantime, we’ve seen more NSR configurations that miss this important setting. We recommend that users of NSR always activate switchover-on-routing-crash.