Network-Engineering | 8 min read
Junos upgrade – filesystem is full
An extension of Juniper’s article KB31198 mainly addressing issues on the EX series switches:
No matter what you do in life or how you earn your money: You really had come into contact with software upgrades at some point and if you are a network engineer you could even develop some kind of dislike to the sound of that phrase. We could probably start a lively conversation about the shared experience in that field and what could go or already went wrong. We would probably not scratch the surface with all cases of the device not coming back up, booting with wrong or corrupted software, hardware failures, power surges, data loss and create a new series of tales from the crypt (you get it, some of them will become zombie devices. Please tell us you get it,our bonus depends on that ;)). But what if we cannot even start, what if there is an issue at the fundamental stage of that process? We recently had a few cases where we couldn’t even upload the image to target devices.
As a first step, it is always good to look for obvious mistakes. If that switch isn’t actually right, maybe you already run out of space, maybe you had one too many snapshots or you were very liberal with logging and trace options. So let’s go and free some space, make some room! Below we have a simple three-strike rule what should be done as a first step on the path to making your engineering life easier.
- Try to actually free up some space
root@juniper> request system storage cleanup
- Remove old snapshots
root@juniper> request system snapshot delete *
- Try to use tmpfs to store an image (for example /tmp)
root@juniper> file copy <source> /tmp/<image>
It would be fair to give you at least a short explanation. Storage cleanup will only remove files from the following directories:
So, if you are trying to upload your image to a location which does not share disk space with them, then that will not help you much, just sayin’. Snapshots are a generally tricky topic since different Juniper devices handle them in various ways. Some can do it only to the external USB drives (QFX5100) since snapshot cannot be stored on the same media that was used to boot up the device. In general, they are copies of currently running software and configuration, so yeah, having multiple of these can quickly consume free space. Besides, usually one is enough.
Of course, before we start anything there is a viable workaround, to not use local storage at all and just do the upgrade over the network. If a user would decide to go this way, there is actually no point in reading this article further. The caveat of this approach is that only TFTP and FTP are supported protocols for that since mgd (management process) does not support SCP. But when this is not possible or not the desired solution, then you guessed it, reading continues.
So let’s break it down: You want to download an image to device local drive and get this. Obviously what a normal person first would do is go into the “denial stage” and maybe shake a fist once or twice.
Then our normal person would check if there is REALLY enough space.
mzwk@ex42-01> show system storage
In case, the destination is, in fact, full, one can do storage clean-up “request system storage cleanup“, as it was described in our upgrade three-strike rule. Below you can see an example of command usage.
Furthermore, if after executing the storage cleanup there is still not enough space on the device, one may look into user home directories, especially/root/folder.
When you are certain that there is so much space on our Juniper device that you could actually get lost in there, you can hit some annoying issue, as you see below.
No matter what we try, we are not able to download an image to the target system, but we can try to push the image to the target device from a remote server and let the mgd do file handling. We have to be honest, to this day we are amazed that it works and that we somehow got the idea to even try it.
After finally managing to upload our future software image, we would like to point out two things to consider adding to the upgrade procedure to make our life easier in the future: It’s good to include this flag in future Junos upgrade to conserve disk space: “no-copy” and “unlink”.
- The no-copy option will prevent the creation of copies of new packages in the /var/sw/pkg.
- Unlink will remove packages after they are installed.
As a closing remark, if a switch is running an older release (i.e. 15.1X) and it is to be upgraded to a recent release (18), a direct upgrade (with no interim releases) is normally possible, especially on EX series fixed switches. Please note that if the switch is a Virtual Chassis cluster, then it may malfunction during such an upgrade process and eventually it may fail to cause a split cluster.
Whenever it is possible, for such a major “multi-hop” upgrade it is advised to split the chassis into standalone switches and upgrade one by one. You may preconfigure the inter-switch ports upfront (i.e. in a management VLAN and respective IP addresses) and then simply convert VC ports to regular ports. Once all devices are upgraded, the VC may be recreated.
Creating a recovery snapshot
Once the upgrade to a stable Junos OS release is done, it is a very good practice to create a new recovery partition. Normally the recovery partition is created by a manual action. At factory state, it reflects the software image running on the device. To create a recovery snapshot, simply issue:
Unfortunately, this is also common, that after performing the upgrade, that is affected by insufficient storage issues, the recovery snapshot creation process is reporting the same problem.
To fix this, execute the following command set:
Now, try to create a recovery snapshot once more and this time it should work like a charm.