DD-WRT Bites Me Again
Embedded devices often suck. No, really. They are challenging to troubleshoot. When they fail they become “black boxes.”
The house LAN Asus router using DD-WRT had failed to boot properly.
Short of obvious hardware malfunctions, when something fails abruptly often I ask myself, “What did I do?"
The previous day I had uploaded new SSH keys to the router and subsequently performed a backup. All seemed well until after powering off the router for the night. Powering off nightly is a normal routine ever since owning a router for some 15 years. On this particular morning the router did not respond.
Some of the Cliff Clavens of the world might smugly offer that they never power off the router. The argument is moot and irrelevant. In this failure case, not powering off nightly only would have prolonged the inevitable and introduced confusion. The habit of powering down nightly provided a clue that something had changed from the previous day rather than mysteriously “failing” weeks or months later and then not remembering the specific changes.
After moving to the spare WRT54GL backup router I returned to restoring the Asus router. First things first. I disconnected the WAN side Ethernet connection. While unlikely, if the device booted with the original Asus firmware I was not going to allow the vendor firmware to phone home.
The device would not respond. The remaining option was a hard reset. I used the infamous DD-WRT 30-30-30 reboot sequence. That failed to initialize the router. The Power LED indicated some kind of boot loop. Powering off for several minutes finally helped.
Partially good news, the device booted into DD-WRT rather than the Asus vendor firmware. The device had reset to the default vendor IP address of 192.168.1.1. As is the design of DD-WRT I had to retype the device name and password. A reboot succeeded.
I uploaded a copy of the last device backup before the SSH mishap. That succeeded and I had the router restored to a known good configuration.
All seemed well but I knew better. I again repeated the same task as previously — uploading the new SSH keys.
The router again failed to reboot properly.
Why was updating SSH keys causing the router to malfunction?
Methodically adding a single SSH key one at a time and rebooting eventually exposed a possible limit with storing keys, either with the number or overall allotted storage capacity. With six keys there were no problems. Adding a seventh key launched the failure ritual. Checking available RAM showed only about 36 MB of the available 256 MB being used. Was the problem too many keys or something corrupted in one of the keys that caused the failure? Or the order of storing the keys?
Oddly, the WRT54GL had eight keys stored and had no such problem. Likely there was no perceived limitation. Likely something else was corrupting the configuration.
I wasted several hours troubleshooting. Various problems appeared after restoring backups.
- The 5 GHz wireless MAC address conflicted with the 2.4 GHz address.
- There seemed to be no way to set the 5 GHz interface to
AC/N-Mixed
because the option never appeared. - The 5 GHz interface always booted as
Disabled
. - Accessing the device through SSH failed repeatedly until toggling the
Apply Settings
button. - The backup restore often aborted before rebooting.
- Often the automatic reboot after restoring a backup resulted in the device resetting to the default vendor IP address.
I risked updating the firmware to the latest version. This seemed to resolve the 5 GHz wireless interface issues, but the Guest interface failed to function. SSH still failed.
I restored the device to the previously used firmware version. The 5 GHz interface remained intact and the Guest network was again functional. SSH still failed.
Various tweaks and changes made no difference. I enabled syslog logging with the hope of seeing a reason why SSH access was failing. Suddenly SSH access succeeded. I disabled logging and SSH continued to succeed.
I don’t know the root cause of the entire fiasco. Therefore I limited the number of SSH keys to less than seven and changed some LAN related shell scripts to not having a dependency on using SSH with the router.
A notable challenge with a failed “black box” router is the inconvenience. The loss of a router is not fatal but is irritating and time consuming to recover. Spending time troubleshooting this device failure meant not doing other things and wasting too many hours.
I accept that creating robust and easy to use software is hard. Nonetheless this experience left me exhausted, to the point I could not even get angry. Just plain worn out and feeling beat up. Dealing with these kinds of bugs is never fun because nobody learns the root cause.
Working with embedded devices often is painful. Many are under powered and slow. Even when using free/libre software these embedded devices have an aura of being closed and proprietary in nature.
There are lessons here. I am reminded that the glory days of DD-WRT are long past. DD-WRT is hobbyist software and is not supported in any meaningful manner. Caveat emptor. I am reminded that I never liked using off-the-shelf “black box” devices. Time to start thinking about building a dedicated Linux distro based gateway and router. With such a system a failure is more easily remedied and troubleshooting more accessible.
Posted: Usability Tagged: DD-WRT
Category:Next: Extracting Attachments Without Launching Thunderbird
Previous: Disaster Recovery Testing — 3