First Boot Wizard crashes when radio without hardware exists in UCI
Problem Description
The First Boot Wizard (FBW) crashes during network scanning when there's a radio configured in UCI (/etc/config/wireless) but the corresponding physical hardware (phy) doesn't exist in the system.
Symptoms
- FBW starts scanning and detects mesh networks successfully
- Process crashes silently during config download phase
- Frontend shows "Connection attempt not yet started" indefinitely
/tmp/scanning file remains true (no cleanup)
- No config files downloaded to
/tmp/fbw/
Steps to Reproduce
- Have a router with a stale UCI radio configuration (e.g.,
radio2) pointing to non-existent PCI hardware
- Start First Boot Wizard scan via lime-app
- FBW detects networks but crashes when trying to process them
Environment
Hardware: Router with 2 physical radios (phy0, phy1)
UCI Config: 3 radios configured (radio0, radio1, radio2)
# Physical radios
root@LiMe-1d2ae2:~# ls /sys/class/ieee80211/
phy0 phy1
# UCI radios
root@LiMe-1d2ae2:~# uci show wireless | grep "^wireless.radio"
wireless.radio0=wifi-device
wireless.radio0.path='platform/ahb/18100000.wmac'
wireless.radio1=wifi-device
wireless.radio1.path='pci0000:00/0000:00:00.0'
wireless.radio2=wifi-device
wireless.radio2.path='pci0000:01/0000:01:00.0' # <-- Hardware doesn't exist
Error Log
root@LiMe-1d2ae2:~# /bin/firstbootwizard
[FBW] Scanning...
/usr/bin/lua: /usr/lib/lua/lime/wireless.lua:19: wireless.get_phy_mac(..) failed reading: /sys/class/ieee80211/phy2/macaddress
stack traceback:
[C]: in function 'assert'
/usr/lib/lua/lime/wireless.lua:19: in function 'get_phy_mac'
/usr/lib/lua/firstbootwizard.lua:110: in function 'func'
/usr/lib/lua/firstbootwizard/functools.lua:63: in function </usr/lib/lua/firstbootwizard/functools.lua:60>
(tail call): ?
/usr/lib/lua/firstbootwizard.lua:127: in function 'cb'
/usr/lib/lua/firstbootwizard/functools.lua:127: in function 'reduce'
/usr/lib/lua/firstbootwizard.lua:430: in function 'get_all_networks'
/bin/firstbootwizard:7: in main chunk
[C]: ?
Root Cause Analysis
The bug occurs in this call chain:
- firstbootwizard.lua:110 -
fbw.get_own_macs() iterates over all 5GHz radios
- firstbootwizard/utils.lua:78 -
extract_phys_from_radios("radio2") returns "phy2"
function utils.extract_phys_from_radios(radio)
return "phy"..radio.sub(radio, -1) -- Assumes radioN = phyN
end
- wireless.lua:110 calls
wireless.get_phy_mac("phy2")
- wireless.lua:19 -
assert() crashes when file doesn't exist:
function wireless.get_phy_mac(phy)
local path = "/sys/class/ieee80211/"..phy.."/macaddress"
local mac = assert(fs.readfile(path), "wireless.get_phy_mac(..) failed reading: "..path):gsub("\n","")
return utils.split(mac, ":")
end
Why the incorrect mapping happens
The code incorrectly assumes that radioN always corresponds to phyN:
radio0 → phy0 ✅
radio1 → phy1 ✅
radio2 → phy2 ❌ (phy2 doesn't exist)
This is fragile because:
- Radio names are UCI configuration names (can be arbitrary)
- Phy names are kernel-assigned based on hardware detection order
- A radio can be removed/disabled in hardware but remain in UCI config
Proposed Solutions
Solution 1: Graceful error handling (Quick fix)
Modify wireless.get_phy_mac() to return nil instead of crashing:
function wireless.get_phy_mac(phy)
local path = "/sys/class/ieee80211/"..phy.."/macaddress"
-- Check if phy exists before trying to read MAC
if not fs.stat(path) then
utils.log("wireless.get_phy_mac: phy "..phy.." does not exist, skipping")
return nil
end
local mac = assert(fs.readfile(path), "wireless.get_phy_mac(..) failed reading: "..path):gsub("\n","")
return utils.split(mac, ":")
end
Then update fbw.get_own_macs() to filter out nil results:
function fbw.get_own_macs()
local radios = ft.map(utils.extract_prop(".name"), wireless.scandevices())
local radios_5ghz = ft.filter(wireless.is5Ghz, radios)
local phys = ft.map(utils.extract_phys_from_radios, radios_5ghz)
local macs = ft.map(function(phy)
local mac = wireless.get_phy_mac(phy)
if mac then
return table.concat(mac, ":")
end
return nil
end, phys)
-- Filter out nils
return ft.filter(function(mac) return mac ~= nil end, macs)
end
Solution 2: Correct radio→phy mapping (Proper fix)
Don't assume radioN = phyN. Instead, derive the phy from the radio's device path:
function wireless.get_phy_from_radio(radio_name)
local uci = config.get_uci_cursor()
local path = uci:get("wireless", radio_name, "path")
if not path then
utils.log("wireless.get_phy_from_radio: no path for radio "..radio_name)
return nil
end
-- Find phy by matching device path
for phy_dir in fs.dir("/sys/class/ieee80211/") do
if phy_dir ~= "." and phy_dir ~= ".." then
local device_link = fs.readlink("/sys/class/ieee80211/"..phy_dir.."/device")
if device_link and device_link:find(path, 1, true) then
return phy_dir
end
end
end
utils.log("wireless.get_phy_from_radio: phy not found for radio "..radio_name.." with path "..path)
return nil
end
Solution 3: Filter radios during scandevices (Most robust)
Modify wireless.scandevices() to only return radios that have corresponding hardware:
function wireless.scandevices()
local devices = {}
local uci = config.get_uci_cursor()
uci:foreach("wireless", "wifi-device", function(dev)
-- Check if hardware exists for this radio
local path = dev.path
if path then
local phy_exists = false
for phy_dir in fs.dir("/sys/class/ieee80211/") do
if phy_dir ~= "." and phy_dir ~= ".." then
local device_link = fs.readlink("/sys/class/ieee80211/"..phy_dir.."/device")
if device_link and device_link:find(path, 1, true) then
phy_exists = true
break
end
end
end
if phy_exists then
devices[dev[".name"]] = dev
else
utils.log("wireless.scandevices: skipping radio "..dev[".name"].." - hardware not found")
end
else
utils.log("wireless.scandevices: skipping radio "..dev[".name"].." - no path defined")
end
end)
-- ... rest of the function
end
Workaround
Users can work around this by cleaning up stale radio configurations:
# Identify stale radios
for radio in $(uci show wireless | grep "=wifi-device" | cut -d. -f2 | cut -d= -f1); do
path=$(uci get wireless.$radio.path 2>/dev/null)
if [ -n "$path" ]; then
# Check if hardware exists
if ! ls -d /sys/devices/$path/ieee80211/phy* >/dev/null 2>&1; then
echo "Stale radio: $radio (path: $path)"
uci delete wireless.$radio
fi
fi
done
uci commit wireless
Related Issues
Impact
- Severity: High - FBW completely breaks on affected systems
- Frequency: Medium - affects routers with hardware changes or stale configs
- User Experience: Critical - prevents initial network setup
Additional Context
This bug was discovered while debugging why FBW was stuck showing "Connection attempt not yet started" in lime-app. The silent crash leaves the system in an inconsistent state with /tmp/scanning locked to true, preventing subsequent scan attempts until manual cleanup.
First Boot Wizard crashes when radio without hardware exists in UCI
Problem Description
The First Boot Wizard (FBW) crashes during network scanning when there's a radio configured in UCI (
/etc/config/wireless) but the corresponding physical hardware (phy) doesn't exist in the system.Symptoms
/tmp/scanningfile remainstrue(no cleanup)/tmp/fbw/Steps to Reproduce
radio2) pointing to non-existent PCI hardwareEnvironment
Hardware: Router with 2 physical radios (phy0, phy1)
UCI Config: 3 radios configured (radio0, radio1, radio2)
Error Log
Root Cause Analysis
The bug occurs in this call chain:
fbw.get_own_macs()iterates over all 5GHz radiosextract_phys_from_radios("radio2")returns"phy2"wireless.get_phy_mac("phy2")assert()crashes when file doesn't exist:Why the incorrect mapping happens
The code incorrectly assumes that
radioNalways corresponds tophyN:radio0→phy0✅radio1→phy1✅radio2→phy2❌ (phy2 doesn't exist)This is fragile because:
Proposed Solutions
Solution 1: Graceful error handling (Quick fix)
Modify
wireless.get_phy_mac()to returnnilinstead of crashing:Then update
fbw.get_own_macs()to filter outnilresults:Solution 2: Correct radio→phy mapping (Proper fix)
Don't assume
radioN = phyN. Instead, derive the phy from the radio's device path:Solution 3: Filter radios during scandevices (Most robust)
Modify
wireless.scandevices()to only return radios that have corresponding hardware:Workaround
Users can work around this by cleaning up stale radio configurations:
Related Issues
Impact
Additional Context
This bug was discovered while debugging why FBW was stuck showing "Connection attempt not yet started" in lime-app. The silent crash leaves the system in an inconsistent state with
/tmp/scanninglocked totrue, preventing subsequent scan attempts until manual cleanup.