nRF52 DFU and the Missing Service Changed Indication

Firmware is Easy
Even when using Nordic’s pre-packaged SDK files for adding DFU (Device Firmware Update) capabilities, things can get tricky. We learned this as we added buttonless DFU to a customer project that used the nRF52832. Using SDK5 v15.3.0 we had successfully given our device buttonless DFU ability when not bonded to a mobile peer. To do this, we needed to add:
- The DFU bootloader
- DFU-specific SDK source files
- Automatic generation of the bootloader settings file upon project build
- Automatic generation of a full firmware image containing the bootloader, application, SoftDevice, and bootloader settings
With all of the components in place, we were able to wirelessly update our device through the nRF Connect app on our phones and desktops. The job wasn’t done yet, as we wanted to restrict the initiation of DFU operations to only those mobile devices which had authenticated with and bonded to our device. In order to accomplish these a few other modules were needed:
- FDS, to store peer bond information persistently
- Peer manager, to manage connections and events from bonded peer devices
After navigating through all of the changes required for those modules – re-jiggering existing functions, initializing all of the modules, messing with various #define
s we were finally ready to bond with our device and start testing DFU again. Everything worked great, buttonless DFUs could be executed just like before. “Firmware is easy!” we said.
Firmware is Hard
Turns out it wasn’t that easy, because we soon found that whenever we tried to reconnect to our device after a DFU it was never successful. No matter what we tried, we could never connect to our test app after we updated the firmware. Lots of troubleshooting (as well as time on Nordic DevZone) were spent by the team members trying to find the root cause to no avail.
It wasn’t until an eagle-eyed app developer mentioned that after DFU they always noticed that the BLE services and characteristics reported by the device were always missing one of the custom services that things started to become clearer. If we connected to the device with nRF Connect after a DFU, the only service that would show up was the same service our firmware application and bootloader shared. This meant that our mobile device was caching the services and characteristics of the DFU bootloader during the process of the DFU. It shouldn’t have been doing that because one of the steps needed for implementing bonded DFU is to enable the service changed indication to be sent from the device to the mobile app when a DFU happens. This is done in sdk_config.h
with Nordic projects:
// NRF_SDH_BLE_SERVICE_CHANGED - Include the Service Changed characteristic in the Attribute Table.
#ifndef NRF_SDH_BLE_SERVICE_CHANGED
#define NRF_SDH_BLE_SERVICE_CHANGED 1
#endif
With that definition set, our mobile devices would not cache the services of the device, instead rediscovering them upon each connection. So something was causing the service changed indication that should be firing to get dropped, and we set out to find where the issue was. Upon thorough inspection of our debug logs over the course of a bonded DFU, we found one in particular that raised suspicions:
Unexpected error when looking for service changed CCCD: NRF_ERROR_NOT_FOUND
which comes from within the function service_changed_send_in_evt()
. That error ends up getting logged after a call to service_changed_cccd()
so armed with the suspect function and our symptom we went to Nordic DevZone to see what answers we could find.
The Fix
In the first post the we read we found that somebody had posted a question based on the symptom that the service changed indication could get dropped if an ATT MTU exchange was already ongoing. We checked our debug logs and sure enough a few lines before the original error that got our attention, there was a line about an ATT MTU request being started which didn’t complete until after the ‘CCCD not found’ error log. In response to that post, the Nordic representative said that the service_changed_cccd()
function was not correct and proposed to change it from:
static ret_code_t service_changed_cccd(uint16_t conn_handle, uint16_t * p_cccd)
{
bool sc_found = false;
uint16_t end_handle;
ret_code_t err_code = sd_ble_gatts_initial_user_handle_get(&end_handle);
ASSERT(err_code == NRF_SUCCESS);
for (uint16_t handle = 1; handle < end_handle; handle++)
{
uint16_t uuid;
ble_gatts_value_t value = {.p_value = (uint8_t *)&uuid, .len = 2, .offset = 0};
err_code = sd_ble_gatts_value_get(conn_handle, handle, &value);
if (err_code != NRF_SUCCESS)
{
return err_code;
}
else if (!sc_found && (uuid == BLE_UUID_GATT_CHARACTERISTIC_SERVICE_CHANGED))
{
sc_found = true;
}
else if (sc_found && (uuid == BLE_UUID_DESCRIPTOR_CLIENT_CHAR_CONFIG))
{
value.p_value = (uint8_t *)p_cccd;
return sd_ble_gatts_value_get(conn_handle, ++handle, &value);
}
}
return NRF_ERROR_NOT_FOUND;
}
to:
static ret_code_t service_changed_cccd(uint16_t conn_handle, uint16_t * p_cccd)
{
bool sc_found = false;
bool generic_att_found = false;
ble_gatts_attr_md_t attr_md;
uint16_t end_handle;
ret_code_t err_code = sd_ble_gatts_initial_user_handle_get(&end_handle);
ASSERT(err_code == NRF_SUCCESS);
for (uint16_t handle = 1; handle < end_handle; handle++)
{
ble_uuid_t uuid;
ble_gatts_value_t value = {.p_value = (uint8_t *)&uuid.uuid, .len = 2, .offset = 0};
if (!generic_att_found)
{
err_code = sd_ble_gatts_value_get(conn_handle, handle, &value);
}
else
{
err_code = sd_ble_gatts_attr_get(handle, &uuid, NULL);
}
if (err_code != NRF_SUCCESS)
{
return err_code;
}
else if (uuid.uuid == BLE_UUID_GATT)
{
generic_att_found = true;
}
else if (!sc_found && (uuid.uuid == BLE_UUID_GATT_CHARACTERISTIC_SERVICE_CHANGED))
{
sc_found = true;
}
else if (sc_found && (uuid.uuid == BLE_UUID_DESCRIPTOR_CLIENT_CHAR_CONFIG))
{
value.p_value = (uint8_t *)p_cccd;
return sd_ble_gatts_value_get(conn_handle, handle, &value);
}
}
return NRF_ERROR_NOT_FOUND;
}
The representative also mentioned that he had reported the issue internally for fixing on later SDK releases. We were aware that SDK v16.0.0 had recently been released so we decided to look at the function there for comparison. We found that the new SDK implemented a different function from both of these, but retained the fix as stated in the post. Our fix simply replaced our SDK v15.3.0 version of service_changed_cccd()
with that of SDK v16.0.0. Problem solved!
And if you have questions about an embedded project you’re working on, Dojo Five can help you with all aspects of your devops for embedded journey! We are always happy to hear about cool projects or interesting problems to solve, so don’t hesitate to reach out and chat with us on LinkedIn or through email!