Using FTDI for custom_io resulted in very unrealible reads. This patch allows more reliable use of FTDI custom io, like what might be needed when used on a mobile device like Android. --- src/cochran_commander.c | 34 +++++++++++++++++++++++++++++++--- 1 file changed, 31 insertions(+), 3 deletions(-)
diff --git a/src/cochran_commander.c b/src/cochran_commander.c index 921c8c0..1de6c07 100644 --- a/src/cochran_commander.c +++ b/src/cochran_commander.c @@ -806,11 +806,39 @@ cochran_commander_device_set_fingerprint (dc_device_t *abstract, const unsigned
static dc_status_t +cochran_commander_read_retry(cochran_commander_device_t *device, dc_event_progress_t *progress, unsigned int address, unsigned char data[], unsigned int size) +{ + + dc_status_t rc = DC_STATUS_SUCCESS; + unsigned int nretries = 0; + int last_current = 0; + + if (progress) + last_current = progress->current; + + while (nretries < 3) { + if (nretries) + WARNING(device->base.context, "Read error, attempting retry."); + + rc = cochran_commander_read(device, progress, address, data, size); + if (rc == DC_STATUS_SUCCESS) + return rc; + + nretries++; + if (progress) + progress->current = last_current; + } + + return rc; +} + + +static dc_status_t cochran_commander_device_read (dc_device_t *abstract, unsigned int address, unsigned char data[], unsigned int size) { cochran_commander_device_t *device = (cochran_commander_device_t *) abstract;
- return cochran_commander_read(device, NULL, address, data, size); + return cochran_commander_read_retry(device, NULL, address, data, size); }
@@ -854,7 +882,7 @@ cochran_commander_device_dump (dc_device_t *abstract, dc_buffer_t *buffer) return rc;
// Read the sample data, logbook and sample data are contiguous - rc = cochran_commander_read (device, &progress, device->layout->rb_logbook_begin, dc_buffer_get_data(buffer), size); + rc = cochran_commander_read_retry (device, &progress, device->layout->rb_logbook_begin, dc_buffer_get_data(buffer), size); if (rc != DC_STATUS_SUCCESS) { ERROR (abstract->context, "Failed to read the sample data."); return rc; @@ -934,7 +962,7 @@ cochran_commander_device_foreach (dc_device_t *abstract, dc_dive_callback_t call }
// Request log book - rc = cochran_commander_read(device, &progress, layout->rb_logbook_begin, data.logbook, data.logbook_size); + rc = cochran_commander_read_retry(device, &progress, layout->rb_logbook_begin, data.logbook, data.logbook_size); if (rc != DC_STATUS_SUCCESS) { status = rc; goto error;
The parameters used with the FTDI USB serial port drivers didn't work well with directly with libftdi1. The new baud rate results in the same effective baud rates for both. The rbstream block size was reduced to help with the unreliability of the libftdi communications. --- src/cochran_commander.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/src/cochran_commander.c b/src/cochran_commander.c index c02bcb6..be41673 100644 --- a/src/cochran_commander.c +++ b/src/cochran_commander.c @@ -199,8 +199,8 @@ static const cochran_device_layout_t cochran_emc14_device_layout = { COCHRAN_MODEL_EMC_14, // model 32, // address_bits ENDIAN_LE, // endian - 806400, // baudrate - 65536, // rbstream_size + 850000, // baudrate + 32768, // rbstream_size 0x0D2, // cf_dive_count 0x13E, // cf_last_log 0x142, // cf_last_interdive @@ -224,8 +224,8 @@ static const cochran_device_layout_t cochran_emc16_device_layout = { COCHRAN_MODEL_EMC_16, // model 32, // address_bits ENDIAN_LE, // endian - 806400, // baudrate - 65536, // rbstream_size + 850000, // baudrate + 32768, // rbstream_size 0x0D2, // cf_dive_count 0x13E, // cf_last_log 0x142, // cf_last_interdive @@ -249,8 +249,8 @@ static const cochran_device_layout_t cochran_emc20_device_layout = { COCHRAN_MODEL_EMC_20, // model 32, // address_bits ENDIAN_LE, // endian - 806400, // baudrate - 65536, // rbstream_size + 850000, // baudrate + 32768, // rbstream_size 0x0D2, // cf_dive_count 0x13E, // cf_last_log 0x142, // cf_last_interdive @@ -372,7 +372,7 @@ cochran_commander_packet (cochran_commander_device_t *device, dc_event_progress_ // Give the DC time to process the command. dc_serial_sleep(device->port, 45);
- // Rates are odd, like 806400 for the EMC, 115200 for commander + // Rates are odd, like 850400 for the EMC, 115200 for commander status = dc_serial_configure(device->port, device->layout->baudrate, 8, DC_PARITY_NONE, DC_STOPBITS_TWO, DC_FLOWCONTROL_NONE); if (status != DC_STATUS_SUCCESS) { ERROR (abstract->context, "Failed to set the high baud rate."); @@ -523,7 +523,7 @@ cochran_commander_read (cochran_commander_device_t *device, dc_event_progress_t return DC_STATUS_UNSUPPORTED; }
- dc_serial_sleep(device->port, 800); + dc_serial_sleep(device->port, 550);
// set back to 9600 baud rc = cochran_commander_serial_setup(device);
On 2017-09-06 22:02, John Van Ostrand wrote:
The parameters used with the FTDI USB serial port drivers didn't work well with directly with libftdi1. The new baud rate results in the same effective baud rates for both.
What's the effective baudrate? Why don't you just use that effective baudrate, instead of 850000? Now the baudrate change looks rather arbitrary to me.
Other than that, the changes look good to me. For the retrying, I'll restore the old retry function (see commit b3d2c603ddec9758fb36706bbde46ce23ca9f0ed) because it handles fatal errors better.
Jef
On Mon, Sep 11, 2017 at 5:22 AM, Jef Driesen jef@libdivecomputer.org wrote:
On 2017-09-06 22:02, John Van Ostrand wrote:
The parameters used with the FTDI USB serial port drivers didn't work well with directly with libftdi1. The new baud rate results in the same effective baud rates for both.
What's the effective baudrate? Why don't you just use that effective baudrate, instead of 850000? Now the baudrate change looks rather arbitrary to me.
Other than that, the changes look good to me. For the retrying, I'll restore the old retry function (see commit b3d2c603ddec9758fb36706bbde46ce23ca9f0ed) because it handles fatal errors better.
I was working with FTDI to get mobile communications to work and high-speed data was coming back with bit 7 set on all bytes. While I was tracking that down I noticed that at the USB packet level the baud rate command was different, there was a different clock divisor for the custom baud rate. Clearly the ftdi_sio kernel module that translates serial commands to FTDI was translating baud differently than the libftdi1 library routines. I suspect either the logic is different or one of them is wrong about which FTDI chip it's talking to. I didn't debug deeper than that. I translated the working divisor back to a baud rate which turned out to be 857,143 baud. I expect the actual baud rate is tied to the clock on the CPU of the dive computer, we don't know what that is, so stating 857,143 as the baud rate suggests accuracy. I figured 850,000 was within the 5% required by the libftdi1 logic, visually more reasonable and about as wrong as 857,143.
I'll give that commit a try to see how it works. I know that one of the problems I was having was that after a baud rate change a heartbeat byte would be received at the wrong baud rate and show up as a 0xFE, that's the only case that is a permanent failure if we used that commit. I'm starting to wonder if the serial flush command given to the FTDI chip is working, or if the DC ignores the baud rate change some times.
On 2017-09-11 16:15, John Van Ostrand wrote:
On Mon, Sep 11, 2017 at 5:22 AM, Jef Driesen jef@libdivecomputer.org wrote:
On 2017-09-06 22:02, John Van Ostrand wrote:
The parameters used with the FTDI USB serial port drivers didn't work well with directly with libftdi1. The new baud rate results in the same effective baud rates for both.
What's the effective baudrate? Why don't you just use that effective baudrate, instead of 850000? Now the baudrate change looks rather arbitrary to me.
Other than that, the changes look good to me. For the retrying, I'll restore the old retry function (see commit b3d2c603ddec9758fb36706bbde46ce23ca9f0ed) because it handles fatal errors better.
I was working with FTDI to get mobile communications to work and high-speed data was coming back with bit 7 set on all bytes. While I was tracking that down I noticed that at the USB packet level the baud rate command was different, there was a different clock divisor for the custom baud rate. Clearly the ftdi_sio kernel module that translates serial commands to FTDI was translating baud differently than the libftdi1 library routines. I suspect either the logic is different or one of them is wrong about which FTDI chip it's talking to. I didn't debug deeper than that. I translated the working divisor back to a baud rate which turned out to be 857,143 baud. I expect the actual baud rate is tied to the clock on the CPU of the dive computer, we don't know what that is, so stating 857,143 as the baud rate suggests accuracy. I figured 850,000 was within the 5% required by the libftdi1 logic, visually more reasonable and about as wrong as 857,143.
This kind of info helps understanding the reason behind the change, so why not add it to the commit message? In a couple of months or years you probably won't remember either :-)
I'll give that commit a try to see how it works. I know that one of the problems I was having was that after a baud rate change a heartbeat byte would be received at the wrong baud rate and show up as a 0xFE, that's the only case that is a permanent failure if we used that commit. I'm starting to wonder if the serial flush command given to the FTDI chip is working, or if the DC ignores the baud rate change some times.
If you change the baudrate while data is arriving, then it might be that some data was still read with the old baudrate (because it was already sitting in the driver receive queue).
With fatal errors, I was referring to errors reported by operating system and/or driver. If for example the write call fails because the connection is down (usb cable unplugged, bluetooth connection lost) or some other serious I/O error happened, then there is no point in retrying. In that case you want to bailout as soon as possible. Typically only for errors at the communication protocol level (e.g. timeouts, corrupt data packets, etc) retrying helps. Those correspond to the DC_STATUS_TIMEOUT and DC_STATUS_PROTOCOL error codes. Everything else is pretty much fatal.
Jef
On Mon, Sep 11, 2017 at 11:13 AM, Jef Driesen jef@libdivecomputer.org wrote:
On 2017-09-11 16:15, John Van Ostrand wrote:
On Mon, Sep 11, 2017 at 5:22 AM, Jef Driesen jef@libdivecomputer.org wrote:
On 2017-09-06 22:02, John Van Ostrand wrote:
The parameters used with the FTDI USB serial port drivers didn't
work well with directly with libftdi1. The new baud rate results in the same effective baud rates for both.
What's the effective baudrate? Why don't you just use that effective baudrate, instead of 850000? Now the baudrate change looks rather arbitrary to me.
Other than that, the changes look good to me. For the retrying, I'll restore the old retry function (see commit b3d2c603ddec9758fb36706bbde46c e23ca9f0ed) because it handles fatal errors better.
I was working with FTDI to get mobile communications to work and high-speed data was coming back with bit 7 set on all bytes. While I was tracking that down I noticed that at the USB packet level the baud rate command was different, there was a different clock divisor for the custom baud rate. Clearly the ftdi_sio kernel module that translates serial commands to FTDI was translating baud differently than the libftdi1 library routines. I suspect either the logic is different or one of them is wrong about which FTDI chip it's talking to. I didn't debug deeper than that. I translated the working divisor back to a baud rate which turned out to be 857,143 baud. I expect the actual baud rate is tied to the clock on the CPU of the dive computer, we don't know what that is, so stating 857,143 as the baud rate suggests accuracy. I figured 850,000 was within the 5% required by the libftdi1 logic, visually more reasonable and about as wrong as 857,143.
This kind of info helps understanding the reason behind the change, so why not add it to the commit message? In a couple of months or years you probably won't remember either :-)
I'll give that commit a try to see how it works. I know that one of the
problems I was having was that after a baud rate change a heartbeat byte would be received at the wrong baud rate and show up as a 0xFE, that's the only case that is a permanent failure if we used that commit. I'm starting to wonder if the serial flush command given to the FTDI chip is working, or if the DC ignores the baud rate change some times.
If you change the baudrate while data is arriving, then it might be that some data was still read with the old baudrate (because it was already sitting in the driver receive queue).
That's what I thought, but a flush after the baud rate change didn't seem to make it less likely.
With fatal errors, I was referring to errors reported by operating system and/or driver. If for example the write call fails because the connection is down (usb cable unplugged, bluetooth connection lost) or some other serious I/O error happened, then there is no point in retrying. In that case you want to bailout as soon as possible. Typically only for errors at the communication protocol level (e.g. timeouts, corrupt data packets, etc) retrying helps. Those correspond to the DC_STATUS_TIMEOUT and DC_STATUS_PROTOCOL error codes. Everything else is pretty much fatal.
Ahh, I see it now. That commit should work fine.
On 2017-09-11 17:21, John Van Ostrand wrote:
On Mon, Sep 11, 2017 at 11:13 AM, Jef Driesen jef@libdivecomputer.org wrote:
If you change the baudrate while data is arriving, then it might be that some data was still read with the old baudrate (because it was already sitting in the driver receive queue).
That's what I thought, but a flush after the baud rate change didn't seem to make it less likely.
Maybe it works better when you purge the buffers before doing the break? And for the heartbeat you could try to use a small loop where you discard invalid heartbeat bytes, until you receive a good one.
With fatal errors, I was referring to errors reported by operating system and/or driver. If for example the write call fails because the connection is down (usb cable unplugged, bluetooth connection lost) or some other serious I/O error happened, then there is no point in retrying. In that case you want to bailout as soon as possible. Typically only for errors at the communication protocol level (e.g. timeouts, corrupt data packets, etc) retrying helps. Those correspond to the DC_STATUS_TIMEOUT and DC_STATUS_PROTOCOL error codes. Everything else is pretty much fatal.
Ahh, I see it now. That commit should work fine.
Patches have been pushed.
Jef