On 2014-03-12 03:17, Linus Torvalds wrote:
On Tue, Mar 11, 2014 at 5:20 PM, Hamish Moffatt hamish@cloud.net.au wrote:
Here's a log from Windows. There appears to be a whole lot less timing precision so it's a bit hard to compare, other than noting the absence of NAKs from the VT3.
Hmm. The windows log is for 0.5.0-devel, with the Linux one being 0.4.2. But I assume there are no big libdivecomputer changes.
[0.156] INFO: Write: size=2, data=8400 [0.156] INFO: Read: size=1, data=5A [0.156] INFO: Read: size=17, data=4F43452056543320523244203531324BBF [0.172] dc_device_dump [0.172] INFO: Write: size=4, data=B1000000 [0.187] INFO: Read: size=1, data=5A
So the Linux dump sometimes got A5 instead of 5A.
But the *timing* is very vert different. Here's Linux getting the right data:
[0.104540] INFO: Write: size=2, data=8400 [0.104551] INFO: Read: size=1, data=5A
and here's the same thing with the wrong data:
[0.109560] INFO: Sleep: value=1 [0.110624] INFO: Write: size=4, data=B1000000 [0.110634] INFO: Read: size=1, data=A5
Note how it did a two-byte write, and then immediately a read within 10 microseconds.
Which is a bit odd. Jef, shouldn't the sleep be *after* the write, and before the read? If there are any duplex issues, the "read -> write" turnaround isn't the problem (because by the time the read returns, we certainly know the data had been fully sent from the other side), but the "write -> read" turnaround might be problematic if the reader gets confused by DSR coming on while it's still receiving.
With the half-duplex workaround, the sleep really does happen after the write. But the logging for the serial_write function happens at the end of the function, but after the call to the serial_sleep() function. The result is that in the log it appears as if the sleep is before the write. This is indeed a bit confusing.
But as Hamish already clarified, this is probably an attempt with a sleep call added before the write. The half-duplex hack always adds a 2ms fudge factor, so if this was the result of enabling the half-duplex hack, the sleep time should always be at least 2ms.
I no longer believe this is a half-duplex problem. The interface has three wires, so that's probably GND, TX and RX. In that case there shouldn't be any problem with starting a read before the write has finished. It would just take a bit longer before the read receives its data. With the Suunto's there were only two wires, which makes the timing of switching between read and write a lot more critical. The Suunto's also need the toggling of the RTS line, which is not the case for the Oceanics.
Hamish already confirmed that if he adds a small delay before the write, then everything starts to work fine. I don't have a good explanation for this. I get the impression that without the extra delay we are sending the next command too fast. Maybe the dive computer is still busy processing the previous command? But it has already send the response to the previous command (because we have already processed it), so I have absolutely no idea what it could be doing. Also why does it need this delay on Linux but not on Windows? Is the Windows driver maybe slower to send out the data, and this extra delay happens to be just enough to make it work?
If you look at the error handling, you'll see that there is already a 100ms delay before sending the command again. Since the second attempt is usually successful, that's another indication that the problem is read->write turnaround rather than the write->read.
Just a thought, but maybe one or more bytes of the command get lost somehow when trying to send too fast? If the dive computer gets no bytes at all, it also doesn't reply and we get a timeout. If only the first few bytes get lost, the dive computer will see an incomplete command and will respond with a NAK to indicate it can't process the request. That would explain the two type of errors (timeout or nak) we get, but then the next question is of course how these bytes disappear?
Jef