[PATCH] Cochran_commander refactoring

John Van Ostrand john at vanostrand.com
Fri Jan 16 14:39:41 PST 2015


On Fri, Jan 16, 2015 at 12:14 PM, John Van Ostrand <john at vanostrand.com>
wrote:

> On Tue, Jan 13, 2015 at 8:06 AM, Jef Driesen <jef at libdivecomputer.org>
> wrote:
>
>> John,
>>
>> I finally had some time again for another look at the cochran backend.
>> This time I concentrated mainly on the data from your Commander. The main
>> issue there is that the profile ringbuffer is completely full. The oldest
>> logbook entries are pointing to profile data that has already been
>> overwritten with newer dives. Because this is not handled properly, those
>> older dives end up with completely bogus profile data.
>>
>> There are 347 logbook entries, and the end-of-profile (eop) pointer
>> contains the value 0x0009a20d. If I dump the pre, begin and end pointers
>> again, I get this:
>>
>>  346 00098290 00098478 00099d5a   488 1203
>>  345 00097d5f 00097d5f 00097ddf     0 1201
>>  344 0009661a 000966d3 000978ae   185 1201
>>  ...
>>  225 0009b16e 0009b16e 0009cbce     0 1201
>>  224 00099e22 00099e6f 0009acbd    77 1201 <- Contains the eop address.
>>  223 00096e85 00098109 00099971  4740 1201
>>  ...
>>   78 0009b79a 0009c2a1 0009d247  2823 1205
>>   77 00099eb5 00099edf 0009b2e9    42 1201 <- Yet another time here.
>>   76 00098588 000985c4 00099a04    60 1201
>>  ...
>>    1 00021928 00021a00 00023420   216 1201
>>    0 00020000 000201fd 00021477   509 1201
>>
>> As you can see dive #224 contains the eop address, which means part of
>> the profile belongs to dive #346. Thus all older dives no longer have valid
>> profile data.
>>
>> The easiest solution to fix this is to process dives from the oldest to
>> the newest one (which you already do), and sum the profile length. Once the
>> total length exceeds the size of the ringbuffer (e.g. the maximum amount of
>> profile data), all older dives will no longer have profile data. For those
>> dives you only have the logbook entry.
>>
>> I have attached a patch with a proof of concept implementation. It's far
>> from complete, but I think you'll get the idea. The patch also illustrate
>> how you can nicely concentrate all the main logic for downloading the dives
>> into the foreach function, instead of scattered over many functions. The
>> main idea is that you try to separate the download and parsing logic as
>> much as possible.
>>
>> (More comments inline.)
>>
>
> Is the application going to expect to see dives from oldest to newest?
>
>
>> On 2014-12-22 21:58, John Van Ostrand wrote:
>>
>>> On Mon, Dec 22, 2014 at 10:38 AM, Jef Driesen <jef at libdivecomputer.org>
>>>
>>>> A memory dump should be independent from whatever logic we use for
>>>> locating the dives. If you simply read everything from address 0 to the
>>>> end, then you we can keep using those old memory dumps. If a device has
>>>> lots of memory, then yes, it may take a while to download, but that's
>>>> fine.
>>>> Downloading a memory dump is not the normal use-case. It's intended for
>>>> troubleshooting, and then you do want all the data. (For reference,
>>>> downloading a full memory on a Suunto Vyper takes 10-15 minutes, and
>>>> that's
>>>> only 8KB of data!)
>>>>
>>>> If we really don't know the total amount of memory, then we'll have to
>>>> make an assumption. If that assumptions turns out to be wrong, we can
>>>> always adjust it. But at least we always have all data up to a certain
>>>> address.
>>>>
>>>>
>>> I'll have to commit to knowing the end of memory then. Right now I've
>>> avoided stating the end explicitly because I'm not sure where it is. If
>>> all
>>> Cochran DCs return 0xFF when past memory I can be a little liberal and
>>> not
>>> crash.
>>>
>>
>> Rather than assuming a value that might be too large, I would start with
>> a relative small value and increase whenever there is evidence that there
>> is indeed more memory. Since the last memory area contains the profile
>> ringbuffer, this evidence will most likely be a ringbuffer pointer that
>> points past our assumed end. That's something that's very easy to check.
>>
>
> Certainly it's easier if we know where memory starts and ends and I may be
> close to figuring that out. I've programmed this with a few ideals in mind.
> The first is to not dump a dive just because we aren't absolutely sure of
> the data. I argue this because I assume most end users of libdivecomputer
> (e.g. subsurface users) are not relying on the data to validate diving
> algorithms, they want to see a graphical representation of their dive. If
> they are like me they want to see most of their dive even if the last few
> samples are wrong. This explains why I'd like to see an attempt to retrieve
> corrupt dives and dives that wrap the sample buffer. In my experience it
> shows useful data. I'll get to my other ideals later.
>
> The code is and was using the logic you describe above initially because I
> had limited experience with Cochran DCs. I use logic like that in
> cochran_get_sample_parms() where I interate through the dives to establish
> a low/high range of sample data. I round down or up as appropriate to 16k
> boundaries. 16K representing 91 min of dive samples on an EMC or 136 min on
> a Commander. I figured this was a reasonable number to capture most
> recreational dives that wrapped. Rounding both values is an attempt to
> capture samples for dives that have wrapped the DC buffer. On a new
> computer the first log entry should point to the start of memory however
> consider a DC where the log book has wrapped. The log entry with the lowest
> start memory location will probably not point to the start of sample
> memory. The start of sample memory will likely have sample data from the
> last dive that wrapped. The same is true for the end of sample memory. In
> other words the log book is not guaranteed to have a dive that points to
> the exact start or end of memory. We could ignore any dive with wrapping
> sample data, as you recommend but I think there's a good case for doing
> everything to present a dive to the end user.
>
> The rounded sample-end-pointer was also used in
> cochran_guess_sample_end_address() prior to commit
> e4608460f246565a260e821f7869de668f79caa8. Guessing was done when the dive
> end wasn't recorded properly. The sample-end-pointer was also used to
> reassemble sample data for dives that wrapped the sample buffer. I should
> revisit this code to make sure it still works as intended.
>
> It wasn't until recent changes to the dump function that I decided I would
> commit to a memory end value. You expressed concern about guessing the end
> of samples, specifically where they wrapped and so I removed guessing
> entirely and went with hard coded values. I also based this decision on
> what Cochran's dealer said and what I've seen with the DCs I've tested.
> I've mentioned before that the DC can be configured to use less memory,
> dealers have the ability to restrict and grant memory usage. I'm told that
> the dealer's software does this and as such I don't know if Cochran charges
> the dealer or if this is pure profit for the dealer. Keeping that in mind I
> haven't seen a Cochran with restricted memory, partly perhaps because I
> have only seen one where  sample memory was exhausted and possibly because
> dealers (maybe this one in particular) just grant all the memory or
> customers order it enabled. Basically I became much more comfortable hard
> coding the sample-end-pointer based on my experience.
>
> I can understand the point of view that we should fail when we are unsure
> of data and then expect the end user to file a bug report. This would
> likely get earlier reports of errors but in the interim many new users
> might be turned away when the software fails to work for them.
>
>
>> If we start with a value that is too large, and there is a device where
>> the profile data crosses the ringbuffer end, then we will bogus profile
>> data. I'll illustrate with an example. Assume for a momement that the real
>> profile ringbuffer is in the range 0x200-0x400, and we (incorrectly) assume
>> the end is at 0x800. If there happens to be a dive that crosses the
>> ringbuffer end, for example from 0x380 to 0x280, then we'll read a first
>> part from 0x380 to 0x800 and a second part from 0x200 to 0x280.
>>
>> 200  280  380  400            800
>>  |xxxxx    xxxxx|              |    <- Correct
>>  |yyyyy    yyyyy|yyyyyyyyyyyyyy|    <- Wrong
>>
>> As you can see that first part contains an extra 0x400 bytes that
>> shouldn't be there. We can't easily detect this kind of errors, other than
>> the user noticing the resulting profile is completely bogus.
>>
>> If we had estimated the amount of memory too small, for example at 0x300,
>> then we would be able to tell immediately that the 0x380 pointer is outside
>> the allowed range. Then we know we have to adjust the range.
>>
>>  Since the other pieces of data (id, misc and config) have a fixed size,
>>>> the structure for a good memory dump could be as simple as concatenating
>>>> all pieces:
>>>>
>>>> id       67 bytes
>>>> misc     1500 bytes
>>>> config   2x512 (emc) or 4x512 (commander) bytes
>>>> memory   variable length, memory starting at address
>>>>
>>>> An alternative is to only include the memory part, and take care of the
>>>> other three separately, by means of the vendor event. This is how it's
>>>> done
>>>> in the several other backends (e.g. oceanic and reefnet).
>>>>
>>>>
>>> Considering I'm one of two users of this I'd like to have the extra data.
>>> And considering the person doing the dump knows the model it should be
>>> apparent how to parse the data so I don't need to make a generic format.
>>>
>>
>> I'm not sure I understand your concerns here. Of course we want that
>> extra data. To locate the dives, we need at a minimum the id and config0
>> packets. And depending on what we discover, we might need the others too at
>> some point. I did only mention that instead of pre-pending the id, misc and
>> config blocks to the memory dump, we could deliver them separately by means
>> of the vendor event. That's how it's done in those other backends.
>>
>>  Do you think we should consider the pre-dive event as part of the dive
>>>> data? Based on your description, these pre-dive events do not look
>>>> really
>>>> interesting from a dive logbook point of view. I have no idea how the
>>>> parser should deal with it, other than just ignoring them.
>>>>
>>>>
>>> If the data were readily available I'd be interested in some of the data,
>>> like altitude and temp changes while travelling. However I've looked at
>>> some of the raw pre-dive data and I can't parse the event data. All that
>>> said I think this data is a passing interest.
>>>
>>> I took my cue from Analyst. It doesn't show the pre-dive events on dive
>>> profiles nor does it show the off-gassing data. It has an option to
>>> display
>>> "inter-dive" events and does so similar to a dive. I've not been
>>> interested
>>> in the data, except to find out what it was. I've been thinking of
>>> libdivecomputer in the context of Subsurface which has no mechanism to
>>> show
>>> inter-dive events or data. I think libdivecomputer would need some other
>>> mechanism to present this data and if only the Cochran does this it may
>>> not
>>> make sense to spend the time.
>>>
>>
>> That's also my impression.
>>
>> But remember that even if libdivecomputer itself doesn't use this
>> information nor provides an api to parse it, there might be someone out
>> there who is interested in the raw data. It's perfectly possible to build a
>> cochran specific application that use libdivecomputer only for downloading
>> the data, and does the parsing on its own. Probably not a very likely
>> scenario, but not impossible. Btw, that's also the reason why
>> libdivecomputer always provides access to the raw data!
>>
>> Note that if we now strip this info, and change our minds later one, then
>> it might be too late. Changing the data format will break backwards
>> compatibility.
>
>
> I agree it would be nice to provide this data via API. If we decide to
> provide this data via the existing API it's going to mess up a lot of
> existing applications. The inter-dive events may not be so bad, since there
> are usually only a few events, keep in mind some dives have a lot. Also I
> don't know how to parse data from these events.
>
> The off-gassing data, if sent out through the samples_foreach function are
> going to be incorporated into the aplications' dive profiles and appear as
> flat trailing lines in the profile. Temperature might be the only
> interesting sample and barely so.
>
> Already other interesting data from the Cochran data stream is not
> provided through the API. Data like deepest stop time and tissue
> compartment loads are not accessible. To make off-gassing samples
> meaningful we should also present the in-dive TC loads and then we should
> indicate to the application that this is inter-dive data.
>
>
>  A pointer of FFFFFFFF means the end of dive was never logged. I had a
>>>> problem with my DC where it would reboot not logging the end of dive.
>>>> The
>>>> code tries to salvage the dive logs.
>>>>
>>>>
>>> Can you recover some of the samples? Based on what I've seen so far (see
>>> my long table) it makes no sense to try to recover the samples, because
>>> it
>>> looks like at least part of the data you are recovering are the pre-dive
>>> events of another dive. That certainly doesn't look right.
>>>
>>> Look for example at dive #79:
>>>
>>> 78 0014b8f3 0014b94a 0014ef9d    87 1800
>>> 79 0014f6a5 0014f70d ffffffff   104 1800
>>> 80 0014f6a5 0015043c 00151442  3479 1373862
>>>
>>> Your code to guess the end pointer will replace ffffffff with 0015043c,
>>> which is the start of the next dive (#80). So for dive #79 you take the
>>> range 0014f70d to 0015043c as the sample data. But that memory range is
>>> also the last part of the pre-dive events for dive #80. This overlap
>>> doesn't make any sense to me.
>>>
>>>
>> You can see several of those starting about there. The battery was low and
>> the DC was starting to exhibit the crash problem so I was intentionally
>> invoking it about that time so I could be confident in describing the
>> problem.
>>
>> Take a look at
>>
>>  95 001627c7 0016447b ffffffff 7348 1451976
>>  96 001627c7 0016463c 001661b3 7797 1451976
>>
>> That 7348 byte of samples on dive 95 was a 40 minute dive in Tobermory,
>> Canada followed by another about that length. Both imported mostly
>> correctly. I'd rather have that flawed dive profile than nothing.
>>
>> I'm not sure what dive 79 was, but I figured the user would rather see
>> corrupt dive imported than none so s/he could make a choice.
>>
>
> If you look at dive #96, then there is 7797 bytes of pre-dive events in
>> the range 001627c7-0016463c. But 7348 of those bytes are also the pre-dive
>> events of dive #96 (range 001627c7-0016447b). That's what puzzles me. How
>> can the pre-dive events of two dives overlap? That simply doesn't make any
>> sense to me.
>>
>> If you then guess the end address of dive #95 to be the start address of
>> dive #96, then the samples of dive #95 end up as somewhere in the pre-dive
>> data of dive #96. That's just weird.
>>
>>
>> The only explanation that I can come up with is that the dive computer
>> started writing dive #95. Then it failed before the dive was finished (e.g.
>> battery empty, reboot or whatever) and the end pointer didn't get written
>> correctly. Hence the ffffffff value. But the end-of-profile pointer is
>> probably only updated after the dive is finished. Hence the pre-dive
>> pointer of the next dive retains the same value. I suspect that the current
>> write position is somehow preserved and thus the next dive will start from
>> there again. But this is of course just guessing.
>>
>>
> That's what I think too. given that flash can be slow and have wears with
> writes I expect it's logical to eliminate unnecessary writes. Based on what
> I've seen the Cochran code writes the first half of the log when the dive
> starts (e.g. deeper than 6ft). There are two end-points to a dive. The
> first is when the diver ascends to 3ft. I think at this point Cochran
> updates the sample-end-pointer in config0. Cochran then continues logging
> until the configured "Post dive surface interval minutes" (10 - 20 minutes)
> elapses. This is why you see 1800 bytes (10 min * 60 seconds * 3
> samples/sec) between dive-end-pointer and the pre-dive-pointer. At any time
> prior to this elapsing a submersion to below 6 ft continues the dive. I
> think this is why these post-dive samples exist. After the elapsed
> interval, the end-dive log is written and now the pre-dive-pointer is known
> and can be updated in config0.
>
> So although the pre-dive pointer is bogus on the dive after a corrupt
> dive, the dive sample data of the corrupt dive is intact but determining
> the end has been difficult. Now that I've thought through the Cochran
> logic, I should go over my logic to see if I can better determine the end
> of a corrupt dive.
>
> I understand why you're trying to salvage those corrupt dives (in the
>> sense that having something is better than nothing at all), but I worry
>> that this will complicate the code or produce bogus data (which is even
>> worse). When I look at some of those salvaged dives, that seems to be the
>> case already. They end abrupt, which is probably normal. But there appears
>> to be some amount of "random" data (e.g. spikes in the profile) too. And
>> that's not good. Maybe this is just a bug in the parsing that we can fix? I
>> simply don't know at this point.
>>
>
> The code ins't too complicated because of it. A corrupt dive is known
> because of the FFFFFFFF dive-end-pointer. Knowing that we loop, exiting on
> a memory boundary, when we see a sample that doesn't make sense (0xFF),
> when we see normal end-dive events or normal start-dive events. If we don't
> find end-dive events and we stop because of start-dive events we may have
> processed some inter-dive events as sample data and I think that's why we
> see the spikes at the end. It's a few lines of code in the parse function
> plus the guess function and a few if-thens. Since corrupt dives don't have
> end-dive log data the code parses the samples for depth, temp and duration,
> etc.
>
> Now that I've walked through the cochran logic and why it's writing sample
> pointers at certain times I might be able to do better guessing at
> end-dives.
>

I've looked into three of the corrupt dives on the EMC, dives 4, 6, 13 and
14. Given the existing logic, the strange data at the end of those dive
profiles is the inter-dive events before the next dive. There seem to be
three events 0x10, 00, and 02. This is a crash and reboot. The data looks
like this:

dive 4/5
00004dc0  40 69 00 42 91 09 41 69  0a 43 93 0a 42 69 09 a8
|@i.B..Ai.C..Bi..|
00004dd0  c0 42 80 08 02 68 07 42  80 06 42 68 05 40 80 04  |.B...h.B..Bh.@
..|
00004de0  42 68 04 42 80 03 40 68  02 40 80 02 02 68 01 40  |Bh.B.. at h.
@...h.@|
00004df0  80 01 01 68 00 41 80 00  40 68 00 01 80 00 41 68  |...h.A.. at h.
...Ah|
00004e00  10 a4 70 36 2a 16 0e 11  0a 06 0e b5 02 00 00 00
|..p6*...........|
00004e10  00 c6 48 00 00 00 a5 70  36 2a 17 0e 11 0a 06 0e
|..H....p6*......|
00004e20  b5 02 00 00 00 00 c6 48  02 ab 70 36 2a 1d 0e 11
|.......H..p6*...|
00004e30  0a 06 0e f9 02 00 00 00  00 c6 48 00 e3 f3 40 02  |..........H...@
.|
00004e40  00 01 69 00 02 10 00 40  69 00 41 12 00 41 69 00  |..i.... at i.A.
.Ai.|


dive 6/7
000073c0  6d 00 02 0e 04 01 6d 03  40 0e 03 01 6c 02 02 0e  |m.....m.@
...l...|
000073d0  02 01 6c 01 02 14 06 02  6c 09 01 18 0c 01 6c 0f
|..l.....l.....l.|
000073e0  02 16 11 40 6c 10 02 13  0e 01 6c 0c 40 11 09 40  |... at l.....l.@
..@|
000073f0  6c 00 40 0b 00 40 6c 00  41 83 00 40 6c 00 40 84  |l. at ..@l.A.. at l.
@.|
00007400  10 8b 80 36 2a 0d 16 12  0a 06 0e b3 02 00 00 00
|...6*...........|
00007410  00 02 49 00 00 00 8c 80  36 2a 0e 16 12 0a 06 0e
|..I.....6*......|
00007420  b3 02 00 00 00 00 02 49  02 92 80 36 2a 14 16 12
|.......I...6*...|
00007430  0a 06 0e f6 02 00 00 00  00 3d 49 00 e3 f3 40 08  |.........=I...@
.|
00007440  00 02 6b 00 01 15 00 01  6b 00 01 1b 00 40 6b 00
|..k.....k.... at k.|

dive 13/14
00014de0  01 40 69 01 40 0d ff 01  69 ff 01 08 00 41 69 00  |. at i.@
...i....Ai.|
00014df0  01 05 07 40 69 09 40 03  0b 41 69 0d 41 83 0f 41  |... at i.@
..Ai.A..A|
00014e00  10 a4 b4 3f 2a 1a 39 11  11 06 0e b7 02 00 00 00
|...?*.9.........|
00014e10  00 79 49 00 00 00 a5 b4  3f 2a 1b 39 11 11 06 0e
|.yI.....?*.9....|
00014e20  b7 02 00 00 00 00 79 49  02 ab b4 3f 2a 21 39 11
|......yI...?*!9.|
00014e30  11 06 0e fb 02 00 00 00  00 3d 49 00 e3 f3 40 00  |.........=I...@
.|
00014e40  00 02 6a 00 42 07 00 40  6a 00 41 81 00 40 6a 00  |..j.B.. at j.A.
. at j.|
00014e50  01 83 00 03 6a 00 01 11  00 40 6a 00 42 12 00 43  |....j.... at j.B.
.C|

dive 14/15
000157e0  00 40 80 00 02 69 01 01  80 02 01 69 03 01 80 04
|. at ...i.....i....|
000157f0  41 69 04 40 80 04 41 69  04 40 80 00 41 69 00 41  |Ai. at ..Ai.@
..Ai.A|
00015800  10 97 b8 3f 2a 11 0e 12  11 06 0e b4 02 00 00 00
|...?*...........|
00015810  00 3d 49 00 00 00 98 b8  3f 2a 12 0e 12 11 06 0e
|.=I.....?*......|
00015820  b4 02 00 00 00 00 3d 49  02 9e b8 3f 2a 18 0e 12
|......=I...?*...|
00015830  11 06 0e fb 02 00 00 00  00 3d 49 00 e3 f3 40 07  |.........=I...@
.|
00015840  00 07 6d 00 02 2a 00 03  6d 00 02 38 00 01 6d 00
|..m..*..m..8..m.|

Look for the "e3 f3" bytes in the excerpts, that is the beginning of the
next dive. If you back track you'll see 3 interdive events. You can use the
table of sizes in the source code to isolate the events but it's easier to
look for the time stamp which ends with 2a and is four bytes in total. The
byte prior to the time stamp is the event code.

For example in sample 14/15 the breakdown is:

Dive 14 samples:
000157e0  00 40 80 00 02 69 01 01  80 02 01 69 03 01 80 04
|. at ...i.....i....|
000157f0  41 69 04 40 80 04 41 69  04 40 80 00 41 69 00 41  |Ai. at ..Ai.@
..Ai.A|

Interdive events

Event 0x10
00015800  10 97 b8 3f 2a 11 0e 12  11 06 0e b4 02 00 00 00
|...?*...........|
00015810  00 3d 49 00 00

Event 00
                          00 98 b8  3f 2a 12 0e 12 11 06 0e
|.=I.....?*......|
00015820  b4 02 00 00 00 00 3d 49

Event 02
                                  02 9e b8 3f 2a 18 0e 12
|......=I...?*...|
00015830  11 06 0e fb 02 00 00 00  00 3d 49 00

Dive 15 samples:
                                                e3 f3 40 07
|.........=I... at .|
00015840  00 07 6d 00 02 2a 00 03  6d 00 02 38 00 01 6d 00
|..m..*..m..8..m.|

I think this pattern may be specific to the EMC and specific to the failure
event, a crash/reboot in this case. I could cycle through the table of
inter-dive events and to see if the byte code matches and test to see if
the time stamp is reasonable. That should work to find the actual end of
previous dive.


What does the Cochran application do with those corrupt dives? Does it show
> them? With or without a profile?
>

Cochran refuses to display a corrupt dive. You also mention that the
Commander shows some erratic dives, although in the context of reclaimed
and overwritten sample data. There were a lot of Commander dives that
looked corrupt in Analyst. I was happy that libdivecomputer/subsurface
mirrored those corrupt dive profiles because it suggested the sample data
was corrupt or that we were at least as capable as Analyst.


>
>  I've been holding off asking Cochran again for help. They previously
>>>> refused citing concern that open source code would reveal too much about
>>>> their algorithm. Now that the code exists I wonder if they would be
>>>> willing
>>>> to help by telling me how to determine model and memory configuration,
>>>> or
>>>> if they will demand that I stop.. It's also the same reason I've been
>>>> trying to respect their privacy by not reading anything but my data.
>>>>
>>>>
>>> That's their usual argument. The truth is that I've never encountered a
>>> device where you can download anything worth protecting (e.g. firmware
>>> code, decompression algorithm). All you can access is the memory
>>> containing
>>> the dives.
>>>
>>
>> Have you had any success in changing their minds? Do you have any good
>> examples of DC manufacturers that have been open that I could use as
>> examples when talking to Cochran next?
>>
>
> My experience so far is that either they are open minded (HW, Reefnet,
> Atomics, Shearwater, etc) or not. Convincing them appears to be rather
> difficult :-(
>
>

>
> --
> John Van Ostrand
> At large on sabbatical
>
>


-- 
John Van Ostrand
At large on sabbatical
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libdivecomputer.org/pipermail/devel/attachments/20150116/e7dc9dad/attachment-0001.html>


More information about the devel mailing list