[PATCH] Cochran_commander refactoring

Wed Jan 21 07:52:28 PST 2015

On 16-01-15 18:14, John Van Ostrand wrote:
> Is the application going to expect to see dives from oldest to newest?

Yes! This is very important for the "download only new dives" feature. 
If the application receives the dives in newest first order, it can stop 
the download as soon as it recognizes a dive that it already has. 
Libdivecomputer's internal implementation (using the fingerprint 
feature) also requires this.

That reminds me of a somewhat related issue. Currently the entire 
profile ringbuffer (or at least the part that contains dives) gets 
downloaded with a single large request. For applications not using 
libdivecomputer' built-in fingerprint feature, that means we will first 
download ALL dives, and then throw away the ones the application has 
already downloaded before. That's not really optimal. Especially because 
downloading only the new dives happens to be our most common use-case.

This could be improved by downloading the profile data with multiple 
smaller requests. For example one request per dive profile (or two 
request if the dive happens to cross the ringbuffer end). I believe this 
is technically possible with the Cochran protocol. Multiple request will 
probably add some extra overhead when doing a complete download. But I 
doubt that outweighs the benefits in the normal case of downloading only 
a few dives.

With multiple smaller requests it's also possible to cancel a download. 
So that's an extra advantage that's nearly impossible to implement 
correctly with just one large request.

>> Rather than assuming a value that might be too large, I would start 
>> with a
>> relative small value and increase whenever there is evidence that 
>> there is
>> indeed more memory. Since the last memory area contains the profile
>> ringbuffer, this evidence will most likely be a ringbuffer pointer 
>> that
>> points past our assumed end. That's something that's very easy to 
>> check.
>> 
> 
> Certainly it's easier if we know where memory starts and ends and I may 
> be
> close to figuring that out. I've programmed this with a few ideals in 
> mind.
> The first is to not dump a dive just because we aren't absolutely sure 
> of
> the data. I argue this because I assume most end users of 
> libdivecomputer
> (e.g. subsurface users) are not relying on the data to validate diving
> algorithms, they want to see a graphical representation of their dive. 
> If
> they are like me they want to see most of their dive even if the last 
> few
> samples are wrong. This explains why I'd like to see an attempt to 
> retrieve
> corrupt dives and dives that wrap the sample buffer. In my experience 
> it
> shows useful data. I'll get to my other ideals later.

We're not dropping any dives, only informing the caller that we did 
encounter data that we (currently) can't handle correctly. That's 
something completely different!

Relying on guessing is always error-prone and unreliable. Just one wrong 
sample is enough to render the recovery of a corrupted dive completely 
useless. For example if there is one depth sample completely out of 
range, then the application will scale the profile accordingly. The 
result is that the good part of the profile will be squeezed completely, 
and you won't be able to see anything useful at all. And things can get 
much worse too (see examples below).

> [...]
> 
> I can understand the point of view that we should fail when we are 
> unsure
> of data and then expect the end user to file a bug report. This would
> likely get earlier reports of errors but in the interim many new users
> might be turned away when the software fails to work for them.

If you ask me, silently ignoring errors is an order of magnitude worse 
than gracefully failing with an error. Remember that if we ignore the 
error and try to proceed anyway, there is absolutely no guarantee that 
the result is correct. If you're lucky then the result is completely 
bogus and the user will notice immediately (e.g. unrealistic profiles). 
Smaller mistakes may go unnoticed at first, but if you realize after a 
while it might be already too late for re-downloading. And last but not 
least there is also the possibility of crashing the application.

I can give plenty examples of real life scenarios that I encountered 
during libdivecomputer development. Two examples: If a ringbuffer 
pointer is out of range, and you try to use it anyway with code that 
expects a pointer in a certain range, then you'll likely end up with a 
buffer overflow and crash the application. When parsing "random" data, 
you can easily end up with depths (or times) that are way beyond the 
normal range (e.g. order of several kilometres). Definitely a challenge 
for the decompression algorithm in the application. (Hint: I ran out of 
patience and just killed it.)

Anyway, my experience is that being very strict pays off in the long 
run. For the end-user, it may be a bit annoying initially, but it's 
certainly not making things worse.

Note that I have the impression we're mixing two unrelated issues in 
this discussion: (1) problems associated with the ringbuffer end, and 
(2) recovery of corrupt dives.

For issue #1, this is all about finding the correct ringbuffer begin/end 
address. Once we have those values, the problem is solved and there is 
absolutely no need for any guessing. That's why I insist on being very 
strict there. It will help us to find the correct values faster.

For issue #2 the situation is very different. Because we're dealing with 
corrupt data here, this will not be as easy as finding some magic 
values. Maybe it's indeed possible to recover some data without too much 
serious issues. In that case I'm fine doing that, but if not I lean 
towards sticking to the logbook header only and dropping the partial 
profile. But let's first see how far we get.

>> But remember that even if libdivecomputer itself doesn't use this
>> information nor provides an api to parse it, there might be someone 
>> out
>> there who is interested in the raw data. It's perfectly possible to 
>> build a
>> cochran specific application that use libdivecomputer only for 
>> downloading
>> the data, and does the parsing on its own. Probably not a very likely
>> scenario, but not impossible. Btw, that's also the reason why
>> libdivecomputer always provides access to the raw data!
>> 
>> Note that if we now strip this info, and change our minds later one, 
>> then
>> it might be too late. Changing the data format will break backwards
>> compatibility.
> 
> I agree it would be nice to provide this data via API. If we decide to
> provide this data via the existing API it's going to mess up a lot of
> existing applications. The inter-dive events may not be so bad, since 
> there
> are usually only a few events, keep in mind some dives have a lot. Also 
> I
> don't know how to parse data from these events.

If we consider the inter-dive events to be part of the dive, and include 
it in our raw dive data, that doesn't mean the parser has to do anything 
with it. The parser can simply ignore it if it doesn't fit well into the 
api. But it's still there, in case an applications wants to look at the 
raw binary data.

Just to make myself clear: For the inter-dive events I'm not sure 
whether we should include it or not. The libdivecomputer api is based 
around the concept of dives. Inter-dive events don't really fit well 
into that model. They are not really part of the dive that comes before 
or after it. We don't have any way to deliver non-dive data to the 
application either. So the only options I see at the moment, are to 
include it as a pre-dive data, or to discard it. Currently my preference 
is more towards the latter option, but at this point I want to leave all 
options open.

> The off-gassing data, if sent out through the samples_foreach function 
> are
> going to be incorporated into the aplications' dive profiles and appear 
> as
> flat trailing lines in the profile. Temperature might be the only
> interesting sample and barely so.

I doubt this is off-gassing data. It's an artifact of how dive computers 
determine the end of a dive. They typically consider a dive after 
spending X amount of time above a certain depth threshold Y. During this 
period they keep recording because you might descend again and resume 
the dive. When the dive is really over, some dive computers keep this 
tail intact, others rewind and discard it. It looks like the cochran has 
some sort of compromise: It leaves the tail intact, but also indicates 
at which point it considered the dive finished.

I would definitely include this data. Discarding the tail should be done 
by the application, where it can be configured by the user. (I believe 
subsurface already support this.) Some users will want to see it, and 
others don't.

> Already other interesting data from the Cochran data stream is not 
> provided
> through the API. Data like deepest stop time and tissue compartment 
> loads
> are not accessible. To make off-gassing samples meaningful we should 
> also
> present the in-dive TC loads and then we should indicate to the 
> application
> that this is inter-dive data.

The libdivecomputer api supports deco/ndl info, but not tissue loadings. 
The latter are too device specific. To be able to interpret tissue 
loadings, you need to know the decompression algorithm and all its 
parameter. That's info we simply don't have (and likely never will). 
Implementing a decompression algorithm in the application makes a lot 
more sense.

>> The only explanation that I can come up with is that the dive computer
>> started writing dive #95. Then it failed before the dive was finished 
>> (e.g.
>> battery empty, reboot or whatever) and the end pointer didn't get 
>> written
>> correctly. Hence the ffffffff value. But the end-of-profile pointer is
>> probably only updated after the dive is finished. Hence the pre-dive
>> pointer of the next dive retains the same value. I suspect that the 
>> current
>> write position is somehow preserved and thus the next dive will start 
>> from
>> there again. But this is of course just guessing.
>> 
> That's what I think too. given that flash can be slow and have wears 
> with
> writes I expect it's logical to eliminate unnecessary writes. Based on 
> what
> I've seen the Cochran code writes the first half of the log when the 
> dive
> starts (e.g. deeper than 6ft). There are two end-points to a dive. The
> first is when the diver ascends to 3ft. I think at this point Cochran
> updates the sample-end-pointer in config0. Cochran then continues 
> logging
> until the configured "Post dive surface interval minutes" (10 - 20 
> minutes)
> elapses. This is why you see 1800 bytes (10 min * 60 seconds * 3
> samples/sec) between dive-end-pointer and the pre-dive-pointer. At any 
> time
> prior to this elapsing a submersion to below 6 ft continues the dive. I
> think this is why these post-dive samples exist. After the elapsed
> interval, the end-dive log is written and now the pre-dive-pointer is 
> known
> and can be updated in config0.
> 
> So although the pre-dive pointer is bogus on the dive after a corrupt 
> dive,
> the dive sample data of the corrupt dive is intact but determining the 
> end
> has been difficult. Now that I've thought through the Cochran logic, I
> should go over my logic to see if I can better determine the end of a
> corrupt dive.

Makes sense.

Excellent info about that 1800 (emc) or 1201 (commander) bytes block. 
Initially I assumed this was some fixed size trailer, and I had no 
explanation for the fact that on some dives it's a few bytes larger. But 
you are right. It's indeed the tail of the dive profile, and those few 
extra bytes are events. More pieces start to fall into place!

>> I understand why you're trying to salvage those corrupt dives (in the 
>> sense
>> that having something is better than nothing at all), but I worry that 
>> this
>> will complicate the code or produce bogus data (which is even worse). 
>> When
>> I look at some of those salvaged dives, that seems to be the case 
>> already.
>> They end abrupt, which is probably normal. But there appears to be 
>> some
>> amount of "random" data (e.g. spikes in the profile) too. And that's 
>> not
>> good. Maybe this is just a bug in the parsing that we can fix? I 
>> simply
>> don't know at this point.
> 
> The code ins't too complicated because of it. A corrupt dive is known
> because of the FFFFFFFF dive-end-pointer. Knowing that we loop, exiting 
> on
> a memory boundary, when we see a sample that doesn't make sense (0xFF),
> when we see normal end-dive events or normal start-dive events. If we 
> don't
> find end-dive events and we stop because of start-dive events we may 
> have
> processed some inter-dive events as sample data and I think that's why 
> we
> see the spikes at the end. It's a few lines of code in the parse 
> function
> plus the guess function and a few if-thens. Since corrupt dives don't 
> have
> end-dive log data the code parses the samples for depth, temp and 
> duration,
> etc.
> 
> Now that I've walked through the cochran logic and why it's writing 
> sample
> pointers at certain times I might be able to do better guessing at
> end-dives.

Well the code is indeed not really complicated, but it's also not 
without flaws. Those spikes at the end are examples of that. Fixing 
those issues might turn out harder than expected. That's my main 
concern.

I suggest we first take care of the non-corrupt dives, and get the 
backend into a state where we can integrate it into libdivecomputer. 
Once we have that, we take a step back and try to come up with a 
solution to deal with those corrupt dives too. What do you think?

>> What does the Cochran application do with those corrupt dives? Does it
>> show them? With or without a profile?
> 
> Cochran refuses to display a corrupt dive.

So if we're not able to recover corrupt dives, we're at least not doing 
worse then Analyst. I think that's already a good starting point. If we 
manage to do better then great.

> You also mention that the
> Commander shows some erratic dives, although in the context of 
> reclaimed
> and overwritten sample data. There were a lot of Commander dives that
> looked corrupt in Analyst. I was happy that libdivecomputer/subsurface
> mirrored those corrupt dive profiles because it suggested the sample 
> data
> was corrupt or that we were at least as capable as Analyst.

Well, we're certainly not going to re-implement the bugs in the Analyst 
application :-)

Jef