Re: [PATCH] Cochran_commander refactoring

21 Jan 2015


      On 16-01-15 18:14, John Van Ostrand wrote:
...
Is the application going to expect to see dives from oldest to newest?
Yes! This is very important for the "download only new dives" feature. 
If the application receives the dives in newest first order, it can stop 
the download as soon as it recognizes a dive that it already has. 
Libdivecomputer's internal implementation (using the fingerprint 
feature) also requires this.
That reminds me of a somewhat related issue. Currently the entire 
profile ringbuffer (or at least the part that contains dives) gets 
downloaded with a single large request. For applications not using 
libdivecomputer' built-in fingerprint feature, that means we will first 
download ALL dives, and then throw away the ones the application has 
already downloaded before. That's not really optimal. Especially because 
downloading only the new dives happens to be our most common use-case.
This could be improved by downloading the profile data with multiple 
smaller requests. For example one request per dive profile (or two 
request if the dive happens to cross the ringbuffer end). I believe this 
is technically possible with the Cochran protocol. Multiple request will 
probably add some extra overhead when doing a complete download. But I 
doubt that outweighs the benefits in the normal case of downloading only 
a few dives.
With multiple smaller requests it's also possible to cancel a download. 
So that's an extra advantage that's nearly impossible to implement 
correctly with just one large request.
...
...
Rather than assuming a value that might be too large, I would start 
with a
relative small value and increase whenever there is evidence that 
there is
indeed more memory. Since the last memory area contains the profile
ringbuffer, this evidence will most likely be a ringbuffer pointer 
that
points past our assumed end. That's something that's very easy to 
check.
Certainly it's easier if we know where memory starts and ends and I may 
be
close to figuring that out. I've programmed this with a few ideals in 
mind.
The first is to not dump a dive just because we aren't absolutely sure 
of
the data. I argue this because I assume most end users of 
libdivecomputer
(e.g. subsurface users) are not relying on the data to validate diving
algorithms, they want to see a graphical representation of their dive. 
If
they are like me they want to see most of their dive even if the last 
few
samples are wrong. This explains why I'd like to see an attempt to 
retrieve
corrupt dives and dives that wrap the sample buffer. In my experience 
it
shows useful data. I'll get to my other ideals later.
We're not dropping any dives, only informing the caller that we did 
encounter data that we (currently) can't handle correctly. That's 
something completely different!
Relying on guessing is always error-prone and unreliable. Just one wrong 
sample is enough to render the recovery of a corrupted dive completely 
useless. For example if there is one depth sample completely out of 
range, then the application will scale the profile accordingly. The 
result is that the good part of the profile will be squeezed completely, 
and you won't be able to see anything useful at all. And things can get 
much worse too (see examples below).
...
[...]
I can understand the point of view that we should fail when we are 
unsure
of data and then expect the end user to file a bug report. This would
likely get earlier reports of errors but in the interim many new users
might be turned away when the software fails to work for them.
If you ask me, silently ignoring errors is an order of magnitude worse 
than gracefully failing with an error. Remember that if we ignore the 
error and try to proceed anyway, there is absolutely no guarantee that 
the result is correct. If you're lucky then the result is completely 
bogus and the user will notice immediately (e.g. unrealistic profiles). 
Smaller mistakes may go unnoticed at first, but if you realize after a 
while it might be already too late for re-downloading. And last but not 
least there is also the possibility of crashing the application.
I can give plenty examples of real life scenarios that I encountered 
during libdivecomputer development. Two examples: If a ringbuffer 
pointer is out of range, and you try to use it anyway with code that 
expects a pointer in a certain range, then you'll likely end up with a 
buffer overflow and crash the application. When parsing "random" data, 
you can easily end up with depths (or times) that are way beyond the 
normal range (e.g. order of several kilometres). Definitely a challenge 
for the decompression algorithm in the application. (Hint: I ran out of 
patience and just killed it.)
Anyway, my experience is that being very strict pays off in the long 
run. For the end-user, it may be a bit annoying initially, but it's 
certainly not making things worse.
Note that I have the impression we're mixing two unrelated issues in 
this discussion: (1) problems associated with the ringbuffer end, and 
(2) recovery of corrupt dives.
For issue #1, this is all about finding the correct ringbuffer begin/end 
address. Once we have those values, the problem is solved and there is 
absolutely no need for any guessing. That's why I insist on being very 
strict there. It will help us to find the correct values faster.
For issue #2 the situation is very different. Because we're dealing with 
corrupt data here, this will not be as easy as finding some magic 
values. Maybe it's indeed possible to recover some data without too much 
serious issues. In that case I'm fine doing that, but if not I lean 
towards sticking to the logbook header only and dropping the partial 
profile. But let's first see how far we get.
...
...
But remember that even if libdivecomputer itself doesn't use this
information nor provides an api to parse it, there might be someone 
out
there who is interested in the raw data. It's perfectly possible to 
build a
cochran specific application that use libdivecomputer only for 
downloading
the data, and does the parsing on its own. Probably not a very likely
scenario, but not impossible. Btw, that's also the reason why
libdivecomputer always provides access to the raw data!
Note that if we now strip this info, and change our minds later one, 
then
it might be too late. Changing the data format will break backwards
compatibility.
I agree it would be nice to provide this data via API. If we decide to
provide this data via the existing API it's going to mess up a lot of
existing applications. The inter-dive events may not be so bad, since 
there
are usually only a few events, keep in mind some dives have a lot. Also 
I
don't know how to parse data from these events.
If we consider the inter-dive events to be part of the dive, and include 
it in our raw dive data, that doesn't mean the parser has to do anything 
with it. The parser can simply ignore it if it doesn't fit well into the 
api. But it's still there, in case an applications wants to look at the 
raw binary data.
Just to make myself clear: For the inter-dive events I'm not sure 
whether we should include it or not. The libdivecomputer api is based 
around the concept of dives. Inter-dive events don't really fit well 
into that model. They are not really part of the dive that comes before 
or after it. We don't have any way to deliver non-dive data to the 
application either. So the only options I see at the moment, are to 
include it as a pre-dive data, or to discard it. Currently my preference 
is more towards the latter option, but at this point I want to leave all 
options open.
...
The off-gassing data, if sent out through the samples_foreach function 
are
going to be incorporated into the aplications' dive profiles and appear 
as
flat trailing lines in the profile. Temperature might be the only
interesting sample and barely so.
I doubt this is off-gassing data. It's an artifact of how dive computers 
determine the end of a dive. They typically consider a dive after 
spending X amount of time above a certain depth threshold Y. During this 
period they keep recording because you might descend again and resume 
the dive. When the dive is really over, some dive computers keep this 
tail intact, others rewind and discard it. It looks like the cochran has 
some sort of compromise: It leaves the tail intact, but also indicates 
at which point it considered the dive finished.
I would definitely include this data. Discarding the tail should be done 
by the application, where it can be configured by the user. (I believe 
subsurface already support this.) Some users will want to see it, and 
others don't.
...
Already other interesting data from the Cochran data stream is not 
provided
through the API. Data like deepest stop time and tissue compartment 
loads
are not accessible. To make off-gassing samples meaningful we should 
also
present the in-dive TC loads and then we should indicate to the 
application
that this is inter-dive data.
The libdivecomputer api supports deco/ndl info, but not tissue loadings. 
The latter are too device specific. To be able to interpret tissue 
loadings, you need to know the decompression algorithm and all its 
parameter. That's info we simply don't have (and likely never will). 
Implementing a decompression algorithm in the application makes a lot 
more sense.
...
...
The only explanation that I can come up with is that the dive computer
started writing dive #95. Then it failed before the dive was finished 
(e.g.
battery empty, reboot or whatever) and the end pointer didn't get 
written
correctly. Hence the ffffffff value. But the end-of-profile pointer is
probably only updated after the dive is finished. Hence the pre-dive
pointer of the next dive retains the same value. I suspect that the 
current
write position is somehow preserved and thus the next dive will start 
from
there again. But this is of course just guessing.
That's what I think too. given that flash can be slow and have wears 
with
writes I expect it's logical to eliminate unnecessary writes. Based on 
what
I've seen the Cochran code writes the first half of the log when the 
dive
starts (e.g. deeper than 6ft). There are two end-points to a dive. The
first is when the diver ascends to 3ft. I think at this point Cochran
updates the sample-end-pointer in config0. Cochran then continues 
logging
until the configured "Post dive surface interval minutes" (10 - 20 
minutes)
elapses. This is why you see 1800 bytes (10 min * 60 seconds * 3
samples/sec) between dive-end-pointer and the pre-dive-pointer. At any 
time
prior to this elapsing a submersion to below 6 ft continues the dive. I
think this is why these post-dive samples exist. After the elapsed
interval, the end-dive log is written and now the pre-dive-pointer is 
known
and can be updated in config0.
So although the pre-dive pointer is bogus on the dive after a corrupt 
dive,
the dive sample data of the corrupt dive is intact but determining the 
end
has been difficult. Now that I've thought through the Cochran logic, I
should go over my logic to see if I can better determine the end of a
corrupt dive.
Makes sense.
Excellent info about that 1800 (emc) or 1201 (commander) bytes block. 
Initially I assumed this was some fixed size trailer, and I had no 
explanation for the fact that on some dives it's a few bytes larger. But 
you are right. It's indeed the tail of the dive profile, and those few 
extra bytes are events. More pieces start to fall into place!
...
...
I understand why you're trying to salvage those corrupt dives (in the 
sense
that having something is better than nothing at all), but I worry that 
this
will complicate the code or produce bogus data (which is even worse). 
When
I look at some of those salvaged dives, that seems to be the case 
already.
They end abrupt, which is probably normal. But there appears to be 
some
amount of "random" data (e.g. spikes in the profile) too. And that's 
not
good. Maybe this is just a bug in the parsing that we can fix? I 
simply
don't know at this point.
The code ins't too complicated because of it. A corrupt dive is known
because of the FFFFFFFF dive-end-pointer. Knowing that we loop, exiting 
on
a memory boundary, when we see a sample that doesn't make sense (0xFF),
when we see normal end-dive events or normal start-dive events. If we 
don't
find end-dive events and we stop because of start-dive events we may 
have
processed some inter-dive events as sample data and I think that's why 
we
see the spikes at the end. It's a few lines of code in the parse 
function
plus the guess function and a few if-thens. Since corrupt dives don't 
have
end-dive log data the code parses the samples for depth, temp and 
duration,
etc.
Now that I've walked through the cochran logic and why it's writing 
sample
pointers at certain times I might be able to do better guessing at
end-dives.
Well the code is indeed not really complicated, but it's also not 
without flaws. Those spikes at the end are examples of that. Fixing 
those issues might turn out harder than expected. That's my main 
concern.
I suggest we first take care of the non-corrupt dives, and get the 
backend into a state where we can integrate it into libdivecomputer. 
Once we have that, we take a step back and try to come up with a 
solution to deal with those corrupt dives too. What do you think?
...
...
What does the Cochran application do with those corrupt dives? Does it
show them? With or without a profile?
Cochran refuses to display a corrupt dive.
So if we're not able to recover corrupt dives, we're at least not doing 
worse then Analyst. I think that's already a good starting point. If we 
manage to do better then great.
...
You also mention that the
Commander shows some erratic dives, although in the context of 
reclaimed
and overwritten sample data. There were a lot of Commander dives that
looked corrupt in Analyst. I was happy that libdivecomputer/subsurface
mirrored those corrupt dive profiles because it suggested the sample 
data
was corrupt or that we were at least as capable as Analyst.
Well, we're certainly not going to re-implement the bugs in the Analyst 
application :-)
Jef

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH] Cochran_commander refactoring