[PATCH] Cochran_commander refactoring
Jef Driesen
jef at libdivecomputer.org
Wed Jan 21 07:52:28 PST 2015
On 16-01-15 18:14, John Van Ostrand wrote:
> Is the application going to expect to see dives from oldest to newest?
Yes! This is very important for the "download only new dives" feature.
If the application receives the dives in newest first order, it can stop
the download as soon as it recognizes a dive that it already has.
Libdivecomputer's internal implementation (using the fingerprint
feature) also requires this.
That reminds me of a somewhat related issue. Currently the entire
profile ringbuffer (or at least the part that contains dives) gets
downloaded with a single large request. For applications not using
libdivecomputer' built-in fingerprint feature, that means we will first
download ALL dives, and then throw away the ones the application has
already downloaded before. That's not really optimal. Especially because
downloading only the new dives happens to be our most common use-case.
This could be improved by downloading the profile data with multiple
smaller requests. For example one request per dive profile (or two
request if the dive happens to cross the ringbuffer end). I believe this
is technically possible with the Cochran protocol. Multiple request will
probably add some extra overhead when doing a complete download. But I
doubt that outweighs the benefits in the normal case of downloading only
a few dives.
With multiple smaller requests it's also possible to cancel a download.
So that's an extra advantage that's nearly impossible to implement
correctly with just one large request.
>> Rather than assuming a value that might be too large, I would start
>> with a
>> relative small value and increase whenever there is evidence that
>> there is
>> indeed more memory. Since the last memory area contains the profile
>> ringbuffer, this evidence will most likely be a ringbuffer pointer
>> that
>> points past our assumed end. That's something that's very easy to
>> check.
>>
>
> Certainly it's easier if we know where memory starts and ends and I may
> be
> close to figuring that out. I've programmed this with a few ideals in
> mind.
> The first is to not dump a dive just because we aren't absolutely sure
> of
> the data. I argue this because I assume most end users of
> libdivecomputer
> (e.g. subsurface users) are not relying on the data to validate diving
> algorithms, they want to see a graphical representation of their dive.
> If
> they are like me they want to see most of their dive even if the last
> few
> samples are wrong. This explains why I'd like to see an attempt to
> retrieve
> corrupt dives and dives that wrap the sample buffer. In my experience
> it
> shows useful data. I'll get to my other ideals later.
We're not dropping any dives, only informing the caller that we did
encounter data that we (currently) can't handle correctly. That's
something completely different!
Relying on guessing is always error-prone and unreliable. Just one wrong
sample is enough to render the recovery of a corrupted dive completely
useless. For example if there is one depth sample completely out of
range, then the application will scale the profile accordingly. The
result is that the good part of the profile will be squeezed completely,
and you won't be able to see anything useful at all. And things can get
much worse too (see examples below).
> [...]
>
> I can understand the point of view that we should fail when we are
> unsure
> of data and then expect the end user to file a bug report. This would
> likely get earlier reports of errors but in the interim many new users
> might be turned away when the software fails to work for them.
If you ask me, silently ignoring errors is an order of magnitude worse
than gracefully failing with an error. Remember that if we ignore the
error and try to proceed anyway, there is absolutely no guarantee that
the result is correct. If you're lucky then the result is completely
bogus and the user will notice immediately (e.g. unrealistic profiles).
Smaller mistakes may go unnoticed at first, but if you realize after a
while it might be already too late for re-downloading. And last but not
least there is also the possibility of crashing the application.
I can give plenty examples of real life scenarios that I encountered
during libdivecomputer development. Two examples: If a ringbuffer
pointer is out of range, and you try to use it anyway with code that
expects a pointer in a certain range, then you'll likely end up with a
buffer overflow and crash the application. When parsing "random" data,
you can easily end up with depths (or times) that are way beyond the
normal range (e.g. order of several kilometres). Definitely a challenge
for the decompression algorithm in the application. (Hint: I ran out of
patience and just killed it.)
Anyway, my experience is that being very strict pays off in the long
run. For the end-user, it may be a bit annoying initially, but it's
certainly not making things worse.
Note that I have the impression we're mixing two unrelated issues in
this discussion: (1) problems associated with the ringbuffer end, and
(2) recovery of corrupt dives.
For issue #1, this is all about finding the correct ringbuffer begin/end
address. Once we have those values, the problem is solved and there is
absolutely no need for any guessing. That's why I insist on being very
strict there. It will help us to find the correct values faster.
For issue #2 the situation is very different. Because we're dealing with
corrupt data here, this will not be as easy as finding some magic
values. Maybe it's indeed possible to recover some data without too much
serious issues. In that case I'm fine doing that, but if not I lean
towards sticking to the logbook header only and dropping the partial
profile. But let's first see how far we get.
>> But remember that even if libdivecomputer itself doesn't use this
>> information nor provides an api to parse it, there might be someone
>> out
>> there who is interested in the raw data. It's perfectly possible to
>> build a
>> cochran specific application that use libdivecomputer only for
>> downloading
>> the data, and does the parsing on its own. Probably not a very likely
>> scenario, but not impossible. Btw, that's also the reason why
>> libdivecomputer always provides access to the raw data!
>>
>> Note that if we now strip this info, and change our minds later one,
>> then
>> it might be too late. Changing the data format will break backwards
>> compatibility.
>
> I agree it would be nice to provide this data via API. If we decide to
> provide this data via the existing API it's going to mess up a lot of
> existing applications. The inter-dive events may not be so bad, since
> there
> are usually only a few events, keep in mind some dives have a lot. Also
> I
> don't know how to parse data from these events.
If we consider the inter-dive events to be part of the dive, and include
it in our raw dive data, that doesn't mean the parser has to do anything
with it. The parser can simply ignore it if it doesn't fit well into the
api. But it's still there, in case an applications wants to look at the
raw binary data.
Just to make myself clear: For the inter-dive events I'm not sure
whether we should include it or not. The libdivecomputer api is based
around the concept of dives. Inter-dive events don't really fit well
into that model. They are not really part of the dive that comes before
or after it. We don't have any way to deliver non-dive data to the
application either. So the only options I see at the moment, are to
include it as a pre-dive data, or to discard it. Currently my preference
is more towards the latter option, but at this point I want to leave all
options open.
> The off-gassing data, if sent out through the samples_foreach function
> are
> going to be incorporated into the aplications' dive profiles and appear
> as
> flat trailing lines in the profile. Temperature might be the only
> interesting sample and barely so.
I doubt this is off-gassing data. It's an artifact of how dive computers
determine the end of a dive. They typically consider a dive after
spending X amount of time above a certain depth threshold Y. During this
period they keep recording because you might descend again and resume
the dive. When the dive is really over, some dive computers keep this
tail intact, others rewind and discard it. It looks like the cochran has
some sort of compromise: It leaves the tail intact, but also indicates
at which point it considered the dive finished.
I would definitely include this data. Discarding the tail should be done
by the application, where it can be configured by the user. (I believe
subsurface already support this.) Some users will want to see it, and
others don't.
> Already other interesting data from the Cochran data stream is not
> provided
> through the API. Data like deepest stop time and tissue compartment
> loads
> are not accessible. To make off-gassing samples meaningful we should
> also
> present the in-dive TC loads and then we should indicate to the
> application
> that this is inter-dive data.
The libdivecomputer api supports deco/ndl info, but not tissue loadings.
The latter are too device specific. To be able to interpret tissue
loadings, you need to know the decompression algorithm and all its
parameter. That's info we simply don't have (and likely never will).
Implementing a decompression algorithm in the application makes a lot
more sense.
>> The only explanation that I can come up with is that the dive computer
>> started writing dive #95. Then it failed before the dive was finished
>> (e.g.
>> battery empty, reboot or whatever) and the end pointer didn't get
>> written
>> correctly. Hence the ffffffff value. But the end-of-profile pointer is
>> probably only updated after the dive is finished. Hence the pre-dive
>> pointer of the next dive retains the same value. I suspect that the
>> current
>> write position is somehow preserved and thus the next dive will start
>> from
>> there again. But this is of course just guessing.
>>
> That's what I think too. given that flash can be slow and have wears
> with
> writes I expect it's logical to eliminate unnecessary writes. Based on
> what
> I've seen the Cochran code writes the first half of the log when the
> dive
> starts (e.g. deeper than 6ft). There are two end-points to a dive. The
> first is when the diver ascends to 3ft. I think at this point Cochran
> updates the sample-end-pointer in config0. Cochran then continues
> logging
> until the configured "Post dive surface interval minutes" (10 - 20
> minutes)
> elapses. This is why you see 1800 bytes (10 min * 60 seconds * 3
> samples/sec) between dive-end-pointer and the pre-dive-pointer. At any
> time
> prior to this elapsing a submersion to below 6 ft continues the dive. I
> think this is why these post-dive samples exist. After the elapsed
> interval, the end-dive log is written and now the pre-dive-pointer is
> known
> and can be updated in config0.
>
> So although the pre-dive pointer is bogus on the dive after a corrupt
> dive,
> the dive sample data of the corrupt dive is intact but determining the
> end
> has been difficult. Now that I've thought through the Cochran logic, I
> should go over my logic to see if I can better determine the end of a
> corrupt dive.
Makes sense.
Excellent info about that 1800 (emc) or 1201 (commander) bytes block.
Initially I assumed this was some fixed size trailer, and I had no
explanation for the fact that on some dives it's a few bytes larger. But
you are right. It's indeed the tail of the dive profile, and those few
extra bytes are events. More pieces start to fall into place!
>> I understand why you're trying to salvage those corrupt dives (in the
>> sense
>> that having something is better than nothing at all), but I worry that
>> this
>> will complicate the code or produce bogus data (which is even worse).
>> When
>> I look at some of those salvaged dives, that seems to be the case
>> already.
>> They end abrupt, which is probably normal. But there appears to be
>> some
>> amount of "random" data (e.g. spikes in the profile) too. And that's
>> not
>> good. Maybe this is just a bug in the parsing that we can fix? I
>> simply
>> don't know at this point.
>
> The code ins't too complicated because of it. A corrupt dive is known
> because of the FFFFFFFF dive-end-pointer. Knowing that we loop, exiting
> on
> a memory boundary, when we see a sample that doesn't make sense (0xFF),
> when we see normal end-dive events or normal start-dive events. If we
> don't
> find end-dive events and we stop because of start-dive events we may
> have
> processed some inter-dive events as sample data and I think that's why
> we
> see the spikes at the end. It's a few lines of code in the parse
> function
> plus the guess function and a few if-thens. Since corrupt dives don't
> have
> end-dive log data the code parses the samples for depth, temp and
> duration,
> etc.
>
> Now that I've walked through the cochran logic and why it's writing
> sample
> pointers at certain times I might be able to do better guessing at
> end-dives.
Well the code is indeed not really complicated, but it's also not
without flaws. Those spikes at the end are examples of that. Fixing
those issues might turn out harder than expected. That's my main
concern.
I suggest we first take care of the non-corrupt dives, and get the
backend into a state where we can integrate it into libdivecomputer.
Once we have that, we take a step back and try to come up with a
solution to deal with those corrupt dives too. What do you think?
>> What does the Cochran application do with those corrupt dives? Does it
>> show them? With or without a profile?
>
> Cochran refuses to display a corrupt dive.
So if we're not able to recover corrupt dives, we're at least not doing
worse then Analyst. I think that's already a good starting point. If we
manage to do better then great.
> You also mention that the
> Commander shows some erratic dives, although in the context of
> reclaimed
> and overwritten sample data. There were a lot of Commander dives that
> looked corrupt in Analyst. I was happy that libdivecomputer/subsurface
> mirrored those corrupt dive profiles because it suggested the sample
> data
> was corrupt or that we were at least as capable as Analyst.
Well, we're certainly not going to re-implement the bugs in the Analyst
application :-)
Jef
More information about the devel
mailing list