On 16-01-15 18:14, John Van Ostrand wrote:
Is the application going to expect to see dives from oldest to newest?
Yes! This is very important for the "download only new dives" feature. If the application receives the dives in newest first order, it can stop the download as soon as it recognizes a dive that it already has. Libdivecomputer's internal implementation (using the fingerprint feature) also requires this.
That reminds me of a somewhat related issue. Currently the entire profile ringbuffer (or at least the part that contains dives) gets downloaded with a single large request. For applications not using libdivecomputer' built-in fingerprint feature, that means we will first download ALL dives, and then throw away the ones the application has already downloaded before. That's not really optimal. Especially because downloading only the new dives happens to be our most common use-case.
This could be improved by downloading the profile data with multiple smaller requests. For example one request per dive profile (or two request if the dive happens to cross the ringbuffer end). I believe this is technically possible with the Cochran protocol. Multiple request will probably add some extra overhead when doing a complete download. But I doubt that outweighs the benefits in the normal case of downloading only a few dives.
With multiple smaller requests it's also possible to cancel a download. So that's an extra advantage that's nearly impossible to implement correctly with just one large request.
Rather than assuming a value that might be too large, I would start with a relative small value and increase whenever there is evidence that there is indeed more memory. Since the last memory area contains the profile ringbuffer, this evidence will most likely be a ringbuffer pointer that points past our assumed end. That's something that's very easy to check.
Certainly it's easier if we know where memory starts and ends and I may be close to figuring that out. I've programmed this with a few ideals in mind. The first is to not dump a dive just because we aren't absolutely sure of the data. I argue this because I assume most end users of libdivecomputer (e.g. subsurface users) are not relying on the data to validate diving algorithms, they want to see a graphical representation of their dive. If they are like me they want to see most of their dive even if the last few samples are wrong. This explains why I'd like to see an attempt to retrieve corrupt dives and dives that wrap the sample buffer. In my experience it shows useful data. I'll get to my other ideals later.
We're not dropping any dives, only informing the caller that we did encounter data that we (currently) can't handle correctly. That's something completely different!
Relying on guessing is always error-prone and unreliable. Just one wrong sample is enough to render the recovery of a corrupted dive completely useless. For example if there is one depth sample completely out of range, then the application will scale the profile accordingly. The result is that the good part of the profile will be squeezed completely, and you won't be able to see anything useful at all. And things can get much worse too (see examples below).
[...]
I can understand the point of view that we should fail when we are unsure of data and then expect the end user to file a bug report. This would likely get earlier reports of errors but in the interim many new users might be turned away when the software fails to work for them.
If you ask me, silently ignoring errors is an order of magnitude worse than gracefully failing with an error. Remember that if we ignore the error and try to proceed anyway, there is absolutely no guarantee that the result is correct. If you're lucky then the result is completely bogus and the user will notice immediately (e.g. unrealistic profiles). Smaller mistakes may go unnoticed at first, but if you realize after a while it might be already too late for re-downloading. And last but not least there is also the possibility of crashing the application.
I can give plenty examples of real life scenarios that I encountered during libdivecomputer development. Two examples: If a ringbuffer pointer is out of range, and you try to use it anyway with code that expects a pointer in a certain range, then you'll likely end up with a buffer overflow and crash the application. When parsing "random" data, you can easily end up with depths (or times) that are way beyond the normal range (e.g. order of several kilometres). Definitely a challenge for the decompression algorithm in the application. (Hint: I ran out of patience and just killed it.)
Anyway, my experience is that being very strict pays off in the long run. For the end-user, it may be a bit annoying initially, but it's certainly not making things worse.
Note that I have the impression we're mixing two unrelated issues in this discussion: (1) problems associated with the ringbuffer end, and (2) recovery of corrupt dives.
For issue #1, this is all about finding the correct ringbuffer begin/end address. Once we have those values, the problem is solved and there is absolutely no need for any guessing. That's why I insist on being very strict there. It will help us to find the correct values faster.
For issue #2 the situation is very different. Because we're dealing with corrupt data here, this will not be as easy as finding some magic values. Maybe it's indeed possible to recover some data without too much serious issues. In that case I'm fine doing that, but if not I lean towards sticking to the logbook header only and dropping the partial profile. But let's first see how far we get.
But remember that even if libdivecomputer itself doesn't use this information nor provides an api to parse it, there might be someone out there who is interested in the raw data. It's perfectly possible to build a cochran specific application that use libdivecomputer only for downloading the data, and does the parsing on its own. Probably not a very likely scenario, but not impossible. Btw, that's also the reason why libdivecomputer always provides access to the raw data!
Note that if we now strip this info, and change our minds later one, then it might be too late. Changing the data format will break backwards compatibility.
I agree it would be nice to provide this data via API. If we decide to provide this data via the existing API it's going to mess up a lot of existing applications. The inter-dive events may not be so bad, since there are usually only a few events, keep in mind some dives have a lot. Also I don't know how to parse data from these events.
If we consider the inter-dive events to be part of the dive, and include it in our raw dive data, that doesn't mean the parser has to do anything with it. The parser can simply ignore it if it doesn't fit well into the api. But it's still there, in case an applications wants to look at the raw binary data.
Just to make myself clear: For the inter-dive events I'm not sure whether we should include it or not. The libdivecomputer api is based around the concept of dives. Inter-dive events don't really fit well into that model. They are not really part of the dive that comes before or after it. We don't have any way to deliver non-dive data to the application either. So the only options I see at the moment, are to include it as a pre-dive data, or to discard it. Currently my preference is more towards the latter option, but at this point I want to leave all options open.
The off-gassing data, if sent out through the samples_foreach function are going to be incorporated into the aplications' dive profiles and appear as flat trailing lines in the profile. Temperature might be the only interesting sample and barely so.
I doubt this is off-gassing data. It's an artifact of how dive computers determine the end of a dive. They typically consider a dive after spending X amount of time above a certain depth threshold Y. During this period they keep recording because you might descend again and resume the dive. When the dive is really over, some dive computers keep this tail intact, others rewind and discard it. It looks like the cochran has some sort of compromise: It leaves the tail intact, but also indicates at which point it considered the dive finished.
I would definitely include this data. Discarding the tail should be done by the application, where it can be configured by the user. (I believe subsurface already support this.) Some users will want to see it, and others don't.
Already other interesting data from the Cochran data stream is not provided through the API. Data like deepest stop time and tissue compartment loads are not accessible. To make off-gassing samples meaningful we should also present the in-dive TC loads and then we should indicate to the application that this is inter-dive data.
The libdivecomputer api supports deco/ndl info, but not tissue loadings. The latter are too device specific. To be able to interpret tissue loadings, you need to know the decompression algorithm and all its parameter. That's info we simply don't have (and likely never will). Implementing a decompression algorithm in the application makes a lot more sense.
The only explanation that I can come up with is that the dive computer started writing dive #95. Then it failed before the dive was finished (e.g. battery empty, reboot or whatever) and the end pointer didn't get written correctly. Hence the ffffffff value. But the end-of-profile pointer is probably only updated after the dive is finished. Hence the pre-dive pointer of the next dive retains the same value. I suspect that the current write position is somehow preserved and thus the next dive will start from there again. But this is of course just guessing.
That's what I think too. given that flash can be slow and have wears with writes I expect it's logical to eliminate unnecessary writes. Based on what I've seen the Cochran code writes the first half of the log when the dive starts (e.g. deeper than 6ft). There are two end-points to a dive. The first is when the diver ascends to 3ft. I think at this point Cochran updates the sample-end-pointer in config0. Cochran then continues logging until the configured "Post dive surface interval minutes" (10 - 20 minutes) elapses. This is why you see 1800 bytes (10 min * 60 seconds * 3 samples/sec) between dive-end-pointer and the pre-dive-pointer. At any time prior to this elapsing a submersion to below 6 ft continues the dive. I think this is why these post-dive samples exist. After the elapsed interval, the end-dive log is written and now the pre-dive-pointer is known and can be updated in config0.
So although the pre-dive pointer is bogus on the dive after a corrupt dive, the dive sample data of the corrupt dive is intact but determining the end has been difficult. Now that I've thought through the Cochran logic, I should go over my logic to see if I can better determine the end of a corrupt dive.
Makes sense.
Excellent info about that 1800 (emc) or 1201 (commander) bytes block. Initially I assumed this was some fixed size trailer, and I had no explanation for the fact that on some dives it's a few bytes larger. But you are right. It's indeed the tail of the dive profile, and those few extra bytes are events. More pieces start to fall into place!
I understand why you're trying to salvage those corrupt dives (in the sense that having something is better than nothing at all), but I worry that this will complicate the code or produce bogus data (which is even worse). When I look at some of those salvaged dives, that seems to be the case already. They end abrupt, which is probably normal. But there appears to be some amount of "random" data (e.g. spikes in the profile) too. And that's not good. Maybe this is just a bug in the parsing that we can fix? I simply don't know at this point.
The code ins't too complicated because of it. A corrupt dive is known because of the FFFFFFFF dive-end-pointer. Knowing that we loop, exiting on a memory boundary, when we see a sample that doesn't make sense (0xFF), when we see normal end-dive events or normal start-dive events. If we don't find end-dive events and we stop because of start-dive events we may have processed some inter-dive events as sample data and I think that's why we see the spikes at the end. It's a few lines of code in the parse function plus the guess function and a few if-thens. Since corrupt dives don't have end-dive log data the code parses the samples for depth, temp and duration, etc.
Now that I've walked through the cochran logic and why it's writing sample pointers at certain times I might be able to do better guessing at end-dives.
Well the code is indeed not really complicated, but it's also not without flaws. Those spikes at the end are examples of that. Fixing those issues might turn out harder than expected. That's my main concern.
I suggest we first take care of the non-corrupt dives, and get the backend into a state where we can integrate it into libdivecomputer. Once we have that, we take a step back and try to come up with a solution to deal with those corrupt dives too. What do you think?
What does the Cochran application do with those corrupt dives? Does it show them? With or without a profile?
Cochran refuses to display a corrupt dive.
So if we're not able to recover corrupt dives, we're at least not doing worse then Analyst. I think that's already a good starting point. If we manage to do better then great.
You also mention that the Commander shows some erratic dives, although in the context of reclaimed and overwritten sample data. There were a lot of Commander dives that looked corrupt in Analyst. I was happy that libdivecomputer/subsurface mirrored those corrupt dive profiles because it suggested the sample data was corrupt or that we were at least as capable as Analyst.
Well, we're certainly not going to re-implement the bugs in the Analyst application :-)
Jef