[PATCH] Cochran_commander refactoring

Jef Driesen jef at libdivecomputer.org
Mon Dec 22 07:38:03 PST 2014


On 08-12-14 00:50, John Van Ostrand wrote:
> On Sun, Dec 7, 2014 at 3:14 PM, Jef Driesen <jef at libdivecomputer.org> 
> wrote:
>> With the exception of the id, config and misc packets, the protocol is
>> mainly based on simple random access to the internal memory. There is 
>> no
>> dedicated command to read the logbook and samples. They are simply 
>> stored
>> in a certain area of the internal memory. So for the dump api it makes 
>> much
>> more sense to implement as full memory dump, instead of reading just 
>> two
>> "random" pieces of the memory. The result is that the dump would be 
>> much
>> more generic, because you no longer need to interpret the data (in 
>> order to
>> find the logbook and samples). This is important when you need the 
>> memory
>> dump to diagnose bugs in the logic for downloading the dives. The fact 
>> that
>> we don't seem to know the total amount of memory, might be a problem 
>> here.

What I meant here is that in the data that can be downloaded with the 
read command there are two main areas of interest: the logbook and 
sample ringbuffers. Right now, you read some part of each ringbuffer, 
and pack them into your memory dump structure. But because the start 
address and size depends on the current implementation, that's not very 
useful as a memory dump.

I'll illustrate with an example. Assume for a moment that the current 
code finds valid sample data in the area 0x094000 to 0x096000. Later on 
we discover a bug in the code, and the end address should have been 
0x098000. That means all memory dump that have been downloaded with the 
old code are now useless because they don't contain enough data. If the 
start address is wrong, it gets even worse.

A memory dump should be independent from whatever logic we use for 
locating the dives. If you simply read everything from address 0 to the 
end, then you we can keep using those old memory dumps. If a device has 
lots of memory, then yes, it may take a while to download, but that's 
fine. Downloading a memory dump is not the normal use-case. It's 
intended for troubleshooting, and then you do want all the data. (For 
reference, downloading a full memory on a Suunto Vyper takes 10-15 
minutes, and that's only 8KB of data!)

If we really don't know the total amount of memory, then we'll have to 
make an assumption. If that assumptions turns out to be wrong, we can 
always adjust it. But at least we always have all data up to a certain 
address.


Since the other pieces of data (id, misc and config) have a fixed size, 
the structure for a good memory dump could be as simple as concatenating 
all pieces:

id       67 bytes
misc     1500 bytes
config   2x512 (emc) or 4x512 (commander) bytes
memory   variable length, memory starting at address

An alternative is to only include the memory part, and take care of the 
other three separately, by means of the vendor event. This is how it's 
done in the several other backends (e.g. oceanic and reefnet).

> What I find interesting is that logbook data is at address 0 (for the 
> read
> command) and that the sample data read is far enough along to justify 
> the
> config pages and maybe some program data, although I have no idea how 
> large
> the program for this could be. GIven the background of Mike Cochran I
> suspect it's tight code.
> 
> So for the EMC 16 and 20H the logbook is from 0 to 0x80000  (1024 
> entries)
> then samples begine at 0x94000. I haven't looked to see what the 
> 0x14000
> bytes between the logbook and sample are, I suspect config and program
> data, although I'm surprised that doesn't start a 0.
> 
> Now for the EMC 20H the quoted maximum sample data is 1450 hours. 3 
> byte
> samples * 60 seconds * 60 minutes * 1450 hours is 0xEEF3E0, or close to
> 16MB, when added to the 0x94000 base. It's short but I suspect that 
> 1450 is
> a rounded number.
> 
> To add to this, EMC models are sold with different amounts of memory
> configured. Word from the dealer is that it's configurable via 
> software.
> That means each 20H is shipped with the same hardware configuration and
> that something in one of the other data blocks indicates how much 
> memory is
> enabled. I've been unable to find where that might be and I'm worried 
> it's
> stored as a feature code instead of simply the amount of memory.

My advice is to start with a very simple assumption. If your assumption 
is wrong, then someday someone will be affected, and report it. At that 
point you can fix your assumption.

Let's look at logbook area. It seems the ringbuffer is located at 
address 0 to 0x80000 (1024 entries of 512 bytes) for the emc and address 
0 to 0x20000 (512 entries of 256 bytes) for the commander . We also have 
the number of dives from the config block. If it's less than 1024/512, 
then we know the ringbuffer isn't full yet. A reasonable assumption is 
that it will be at offset 0. If there are more, then all entries will be 
valid and we can find the oldest one by means of the internal dive 
number. I think those are very reasonable assumptions. In fact that's 
pretty much the description of the OSTC3 algorithm.

Now, if a device is limited by its firmware to less than 1024/512 
logbook entries, then the above will work fine until someone exceeds 
this artificial limit. At that point our algorithm will fail, and we'll 
get a bugreport. But the good news is that we now also have real data to 
confirm and fix the problem.

For the sample area, we know the start address of the ringbuffer, but 
not the end. What we could do is assume the ringbuffer is unlimited in 
size (e.g. address 0xFFFFFFFF, which is the maximum we can support with 
a 32bit integer) and fail when the end address is before the start 
address. Normally that indicates that we reached the end of the 
ringbuffer. Again, we now have access to data where we can figure out 
the exact ringbuffer end.


The idea is that you always start with a very simple assumptions, and 
improve them if they turn out to be wrong. That's in a nutshell how 
things are usually done in libdivecomputer. When you make an (educated) 
guess, without any hard evidence, then you have a very high chance that 
your guess is still wrong. But the problem is that the resulting bugs 
will be a lot more subtle and harder to spot and fix.

For example if a ringbuffer end address is wrong, then you'll either 
miss a few bytes (address too small) or have some garbage bytes inserted 
(address too large). A few missing samples can easily go unnoticed.

>> The next thing where I have some concerns, is the logic how you find 
>> the
>> logbook and samples. It seems we only know the start address of the 
>> logbook
>> and sample ringbuffers (respectively 0 and 0x094000 for the emc), but 
>> not
>> the end address. And that results in a few issues:
> 
> We don't really know the start. I just worked with my first EMC-14 and 
> its
> samples start at 0x20000, mostly because it supports only 256 logs.

If the first dive starts at that address, then it's likely the start 
address of the ringbuffer. Of course it's not a guarantee, but 
nevertheless a good assumption. Let's say it's correct until proven 
wrong :-)

>> For the logbook, you assume the first logbook entry is at the address 
>> 0,
>> and then move forward until you reach the total number of dives. But 
>> in
>> practice, the number of dives that can be stored is not unlimited, and 
>> the
>> memory is most likely used in a circular way. Thus, once you have 
>> enough
>> dives, this assumption will no longer be correct. Ideally we need to 
>> figure
>> out the end of the ringbuffer, so we can take this into account 
>> correctly.
>> If we are unable to find this address, then we need at least some 
>> checks to
>> verify our assumptions (for example check that the dive number starts 
>> at
>> zero and always increases, check that we don't go past the sample 
>> area,
>> etc). If you ask me, returning an error is always better than silently
>> returning some wrong data.
>> 
> 
> The number of log book entries is also configured via software. In 
> other
> words the DC can be configured to ignore part of the logbook 
> ringbuffer. It
> might be a head/tail pointer or something.
> 
> You can see that the number of dives is available in the config data 
> block.
> I suspect that a starting dive number is also present  but since it's 
> set
> at 00 00 now it can't be identified. Also because of the insane number 
> of
> dives this computer will log it would take me 8 years to wrap mine.
> 
> Checking dive numbers will actually work for most divers. I suspect 
> that
> all but a few will never wrap the ring buffer before changing DCs.

See my explanation above. Have a look at the hw_ostc3_device_foreach 
function.

I believe I have found something else that will be of interest here.

At offset 0x142 in config0 there is an "end-of-profile" pointer. In the 
data I got from you it contains 0x00191467. That nicely corresponds to 
the end pointer (0x00190d5f) of the last logbook entry (#114), plus the 
1800 byte "trailer". This is "end-of-profile" pointer exist in most data 
formats and is typically used internally by the dive computer as the 
position to start writing the sample data for the next dive.

The next 4 bytes contains the value 0xE400. This is the address where 
the next logbook entry will be written (114 * 512). So this is probably 
the "end-of-logbook" pointer.

With this information, the ringbuffer starts to look very similar to 
what I've seen in other data formats.

>> A similar problem exist when reading the samples. You have some code 
>> that
>> tries to guess where the ringbuffer ends. But there is absolutely no
>> guarantee that those assumptions are correct. I think it's much safer 
>> to
>> not guess at all and just fail if the ringbuffer wrap point is crossed
>> (e.g. begin address is higher than the end address). Once someone hits 
>> this
>> error, we can investigate and determine the correct ringbuffer end 
>> address.
>> If we're guessing, and get it wrong, then we're silently returning 
>> bogus
>> data. That's not good. And the chance that your guess is wrong is 
>> pretty
>> high. Although I agree that the total memory is likely nicely aligned, 
>> the
>> sample ringbuffer can end anywhere. I've seen that more than once in 
>> other
>> backends.
>> 
> 
> It's more likely that a ring buffer will wrap than the logbook. I have 
> a
> Commander that has already wrapped (I bought it used) and so I have a
> working example. From USB traffic I could see the addresses used by 
> Analyst
> and it's a nice clean alignment. I have no evidence on an EMC but it 
> seems
> safe to assume it's a minor upgrade on the Commander software.

So looks like you already have some real data to figure out the 
ringbuffer end for the commander. Great.

Note that if the requested addresses are nicely aligned, then there is 
no guarantee that the ringbuffer end address is also nicely aligned. The 
application might read more data than it needs, there might be data that 
is not part of the ringbuffer but still used for something else, etc.

>> [One method that I have used successfully in the past to locate the
>> ringbuffer end, without having access to data that crosses it, is by
>> looking at the full memory dump. Usually unused data in flash memory
>> contains all 0xFF. So if there is another block located right after 
>> the
>> ringbuffer area, then the unused tail of the ringbuffer will contain 
>> 0xFF,
>> followed by some non 0xFF data from that other block. Thus the first
>> address containing non 0xFF bytes is very likely the end of the 
>> ringbuffer.]
> 
> There are three problems with that. First, for the EMC 20H, that's 16 
> MB of
> data and even at 800K baud it's going to be a while to download. 
> Probing
> might make more sense. Reading 1K at certain points as a test.

A long download might be annoying, but it's not a big deal either. It's 
not something you'll need to do every day.

> Second, since the sample ring buffer size is software configurable the 
> end
> may not be apparent. I expect the end of the buffer is truncated on a 
> DC
> configured with less memory. That means that the technique of probing 
> will
> probably return a larger size than it should.

That might be true, but we can't write code to handle things we don't 
know yet. Only for the things we do know.

> Third, I suspect that the end of sample memory is the end of physical
> memory. It's a guess based on the number of quoted number of samples
> supported and the starting address of samples. I don't know what will
> happen if we read beyond physical memory.

Without actually trying, I have no idea what will happen when you read 
past the available memory. The read request may fail, the device could 
automatically wrap around to some other address (and not necessary to 
zero), etc. All possible outcomes that I've seen in the past.

>> The next big question is where does the sample data begins/ends? So 
>> far
>> you have identified three pointers: pre, begin and end address. When I 
>> dump
>> those pointers (see below), I noticed something interesting:
>> 
>>   * The 5th column is the difference between begin and pre. According 
>> to
>> your documentation, these are inter dive events. They can be zero 
>> length
>> too. Should we consider those as part of the dive, or ignore them? 
>> Since I
>> don't know what's in those interdive events, I have no idea.
>> 
>>   * The 6th column is the difference between pre and end of the 
>> previous
>> entry. This is nearly always equal to 1800. I doubt that's a 
>> coincidence. I
>> guess that means there is some 1800 byte header or trailer. Do you 
>> have any
>> idea what's in there?
>> 
>> * When the end pointer is ffffffff, the next entry has the same pre
>> pointer. Because it makes no sense that the sample data for two 
>> different
>> dives start at the same address, I assume this means that for some 
>> reason
>> (e.g. battery empty, too short duration, etc) this dive has no sample 
>> data.
>> What you did in this case is take the data until the start of the next
>> dive. But if I'm right, that's the inter dive events. So this makes no
>> sense to me. Do you get anything useful from doing this?
>> 
>> 
> Pre-dive events consist of the following (and more) according to
> documentation.
> 
> - Initialization of the unit.
> - The unit is turned on
> - Low batteries
> - Altitude Changes of over 500 feet
> - Temperature Changes of 10 degrees
> - Sensor Malfunction
> - Analyst ® P.C. Communication
> 
> Interdive events like the above are multiple bytes in length, up to 20
> bytes. The DC has no on or off switch. It turns on under certain 
> conditions
> and off after a certain amount of off-gassing after a dive. So many 
> dives
> will have at least an On event occur before dives. The first dive has a 
> lot
> of events because I was configuring the device via Analyst and each 
> config
> change seems to be logged.
> 
> The DC also stays on after a dive and logs off-gassing. I'd bet that's 
> the
> 1800 bytes of logs after. Although I assumed it was longer. Off gassing
> samples are recognizable by the depth change and ascent rate samples 
> being
> 0x40 and 0x80 respectively (these decode to negative zero.)

Do you think we should consider the pre-dive event as part of the dive 
data? Based on your description, these pre-dive events do not look 
really interesting from a dive logbook point of view. I have no idea how 
the parser should deal with it, other than just ignoring them.

> A pointer of FFFFFFFF means the end of dive was never logged. I had a
> problem with my DC where it would reboot not logging the end of dive. 
> The
> code tries to salvage the dive logs.

Can you recover some of the samples? Based on what I've seen so far (see 
my long table) it makes no sense to try to recover the samples, because 
it looks like at least part of the data you are recovering are the 
pre-dive events of another dive. That certainly doesn't look right.

Look for example at dive #79:

78 0014b8f3 0014b94a 0014ef9d    87 1800
79 0014f6a5 0014f70d ffffffff   104 1800
80 0014f6a5 0015043c 00151442  3479 1373862

Your code to guess the end pointer will replace ffffffff with 0015043c, 
which is the start of the next dive (#80). So for dive #79 you take the 
range 0014f70d to 0015043c as the sample data. But that memory range is 
also the last part of the pre-dive events for dive #80. This overlap 
doesn't make any sense to me.

>> I also tried to parse the test data I received from you, but I can't 
>> get
>> anything useful out of it. That might be due to the fact that I don't 
>> know
>> at which address the sample data was read when you created that data 
>> (that
>> might have changed in your latest code). Btw, that's exactly why I 
>> prefer a
>> more generic dump format as explained above. Can you provide some new 
>> data
>> for me, or the correct address? I also tried to run the parser under
>> valgrind, and it reports a *huge* number of problems. That might be 
>> due to
>> this mismatch in data, but the parser should be robust against 
>> whatever you
>> throw at it. The rule of thumb is never trust the input data, and 
>> always
>> check everything before you use it. I didn't investigate further, 
>> because I
>> first want to confirm with good input :-)
>> 
> 
> As you can see from the source, the reads are based on data from the 
> config
> blocks and from the logbook. Config tells me how many logs to read. 
> Reading
> the logbook for pointers results in knowing which memory addresses to 
> read.
> 
> For an EMC 20H the sample data starts at 0x94000. It's been a while 
> since I
> checked that the code is actually reading from that point, but it does
> round down 16k.
> 
> I've not tried to read the data, like you have. My simulator used 
> earlier
> files with names that included their memory addresses. From a glance 
> the
> data looks good.
> 
> But let's go through an example.
> 
> [...]

Ignore my question. I think it was a mistake on my side. Instead of 
modify the code to read the data from files, I extended the 
libdivecomputer simulator with the cochran protocol. It produces 
reasonable looking profiles now, so I assume there was some bug in 
previous hack to load the data from files.

> I've been hesitant in reading the entire memory of the device. I may 
> try
> with my used device, which is a Commander.
> 
> I've been holding off asking Cochran again for help. They previously
> refused citing concern that open source code would reveal too much 
> about
> their algorithm. Now that the code exists I wonder if they would be 
> willing
> to help by telling me how to determine model and memory configuration, 
> or
> if they will demand that I stop.. It's also the same reason I've been
> trying to respect their privacy by not reading anything but my data.

That's their usual argument. The truth is that I've never encountered a 
device where you can download anything worth protecting (e.g. firmware 
code, decompression algorithm). All you can access is the memory 
containing the dives.

Jef


More information about the devel mailing list