[PATCH] Cochran_commander refactoring

John Van Ostrand john at vanostrand.com
Sun Dec 7 15:50:37 PST 2014


On Sun, Dec 7, 2014 at 3:14 PM, Jef Driesen <jef at libdivecomputer.org> wrote:

> John,
>
> I had another look at your latest code today. Armed with your
> documentation, I tried to understand the logic behind it. I have some
> comments and questions for you:
>
> The split between the config and data is already a nice improvement. There
> is still room for improvement in the sense that the config should be all
> constant and not based on the input data (e.g. the
> sample_memory_end_address field).
>
> There is a nasty bug in the cochran_commander_serial_setup() function. If
> there is an error, you close the serial port and free the device structure.
> But this function is called from several places, without checking the
> return value. You need to refactor this function to not free anything.
>
> (You also don't need to pass the dc_context_t pointer as a parameter,
> because it's available from the device struct.)
>
> The cochran_commander_serial_open() function is similar. Although it's
> not really a problem here, it's confusing that it frees the device struct.
> I would remove this function, and call serial_open() directly in the
> cochran_commander_device_open() function.
>
> I seperated those when I thought I had to close and re-open the port to
move from high-baud back to low. At the time i was still simulating the
Analyst software. I can recombine them.


>
> The remainder of this email is mostly some questions and comments on the
> data format an how you handle that in the code:
>
> With the exception of the id, config and misc packets, the protocol is
> mainly based on simple random access to the internal memory. There is no
> dedicated command to read the logbook and samples. They are simply stored
> in a certain area of the internal memory. So for the dump api it makes much
> more sense to implement as full memory dump, instead of reading just two
> "random" pieces of the memory. The result is that the dump would be much
> more generic, because you no longer need to interpret the data (in order to
> find the logbook and samples). This is important when you need the memory
> dump to diagnose bugs in the logic for downloading the dives. The fact that
> we don't seem to know the total amount of memory, might be a problem here.
>

What I find interesting is that logbook data is at address 0 (for the read
command) and that the sample data read is far enough along to justify the
config pages and maybe some program data, although I have no idea how large
the program for this could be. GIven the background of Mike Cochran I
suspect it's tight code.

So for the EMC 16 and 20H the logbook is from 0 to 0x80000  (1024 entries)
then samples begine at 0x94000. I haven't looked to see what the 0x14000
bytes between the logbook and sample are, I suspect config and program
data, although I'm surprised that doesn't start a 0.

Now for the EMC 20H the quoted maximum sample data is 1450 hours. 3 byte
samples * 60 seconds * 60 minutes * 1450 hours is 0xEEF3E0, or close to
16MB, when added to the 0x94000 base. It's short but I suspect that 1450 is
a rounded number.

To add to this, EMC models are sold with different amounts of memory
configured. Word from the dealer is that it's configurable via software.
That means each 20H is shipped with the same hardware configuration and
that something in one of the other data blocks indicates how much memory is
enabled. I've been unable to find where that might be and I'm worried it's
stored as a feature code instead of simply the amount of memory.


> The next thing where I have some concerns, is the logic how you find the
> logbook and samples. It seems we only know the start address of the logbook
> and sample ringbuffers (respectively 0 and 0x094000 for the emc), but not
> the end address. And that results in a few issues:
>

We don't really know the start. I just worked with my first EMC-14 and its
samples start at 0x20000, mostly because it supports only 256 logs.


> For the logbook, you assume the first logbook entry is at the address 0,
> and then move forward until you reach the total number of dives. But in
> practice, the number of dives that can be stored is not unlimited, and the
> memory is most likely used in a circular way. Thus, once you have enough
> dives, this assumption will no longer be correct. Ideally we need to figure
> out the end of the ringbuffer, so we can take this into account correctly.
> If we are unable to find this address, then we need at least some checks to
> verify our assumptions (for example check that the dive number starts at
> zero and always increases, check that we don't go past the sample area,
> etc). If you ask me, returning an error is always better than silently
> returning some wrong data.
>

The number of log book entries is also configured via software. In other
words the DC can be configured to ignore part of the logbook ringbuffer. It
might be a head/tail pointer or something.

You can see that the number of dives is available in the config data block.
I suspect that a starting dive number is also present  but since it's set
at 00 00 now it can't be identified. Also because of the insane number of
dives this computer will log it would take me 8 years to wrap mine.

Checking dive numbers will actually work for most divers. I suspect that
all but a few will never wrap the ring buffer before changing DCs.


> A similar problem exist when reading the samples. You have some code that
> tries to guess where the ringbuffer ends. But there is absolutely no
> guarantee that those assumptions are correct. I think it's much safer to
> not guess at all and just fail if the ringbuffer wrap point is crossed
> (e.g. begin address is higher than the end address). Once someone hits this
> error, we can investigate and determine the correct ringbuffer end address.
> If we're guessing, and get it wrong, then we're silently returning bogus
> data. That's not good. And the chance that your guess is wrong is pretty
> high. Although I agree that the total memory is likely nicely aligned, the
> sample ringbuffer can end anywhere. I've seen that more than once in other
> backends.
>

It's more likely that a ring buffer will wrap than the logbook. I have a
Commander that has already wrapped (I bought it used) and so I have a
working example. From USB traffic I could see the addresses used by Analyst
and it's a nice clean alignment. I have no evidence on an EMC but it seems
safe to assume it's a minor upgrade on the Commander software.


>
> [One method that I have used successfully in the past to locate the
> ringbuffer end, without having access to data that crosses it, is by
> looking at the full memory dump. Usually unused data in flash memory
> contains all 0xFF. So if there is another block located right after the
> ringbuffer area, then the unused tail of the ringbuffer will contain 0xFF,
> followed by some non 0xFF data from that other block. Thus the first
> address containing non 0xFF bytes is very likely the end of the ringbuffer.]
>

There are three problems with that. First, for the EMC 20H, that's 16 MB of
data and even at 800K baud it's going to be a while to download. Probing
might make more sense. Reading 1K at certain points as a test.

Second, since the sample ring buffer size is software configurable the end
may not be apparent. I expect the end of the buffer is truncated on a DC
configured with less memory. That means that the technique of probing will
probably return a larger size than it should.

Third, I suspect that the end of sample memory is the end of physical
memory. It's a guess based on the number of quoted number of samples
supported and the starting address of samples. I don't know what will
happen if we read beyond physical memory.


> The next big question is where does the sample data begins/ends? So far
> you have identified three pointers: pre, begin and end address. When I dump
> those pointers (see below), I noticed something interesting:
>
>  * The 5th column is the difference between begin and pre. According to
> your documentation, these are inter dive events. They can be zero length
> too. Should we consider those as part of the dive, or ignore them? Since I
> don't know what's in those interdive events, I have no idea.
>
>  * The 6th column is the difference between pre and end of the previous
> entry. This is nearly always equal to 1800. I doubt that's a coincidence. I
> guess that means there is some 1800 byte header or trailer. Do you have any
> idea what's in there?
>
> * When the end pointer is ffffffff, the next entry has the same pre
> pointer. Because it makes no sense that the sample data for two different
> dives start at the same address, I assume this means that for some reason
> (e.g. battery empty, too short duration, etc) this dive has no sample data.
> What you did in this case is take the data until the start of the next
> dive. But if I'm right, that's the inter dive events. So this makes no
> sense to me. Do you get anything useful from doing this?
>
>
Pre-dive events consist of the following (and more) according to
documentation.

- Initialization of the unit.
- The unit is turned on
- Low batteries
- Altitude Changes of over 500 feet
- Temperature Changes of 10 degrees
- Sensor Malfunction
- Analyst ® P.C. Communication

Interdive events like the above are multiple bytes in length, up to 20
bytes. The DC has no on or off switch. It turns on under certain conditions
and off after a certain amount of off-gassing after a dive. So many dives
will have at least an On event occur before dives. The first dive has a lot
of events because I was configuring the device via Analyst and each config
change seems to be logged.

The DC also stays on after a dive and logs off-gassing. I'd bet that's the
1800 bytes of logs after. Although I assumed it was longer. Off gassing
samples are recognizable by the depth change and ascent rate samples being
0x40 and 0x80 respectively (these decode to negative zero.)

A pointer of FFFFFFFF means the end of dive was never logged. I had a
problem with my DC where it would reboot not logging the end of dive. The
code tries to salvage the dive logs.


> I also tried to parse the test data I received from you, but I can't get
> anything useful out of it. That might be due to the fact that I don't know
> at which address the sample data was read when you created that data (that
> might have changed in your latest code). Btw, that's exactly why I prefer a
> more generic dump format as explained above. Can you provide some new data
> for me, or the correct address? I also tried to run the parser under
> valgrind, and it reports a *huge* number of problems. That might be due to
> this mismatch in data, but the parser should be robust against whatever you
> throw at it. The rule of thumb is never trust the input data, and always
> check everything before you use it. I didn't investigate further, because I
> first want to confirm with good input :-)
>

As you can see from the source, the reads are based on data from the config
blocks and from the logbook. Config tells me how many logs to read. Reading
the logbook for pointers results in knowing which memory addresses to read.

For an EMC 20H the sample data starts at 0x94000. It's been a while since I
checked that the code is actually reading from that point, but it does
round down 16k.

I've not tried to read the data, like you have. My simulator used earlier
files with names that included their memory addresses. From a glance the
data looks good.

But let's go through an example. I'm using hexdump to view the files and
doing this manually.

Take dive 45 from your list. The log entry can be found at 0x5A00 in the
logbook.bin file and we can see the pre-pointer of 0xf255b, sample ptr of
0xf2725 and end ptr of 0xf2761. When I subtract 0x94000 from those I get
0x5e55b, 0x5e725 and 0x5e761. So I search for 5e550 in the the hexdump to
get to the right place in the output. You can see 22 inter-dive events
starting a 0x5e55b. The events are:

02 81 e3 7e 2a 0b 0a 10 04 08 0e fd 02 00 00 0e 00 09 52 00
10 80 15 7f 2a 1e 2b 13 04 08 0e da 02 00 00 0e 00 6f 5e 6c 54
10 00 7e 7f 2a 16 09 03 05 08 0e d9 02 00 00 0e 00 6e 54 6f 5e
01 40 0c 81 2a 22 1c 07 06 08 0e d8 02 00 00 0f 02 bb 4c 00 00 0f 02
01 80 0d 81 2a 36 21 07 06 08 0e d8 02 00 00 0c 04 b2 4c 0f 02 0c 04
01 40 1e 81 2a 16 2d 08 06 08 0e d8 02 00 00 13 06 dc 4b 0c 04 13 06
01 00 25 81 2a 0a 0e 09 06 08 0e d8 02 00 00 1b 04 62 4b 13 06 8b 06
01 c0 30 81 2a 12 04 0a 06 08 0e d8 02 00 00 1e 02 b1 4a 1b 04 8b 06
01 80 35 81 2a 22 18 0a 06 08 0e d8 02 00 00 00 00 7b 4a 1e 02 8b 06
10 80 39 81 2a 26 29 0a 06 08 0e d8 02 00 00 ff 00 6d 4a 6e 54
03 6b 7a 82 2a 35 1e 09 07 08 0e d8 02 00 00 ff 00 c8 48
02 f9 7d 82 2a 03 2e 09 07 08 0e fd 02 00 00 9b 00 14 48 00
02 12 bf 82 2a 30 17 0e 07 08 0e fd 02 00 00 00 00 a3 4a 00
03 25 bf 82 2a 07 18 0e 07 08 0e da 02 00 00 00 00 c3 4a
03 29 cf 82 2a 1b 20 0f 07 08 0e d8 02 00 00 00 00 ba 51
02 b1 d1 82 2a 0f 2b 0f 07 08 0e fd 02 00 00 38 00 ea 50 00
02 1d 09 83 2a 2b 27 13 07 08 0e ff 02 00 00 00 00 5b 4e 00
03 2d 09 83 2a 3b 27 13 07 08 0e e1 02 00 00 00 00 59 4e
03 02 0b 83 2a 30 2f 13 07 08 0e d8 02 00 00 46 00 a4 4e
10 00 6e 83 2a 0a 32 02 08 08 0e d6 02 00 00 71 00 67 40 6d 4a
10 e8 10 84 2a 0e 19 0e 08 08 0e fc 02 00 00 00 00 8a 52 67 40
02 e8 10 84 2a 0e 19 0e 08 08 0e fc 02 00 00 00 00 8a 52 00

You can see in the above columns 2 through 5 is a 4 byte time stamp
(seconds since Jan 1, 1992). It's a consistent enough number that it helps
to manually delineate pre-dive events.

The dive starts. Often (I suspect always when the EMC is in FO2 mode, i.e.
open circuit) the dive starts with *dive events* e3 f3. A starting depth
change sample of 0x40 is also a hint to the start of a dive. Hints like
this are only used by the code that tried to read bad dives.

Fast forwarding to the end of the dive, I jump to 5e761 in the file and
just before that you'll see dive events e1 c0, surface and switch-to-air
events. Following that is off-gassing samples until the begin of the next
dive's pre-dive events (if any). Maybe this dive was setting a dive flag
for a training session. Maybe the off-gassing time of 10 minutes (1800
bytes) is based on the configured SIT interval after which a new dive can
start.

I've been hesitant in reading the entire memory of the device. I may try
with my used device, which is a Commander.

I've been holding off asking Cochran again for help. They previously
refused citing concern that open source code would reveal too much about
their algorithm. Now that the code exists I wonder if they would be willing
to help by telling me how to determine model and memory configuration, or
if they will demand that I stop.. It's also the same reason I've been
trying to respect their privacy by not reading anything but my data.


>
>
> Jef
>
>  i |   pre  | begin  |  end  | 5th | 6th
> ==========================================
>   0 00094000 00094834 00094a95 2100 606208
>   1 0009519d 0009519d 0009674b 0 1800
>   2 00096e53 00096e7d 000980c9 42 1800
>   3 000987d1 00098aa9 ffffffff 728 1800
>   4 000987d1 00098e3c 0009ac28 1643 624594
>   5 0009b330 0009b330 ffffffff 0 1800
>   6 0009b330 0009b43c 0009c2c4 268 635697
>   7 0009c9cc 0009ca46 0009e898 122 1800
>   8 0009efa0 0009f01d 000a07f4 125 1800
>   9 000a0efc 000a0f39 000a2d60 61 1800
>  10 000a3468 000a34a6 000a4f2a 62 1800
>  11 000a5632 000a5647 000a7207 21 1800
>  12 000a790f 000a7a3c ffffffff 301 1800
>  13 000a790f 000a8e3c ffffffff 5421 686352
>  14 000a790f 000a983c 000aa9e6 7981 686352
>  15 000ab0ee 000ab26c 000ab40a 382 1800
>  16 000abb12 000abb12 000abdc5 0 1800
>  17 000ac4cd 000ac4cd 000ad6c9 0 1800
>  18 000addd1 000ade4c 000ae5d3 123 1800
>  19 000aecdb 000aecdb 000b06cd 0 1800
>  20 000b0dd5 000b0e69 000b40aa 148 1800
>  21 000b47b2 000b481e ffffffff 108 1800
>  22 000b47b2 000b5466 000b70c3 3252 739251
>  23 000b77cb 000b77df 000b9d11 20 1800
>  24 000ba419 000ba469 000bda64 80 1800
>  25 000be16c 000be217 000c00ac 171 1800
>  26 000c07b4 000c084a 000c29ed 150 1800
>  27 000c30f5 000c34d6 000c4a9f 993 1800
>  28 000c51a7 000c51a7 000c6449 0 1800
>  29 000c6b51 000c6e74 000c6e7d 803 1800
>  30 000c7585 000c79ff 000c9d39 1146 1800
>  31 000ca441 000ca47f 000cc82d 62 1800
>  32 000ccf35 000ccf73 000ce838 62 1800
>  33 000cef40 000cefa7 000d15c7 103 1800
>  34 000d1ccf 000d1cf8 000d47ed 41 1800
>  35 000d4ef5 000d4f09 000d77c4 20 1800
>  36 000d7ecc 000d7ee0 000da532 20 1800
>  37 000dac3a 000dac4e 000dd172 20 1800
>  38 000dd87a 000dd88e 000df0bd 20 1800
>  39 000df7c5 000df7ee 000e2f19 41 1800
>  40 000e3621 000e364a 000e5730 41 1800
>  41 000e5e38 000e5e4c 000e880c 20 1800
>  42 000e8f14 000e8f28 000eb7cf 20 1800
>  43 000ebed7 000ebeeb 000eee22 20 1800
>  44 000ef52f 000ef543 000f1e4f 20 1805
>  45 000f255b 000f2725 000f2761 458 1804
>  46 000f2e69 000f2e69 000f599f 0 1800
>  47 000f60a7 000f60f9 000f8e7f 82 1800
>  48 000f9587 000f95b0 000fb7e2 41 1800
>  49 000fbeea 000fbfd8 000febe4 238 1800
>  50 000ff2ec 000ff397 00100e44 171 1800
>  51 0010154c 0010154c 0010160c 0 1800
>  52 00101d14 00101dd9 00103a7e 197 1800
>  53 00104186 001041f2 00105a4a 108 1800
>  54 00106152 00106167 00107b79 21 1800
>  55 00108281 001083d0 0010b8cc 335 1800
>  56 0010bfd4 0010c012 0010f0dc 62 1800
>  57 0010f7e4 0010f7f8 00111a89 20 1800
>  58 00112191 001121cf 00114e23 62 1800
>  59 0011552b 0011553f 00118412 20 1800
>  60 00118b1a 00118b58 0011b9a5 62 1800
>  61 0011c0ad 0011c0c1 0011e8ab 20 1800
>  62 0011efb3 0011efdc 00121e25 41 1800
>  63 0012252d 00122541 00124dc4 20 1800
>  64 001254cc 001254e0 00128055 20 1800
>  65 0012875d 00128771 0012b036 20 1800
>  66 0012b73e 0012b767 0012e4ba 41 1800
>  67 0012ebc2 0012ebd6 00130d22 20 1800
>  68 0013142a 0013143e 00133d2f 20 1800
>  69 00134437 00134437 00136ef7 0 1800
>  70 001375ff 00137628 0013912e 41 1800
>  71 00139836 0013984a 0013c3f2 20 1800
>  72 0013cafa 0013cdbc 00140ab6 706 1800
>  73 001411be 00141269 00143321 171 1800
>  74 00143a29 00143a29 001455fa 0 1800
>  75 00145d02 00145d42 00148ae2 64 1800
>  76 001491ea 0014933f 0014a805 341 1800
>  77 0014af0d 0014af0d 0014b1eb 0 1800
>  78 0014b8f3 0014b94a 0014ef9d 87 1800
>  79 0014f6a5 0014f70d ffffffff 104 1800
>  80 0014f6a5 0015043c 00151442 3479 1373862
>  81 00151b4a 00151b4a 00153815 0 1800
>  82 00153f1d 00153fb3 ffffffff 150 1800
>  83 00153f1d 0015643c ffffffff 9503 1392414
>  84 00153f1d 0015663c ffffffff 10015 1392414
>  85 00153f1d 001566e0 ffffffff 10179 1392414
>  86 00153f1d 0015943c ffffffff 21791 1392414
>  87 00153f1d 0015983c 0015a55f 22815 1392414
>  88 0015ac67 0015ad0e ffffffff 167 1800
>  89 0015ac67 0015ca3c 0015d1c0 7637 1420392
>  90 0015d8c8 0015d8f1 0015fa0c 41 1800
>  91 00160114 00160129 ffffffff 21 1800
>  92 00160114 0016043c ffffffff 808 1442069
>  93 00160114 00161c3c 001620bf 6952 1442069
>  94 001627c7 001627db ffffffff 20 1800
>  95 001627c7 0016447b ffffffff 7348 1451976
>  96 001627c7 0016463c 001661b3 7797 1451976
>  97 001668bb 0016694d 00167a66 146 1800
>  98 0016816e 0016853f 00169fa2 977 1800
>  99 0016a6aa 0016a6aa 0016c07d 0 1800
> 100 0016c785 0016cae1 0016f7b4 860 1800
> 101 0016febc 0016fee5 00172dee 41 1800
> 102 001734f6 0017350a 00175cdb 20 1800
> 103 001763e3 0017640c 00177487 41 1800
> 104 00177b8f 00177bd1 00179239 66 1800
> 105 00179941 00179955 0017aae1 20 1800
> 106 0017b1e9 0017b212 0017c680 41 1800
> 107 0017cd88 0017ce2f 0017fbce 167 1800
> 108 001802d6 001802ff 00182cfe 41 1800
> 109 00183406 0018342f 0018591f 41 1800
> 110 00186027 00186066 0018821d 63 1800
> 111 00188925 001889a2 0018b8a6 125 1800
> 112 0018bfae 0018c452 0018d833 1188 1800
> 113 0018df3b 0018df8e 00190d5f 83 1800
>



-- 
John Van Ostrand
At large on sabbatical
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://libdivecomputer.org/pipermail/devel/attachments/20141207/9042d66b/attachment-0001.html>


More information about the devel mailing list