Next round of api improvements.

Dirk Hohndel dirk at hohndel.org
Fri Nov 16 19:18:14 UTC 2012


Jef Driesen <jefdriesen at telenet.be> writes:
>> Yuck, yuck, yuck. It's standard and universally hated. It creates
>> significantly more pointless code and boilerplate on both sides of 
>> the
>> API for no real benefit.
>>
>> And it's a wonderful way to create subtle, really hard to debug 
>> problems
>> - you forget to reference the object and keep using it - Valgrind 
>> won't
>> find that you are using 'freed' memory because it isn't freed - it 
>> just
>> changes values under you. I absolutely HATE this idea.
>
> Personally, I don't think reference counting is that bad. On the 
> application side it's just a few extra dc_object_ref calls, and 
> dc_object_unref instead of dc_object_free. And you can equally well 
> shoot yourself in the foot with plain malloc. But okay, your preference 
> is noted :-)

I really want to stress my last point. It's debuggability that I am most
concerned about. If the application owns the memory, there are very
powerful tools to help you figure out where you shot yourself in the
foot (ask me how I know).

>> Please make the memory application owned and have the application be
>> responsible for freeing memory. Unless you are planning to run
>> libdivecomputer on a wristwatch the overhead for that is negligible 
>> with
>> today's OSs (that all have a reasonably smart malloc implementation 
>> that
>> will reuse the recently freed memory block on its own).
>
> I certainly agree that this is probably the most intuitive and simple 
> design.
>
> But while I'm not too concerned about the memory allocations by 
> themselves, I still find it a bit silly having to allocate resources 
> where it's not strictly necessary. For the sample data I simply can't 
> think of any use-case where you would want to hold on to the sample 
> objects outside the callback function. Hence my search for possible 
> alternatives.

I understand. But I think you are focusing on the wrong problem.
Having a consistent API that treats memory consistently and doesn't
require the application to know the internals of your implementation is
a HUGE advantage. It makes libdivecomputer easier to use.

"Any memory allocation that libdivecomputer returns to the application
is owned by the application and needs to be freed when it is no longer
needed". 

This is easy to understand, easy to implement, even easy to verify with
static analysis tools.

Elegance and best possible algorithms are honorable goals. But
consistency and ease of use (and debuggability) are more important, I
think.

>>> 2. Querying available sample values?
>>> =====================================
>>>
>>> With the current design, the samples are pushed to the application, 
>>> and
>>> there is no need to know which values are actually supported. You'll
>>> only get those values that are actually present. However, with the 
>>> new
>>> design, the application has to call an accessor function to retrieve 
>>> a
>>> piece of data, and thus it needs some way to query which values are
>>> presented. The only values that are guaranteed to be available for 
>>> each
>>> sample are time and depth. Everything else is optional.
>>>
>>> The level at which we can tell whether a particular sample value is
>>> present varies. It can be at the device level (for example a non-air
>>> integrated model will never store tank pressure data), at the dive 
>>> level
>>> (for example a model with a wireless pressure sensor can have dives 
>>> with
>>> and without tank pressure stored), or even at the sample level (for
>>> example a tank pressure every x sample).
>>
>> Or you have a Suunto dive computer and the sample data is randomly
>> missing... :-/
>
> What do you mean?

Oh - we have several Suunto DCs where the wireless sensor keeps dropping
out and randomly you have samples with missing pressure... I was
basically agreeing with you, giving another scenario where this could be
the case.

>> I was hoping for a somewhat different design.
>>
>> int dc_sample_get_basic(sample, basic_sample);
>>
>> returns DC_STATUS_SUCCESS if you got a sample back, various errors
>> otherwise.
>>
>> basic_sample_t {
>> 	int time; // in seconds
>> 	int depth; // in mm (or you can do a double in meters)
>> 	uint64_t data_available; // bitmap on what else is available
>> }
>>
>> Then your header files define those bits and the corresponding data
>> structures.
>>
>> #define DC_PRESSURE_SAMPLE 1<<0
>> dc_pressure_sample_t {
>> 	int nr_tanks;
>> 	int *tankpressure;
>> }
>>
>> So this tells you how many pressure samples and returns a pointer to 
>> the
>> array of pressures (or in the typical case to the single pressure) - 
>> an
>> integer in mbar or a double in bar or whatever.
>>
>> And for simple stuff just have an int (or you can go fancy and define
>> types for all of them).
>>
>> #define DC_TEMPERATURE_SAMPLE 1<<1
>> typedef dc_temperature_sample_t int; // temperature in mkelvin (or
>> whatever you want)
>>
>> This way ONE call gets you the basics that every dive computer 
>> supports
>> and at the same time the information what else is available for THIS
>> sample. And you can keep things API compatible by simply adding new 
>> bits
>> whenever you add a new piece of information that you can get in a 
>> sample
>> - the app can then do
>>
>> #ifdef DC_DECO_SAMPLE
>> ...
>> #endif
>>
>> and work with older and newer versions of libdivecomputer...
>
> I also considered using a bitmap (it was mentioned briefly somewhere in 
> my mail I think), but I couldn't find a way to make it work with vendor 
> extensions. A bitfield is limited to 8 to 64 different types, depending 
> on the integer size we decide to use. For the normal samples, that is 
> probably more than we'll ever need. But if every backend can support a 
> few extension types, and each one needs another bit, then we'll quickly 
> running out of bits. Do you have a solution for this?

Yes. Sort of. You have one bit in your bitmap that says "vendor
extension" and then the vendor extension can use another int (16bit or
something) with an enum (so 64k values for 16 bits) to indicate which
extension this is. So we don't waste the space in every sample, but if
there is an extension we have tons of room to grow and add extensions
for new things to come.

> While I'm definitely not opposed to the bitmap idea, I'm not that that 
> happy with the basic_sample_t struct. I would prefer to implement this 
> with separate functions. One for the bitmap itself, and one function for 
> each value. Since there are only two values that are always available, I 
> wouldn't treat them any different from the others.

The reason that I thought of treating them differently is that these two
values are available in every sample from every DC (otherwise, what's
the point of that DC). So you can have one call that gives you what you
can always get (time, depth) plus the information about everything else
that is available.

But if you'd rather have this be three API calls (get_time, get_depth,
get_available_data_bitmap or something), I'll be happy with that, too.

> For the multivalued data, I'm not sure how well such a struct 
> (containing a length and pointer to an array) will work for language 
> bindings (python, .net, etc). That's something we'll have to check.

I don't know about .net. In python this would work just fine.

> I prefer enums over pre-processor constants too. I know this will loose 
> the conditional compiling, but this is a trick that will work for C/C++ 
> only.

The problem with enums is that you now have to have strong versioning of
your API - something which libdivecomputer has not a good history of
doing (and I know all the reasons why - don't take this as criticism,
just as an observation). So if you use enums then I have to recreate the
logic of "which version of libdivecomputer supports which values for
this function" in my app - and so does everyone else. We need to
basically create a "version number to feature mapping" in the client.

If you do this with #defines then this is much easier to take care of.
At least for C/C++ programers - I have seen other language bindings
parse the defines for this as well, but don't know what the current
users of libdivecomputer are

>>> 3. Supported dive and sample data
>>> ==================================
>>>   * temperature
>>
>> air temperature, I assume?
>
> Depends. Different dive computers have different interpretations. It 
> could be air/water temperature, the min/max/avg value, and so on. If 
> that's even know to us. Not sure how to deal with this. We could just 
> supply a temperature, and leaving the actual interpretation to the user, 
> or we could have an extra flag with the exact interpretation (if known) 
> as you mention below. But even then we're still left with devices that 
> provide more than one temperature.

So offer a couple of values with flags that determin the semantics

* air_temperature
* min_water_temperature
* avg_water_temperature

And offer flags that clarify semantics

AIR_TEMP_IS_FIRST_SAMPLE
AVG_IS_LAST_TEMP
MIN_TEMP_IS_AT_MAX_DEPTH

(btw, I love the last one as I frequently dive in a water body that for
more than half of the year has temperature go UP as you go deeper...)

etc.

> For example for the suunto vyper family, there is a temperature at the 
> start/end of the dive, and at maximum depth. In this particular case I 
> solved this by injecting artificial temperature values into the samples.
>
>>>   * salinity, altitude/atmospheric pressure
>>>   * dive mode (open/closed circuit, gauge, etc)
>>
>>
>>> Samples:
>>>   * time
>>>   * depth
>>>   * temperature
>>>   * tank pressure
>>>   * heartrate
>>>   * airtime
>>
>> That seems like a good list. Heartrate is the one that seems a bit 
>> out
>> of place - are there many computers that support that? I would have 
>> seen
>> CNS, NDL as more likely candidates for the default list.
>
> Heartrate is only supported by the Uwatec Galileo as far as I know. It 
> certainly not a must-have feature, but I can imagine that if you have 
> such a device, this is an interesting value. Although the actual 
> usefulness is questionable if you ask me, but it's cool gadget :-)

So much of what high end DCs provide is more cool than entirely useful -
but I agree that we should make it available regardless. Visualization
is the application's job :-)

> CNS/OTU can be calculated from the samples, but we can consider to 
> support it too.

Agree on both parts. I know several DCs provide CNS information and
would love to be able to use that instead of the internally calculated
values in Subsurface.

> I have no idea which devices record this in the data. I 
> know the Suunto's have this weird OLF (Oxygen Limit Factor) value, which 
> is either the CNS or OTU, whichever is higher. And I think the Oceanics 
> have a scale from 1 to 5 or something. So that might be a bit more 
> difficult to support in a uniform way.

I'll keep singing the same song. Pick one semantic as default and
provide clearly specified flags to modify that.

> I assume with NDL, you mean the remaining no-deco time? I don't think 
> any device does support that. And then you are left with NDL equals the 
> absense of decostop info.

Several do. OSTC and Uemis Zurich both give you their current no deco
time in either every sample or every nth sample (OSTC).

>>>   * Deco stops (depth and time)
>>>   * Warnings, bookmarks, etc
>>>
>>> For the events, I'm not sure yet how the querying (see previous 
>>> item)
>>> should be done. The normal samples are continuous values, and will
>>> usually appear at least every X sample, or not at all. But the 
>>> events
>>> are discrete in nature and will only appear occasionally. Should 
>>> this be
>>> done differently in the api?
>>
>> We could do this with the bitmap proposed above, right?
>
> What I had in mind, is to have a single DC_SAMPLE_EVENT type. If this 
> bit is set, there are events available. The actual event value would 
> then be another bitmap containing all the events. With this approach we 
> can support several events without needing additional sample types for 
> each event type.

Good idea.

>>> We'll also have to agree on a data structure and the exact semantics
>>> for each of those pieces of data. Some won't need much discussion, 
>>> but
>>> others will. For example take the temperature. A device can record a
>>> single value which could be the minimum, maximum or average 
>>> temperature.
>>> It can have all three, or a full temperature profile. How do we deal
>>> with that? Things are not always as easy as they may look.
>>
>> I think the best way to deal with that is to have a bitfield flags 
>> that
>> is part of every sample. Be generous, at least 64 bits. And simply 
>> have
>> a clearly defined default semantic (whatever Suuntu uses, or some 
>> other
>> common DC) and whenever there's a different semantic for such a 
>> value,
>> add a flag bit that again has clear semantic.
>>
>> So for example by default temperature is the minimum temperature. And
>> then there can be a DC_DATA_TEMP_IS_AVERAGE bit that tells you that
>> instead this is the average temperature.
>>
>> So you have one of those flags bitfield in the dive data, and one in
>> each sample. And you do your darnedest to make sure that there is a
>> precise definition of the default meaning of each value, and a 
>> precise
>> definition of how the bits modify that.
>
> I fear that the use of extra flags will quickly lead to a mess again. 
> We now have this for the temperature, tomorrow we may need extra flags 
> for something else, and before we know we end up with a flags 
> everywhere.
>
> And it doesn't solve the case where there are multiple temperature 
> values either.

See what I wrote above. The moment you don't have precisely defined
semantics of the value (even if the semantic is just our current best
guess based on reverse engineering) the usefulness of that value to the
application declines dramatically. An app is always at liberty to ignore
that additional information ("hey, here's some temperature, no idea what
it is"), but if it isn't there all that will happen is that apps start
to make educated guesses based on the type of DC used. And that's just
ugly. 

I feel very strongly about that. I think and API that doesn't define
what the values it gives you actually mean is really of rather limited
use. 

> Sorry for the possible confusion, but imperial was never an option. 
> It's either metric or SI. The only difference between the two is the 
> temperature (Celsius vs Kelvin). For the other units it's just a 
> difference in scaling factors (Pascal vs bar).

Good!

/D




More information about the Devel mailing list