Next round of api improvements.

Thu Nov 15 18:43:57 UTC 2012

On Thu, 2012-11-15 at 12:16 +0100, Jef Driesen wrote:

> 1. Object ownership, lifetime and memory management.
> =====================================================
> 
> How should we deal with the ownership of the new objects?
> 
> With the current api, the application has to call dc_parser_new() 
> function explicitly, and their is no doubt that the application is the 
> owner of the returned object. However, because the new dive and sample 
> objects will be allocated internally, their ownership is not immediately 
> obvious. It all depends on the internal implementation. There are a 
> couple of options, each with their own advantages and disadvantages:
> 
> A. Library owns
> 
> The first option is to make the library the owner of the objects. In 
> this case, the object is allocated internally, passed to the callback 
> function, and destroyed internally again:
[...]
> The only way to keep a (valid) reference to the object is to make a 
> copy. In the current design, this works fine because the raw data is 
> trivial to copy. It's just a byte array after all. But the new 
> dive/sample objects are opaque objects and can't be copied unless a 
> helper function is provided to make a deep copy (e.g. a dc_dive_clone() 
> functions). But that's not exactly the most elegant interface if you ask 
> me.

That seems annoying to me. I won't furiously object to it, but I'm not a
huge fan of the idea.

> B. Application owns
> 
> The second option is to make the application the owner of the objects. 
> In this case, the object is allocated internally, passed to the callback 
> function, and should be destroyed by the application, not the library:
[...]
> This may work well for those scenario's where you really need to take 
> the ownership, but it adds unnecessary overhead when you don't need it 
> and there is no way to avoid this extra overhead. Take the sample 
> parsing for example. I think it's safe to assume that most applications 
> will convert the sample data to their native format on the fly, and thus 
> never keep a reference to the libdivecomputer sample objects. The result 
> is that a large number of sample objects will be allocated and 
> immediately destroyed again. Thus many memory allocations for no good 
> reasons. I don't like that. We should primarly aim for a good design, 
> but it doesn't hurt to keep an eye on the efficiency too.

I think you are overestimating the cost of this - I think this is a very
clean design that is easy to understand and easy to get right on the
application side. Definitely my favorite.

> Note that in my code snippet for the previous "library owns" case, 
> their is an equal amount of memory allocations taking place. But those 
> can easily be avoided by allocating just a single object, and re-using 
> it every time. However for the "application owns" case no such 
> optimization is possible.

Well, your C library will take care of this for you. The cost of
allocations is for all intents and purposes negligible.  

> C. Reference counting
> 
> An alternative option would be to make the new objects reference 
> counted. This changes nothing to the fact that we have to decide whether 
> the application has to take a reference explicitly (e.g. case A) or 
> gains it automatically (e.g. case B). But with the reference count, we 
> can detect whether the application still holds a reference or not, and 
> avoid unnecessary memory allocations:
[...]
> It's of course slightly more complex to implement, and we have to make 
> sure the reference counting is thread-safe too using atomic operations 
> or protected with a mutex), but I think the resulting api will be more 
> intuitive. Reference counting is pretty standard today.

Yuck, yuck, yuck. It's standard and universally hated. It creates
significantly more pointless code and boilerplate on both sides of the
API for no real benefit.

And it's a wonderful way to create subtle, really hard to debug problems
- you forget to reference the object and keep using it - Valgrind won't
find that you are using 'freed' memory because it isn't freed - it just
changes values under you. I absolutely HATE this idea.

Please make the memory application owned and have the application be
responsible for freeing memory. Unless you are planning to run
libdivecomputer on a wristwatch the overhead for that is negligible with
today's OSs (that all have a reasonably smart malloc implementation that
will reuse the recently freed memory block on its own).

> D. Mixed ownership
> 
> The last option is to allow different ownership rules for the dive and 
> sample objects. For example we could choose "applications owns" for the 
> dives, and "library owns" for the samples. In that case, the dives can 
> be used outside the callback function, while their is no unnecessary 
> overhead for the samples. The obvious downside is less consistency.

I vote for consistency any time.

> 2. Querying available sample values?
> =====================================
> 
> With the current design, the samples are pushed to the application, and 
> there is no need to know which values are actually supported. You'll 
> only get those values that are actually present. However, with the new 
> design, the application has to call an accessor function to retrieve a 
> piece of data, and thus it needs some way to query which values are 
> presented. The only values that are guaranteed to be available for each 
> sample are time and depth. Everything else is optional.
> 
> The level at which we can tell whether a particular sample value is 
> present varies. It can be at the device level (for example a non-air 
> integrated model will never store tank pressure data), at the dive level 
> (for example a model with a wireless pressure sensor can have dives with 
> and without tank pressure stored), or even at the sample level (for 
> example a tank pressure every x sample).

Or you have a Suunto dive computer and the sample data is randomly
missing... :-/

>  To support all possible cases 
> this querying should thus be done at the sample level.
> 
> I think the simplest solution is to use the accessor function itself to 
> probe for the presence of the info. You just call the accessor function 
> and if the info isn't available, a special error code like UNSUPPORTED 
> is returned:
> 
> if (dc_sample_get_depth (sample, &depth) == DC_STATUS_SUCCESS) {
>     /* Depth retrieved successfully. */
> } else {
>     /* No depth available (or an error occurred). */
> }
> 
> For multivalued data, like the tank pressure, we could then use an 
> extra index parameter in the accessor function, and then call it 
> repeatedly until the UNSUPPORTED error is returned:
> 
> i = 0;
> while (dc_sample_get_pressure (sample, &pressure, i) == 
> DC_STATUS_SUCCESS) {
>     /* Tank pressure retrieved successfully. */
> 
>     i++;
> }
> 
> An alternative solution for the multivalued data would be to retrieve 
> all values at once using an array:
> 
> dc_sample_get_pressure (dc_sample_t *sample, double pressure[], 
> unsigned int n);
> 
> But then the caller somehow has to know the maximum value for n, and 
> the api also has to communicate back the actual number of values 
> returned. The second part is easy with an extra output parameter, but I 
> don't know a nice solution for the first problem (other than a second 
> dc_sample_get_pressure_count function).
> 
> Alternatives for the probing approach outlined above, are an iterator 
> like interface to iterator over the available sample values, or a 
> function to retrieve a bitmap with the available sample values.

I was hoping for a somewhat different design.

int dc_sample_get_basic(sample, basic_sample);

returns DC_STATUS_SUCCESS if you got a sample back, various errors
otherwise.

basic_sample_t {
	int time; // in seconds
	int depth; // in mm (or you can do a double in meters)
	uint64_t data_available; // bitmap on what else is available
}

Then your header files define those bits and the corresponding data
structures.

#define DC_PRESSURE_SAMPLE 1<<0
dc_pressure_sample_t {
	int nr_tanks;
	int *tankpressure;
}

So this tells you how many pressure samples and returns a pointer to the
array of pressures (or in the typical case to the single pressure) - an
integer in mbar or a double in bar or whatever.

And for simple stuff just have an int (or you can go fancy and define
types for all of them).

#define DC_TEMPERATURE_SAMPLE 1<<1
typedef dc_temperature_sample_t int; // temperature in mkelvin (or
whatever you want)

This way ONE call gets you the basics that every dive computer supports
and at the same time the information what else is available for THIS
sample. And you can keep things API compatible by simply adding new bits
whenever you add a new piece of information that you can get in a sample
- the app can then do

#ifdef DC_DECO_SAMPLE
...
#endif

and work with older and newer versions of libdivecomputer...

> 3. Supported dive and sample data
> ==================================
> 
> Which dive data and sample data should we support?
> 
> I already made a tentative list, but this is open for discussion of 
> course. Items can be added or removed after discussion.
> 
> Dives:
>   * date/time
>   * divetime

length of dive as reported by DC?

>   * maxdepth
>   * gasmixes

That's variable length, right?

>   * tank pressures (begin/end)

ditto

>   * temperature

air temperature, I assume?

>   * salinity, altitude/atmospheric pressure
>   * dive mode (open/closed circuit, gauge, etc)

> Samples:
>   * time
>   * depth
>   * temperature
>   * tank pressure
>   * heartrate
>   * airtime

That seems like a good list. Heartrate is the one that seems a bit out
of place - are there many computers that support that? I would have seen
CNS, NDL as more likely candidates for the default list.
Also, your tank pressure contains the tank index as well, right?

> Events:
>   * Gas changes

as we discussed in earlier email - I think having the tank index in the
sample is a better solution...

>   * Deco stops (depth and time)
>   * Warnings, bookmarks, etc
> 
> For the events, I'm not sure yet how the querying (see previous item) 
> should be done. The normal samples are continuous values, and will 
> usually appear at least every X sample, or not at all. But the events 
> are discrete in nature and will only appear occasionally. Should this be 
> done differently in the api?

We could do this with the bitmap proposed above, right?

> We'll also have to agree on a data structure and the exact semantics 
> for each of those pieces of data. Some won't need much discussion, but 
> others will. For example take the temperature. A device can record a 
> single value which could be the minimum, maximum or average temperature. 
> It can have all three, or a full temperature profile. How do we deal 
> with that? Things are not always as easy as they may look.

I think the best way to deal with that is to have a bitfield flags that
is part of every sample. Be generous, at least 64 bits. And simply have
a clearly defined default semantic (whatever Suuntu uses, or some other
common DC) and whenever there's a different semantic for such a value,
add a flag bit that again has clear semantic.

So for example by default temperature is the minimum temperature. And
then there can be a DC_DATA_TEMP_IS_AVERAGE bit that tells you that
instead this is the average temperature.

So you have one of those flags bitfield in the dive data, and one in
each sample. And you do your darnedest to make sure that there is a
precise definition of the default meaning of each value, and a precise
definition of how the bits modify that.

> Closely related to the data structures is the choice of the unit 
> system. The choice is either metric or SI. SI units have the advantage 
> all values are positive, but applications are likely to use metric or 
> imperial. We can also consider to introduce a unit system as used in 
> subsurface, where all values are scaled to fit into an integer (e.g. 
> depths in millimeters instead of meters, and so on)? Or are we happy 
> with floating point values?

I'm obviously biased here... but in the end, I'll be happy with floats
or integers. But please, please, please, ONLY USE SI-related values.
It's the applications job to convert this into whatever insane
preferences the user has. Distance in ft, volume in liter, temperature
in C, pressure in psi... whatever. But libdivecomputer should ALWAYS
give us SI-related units (I say related because one can argue if C or K
is the right temperature value - we went with mK because that gives us
plenty of precision in an unsigned integer - but one might reasonably
argue that float centigrade is an equally valid SI-related unit... just
don't allow F or other random non SI units...). If the DC is silly
enough to return data in the unit system that the user has set up, then
libdivecomputer should convert that into sane units.

> That's it for now. Comments are welcome! If you think I missed 
> something important, don't hesitate to bring it up. It's not unlikely 
> that I forgot about something :-)

I like the overall direction. And I'm glad that we can discuss this hear
before it is set in stone... the current API has caused us some
frustration here and there and this is a great opportunity to create
something that can be maintained in a backwards compatible manner for a
long time to come...

/D