Next round of api improvements.

Thu Nov 15 11:16:14 UTC 2012

Hi,

For the next release, I'm planning to introduce a new dc_dive_t and 
dc_sample_t interface. This will again be a major change to the 
libdivecomputer api. Please note that the interface presented in this 
email is by no means a final version. It's intended to serve as a 
starting point for further discussions. Now, let's have a closer look.

The role of the new dc_dive_t object has been discussed before [1], so 
I'll just summarize here. In a nutshell, the new dc_dive_t object is 
basically a container for the raw dive data, with the parser 
functionality built-in. As a result, there is no need anymore to create 
a dc_parser_t object explicitely.

The dc_device_foreach() function will be modified to return dc_dive_t 
objects directly in the callback, instead of the raw dive data. The raw 
dive data (and fingerprint) can be retrieved from the dc_dive_t object 
by means of the dc_dive_get_raw() and dc_dive_get_fingerprint() 
functions. The dive metadata can be retrieved with a number of accessor 
functions, similar to the current dc_parser_t interface:

int
dive_cb (dc_dive_t *dive, void *userdata)
{
    /* Get raw dive and fingerprint data. */
    dc_dive_get_raw (dive, &data, &size);
    dc_dive_get_fingerprint (dive, &fpdata, &fpsize);

    /* Get dive metadata directly. */
    dc_dive_get_datetime (dive, &datetime);
    dc_dive_get_maxdepth (dive, &maxdepth);
    ...
}

Note that this new dc_dive_t interface will also simplify the 
transition from a callback based interface to an iterator based 
interface, because the dc_iterator_next() function can now return the 
dc_dive_t object:

dc_dive_t *dive;
dc_iterator_t *iterator;
dc_device_iterator (device, &iterator);
while (dc_iterator_next (iterator, &dive) == DC_STATUS_SUCCESS) {
    /* Contents of the callback function goes here. */
}
dc_iterator_free(iterator);

Although this will be something for later, we can already keep this in 
mind of course.

For parsing the samples, I have something very similar in mind. With 
the current interface, the application receives a stream of sample 
values (e.g. time, depth, etc), and the time sample serves as the marker 
to start a new sample. However, with the new interface, the sample 
values would be accumulated internally in the new dc_sample_t object. 
Once the entire sample has been parsed, the sample is passed to the 
callback function. As a result, there is no need anymore to treat the 
time sample as special, and the individual sample values can be 
retrieved with a number of accessor functions:

void
sample_cb (dc_sample_t *sample, void *userdata)
{
    dc_sample_get_time (sample, &time);
    dc_sample_get_depth (sample, &depth);
    ...
}

With these new dive and sample objects it's also possible to add vendor 
specific extensions in the future. We just need a new accessor function 
and a corresponding data structure.

So far the overview of the high level design that I had in mind. The 
remainder of this email consists of more detailed information on a 
number of technical challenges that will have to be addressed in this 
new design.

1. Object ownership, lifetime and memory management.
=====================================================

How should we deal with the ownership of the new objects?

With the current api, the application has to call dc_parser_new() 
function explicitly, and their is no doubt that the application is the 
owner of the returned object. However, because the new dive and sample 
objects will be allocated internally, their ownership is not immediately 
obvious. It all depends on the internal implementation. There are a 
couple of options, each with their own advantages and disadvantages:

A. Library owns

The first option is to make the library the owner of the objects. In 
this case, the object is allocated internally, passed to the callback 
function, and destroyed internally again:

dc_device_foreach (...)
{
    while (!finished) {
       dc_dive_t *dive = NULL;
       dc_dive_new (&dive, ...);
       callback (dive, userdata);
       dc_dive_free (dive);
    }
}

int
dive_cb (dc_dive_t *dive, void *userdata)
{
    ...
}

The consequence is that the lifetime of the object is limited to scope 
of the callback function, and an application won't be able to keep a 
reference to the object. This will be problematic for some legitimate 
scenario's. For example storing the objects in a list for later 
processing (e.g. after all dives have been downloaded), or passing 
objects between threads (e.g. download the dives in a worker thread, 
then and pass them to the main GUI thread to display them in real-time).

The only way to keep a (valid) reference to the object is to make a 
copy. In the current design, this works fine because the raw data is 
trivial to copy. It's just a byte array after all. But the new 
dive/sample objects are opaque objects and can't be copied unless a 
helper function is provided to make a deep copy (e.g. a dc_dive_clone() 
functions). But that's not exactly the most elegant interface if you ask 
me.

B. Application owns

The second option is to make the application the owner of the objects. 
In this case, the object is allocated internally, passed to the callback 
function, and should be destroyed by the application, not the library:

dc_device_foreach (...)
{
    while (!finished) {
       dc_dive_t *dive = NULL;
       dc_dive_new (&dive, ...);
       callback (dive, userdata);
    }
}

int
dive_cb (dc_dive_t *dive, void *userdata)
{
    ...
    dc_dive_free (dive);
}

This may work well for those scenario's where you really need to take 
the ownership, but it adds unnecessary overhead when you don't need it 
and there is no way to avoid this extra overhead. Take the sample 
parsing for example. I think it's safe to assume that most applications 
will convert the sample data to their native format on the fly, and thus 
never keep a reference to the libdivecomputer sample objects. The result 
is that a large number of sample objects will be allocated and 
immediately destroyed again. Thus many memory allocations for no good 
reasons. I don't like that. We should primarly aim for a good design, 
but it doesn't hurt to keep an eye on the efficiency too.

Note that in my code snippet for the previous "library owns" case, 
their is an equal amount of memory allocations taking place. But those 
can easily be avoided by allocating just a single object, and re-using 
it every time. However for the "application owns" case no such 
optimization is possible.

C. Reference counting

An alternative option would be to make the new objects reference 
counted. This changes nothing to the fact that we have to decide whether 
the application has to take a reference explicitly (e.g. case A) or 
gains it automatically (e.g. case B). But with the reference count, we 
can detect whether the application still holds a reference or not, and 
avoid unnecessary memory allocations:

dc_device_foreach (...)
{
    dc_dive_t *dive = NULL;

    while (!finished) {
       if (dive && --dive->refcount == 0) {
          /* We just dropped the last reference and
             we can safely re-use the previous dive. */
       } else {
          /* The application still holds a reference to the
             previous dive and we have to allocate a new one. */
          dc_dive_new (&dive, ...);
       }

       callback (dive, userdata);
    }

    dc_dive_free (dive);
}

int
dive_cb (dc_dive_t *dive, void *userdata)
{
    ...
    /* Optionally keep a reference. */
    dc_dive_ref (dive);
}

It's of course slightly more complex to implement, and we have to make 
sure the reference counting is thread-safe too using atomic operations 
or protected with a mutex), but I think the resulting api will be more 
intuitive. Reference counting is pretty standard today.

D. Mixed ownership

The last option is to allow different ownership rules for the dive and 
sample objects. For example we could choose "applications owns" for the 
dives, and "library owns" for the samples. In that case, the dives can 
be used outside the callback function, while their is no unnecessary 
overhead for the samples. The obvious downside is less consistency.

2. Querying available sample values?
=====================================

With the current design, the samples are pushed to the application, and 
there is no need to know which values are actually supported. You'll 
only get those values that are actually present. However, with the new 
design, the application has to call an accessor function to retrieve a 
piece of data, and thus it needs some way to query which values are 
presented. The only values that are guaranteed to be available for each 
sample are time and depth. Everything else is optional.

The level at which we can tell whether a particular sample value is 
present varies. It can be at the device level (for example a non-air 
integrated model will never store tank pressure data), at the dive level 
(for example a model with a wireless pressure sensor can have dives with 
and without tank pressure stored), or even at the sample level (for 
example a tank pressure every x sample). To support all possible cases 
this querying should thus be done at the sample level.

I think the simplest solution is to use the accessor function itself to 
probe for the presence of the info. You just call the accessor function 
and if the info isn't available, a special error code like UNSUPPORTED 
is returned:

if (dc_sample_get_depth (sample, &depth) == DC_STATUS_SUCCESS) {
    /* Depth retrieved successfully. */
} else {
    /* No depth available (or an error occurred). */
}

For multivalued data, like the tank pressure, we could then use an 
extra index parameter in the accessor function, and then call it 
repeatedly until the UNSUPPORTED error is returned:

i = 0;
while (dc_sample_get_pressure (sample, &pressure, i) == 
DC_STATUS_SUCCESS) {
    /* Tank pressure retrieved successfully. */

    i++;
}

An alternative solution for the multivalued data would be to retrieve 
all values at once using an array:

dc_sample_get_pressure (dc_sample_t *sample, double pressure[], 
unsigned int n);

But then the caller somehow has to know the maximum value for n, and 
the api also has to communicate back the actual number of values 
returned. The second part is easy with an extra output parameter, but I 
don't know a nice solution for the first problem (other than a second 
dc_sample_get_pressure_count function).

Alternatives for the probing approach outlined above, are an iterator 
like interface to iterator over the available sample values, or a 
function to retrieve a bitmap with the available sample values.

3. Supported dive and sample data
==================================

Which dive data and sample data should we support?

I already made a tentative list, but this is open for discussion of 
course. Items can be added or removed after discussion.

Dives:
  * date/time
  * divetime
  * maxdepth
  * gasmixes
  * tank pressures (begin/end)
  * temperature
  * salinity, altitude/atmospheric pressure
  * dive mode (open/closed circuit, gauge, etc)

Samples:
  * time
  * depth
  * temperature
  * tank pressure
  * heartrate
  * airtime

Events:
  * Gas changes
  * Deco stops (depth and time)
  * Warnings, bookmarks, etc

For the events, I'm not sure yet how the querying (see previous item) 
should be done. The normal samples are continuous values, and will 
usually appear at least every X sample, or not at all. But the events 
are discrete in nature and will only appear occasionally. Should this be 
done differently in the api?

We'll also have to agree on a data structure and the exact semantics 
for each of those pieces of data. Some won't need much discussion, but 
others will. For example take the temperature. A device can record a 
single value which could be the minimum, maximum or average temperature. 
It can have all three, or a full temperature profile. How do we deal 
with that? Things are not always as easy as they may look.

Closely related to the data structures is the choice of the unit 
system. The choice is either metric or SI. SI units have the advantage 
all values are positive, but applications are likely to use metric or 
imperial. We can also consider to introduce a unit system as used in 
subsurface, where all values are scaled to fit into an integer (e.g. 
depths in millimeters instead of meters, and so on)? Or are we happy 
with floating point values?

That's it for now. Comments are welcome! If you think I missed 
something important, don't hesitate to bring it up. It's not unlikely 
that I forgot about something :-)

[1] http://sourceforge.net/mailarchive/message.php?msg_id=29475503

Jef