API redesign progress

Fri Jun 29 14:54:54 UTC 2012

Hi Linus,

Allow me to start this email with some background information on the 
libdivecomputer project first.

The current libdivecomputer api has been build around the assumption of 
a "pipeline" model. In this model, there is a "device" component which 
takes care of the communication with the dive computer, and produces raw 
dives as the output. The "parser" component consumes those raw dives and 
translates them into some more structured data. The key concept is that 
both components are completely independent of each other. This provides 
a great amount of flexibility, because you are not limited to a single 
scenario.

I agree that most applications (subsurface, divinglog, etc) are only 
interested in the final data, and don't care about the extra flexibility 
or the intermediate data. But that doesn't mean there are no 
applications that do care. I can give you several examples.

One such example is macdive. Currently it uses only the device 
functionality, but not the libdivecomputer parser. The reason for that 
is partly legacy (the libdivecomputer parser wasn't ready back then) and 
partly because macdive supports some highly device specific features, 
which are unlikely to make it into the libdivecomputer api. Therefore 
access to the raw dive data is important.

Or, imagine the "dctool" command line application that I have been 
planning to write (but for which I had no time yet). The idea was to 
have a separate processes for downloading and parsing:

dctool download <args> | dctool parse <args>

The dives and their metadata would be serialized to some application 
defined data format, and read again by the parser tool. This is a very 
simple illustration of the offline parsing case, where there is no 
active connection with the device.

And last but not least, there is the scenario where an application uses 
the raw dive data (together with the associated metadata) as its 
internal data format. Again this requires offline parsing support. This 
may sound a bit strange at first, but if you think about it, it has 
several advantages. First of all it's the most compact format possible, 
and you never loose a single bit of information. If the application 
gains new features or is capable of parsing more information, you can 
apply that to all your old dives too. This is something you can't do if 
you only care about the parsed representation. Migrating to another 
application becomes easier too.

Anyway, the reason why I'm writing all this is to make clear there are 
perfectly valid scenarios besides the typical scenario used by 
subsurface. I don't mind adapting the api to something that makes the 
common scenario easier to use, but at the same time I don't want to drop 
the possibility to support alternative scenarios. The same goes for the 
other features you are not using or don't care about (e.g. fingerprint 
support). And yes I'm aware applications doing this kind of offline 
parsing will need some additional knowledge, but then that's the 
responsibility of the app developer, not the libdivecomputer api.

Having said all that, let's get back to your proposal! It basically 
comes down to a design where the dive callback function is changed into 
something like this:

int
dive_cb (dc_dive_t *dive, void *userdata)
{
    ...
    dc_dive_get_datetime (dive, ...);
    dc_dive_get_something (dive, ...);
    dc_dive_get_anotherthing (dive, ...);
    ...
}

where the new dc_dive_t object is conceptually a container for storing 
the dive data, combined with a built-in parser. I don't have any 
particular arguments against such a change. I do have a couple of 
questions/remarks regarding the actual design and implementation. Let's 
go over them one by one.

1. Access to raw dive/fingerprint data

As you mentioned before, this can be achieved with a 
dc_dive_get_rawdata() and dc_dive_get_fingerprint() function, so that 
shouldn't be a problem.

2. Ownership and lifetime

I'm convinced the lifetime of the dc_device_t object should not be 
limited to the scope of the callback function. Because if you do that, 
you can't maintain a reference to the object for use outside the 
callback function. For example a perfectly valid scenario would be to 
append the dive object to a list inside the callback, and process them 
afterwards, when the download has finished.

This would require that the dive object is allocated by the device 
object, but not owned by it, and therefore should be freed by the 
application (e.g a dc_dive_free function is needed):

int
dive_cb (dc_dive_t *dive, void *userdata)
{
    ...
    dc_dive_free (dive);
}

A side effect is that the dive object can't maintain a back reference 
to the device object, because the device connection might be closed 
before the dive object is freed:

int
dive_cb (dc_dive_t *dive, void *userdata)
{
    mylist.add (dive);
}

dc_device_open (&device, ...);
dc_device_foreach (device, dive_cb);
dc_device_close (device);

foreach dive in mylist {
    ...
    dc_dive_free (dive);
}

I think that's actually a good thing, because leaving the device 
connection open longer than necessary is bad practice (e.g. higher power 
consumption for most devices). For the implementation this won't cause 
any problems at all.

Note that this is consistent with the device enumeration interface, 
where you have to call dc_descriptor_free() for each descriptor you 
obtain. I'm aware that the current implementation of the 
dc_descriptor_free() function is just an empty stub, but that may change 
some day.

3. Thread safety

I intended to introduce a new library context object to eliminate all 
global state (something which is present today in the logging code). 
Full thread safety could then be achieved by requiring that every thread 
operates on its own independent context. The context object is also 
convenient to perform any library initialization and shutdown tasks. 
However the new dc_dive_t object would cause some serious trouble here.

Assume you have an application that does the downloading on a worker 
thread and the processing of the dives on the main thread. According to 
the rule above, contexts shouldn't be shared between threads, and the 
worker thread will have to create its own context and pass it to the 
dc_device_open() function. Now, because the dive object is created 
directly by the device object, it will automatically inherit the same 
context. That means we won't be able to hand over the dive object to the 
main thread, without breaking our threading rule.

/* Shared data (synchronization omitted) */

queue_t queue;

/* Main thread */

worker_thread_run ();

while (1) {
    dc_dive_t dive = queue.pop ();

    /* Ouch, this dive is associated with
     * the context of the worker thread! */

    dc_dive_free (dive);
}

/* Worker thread */

int
dive_cb (dc_dive_t *dive, void *userdata)
{
    queue.push (dive);
}

dc_context_t *context = NULL;
dc_device_t *device = NULL;

dc_context_new (&context);

dc_device_open (&device, context, ...);
dc_device_foreach (device, dive_cb, ...);
dc_device_close (device);

dc_context_free (context);

The only workaround I can think of would be to make the context object 
itself thread-safe (e.g. with an internal mutex), so it can be shared 
safely between multiple threads. The disadvantage is that this approach 
would introduce some additional complexity and runtime overhead, 
compared to the simple thread-safety rule that we had first. We may also 
have to implement reference counting for the context object to make 
tracking its lifetime a bit easier and less error prone.

Of course the application will still need proper synchronization to 
protect the shared data (e.g. the queue in the pseudo code above).

4. Offline parsing

For the offline parsing scenario, I think we can do something very 
similar to what I proposed before for the dc_parser_t api:

dc_status_t
dc_dive_new (dc_dive_t **dive,
              const char *data, unsigned int size,
              dc_family_t type,
              const dc_event_devinfo_t *devinfo,
              const dc_event_clock_t *clock);

With this function we can create a dive object manually for the raw 
dive data and its metadata. Applications that have no need for this 
functionallity can just ignore this function.

5. Accessor functions

Is there a preference to have separate accessor functions instead of 
the ioctl like dc_dive_get_field() function? The main argument in favor 
for the ioctl like function was/is that it's a bit easier to add support 
for new fields. No new functions are required, just a new enum value. If 
you try to run a (compiled) application against an older version of the 
library, it would just return an unsupported error, instead of 
complaining about unresolved symbols. On the other hand it's a very ugly 
interface, and very easy to misuse (e.g. no type checking at all). So 
maybe we should switch to separate functions?

Jef