Hi Linus,
Allow me to start this email with some background information on the libdivecomputer project first.
The current libdivecomputer api has been build around the assumption of a "pipeline" model. In this model, there is a "device" component which takes care of the communication with the dive computer, and produces raw dives as the output. The "parser" component consumes those raw dives and translates them into some more structured data. The key concept is that both components are completely independent of each other. This provides a great amount of flexibility, because you are not limited to a single scenario.
I agree that most applications (subsurface, divinglog, etc) are only interested in the final data, and don't care about the extra flexibility or the intermediate data. But that doesn't mean there are no applications that do care. I can give you several examples.
One such example is macdive. Currently it uses only the device functionality, but not the libdivecomputer parser. The reason for that is partly legacy (the libdivecomputer parser wasn't ready back then) and partly because macdive supports some highly device specific features, which are unlikely to make it into the libdivecomputer api. Therefore access to the raw dive data is important.
Or, imagine the "dctool" command line application that I have been planning to write (but for which I had no time yet). The idea was to have a separate processes for downloading and parsing:
dctool download <args> | dctool parse <args>
The dives and their metadata would be serialized to some application defined data format, and read again by the parser tool. This is a very simple illustration of the offline parsing case, where there is no active connection with the device.
And last but not least, there is the scenario where an application uses the raw dive data (together with the associated metadata) as its internal data format. Again this requires offline parsing support. This may sound a bit strange at first, but if you think about it, it has several advantages. First of all it's the most compact format possible, and you never loose a single bit of information. If the application gains new features or is capable of parsing more information, you can apply that to all your old dives too. This is something you can't do if you only care about the parsed representation. Migrating to another application becomes easier too.
Anyway, the reason why I'm writing all this is to make clear there are perfectly valid scenarios besides the typical scenario used by subsurface. I don't mind adapting the api to something that makes the common scenario easier to use, but at the same time I don't want to drop the possibility to support alternative scenarios. The same goes for the other features you are not using or don't care about (e.g. fingerprint support). And yes I'm aware applications doing this kind of offline parsing will need some additional knowledge, but then that's the responsibility of the app developer, not the libdivecomputer api.
Having said all that, let's get back to your proposal! It basically comes down to a design where the dive callback function is changed into something like this:
int dive_cb (dc_dive_t *dive, void *userdata) { ... dc_dive_get_datetime (dive, ...); dc_dive_get_something (dive, ...); dc_dive_get_anotherthing (dive, ...); ... }
where the new dc_dive_t object is conceptually a container for storing the dive data, combined with a built-in parser. I don't have any particular arguments against such a change. I do have a couple of questions/remarks regarding the actual design and implementation. Let's go over them one by one.
1. Access to raw dive/fingerprint data
As you mentioned before, this can be achieved with a dc_dive_get_rawdata() and dc_dive_get_fingerprint() function, so that shouldn't be a problem.
2. Ownership and lifetime
I'm convinced the lifetime of the dc_device_t object should not be limited to the scope of the callback function. Because if you do that, you can't maintain a reference to the object for use outside the callback function. For example a perfectly valid scenario would be to append the dive object to a list inside the callback, and process them afterwards, when the download has finished.
This would require that the dive object is allocated by the device object, but not owned by it, and therefore should be freed by the application (e.g a dc_dive_free function is needed):
int dive_cb (dc_dive_t *dive, void *userdata) { ... dc_dive_free (dive); }
A side effect is that the dive object can't maintain a back reference to the device object, because the device connection might be closed before the dive object is freed:
int dive_cb (dc_dive_t *dive, void *userdata) { mylist.add (dive); }
dc_device_open (&device, ...); dc_device_foreach (device, dive_cb); dc_device_close (device);
foreach dive in mylist { ... dc_dive_free (dive); }
I think that's actually a good thing, because leaving the device connection open longer than necessary is bad practice (e.g. higher power consumption for most devices). For the implementation this won't cause any problems at all.
Note that this is consistent with the device enumeration interface, where you have to call dc_descriptor_free() for each descriptor you obtain. I'm aware that the current implementation of the dc_descriptor_free() function is just an empty stub, but that may change some day.
3. Thread safety
I intended to introduce a new library context object to eliminate all global state (something which is present today in the logging code). Full thread safety could then be achieved by requiring that every thread operates on its own independent context. The context object is also convenient to perform any library initialization and shutdown tasks. However the new dc_dive_t object would cause some serious trouble here.
Assume you have an application that does the downloading on a worker thread and the processing of the dives on the main thread. According to the rule above, contexts shouldn't be shared between threads, and the worker thread will have to create its own context and pass it to the dc_device_open() function. Now, because the dive object is created directly by the device object, it will automatically inherit the same context. That means we won't be able to hand over the dive object to the main thread, without breaking our threading rule.
/* Shared data (synchronization omitted) */
queue_t queue;
/* Main thread */
worker_thread_run ();
while (1) { dc_dive_t dive = queue.pop ();
/* Ouch, this dive is associated with * the context of the worker thread! */
dc_dive_free (dive); }
/* Worker thread */
int dive_cb (dc_dive_t *dive, void *userdata) { queue.push (dive); }
dc_context_t *context = NULL; dc_device_t *device = NULL;
dc_context_new (&context);
dc_device_open (&device, context, ...); dc_device_foreach (device, dive_cb, ...); dc_device_close (device);
dc_context_free (context);
The only workaround I can think of would be to make the context object itself thread-safe (e.g. with an internal mutex), so it can be shared safely between multiple threads. The disadvantage is that this approach would introduce some additional complexity and runtime overhead, compared to the simple thread-safety rule that we had first. We may also have to implement reference counting for the context object to make tracking its lifetime a bit easier and less error prone.
Of course the application will still need proper synchronization to protect the shared data (e.g. the queue in the pseudo code above).
4. Offline parsing
For the offline parsing scenario, I think we can do something very similar to what I proposed before for the dc_parser_t api:
dc_status_t dc_dive_new (dc_dive_t **dive, const char *data, unsigned int size, dc_family_t type, const dc_event_devinfo_t *devinfo, const dc_event_clock_t *clock);
With this function we can create a dive object manually for the raw dive data and its metadata. Applications that have no need for this functionallity can just ignore this function.
5. Accessor functions
Is there a preference to have separate accessor functions instead of the ioctl like dc_dive_get_field() function? The main argument in favor for the ioctl like function was/is that it's a bit easier to add support for new fields. No new functions are required, just a new enum value. If you try to run a (compiled) application against an older version of the library, it would just return an unsupported error, instead of complaining about unresolved symbols. On the other hand it's a very ugly interface, and very easy to misuse (e.g. no type checking at all). So maybe we should switch to separate functions?
Jef