Hi,
For the next release, I'm planning to introduce a new dc_dive_t and dc_sample_t interface. This will again be a major change to the libdivecomputer api. Please note that the interface presented in this email is by no means a final version. It's intended to serve as a starting point for further discussions. Now, let's have a closer look.
The role of the new dc_dive_t object has been discussed before [1], so I'll just summarize here. In a nutshell, the new dc_dive_t object is basically a container for the raw dive data, with the parser functionality built-in. As a result, there is no need anymore to create a dc_parser_t object explicitely.
The dc_device_foreach() function will be modified to return dc_dive_t objects directly in the callback, instead of the raw dive data. The raw dive data (and fingerprint) can be retrieved from the dc_dive_t object by means of the dc_dive_get_raw() and dc_dive_get_fingerprint() functions. The dive metadata can be retrieved with a number of accessor functions, similar to the current dc_parser_t interface:
int dive_cb (dc_dive_t *dive, void *userdata) { /* Get raw dive and fingerprint data. */ dc_dive_get_raw (dive, &data, &size); dc_dive_get_fingerprint (dive, &fpdata, &fpsize);
/* Get dive metadata directly. */ dc_dive_get_datetime (dive, &datetime); dc_dive_get_maxdepth (dive, &maxdepth); ... }
Note that this new dc_dive_t interface will also simplify the transition from a callback based interface to an iterator based interface, because the dc_iterator_next() function can now return the dc_dive_t object:
dc_dive_t *dive; dc_iterator_t *iterator; dc_device_iterator (device, &iterator); while (dc_iterator_next (iterator, &dive) == DC_STATUS_SUCCESS) { /* Contents of the callback function goes here. */ } dc_iterator_free(iterator);
Although this will be something for later, we can already keep this in mind of course.
For parsing the samples, I have something very similar in mind. With the current interface, the application receives a stream of sample values (e.g. time, depth, etc), and the time sample serves as the marker to start a new sample. However, with the new interface, the sample values would be accumulated internally in the new dc_sample_t object. Once the entire sample has been parsed, the sample is passed to the callback function. As a result, there is no need anymore to treat the time sample as special, and the individual sample values can be retrieved with a number of accessor functions:
void sample_cb (dc_sample_t *sample, void *userdata) { dc_sample_get_time (sample, &time); dc_sample_get_depth (sample, &depth); ... }
With these new dive and sample objects it's also possible to add vendor specific extensions in the future. We just need a new accessor function and a corresponding data structure.
So far the overview of the high level design that I had in mind. The remainder of this email consists of more detailed information on a number of technical challenges that will have to be addressed in this new design.
1. Object ownership, lifetime and memory management. =====================================================
How should we deal with the ownership of the new objects?
With the current api, the application has to call dc_parser_new() function explicitly, and their is no doubt that the application is the owner of the returned object. However, because the new dive and sample objects will be allocated internally, their ownership is not immediately obvious. It all depends on the internal implementation. There are a couple of options, each with their own advantages and disadvantages:
A. Library owns
The first option is to make the library the owner of the objects. In this case, the object is allocated internally, passed to the callback function, and destroyed internally again:
dc_device_foreach (...) { while (!finished) { dc_dive_t *dive = NULL; dc_dive_new (&dive, ...); callback (dive, userdata); dc_dive_free (dive); } }
int dive_cb (dc_dive_t *dive, void *userdata) { ... }
The consequence is that the lifetime of the object is limited to scope of the callback function, and an application won't be able to keep a reference to the object. This will be problematic for some legitimate scenario's. For example storing the objects in a list for later processing (e.g. after all dives have been downloaded), or passing objects between threads (e.g. download the dives in a worker thread, then and pass them to the main GUI thread to display them in real-time).
The only way to keep a (valid) reference to the object is to make a copy. In the current design, this works fine because the raw data is trivial to copy. It's just a byte array after all. But the new dive/sample objects are opaque objects and can't be copied unless a helper function is provided to make a deep copy (e.g. a dc_dive_clone() functions). But that's not exactly the most elegant interface if you ask me.
B. Application owns
The second option is to make the application the owner of the objects. In this case, the object is allocated internally, passed to the callback function, and should be destroyed by the application, not the library:
dc_device_foreach (...) { while (!finished) { dc_dive_t *dive = NULL; dc_dive_new (&dive, ...); callback (dive, userdata); } }
int dive_cb (dc_dive_t *dive, void *userdata) { ... dc_dive_free (dive); }
This may work well for those scenario's where you really need to take the ownership, but it adds unnecessary overhead when you don't need it and there is no way to avoid this extra overhead. Take the sample parsing for example. I think it's safe to assume that most applications will convert the sample data to their native format on the fly, and thus never keep a reference to the libdivecomputer sample objects. The result is that a large number of sample objects will be allocated and immediately destroyed again. Thus many memory allocations for no good reasons. I don't like that. We should primarly aim for a good design, but it doesn't hurt to keep an eye on the efficiency too.
Note that in my code snippet for the previous "library owns" case, their is an equal amount of memory allocations taking place. But those can easily be avoided by allocating just a single object, and re-using it every time. However for the "application owns" case no such optimization is possible.
C. Reference counting
An alternative option would be to make the new objects reference counted. This changes nothing to the fact that we have to decide whether the application has to take a reference explicitly (e.g. case A) or gains it automatically (e.g. case B). But with the reference count, we can detect whether the application still holds a reference or not, and avoid unnecessary memory allocations:
dc_device_foreach (...) { dc_dive_t *dive = NULL;
while (!finished) { if (dive && --dive->refcount == 0) { /* We just dropped the last reference and we can safely re-use the previous dive. */ } else { /* The application still holds a reference to the previous dive and we have to allocate a new one. */ dc_dive_new (&dive, ...); }
callback (dive, userdata); }
dc_dive_free (dive); }
int dive_cb (dc_dive_t *dive, void *userdata) { ... /* Optionally keep a reference. */ dc_dive_ref (dive); }
It's of course slightly more complex to implement, and we have to make sure the reference counting is thread-safe too using atomic operations or protected with a mutex), but I think the resulting api will be more intuitive. Reference counting is pretty standard today.
D. Mixed ownership
The last option is to allow different ownership rules for the dive and sample objects. For example we could choose "applications owns" for the dives, and "library owns" for the samples. In that case, the dives can be used outside the callback function, while their is no unnecessary overhead for the samples. The obvious downside is less consistency.
2. Querying available sample values? =====================================
With the current design, the samples are pushed to the application, and there is no need to know which values are actually supported. You'll only get those values that are actually present. However, with the new design, the application has to call an accessor function to retrieve a piece of data, and thus it needs some way to query which values are presented. The only values that are guaranteed to be available for each sample are time and depth. Everything else is optional.
The level at which we can tell whether a particular sample value is present varies. It can be at the device level (for example a non-air integrated model will never store tank pressure data), at the dive level (for example a model with a wireless pressure sensor can have dives with and without tank pressure stored), or even at the sample level (for example a tank pressure every x sample). To support all possible cases this querying should thus be done at the sample level.
I think the simplest solution is to use the accessor function itself to probe for the presence of the info. You just call the accessor function and if the info isn't available, a special error code like UNSUPPORTED is returned:
if (dc_sample_get_depth (sample, &depth) == DC_STATUS_SUCCESS) { /* Depth retrieved successfully. */ } else { /* No depth available (or an error occurred). */ }
For multivalued data, like the tank pressure, we could then use an extra index parameter in the accessor function, and then call it repeatedly until the UNSUPPORTED error is returned:
i = 0; while (dc_sample_get_pressure (sample, &pressure, i) == DC_STATUS_SUCCESS) { /* Tank pressure retrieved successfully. */
i++; }
An alternative solution for the multivalued data would be to retrieve all values at once using an array:
dc_sample_get_pressure (dc_sample_t *sample, double pressure[], unsigned int n);
But then the caller somehow has to know the maximum value for n, and the api also has to communicate back the actual number of values returned. The second part is easy with an extra output parameter, but I don't know a nice solution for the first problem (other than a second dc_sample_get_pressure_count function).
Alternatives for the probing approach outlined above, are an iterator like interface to iterator over the available sample values, or a function to retrieve a bitmap with the available sample values.
3. Supported dive and sample data ==================================
Which dive data and sample data should we support?
I already made a tentative list, but this is open for discussion of course. Items can be added or removed after discussion.
Dives: * date/time * divetime * maxdepth * gasmixes * tank pressures (begin/end) * temperature * salinity, altitude/atmospheric pressure * dive mode (open/closed circuit, gauge, etc)
Samples: * time * depth * temperature * tank pressure * heartrate * airtime
Events: * Gas changes * Deco stops (depth and time) * Warnings, bookmarks, etc
For the events, I'm not sure yet how the querying (see previous item) should be done. The normal samples are continuous values, and will usually appear at least every X sample, or not at all. But the events are discrete in nature and will only appear occasionally. Should this be done differently in the api?
We'll also have to agree on a data structure and the exact semantics for each of those pieces of data. Some won't need much discussion, but others will. For example take the temperature. A device can record a single value which could be the minimum, maximum or average temperature. It can have all three, or a full temperature profile. How do we deal with that? Things are not always as easy as they may look.
Closely related to the data structures is the choice of the unit system. The choice is either metric or SI. SI units have the advantage all values are positive, but applications are likely to use metric or imperial. We can also consider to introduce a unit system as used in subsurface, where all values are scaled to fit into an integer (e.g. depths in millimeters instead of meters, and so on)? Or are we happy with floating point values?
That's it for now. Comments are welcome! If you think I missed something important, don't hesitate to bring it up. It's not unlikely that I forgot about something :-)
[1] http://sourceforge.net/mailarchive/message.php?msg_id=29475503
Jef
On 15.11.2012 13:16, Jef Driesen wrote:
Note that this new dc_dive_t interface will also simplify the transition from a callback based interface to an iterator based interface, because the dc_iterator_next() function can now return the dc_dive_t object:
dc_dive_t *dive; dc_iterator_t *iterator; dc_device_iterator (device,&iterator); while (dc_iterator_next (iterator,&dive) == DC_STATUS_SUCCESS) { /* Contents of the callback function goes here. */ } dc_iterator_free(iterator);
dc_device_iterator sounds like it would iterate devices. Perhaps it could be called dc_device_iterate_dives, or dc_device_dive_iterator, or some variant thereof?
- Object ownership, lifetime and memory management.
=====================================================
How should we deal with the ownership of the new objects?
With the current api, the application has to call dc_parser_new() function explicitly, and their is no doubt that the application is the owner of the returned object. However, because the new dive and sample objects will be allocated internally, their ownership is not immediately obvious. It all depends on the internal implementation. There are a couple of options, each with their own advantages and disadvantages:
What about utilizing the object hierarchy? Make the device object own the dives, and each dive object own its samples. This gives the application control over their lifetime, but doesn't require freeing everything separately.
The downside is a larger memory footprint while parsing, since all dives would be in memory simultaneously. I don't think it's an issue, but if it is, you could provide a dc_device_discard_dives function to get rid of unwanted dives.
On the other hand, this would save some work when iterating over the same dives a second time.
- Querying available sample values?
=====================================
I think the simplest solution is to use the accessor function itself to probe for the presence of the info. You just call the accessor function and if the info isn't available, a special error code like UNSUPPORTED is returned:
if (dc_sample_get_depth (sample,&depth) == DC_STATUS_SUCCESS) { /* Depth retrieved successfully. */ } else { /* No depth available (or an error occurred). */ }
Are there any legitimate error conditions other than that piece of data not being present in the sample? I try to avoid out parameters due to the greater amount of scaffolding required in calling code when compared to using the return value, but then again I mostly write C++ which provides out-of-band error handling with exceptions.
But then the caller somehow has to know the maximum value for n, and the api also has to communicate back the actual number of values returned. The second part is easy with an extra output parameter, but I don't know a nice solution for the first problem (other than a second dc_sample_get_pressure_count function).
A typical solution is to allow passing a null pointer for the array, in which case no data is written to it but the count is still returned.
Alternatives for the probing approach outlined above, are an iterator like interface to iterator over the available sample values, or a function to retrieve a bitmap with the available sample values.
I went with the bitmap approach in a similar scenario. It works well enough and produces more compact code.
- Supported dive and sample data
==================================
- airtime
Does some computer actually report this? If tank pressure is also reported, airtime can be calculated from that. In my experience it's not too stable over the course of the dive, so it would be of little value, but perhaps others disagree.
For the events, I'm not sure yet how the querying (see previous item) should be done. The normal samples are continuous values, and will usually appear at least every X sample, or not at all. But the events are discrete in nature and will only appear occasionally. Should this be done differently in the api?
I'd add a separate API for iterating events. They're different enough from normal samples that trying to force them into one API is only going to cause headache down the road.
We'll also have to agree on a data structure and the exact semantics for each of those pieces of data. Some won't need much discussion, but others will. For example take the temperature. A device can record a single value which could be the minimum, maximum or average temperature. It can have all three, or a full temperature profile. How do we deal with that? Things are not always as easy as they may look.
Minimum, maximum and average values belong in the dive, not in the samples. I dunno what to do if it's not known which of those a particular computer provides, though.
On 2012-11-15 17:56, Mikko Rasa wrote:
On 15.11.2012 13:16, Jef Driesen wrote:
Note that this new dc_dive_t interface will also simplify the transition from a callback based interface to an iterator based interface, because the dc_iterator_next() function can now return the dc_dive_t object:
dc_dive_t *dive; dc_iterator_t *iterator; dc_device_iterator (device,&iterator); while (dc_iterator_next (iterator,&dive) == DC_STATUS_SUCCESS) { /* Contents of the callback function goes here. */ } dc_iterator_free(iterator);
dc_device_iterator sounds like it would iterate devices. Perhaps it could be called dc_device_iterate_dives, or dc_device_dive_iterator, or some variant thereof?
It's just a tentative name and can be changed.
- Object ownership, lifetime and memory management.
=====================================================
How should we deal with the ownership of the new objects?
With the current api, the application has to call dc_parser_new() function explicitly, and their is no doubt that the application is the owner of the returned object. However, because the new dive and sample objects will be allocated internally, their ownership is not immediately obvious. It all depends on the internal implementation. There are a couple of options, each with their own advantages and disadvantages:
What about utilizing the object hierarchy? Make the device object own the dives, and each dive object own its samples. This gives the application control over their lifetime, but doesn't require freeing everything separately.
The downside is a larger memory footprint while parsing, since all dives would be in memory simultaneously. I don't think it's an issue, but if it is, you could provide a dc_device_discard_dives function to get rid of unwanted dives.
On the other hand, this would save some work when iterating over the same dives a second time.
This is a bad idea. Some devices consume significantly more power when connected to the host system, and therefore the connection should be closed as soon as possible. But if the device object owns the dives, the connection can't be closed until the application is done with the dives.
- Querying available sample values?
=====================================
I think the simplest solution is to use the accessor function itself to probe for the presence of the info. You just call the accessor function and if the info isn't available, a special error code like UNSUPPORTED is returned:
if (dc_sample_get_depth (sample,&depth) == DC_STATUS_SUCCESS) { /* Depth retrieved successfully. */ } else { /* No depth available (or an error occurred). */ }
Are there any legitimate error conditions other than that piece of data not being present in the sample? I try to avoid out parameters due to the greater amount of scaffolding required in calling code when compared to using the return value, but then again I mostly write C++ which provides out-of-band error handling with exceptions.
Real errors are rare, but they can happen. For the majority of the devices our knowledge is based on reverse engineering, so there is always a possibility that their are errors in our knowledge. Thus the most common parsing error is that we encounter some data that doesn't match our expectations (e.g. DC_STATUS_DATAFORMAT).
In theory we could catch some of those errors when constructing the dive/sample object, but then the error is pretty much fatal. I mean if the dc_xxx_foreach (or dc_iterator_next) function fails, there is no way to proceed anymore with the downloading or parsing. But in many cases it's still possible to move to the next dive or sample.
Take for example the Suunto data format. The dives are stored in a linked list, so you need zero knowledge of the actual dive format to locate and download the dives. Even if some of the dives will fail to parse correctly (for example when the data format has been changed to support a new feature), that has no impact on the ability to download the dives. If this data format error is detected by the parser while constructing the dive object, we have two options:
1. Fail to construct the dive object. This leaves the foreach() function no other option than to abort the downloading. The result is that a minor problem with just one dive will cause the entire download to fail.
2. Construct the dive object anyway in an error state. The application will still get the dive, and downloading will continue. It may not be able to parse this dive completely, but depending on the severity of the error you may still be able to get some info. For example the raw dive data will always be available. Some of the fields may still be parse-able too. But this only works if the individual accessor functions can return an error.
Note that in the current design this is less of a problem, because downloading the dives and creating a parser are completely decoupled. A failure in creating a parser will never affect the downloading.
But then the caller somehow has to know the maximum value for n, and the api also has to communicate back the actual number of values returned. The second part is easy with an extra output parameter, but I don't know a nice solution for the first problem (other than a second dc_sample_get_pressure_count function).
A typical solution is to allow passing a null pointer for the array, in which case no data is written to it but the count is still returned.
Alternatives for the probing approach outlined above, are an iterator like interface to iterator over the available sample values, or a function to retrieve a bitmap with the available sample values.
I went with the bitmap approach in a similar scenario. It works well enough and produces more compact code.
- Supported dive and sample data
==================================
- airtime
Does some computer actually report this? If tank pressure is also reported, airtime can be calculated from that. In my experience it's not too stable over the course of the dive, so it would be of little value, but perhaps others disagree.
There are several devices which support it. Uwatec, Oceanic, and probably a few other too. It may not be the most critical piece of information, but I do get requests for it. So while it's not on my must-have list, it's something we can consider adding.
You would be surprised what all those dive computers support. We definitely won't support everything, but we want to make a good trade-off between what we consider important and what is available.
For the events, I'm not sure yet how the querying (see previous item) should be done. The normal samples are continuous values, and will usually appear at least every X sample, or not at all. But the events are discrete in nature and will only appear occasionally. Should this be done differently in the api?
I'd add a separate API for iterating events. They're different enough from normal samples that trying to force them into one API is only going to cause headache down the road.
Do you have a suggestion for the api?
We'll also have to agree on a data structure and the exact semantics for each of those pieces of data. Some won't need much discussion, but others will. For example take the temperature. A device can record a single value which could be the minimum, maximum or average temperature. It can have all three, or a full temperature profile. How do we deal with that? Things are not always as easy as they may look.
Minimum, maximum and average values belong in the dive, not in the samples. I dunno what to do if it's not known which of those a particular computer provides, though.
I was referring to the dive temperature here, not the sample temperature. It makes indeed no sense to have min/max/avg for the samples.
Jef
Jef Driesen jefdriesen@telenet.be writes:
Are there any legitimate error conditions other than that piece of data not being present in the sample? I try to avoid out parameters due to the greater amount of scaffolding required in calling code when compared to using the return value, but then again I mostly write C++ which provides out-of-band error handling with exceptions.
Real errors are rare, but they can happen. For the majority of the devices our knowledge is based on reverse engineering, so there is always a possibility that their are errors in our knowledge. Thus the most common parsing error is that we encounter some data that doesn't match our expectations (e.g. DC_STATUS_DATAFORMAT).
And that is a frequent thing as DC vendors often have no regard for backward compatibility when they make incompatible changes in new versions of a product. So libdivecomputer's resilliance to that happening is a FEATURE and something we should strive for.
In theory we could catch some of those errors when constructing the dive/sample object, but then the error is pretty much fatal. I mean if the dc_xxx_foreach (or dc_iterator_next) function fails, there is no way to proceed anymore with the downloading or parsing. But in many cases it's still possible to move to the next dive or sample.
Take for example the Suunto data format. The dives are stored in a linked list, so you need zero knowledge of the actual dive format to locate and download the dives. Even if some of the dives will fail to parse correctly (for example when the data format has been changed to support a new feature), that has no impact on the ability to download the dives. If this data format error is detected by the parser while constructing the dive object, we have two options:
- Fail to construct the dive object. This leaves the foreach()
function no other option than to abort the downloading. The result is that a minor problem with just one dive will cause the entire download to fail.
- Construct the dive object anyway in an error state. The application
will still get the dive, and downloading will continue. It may not be able to parse this dive completely, but depending on the severity of the error you may still be able to get some info. For example the raw dive data will always be available. Some of the fields may still be parse-able too. But this only works if the individual accessor functions can return an error.
And if we are lucky we can still get at least the basics (depth and time) and do a somewhat useful download even if more advanced information can't be processed.
Note that in the current design this is less of a problem, because downloading the dives and creating a parser are completely decoupled. A failure in creating a parser will never affect the downloading.
And that's a property we should maintain.
- Supported dive and sample data
==================================
- airtime
Does some computer actually report this? If tank pressure is also reported, airtime can be calculated from that. In my experience it's not too stable over the course of the dive, so it would be of little value, but perhaps others disagree.
There are several devices which support it. Uwatec, Oceanic, and probably a few other too. It may not be the most critical piece of information, but I do get requests for it. So while it's not on my must-have list, it's something we can consider adding.
I like having it - but struggle with how to visualize it in the dive log application. I tried a few things that all ended up looking kinda odd :-)
You would be surprised what all those dive computers support. We definitely won't support everything, but we want to make a good trade-off between what we consider important and what is available.
As I mentioned before - we should standardize everything where we can create some well defined semantic (and possible variations) that we can provide to the application. The less needs to be parsed from raw data the better (but we should still maintain the ability for the app to parse the raw data itself if all else fails).
So while I don't know how I'd visualize the airtime, I definitely encourage you to have this as a field that you support - and if different DCs have different semantics (is this to "tank empty" or to a threshold pressure? What's that pressure? Does it include the ascend (with or without safety stop) or does it assume you stay at this depth until your tank is empty (brilliant plan...)... this should be handled by flags to make sure that the meaning of the value is well defined.
/D
On Thu, 2012-11-15 at 12:16 +0100, Jef Driesen wrote:
- Object ownership, lifetime and memory management.
=====================================================
How should we deal with the ownership of the new objects?
With the current api, the application has to call dc_parser_new() function explicitly, and their is no doubt that the application is the owner of the returned object. However, because the new dive and sample objects will be allocated internally, their ownership is not immediately obvious. It all depends on the internal implementation. There are a couple of options, each with their own advantages and disadvantages:
A. Library owns
The first option is to make the library the owner of the objects. In this case, the object is allocated internally, passed to the callback function, and destroyed internally again:
[...]
The only way to keep a (valid) reference to the object is to make a copy. In the current design, this works fine because the raw data is trivial to copy. It's just a byte array after all. But the new dive/sample objects are opaque objects and can't be copied unless a helper function is provided to make a deep copy (e.g. a dc_dive_clone() functions). But that's not exactly the most elegant interface if you ask me.
That seems annoying to me. I won't furiously object to it, but I'm not a huge fan of the idea.
B. Application owns
The second option is to make the application the owner of the objects. In this case, the object is allocated internally, passed to the callback function, and should be destroyed by the application, not the library:
[...]
This may work well for those scenario's where you really need to take the ownership, but it adds unnecessary overhead when you don't need it and there is no way to avoid this extra overhead. Take the sample parsing for example. I think it's safe to assume that most applications will convert the sample data to their native format on the fly, and thus never keep a reference to the libdivecomputer sample objects. The result is that a large number of sample objects will be allocated and immediately destroyed again. Thus many memory allocations for no good reasons. I don't like that. We should primarly aim for a good design, but it doesn't hurt to keep an eye on the efficiency too.
I think you are overestimating the cost of this - I think this is a very clean design that is easy to understand and easy to get right on the application side. Definitely my favorite.
Note that in my code snippet for the previous "library owns" case, their is an equal amount of memory allocations taking place. But those can easily be avoided by allocating just a single object, and re-using it every time. However for the "application owns" case no such optimization is possible.
Well, your C library will take care of this for you. The cost of allocations is for all intents and purposes negligible.
C. Reference counting
An alternative option would be to make the new objects reference counted. This changes nothing to the fact that we have to decide whether the application has to take a reference explicitly (e.g. case A) or gains it automatically (e.g. case B). But with the reference count, we can detect whether the application still holds a reference or not, and avoid unnecessary memory allocations:
[...]
It's of course slightly more complex to implement, and we have to make sure the reference counting is thread-safe too using atomic operations or protected with a mutex), but I think the resulting api will be more intuitive. Reference counting is pretty standard today.
Yuck, yuck, yuck. It's standard and universally hated. It creates significantly more pointless code and boilerplate on both sides of the API for no real benefit.
And it's a wonderful way to create subtle, really hard to debug problems - you forget to reference the object and keep using it - Valgrind won't find that you are using 'freed' memory because it isn't freed - it just changes values under you. I absolutely HATE this idea.
Please make the memory application owned and have the application be responsible for freeing memory. Unless you are planning to run libdivecomputer on a wristwatch the overhead for that is negligible with today's OSs (that all have a reasonably smart malloc implementation that will reuse the recently freed memory block on its own).
D. Mixed ownership
The last option is to allow different ownership rules for the dive and sample objects. For example we could choose "applications owns" for the dives, and "library owns" for the samples. In that case, the dives can be used outside the callback function, while their is no unnecessary overhead for the samples. The obvious downside is less consistency.
I vote for consistency any time.
- Querying available sample values?
=====================================
With the current design, the samples are pushed to the application, and there is no need to know which values are actually supported. You'll only get those values that are actually present. However, with the new design, the application has to call an accessor function to retrieve a piece of data, and thus it needs some way to query which values are presented. The only values that are guaranteed to be available for each sample are time and depth. Everything else is optional.
The level at which we can tell whether a particular sample value is present varies. It can be at the device level (for example a non-air integrated model will never store tank pressure data), at the dive level (for example a model with a wireless pressure sensor can have dives with and without tank pressure stored), or even at the sample level (for example a tank pressure every x sample).
Or you have a Suunto dive computer and the sample data is randomly missing... :-/
To support all possible cases this querying should thus be done at the sample level.
I think the simplest solution is to use the accessor function itself to probe for the presence of the info. You just call the accessor function and if the info isn't available, a special error code like UNSUPPORTED is returned:
if (dc_sample_get_depth (sample, &depth) == DC_STATUS_SUCCESS) { /* Depth retrieved successfully. */ } else { /* No depth available (or an error occurred). */ }
For multivalued data, like the tank pressure, we could then use an extra index parameter in the accessor function, and then call it repeatedly until the UNSUPPORTED error is returned:
i = 0; while (dc_sample_get_pressure (sample, &pressure, i) == DC_STATUS_SUCCESS) { /* Tank pressure retrieved successfully. */
i++;
}
An alternative solution for the multivalued data would be to retrieve all values at once using an array:
dc_sample_get_pressure (dc_sample_t *sample, double pressure[], unsigned int n);
But then the caller somehow has to know the maximum value for n, and the api also has to communicate back the actual number of values returned. The second part is easy with an extra output parameter, but I don't know a nice solution for the first problem (other than a second dc_sample_get_pressure_count function).
Alternatives for the probing approach outlined above, are an iterator like interface to iterator over the available sample values, or a function to retrieve a bitmap with the available sample values.
I was hoping for a somewhat different design.
int dc_sample_get_basic(sample, basic_sample);
returns DC_STATUS_SUCCESS if you got a sample back, various errors otherwise.
basic_sample_t { int time; // in seconds int depth; // in mm (or you can do a double in meters) uint64_t data_available; // bitmap on what else is available }
Then your header files define those bits and the corresponding data structures.
#define DC_PRESSURE_SAMPLE 1<<0 dc_pressure_sample_t { int nr_tanks; int *tankpressure; }
So this tells you how many pressure samples and returns a pointer to the array of pressures (or in the typical case to the single pressure) - an integer in mbar or a double in bar or whatever.
And for simple stuff just have an int (or you can go fancy and define types for all of them).
#define DC_TEMPERATURE_SAMPLE 1<<1 typedef dc_temperature_sample_t int; // temperature in mkelvin (or whatever you want)
This way ONE call gets you the basics that every dive computer supports and at the same time the information what else is available for THIS sample. And you can keep things API compatible by simply adding new bits whenever you add a new piece of information that you can get in a sample - the app can then do
#ifdef DC_DECO_SAMPLE ... #endif
and work with older and newer versions of libdivecomputer...
- Supported dive and sample data
==================================
Which dive data and sample data should we support?
I already made a tentative list, but this is open for discussion of course. Items can be added or removed after discussion.
Dives:
- date/time
- divetime
length of dive as reported by DC?
- maxdepth
- gasmixes
That's variable length, right?
- tank pressures (begin/end)
ditto
- temperature
air temperature, I assume?
- salinity, altitude/atmospheric pressure
- dive mode (open/closed circuit, gauge, etc)
Samples:
- time
- depth
- temperature
- tank pressure
- heartrate
- airtime
That seems like a good list. Heartrate is the one that seems a bit out of place - are there many computers that support that? I would have seen CNS, NDL as more likely candidates for the default list. Also, your tank pressure contains the tank index as well, right?
Events:
- Gas changes
as we discussed in earlier email - I think having the tank index in the sample is a better solution...
- Deco stops (depth and time)
- Warnings, bookmarks, etc
For the events, I'm not sure yet how the querying (see previous item) should be done. The normal samples are continuous values, and will usually appear at least every X sample, or not at all. But the events are discrete in nature and will only appear occasionally. Should this be done differently in the api?
We could do this with the bitmap proposed above, right?
We'll also have to agree on a data structure and the exact semantics for each of those pieces of data. Some won't need much discussion, but others will. For example take the temperature. A device can record a single value which could be the minimum, maximum or average temperature. It can have all three, or a full temperature profile. How do we deal with that? Things are not always as easy as they may look.
I think the best way to deal with that is to have a bitfield flags that is part of every sample. Be generous, at least 64 bits. And simply have a clearly defined default semantic (whatever Suuntu uses, or some other common DC) and whenever there's a different semantic for such a value, add a flag bit that again has clear semantic.
So for example by default temperature is the minimum temperature. And then there can be a DC_DATA_TEMP_IS_AVERAGE bit that tells you that instead this is the average temperature.
So you have one of those flags bitfield in the dive data, and one in each sample. And you do your darnedest to make sure that there is a precise definition of the default meaning of each value, and a precise definition of how the bits modify that.
Closely related to the data structures is the choice of the unit system. The choice is either metric or SI. SI units have the advantage all values are positive, but applications are likely to use metric or imperial. We can also consider to introduce a unit system as used in subsurface, where all values are scaled to fit into an integer (e.g. depths in millimeters instead of meters, and so on)? Or are we happy with floating point values?
I'm obviously biased here... but in the end, I'll be happy with floats or integers. But please, please, please, ONLY USE SI-related values. It's the applications job to convert this into whatever insane preferences the user has. Distance in ft, volume in liter, temperature in C, pressure in psi... whatever. But libdivecomputer should ALWAYS give us SI-related units (I say related because one can argue if C or K is the right temperature value - we went with mK because that gives us plenty of precision in an unsigned integer - but one might reasonably argue that float centigrade is an equally valid SI-related unit... just don't allow F or other random non SI units...). If the DC is silly enough to return data in the unit system that the user has set up, then libdivecomputer should convert that into sane units.
That's it for now. Comments are welcome! If you think I missed something important, don't hesitate to bring it up. It's not unlikely that I forgot about something :-)
I like the overall direction. And I'm glad that we can discuss this hear before it is set in stone... the current API has caused us some frustration here and there and this is a great opportunity to create something that can be maintained in a backwards compatible manner for a long time to come...
/D
On 2012-11-15 19:43, Dirk Hohndel wrote:
On Thu, 2012-11-15 at 12:16 +0100, Jef Driesen wrote:
C. Reference counting
An alternative option would be to make the new objects reference counted. This changes nothing to the fact that we have to decide whether the application has to take a reference explicitly (e.g. case A) or gains it automatically (e.g. case B). But with the reference count, we can detect whether the application still holds a reference or not, and avoid unnecessary memory allocations:
[...]
It's of course slightly more complex to implement, and we have to make sure the reference counting is thread-safe too using atomic operations or protected with a mutex), but I think the resulting api will be more intuitive. Reference counting is pretty standard today.
Yuck, yuck, yuck. It's standard and universally hated. It creates significantly more pointless code and boilerplate on both sides of the API for no real benefit.
And it's a wonderful way to create subtle, really hard to debug problems
- you forget to reference the object and keep using it - Valgrind
won't find that you are using 'freed' memory because it isn't freed - it just changes values under you. I absolutely HATE this idea.
Personally, I don't think reference counting is that bad. On the application side it's just a few extra dc_object_ref calls, and dc_object_unref instead of dc_object_free. And you can equally well shoot yourself in the foot with plain malloc. But okay, your preference is noted :-)
Please make the memory application owned and have the application be responsible for freeing memory. Unless you are planning to run libdivecomputer on a wristwatch the overhead for that is negligible with today's OSs (that all have a reasonably smart malloc implementation that will reuse the recently freed memory block on its own).
I certainly agree that this is probably the most intuitive and simple design.
But while I'm not too concerned about the memory allocations by themselves, I still find it a bit silly having to allocate resources where it's not strictly necessary. For the sample data I simply can't think of any use-case where you would want to hold on to the sample objects outside the callback function. Hence my search for possible alternatives.
D. Mixed ownership
The last option is to allow different ownership rules for the dive and sample objects. For example we could choose "applications owns" for the dives, and "library owns" for the samples. In that case, the dives can be used outside the callback function, while their is no unnecessary overhead for the samples. The obvious downside is less consistency.
I vote for consistency any time.
I usually prefer consistency too...
The idea originated from the fact that there exits two types of containers. In a traditional container (e.g. array, linked list, etc) the elements are already owned by the container. So when you iterate such a container, you just get a direct reference to the element in the container. There is no need to deal with ownership anywhere, because that's already taken care of.
But in our case, the containers act more like generators, where the elements are produced on-the-fly. There it makes sense to have the application own the generated items. However, for the samples, you can consider a dive as a traditional container for the samples. The samples are already present, they just need to be wrapped in a device independent structure.
- Querying available sample values?
=====================================
With the current design, the samples are pushed to the application, and there is no need to know which values are actually supported. You'll only get those values that are actually present. However, with the new design, the application has to call an accessor function to retrieve a piece of data, and thus it needs some way to query which values are presented. The only values that are guaranteed to be available for each sample are time and depth. Everything else is optional.
The level at which we can tell whether a particular sample value is present varies. It can be at the device level (for example a non-air integrated model will never store tank pressure data), at the dive level (for example a model with a wireless pressure sensor can have dives with and without tank pressure stored), or even at the sample level (for example a tank pressure every x sample).
Or you have a Suunto dive computer and the sample data is randomly missing... :-/
What do you mean?
To support all possible cases this querying should thus be done at the sample level.
I think the simplest solution is to use the accessor function itself to probe for the presence of the info. You just call the accessor function and if the info isn't available, a special error code like UNSUPPORTED is returned:
if (dc_sample_get_depth (sample, &depth) == DC_STATUS_SUCCESS) { /* Depth retrieved successfully. */ } else { /* No depth available (or an error occurred). */ }
For multivalued data, like the tank pressure, we could then use an extra index parameter in the accessor function, and then call it repeatedly until the UNSUPPORTED error is returned:
i = 0; while (dc_sample_get_pressure (sample, &pressure, i) == DC_STATUS_SUCCESS) { /* Tank pressure retrieved successfully. */
i++;
}
An alternative solution for the multivalued data would be to retrieve all values at once using an array:
dc_sample_get_pressure (dc_sample_t *sample, double pressure[], unsigned int n);
But then the caller somehow has to know the maximum value for n, and the api also has to communicate back the actual number of values returned. The second part is easy with an extra output parameter, but I don't know a nice solution for the first problem (other than a second dc_sample_get_pressure_count function).
Alternatives for the probing approach outlined above, are an iterator like interface to iterator over the available sample values, or a function to retrieve a bitmap with the available sample values.
I was hoping for a somewhat different design.
int dc_sample_get_basic(sample, basic_sample);
returns DC_STATUS_SUCCESS if you got a sample back, various errors otherwise.
basic_sample_t { int time; // in seconds int depth; // in mm (or you can do a double in meters) uint64_t data_available; // bitmap on what else is available }
Then your header files define those bits and the corresponding data structures.
#define DC_PRESSURE_SAMPLE 1<<0 dc_pressure_sample_t { int nr_tanks; int *tankpressure; }
So this tells you how many pressure samples and returns a pointer to the array of pressures (or in the typical case to the single pressure) - an integer in mbar or a double in bar or whatever.
And for simple stuff just have an int (or you can go fancy and define types for all of them).
#define DC_TEMPERATURE_SAMPLE 1<<1 typedef dc_temperature_sample_t int; // temperature in mkelvin (or whatever you want)
This way ONE call gets you the basics that every dive computer supports and at the same time the information what else is available for THIS sample. And you can keep things API compatible by simply adding new bits whenever you add a new piece of information that you can get in a sample
- the app can then do
#ifdef DC_DECO_SAMPLE ... #endif
and work with older and newer versions of libdivecomputer...
I also considered using a bitmap (it was mentioned briefly somewhere in my mail I think), but I couldn't find a way to make it work with vendor extensions. A bitfield is limited to 8 to 64 different types, depending on the integer size we decide to use. For the normal samples, that is probably more than we'll ever need. But if every backend can support a few extension types, and each one needs another bit, then we'll quickly running out of bits. Do you have a solution for this?
While I'm definitely not opposed to the bitmap idea, I'm not that that happy with the basic_sample_t struct. I would prefer to implement this with separate functions. One for the bitmap itself, and one function for each value. Since there are only two values that are always available, I wouldn't treat them any different from the others.
For the multivalued data, I'm not sure how well such a struct (containing a length and pointer to an array) will work for language bindings (python, .net, etc). That's something we'll have to check.
I prefer enums over pre-processor constants too. I know this will loose the conditional compiling, but this is a trick that will work for C/C++ only.
- Supported dive and sample data
==================================
Which dive data and sample data should we support?
I already made a tentative list, but this is open for discussion of course. Items can be added or removed after discussion.
Dives:
- date/time
- divetime
length of dive as reported by DC?
Yes, and if such value isn't available, fallback to a value calculated from the samples. Exactly the same as we do today.
- maxdepth
- gasmixes
That's variable length, right?
- tank pressures (begin/end)
ditto
Yes, multiple gases and tanks will be fully supported.
- temperature
air temperature, I assume?
Depends. Different dive computers have different interpretations. It could be air/water temperature, the min/max/avg value, and so on. If that's even know to us. Not sure how to deal with this. We could just supply a temperature, and leaving the actual interpretation to the user, or we could have an extra flag with the exact interpretation (if known) as you mention below. But even then we're still left with devices that provide more than one temperature.
For example for the suunto vyper family, there is a temperature at the start/end of the dive, and at maximum depth. In this particular case I solved this by injecting artificial temperature values into the samples.
- salinity, altitude/atmospheric pressure
- dive mode (open/closed circuit, gauge, etc)
Samples:
- time
- depth
- temperature
- tank pressure
- heartrate
- airtime
That seems like a good list. Heartrate is the one that seems a bit out of place - are there many computers that support that? I would have seen CNS, NDL as more likely candidates for the default list.
Heartrate is only supported by the Uwatec Galileo as far as I know. It certainly not a must-have feature, but I can imagine that if you have such a device, this is an interesting value. Although the actual usefulness is questionable if you ask me, but it's cool gadget :-)
CNS/OTU can be calculated from the samples, but we can consider to support it too. I have no idea which devices record this in the data. I know the Suunto's have this weird OLF (Oxygen Limit Factor) value, which is either the CNS or OTU, whichever is higher. And I think the Oceanics have a scale from 1 to 5 or something. So that might be a bit more difficult to support in a uniform way.
I assume with NDL, you mean the remaining no-deco time? I don't think any device does support that. And then you are left with NDL equals the absense of decostop info.
Also, your tank pressure contains the tank index as well, right?
Of course.
Events:
- Gas changes
as we discussed in earlier email - I think having the tank index in the sample is a better solution...
Indeed.
- Deco stops (depth and time)
- Warnings, bookmarks, etc
For the events, I'm not sure yet how the querying (see previous item) should be done. The normal samples are continuous values, and will usually appear at least every X sample, or not at all. But the events are discrete in nature and will only appear occasionally. Should this be done differently in the api?
We could do this with the bitmap proposed above, right?
What I had in mind, is to have a single DC_SAMPLE_EVENT type. If this bit is set, there are events available. The actual event value would then be another bitmap containing all the events. With this approach we can support several events without needing additional sample types for each event type.
We'll also have to agree on a data structure and the exact semantics for each of those pieces of data. Some won't need much discussion, but others will. For example take the temperature. A device can record a single value which could be the minimum, maximum or average temperature. It can have all three, or a full temperature profile. How do we deal with that? Things are not always as easy as they may look.
I think the best way to deal with that is to have a bitfield flags that is part of every sample. Be generous, at least 64 bits. And simply have a clearly defined default semantic (whatever Suuntu uses, or some other common DC) and whenever there's a different semantic for such a value, add a flag bit that again has clear semantic.
So for example by default temperature is the minimum temperature. And then there can be a DC_DATA_TEMP_IS_AVERAGE bit that tells you that instead this is the average temperature.
So you have one of those flags bitfield in the dive data, and one in each sample. And you do your darnedest to make sure that there is a precise definition of the default meaning of each value, and a precise definition of how the bits modify that.
I fear that the use of extra flags will quickly lead to a mess again. We now have this for the temperature, tomorrow we may need extra flags for something else, and before we know we end up with a flags everywhere.
And it doesn't solve the case where there are multiple temperature values either.
Closely related to the data structures is the choice of the unit system. The choice is either metric or SI. SI units have the advantage all values are positive, but applications are likely to use metric or imperial. We can also consider to introduce a unit system as used in subsurface, where all values are scaled to fit into an integer (e.g. depths in millimeters instead of meters, and so on)? Or are we happy with floating point values?
I'm obviously biased here... but in the end, I'll be happy with floats or integers. But please, please, please, ONLY USE SI-related values. It's the applications job to convert this into whatever insane preferences the user has. Distance in ft, volume in liter, temperature in C, pressure in psi... whatever. But libdivecomputer should ALWAYS give us SI-related units (I say related because one can argue if C or K is the right temperature value - we went with mK because that gives us plenty of precision in an unsigned integer - but one might reasonably argue that float centigrade is an equally valid SI-related unit... just don't allow F or other random non SI units...). If the DC is silly enough to return data in the unit system that the user has set up, then libdivecomputer should convert that into sane units.
Sorry for the possible confusion, but imperial was never an option. It's either metric or SI. The only difference between the two is the temperature (Celsius vs Kelvin). For the other units it's just a difference in scaling factors (Pascal vs bar).
That's it for now. Comments are welcome! If you think I missed something important, don't hesitate to bring it up. It's not unlikely that I forgot about something :-)
I like the overall direction. And I'm glad that we can discuss this hear before it is set in stone... the current API has caused us some frustration here and there and this is a great opportunity to create something that can be maintained in a backwards compatible manner for a long time to come...
Great.
Jef
Jef Driesen jefdriesen@telenet.be writes:
Yuck, yuck, yuck. It's standard and universally hated. It creates significantly more pointless code and boilerplate on both sides of the API for no real benefit.
And it's a wonderful way to create subtle, really hard to debug problems
- you forget to reference the object and keep using it - Valgrind
won't find that you are using 'freed' memory because it isn't freed - it just changes values under you. I absolutely HATE this idea.
Personally, I don't think reference counting is that bad. On the application side it's just a few extra dc_object_ref calls, and dc_object_unref instead of dc_object_free. And you can equally well shoot yourself in the foot with plain malloc. But okay, your preference is noted :-)
I really want to stress my last point. It's debuggability that I am most concerned about. If the application owns the memory, there are very powerful tools to help you figure out where you shot yourself in the foot (ask me how I know).
Please make the memory application owned and have the application be responsible for freeing memory. Unless you are planning to run libdivecomputer on a wristwatch the overhead for that is negligible with today's OSs (that all have a reasonably smart malloc implementation that will reuse the recently freed memory block on its own).
I certainly agree that this is probably the most intuitive and simple design.
But while I'm not too concerned about the memory allocations by themselves, I still find it a bit silly having to allocate resources where it's not strictly necessary. For the sample data I simply can't think of any use-case where you would want to hold on to the sample objects outside the callback function. Hence my search for possible alternatives.
I understand. But I think you are focusing on the wrong problem. Having a consistent API that treats memory consistently and doesn't require the application to know the internals of your implementation is a HUGE advantage. It makes libdivecomputer easier to use.
"Any memory allocation that libdivecomputer returns to the application is owned by the application and needs to be freed when it is no longer needed".
This is easy to understand, easy to implement, even easy to verify with static analysis tools.
Elegance and best possible algorithms are honorable goals. But consistency and ease of use (and debuggability) are more important, I think.
- Querying available sample values?
=====================================
With the current design, the samples are pushed to the application, and there is no need to know which values are actually supported. You'll only get those values that are actually present. However, with the new design, the application has to call an accessor function to retrieve a piece of data, and thus it needs some way to query which values are presented. The only values that are guaranteed to be available for each sample are time and depth. Everything else is optional.
The level at which we can tell whether a particular sample value is present varies. It can be at the device level (for example a non-air integrated model will never store tank pressure data), at the dive level (for example a model with a wireless pressure sensor can have dives with and without tank pressure stored), or even at the sample level (for example a tank pressure every x sample).
Or you have a Suunto dive computer and the sample data is randomly missing... :-/
What do you mean?
Oh - we have several Suunto DCs where the wireless sensor keeps dropping out and randomly you have samples with missing pressure... I was basically agreeing with you, giving another scenario where this could be the case.
I was hoping for a somewhat different design.
int dc_sample_get_basic(sample, basic_sample);
returns DC_STATUS_SUCCESS if you got a sample back, various errors otherwise.
basic_sample_t { int time; // in seconds int depth; // in mm (or you can do a double in meters) uint64_t data_available; // bitmap on what else is available }
Then your header files define those bits and the corresponding data structures.
#define DC_PRESSURE_SAMPLE 1<<0 dc_pressure_sample_t { int nr_tanks; int *tankpressure; }
So this tells you how many pressure samples and returns a pointer to the array of pressures (or in the typical case to the single pressure) - an integer in mbar or a double in bar or whatever.
And for simple stuff just have an int (or you can go fancy and define types for all of them).
#define DC_TEMPERATURE_SAMPLE 1<<1 typedef dc_temperature_sample_t int; // temperature in mkelvin (or whatever you want)
This way ONE call gets you the basics that every dive computer supports and at the same time the information what else is available for THIS sample. And you can keep things API compatible by simply adding new bits whenever you add a new piece of information that you can get in a sample
- the app can then do
#ifdef DC_DECO_SAMPLE ... #endif
and work with older and newer versions of libdivecomputer...
I also considered using a bitmap (it was mentioned briefly somewhere in my mail I think), but I couldn't find a way to make it work with vendor extensions. A bitfield is limited to 8 to 64 different types, depending on the integer size we decide to use. For the normal samples, that is probably more than we'll ever need. But if every backend can support a few extension types, and each one needs another bit, then we'll quickly running out of bits. Do you have a solution for this?
Yes. Sort of. You have one bit in your bitmap that says "vendor extension" and then the vendor extension can use another int (16bit or something) with an enum (so 64k values for 16 bits) to indicate which extension this is. So we don't waste the space in every sample, but if there is an extension we have tons of room to grow and add extensions for new things to come.
While I'm definitely not opposed to the bitmap idea, I'm not that that happy with the basic_sample_t struct. I would prefer to implement this with separate functions. One for the bitmap itself, and one function for each value. Since there are only two values that are always available, I wouldn't treat them any different from the others.
The reason that I thought of treating them differently is that these two values are available in every sample from every DC (otherwise, what's the point of that DC). So you can have one call that gives you what you can always get (time, depth) plus the information about everything else that is available.
But if you'd rather have this be three API calls (get_time, get_depth, get_available_data_bitmap or something), I'll be happy with that, too.
For the multivalued data, I'm not sure how well such a struct (containing a length and pointer to an array) will work for language bindings (python, .net, etc). That's something we'll have to check.
I don't know about .net. In python this would work just fine.
I prefer enums over pre-processor constants too. I know this will loose the conditional compiling, but this is a trick that will work for C/C++ only.
The problem with enums is that you now have to have strong versioning of your API - something which libdivecomputer has not a good history of doing (and I know all the reasons why - don't take this as criticism, just as an observation). So if you use enums then I have to recreate the logic of "which version of libdivecomputer supports which values for this function" in my app - and so does everyone else. We need to basically create a "version number to feature mapping" in the client.
If you do this with #defines then this is much easier to take care of. At least for C/C++ programers - I have seen other language bindings parse the defines for this as well, but don't know what the current users of libdivecomputer are
- Supported dive and sample data
==================================
- temperature
air temperature, I assume?
Depends. Different dive computers have different interpretations. It could be air/water temperature, the min/max/avg value, and so on. If that's even know to us. Not sure how to deal with this. We could just supply a temperature, and leaving the actual interpretation to the user, or we could have an extra flag with the exact interpretation (if known) as you mention below. But even then we're still left with devices that provide more than one temperature.
So offer a couple of values with flags that determin the semantics
* air_temperature * min_water_temperature * avg_water_temperature
And offer flags that clarify semantics
AIR_TEMP_IS_FIRST_SAMPLE AVG_IS_LAST_TEMP MIN_TEMP_IS_AT_MAX_DEPTH
(btw, I love the last one as I frequently dive in a water body that for more than half of the year has temperature go UP as you go deeper...)
etc.
For example for the suunto vyper family, there is a temperature at the start/end of the dive, and at maximum depth. In this particular case I solved this by injecting artificial temperature values into the samples.
- salinity, altitude/atmospheric pressure
- dive mode (open/closed circuit, gauge, etc)
Samples:
- time
- depth
- temperature
- tank pressure
- heartrate
- airtime
That seems like a good list. Heartrate is the one that seems a bit out of place - are there many computers that support that? I would have seen CNS, NDL as more likely candidates for the default list.
Heartrate is only supported by the Uwatec Galileo as far as I know. It certainly not a must-have feature, but I can imagine that if you have such a device, this is an interesting value. Although the actual usefulness is questionable if you ask me, but it's cool gadget :-)
So much of what high end DCs provide is more cool than entirely useful - but I agree that we should make it available regardless. Visualization is the application's job :-)
CNS/OTU can be calculated from the samples, but we can consider to support it too.
Agree on both parts. I know several DCs provide CNS information and would love to be able to use that instead of the internally calculated values in Subsurface.
I have no idea which devices record this in the data. I know the Suunto's have this weird OLF (Oxygen Limit Factor) value, which is either the CNS or OTU, whichever is higher. And I think the Oceanics have a scale from 1 to 5 or something. So that might be a bit more difficult to support in a uniform way.
I'll keep singing the same song. Pick one semantic as default and provide clearly specified flags to modify that.
I assume with NDL, you mean the remaining no-deco time? I don't think any device does support that. And then you are left with NDL equals the absense of decostop info.
Several do. OSTC and Uemis Zurich both give you their current no deco time in either every sample or every nth sample (OSTC).
- Deco stops (depth and time)
- Warnings, bookmarks, etc
For the events, I'm not sure yet how the querying (see previous item) should be done. The normal samples are continuous values, and will usually appear at least every X sample, or not at all. But the events are discrete in nature and will only appear occasionally. Should this be done differently in the api?
We could do this with the bitmap proposed above, right?
What I had in mind, is to have a single DC_SAMPLE_EVENT type. If this bit is set, there are events available. The actual event value would then be another bitmap containing all the events. With this approach we can support several events without needing additional sample types for each event type.
Good idea.
We'll also have to agree on a data structure and the exact semantics for each of those pieces of data. Some won't need much discussion, but others will. For example take the temperature. A device can record a single value which could be the minimum, maximum or average temperature. It can have all three, or a full temperature profile. How do we deal with that? Things are not always as easy as they may look.
I think the best way to deal with that is to have a bitfield flags that is part of every sample. Be generous, at least 64 bits. And simply have a clearly defined default semantic (whatever Suuntu uses, or some other common DC) and whenever there's a different semantic for such a value, add a flag bit that again has clear semantic.
So for example by default temperature is the minimum temperature. And then there can be a DC_DATA_TEMP_IS_AVERAGE bit that tells you that instead this is the average temperature.
So you have one of those flags bitfield in the dive data, and one in each sample. And you do your darnedest to make sure that there is a precise definition of the default meaning of each value, and a precise definition of how the bits modify that.
I fear that the use of extra flags will quickly lead to a mess again. We now have this for the temperature, tomorrow we may need extra flags for something else, and before we know we end up with a flags everywhere.
And it doesn't solve the case where there are multiple temperature values either.
See what I wrote above. The moment you don't have precisely defined semantics of the value (even if the semantic is just our current best guess based on reverse engineering) the usefulness of that value to the application declines dramatically. An app is always at liberty to ignore that additional information ("hey, here's some temperature, no idea what it is"), but if it isn't there all that will happen is that apps start to make educated guesses based on the type of DC used. And that's just ugly.
I feel very strongly about that. I think and API that doesn't define what the values it gives you actually mean is really of rather limited use.
Sorry for the possible confusion, but imperial was never an option. It's either metric or SI. The only difference between the two is the temperature (Celsius vs Kelvin). For the other units it's just a difference in scaling factors (Pascal vs bar).
Good!
/D
On 2012-11-16 20:18, Dirk Hohndel wrote:
Jef Driesen jefdriesen@telenet.be writes:
But while I'm not too concerned about the memory allocations by themselves, I still find it a bit silly having to allocate resources where it's not strictly necessary. For the sample data I simply can't think of any use-case where you would want to hold on to the sample objects outside the callback function. Hence my search for possible alternatives.
I understand. But I think you are focusing on the wrong problem. Having a consistent API that treats memory consistently and doesn't require the application to know the internals of your implementation is a HUGE advantage. It makes libdivecomputer easier to use.
"Any memory allocation that libdivecomputer returns to the application is owned by the application and needs to be freed when it is no longer needed".
This is easy to understand, easy to implement, even easy to verify with static analysis tools.
Elegance and best possible algorithms are honorable goals. But consistency and ease of use (and debuggability) are more important, I think.
If the "application owns" is the preferred option, then that's fine for me. No problem.
I also considered using a bitmap (it was mentioned briefly somewhere in my mail I think), but I couldn't find a way to make it work with vendor extensions. A bitfield is limited to 8 to 64 different types, depending on the integer size we decide to use. For the normal samples, that is probably more than we'll ever need. But if every backend can support a few extension types, and each one needs another bit, then we'll quickly running out of bits. Do you have a solution for this?
Yes. Sort of. You have one bit in your bitmap that says "vendor extension" and then the vendor extension can use another int (16bit or something) with an enum (so 64k values for 16 bits) to indicate which extension this is. So we don't waste the space in every sample, but if there is an extension we have tons of room to grow and add extensions for new things to come.
If I understand your proposal correctly, you would have a data structure similar to the existing vendor data:
struct dc_sample_vendor_t { unsigned int type; unsigned int size; const void *data; };
That would work for a one vendor extension per sample, but I don't see how you would make that work nicely in case there can be multiple extensions per sample. It also makes it harder to use a custom data structure for each extension, instead of just a blob of raw data. Yes, you can abuse the void pointer to point to any structure, but that isn't the most pretty interface.
Let's assume that a device supports a heartrate and an airtime value per sample. And assume for a moment those are not supported with the main api.
If we introduce some kind of a generic dc_sample_get_vendor() function, then we have to make sure it can return both the heartrate and airtime value. We can probably solve that in a similar way as you proposed for the tank pressures. However, my idea was that the backend would provide some dc_xxx_sample_get_heartrate() and dc_xxx_sample_get_airtime() functions with a corresponding data structure. But then a single bit to indicate the presence of a vendor extension isn't sufficient. If every such extensions needs it own bit in the main bitfield, then we'll quickly run out of bits. There are many backends and many possible extensions, so we have a virtually unlimited number of possible extensions. We would at least need an independent bitmap only for the extensions, or some other solution.
I think this is something we'll have to explore further.
While I'm definitely not opposed to the bitmap idea, I'm not that that happy with the basic_sample_t struct. I would prefer to implement this with separate functions. One for the bitmap itself, and one function for each value. Since there are only two values that are always available, I wouldn't treat them any different from the others.
The reason that I thought of treating them differently is that these two values are available in every sample from every DC (otherwise, what's the point of that DC). So you can have one call that gives you what you can always get (time, depth) plus the information about everything else that is available.
But if you'd rather have this be three API calls (get_time, get_depth, get_available_data_bitmap or something), I'll be happy with that, too.
For the multivalued data, I'm not sure how well such a struct (containing a length and pointer to an array) will work for language bindings (python, .net, etc). That's something we'll have to check.
I don't know about .net. In python this would work just fine.
I'm pretty sure such a structure isn't supported out of the box and will need some manual marshaling on the .net side. So my guess is that it can be done, but it won't be very pretty.
We can maybe consider to provide language bindings for the most popular languages (.net and python) ourselves?
I prefer enums over pre-processor constants too. I know this will loose the conditional compiling, but this is a trick that will work for C/C++ only.
The problem with enums is that you now have to have strong versioning of your API - something which libdivecomputer has not a good history of doing (and I know all the reasons why - don't take this as criticism, just as an observation). So if you use enums then I have to recreate the logic of "which version of libdivecomputer supports which values for this function" in my app - and so does everyone else. We need to basically create a "version number to feature mapping" in the client.
If you do this with #defines then this is much easier to take care of. At least for C/C++ programers - I have seen other language bindings parse the defines for this as well, but don't know what the current users of libdivecomputer are
As far as I know there are at least .NET (divinglog) and python (kenozooid) based applications. I got quite some questions about .NET support, so I assume there will be others too.
Anyway, I think it's very important to not treat these non C/C++ applications as second class citizens, especially because at least one of them (divinglog) is likely the application with the largest userbase. So that means all features should be usable from the languages bindings too.
BTW, that's for example one of the main reasons the logging interface doesn't exposes a vararg interface, but calls vsnprintf internally to convert the messages to strings. Strings are supported by all languages, varargs are not.
I guess that once we have decided on the final list of dive and sample data that we will support, the chances that we'll have to introduce additional samples in the future are relative small. I mean the goal for the v1.0 was to maintain backwards compatibility from then on. If we then document the first version number in which every feature appeared, then it shouldn't be that much of a hassle to do:
#if DC_VERSION_CHECK(x,y,z)
#endif
and then language bindings can do the same at runtime.
But okay, this is something we can discuss further.
- Supported dive and sample data
==================================
- temperature
air temperature, I assume?
Depends. Different dive computers have different interpretations. It could be air/water temperature, the min/max/avg value, and so on. If that's even know to us. Not sure how to deal with this. We could just supply a temperature, and leaving the actual interpretation to the user, or we could have an extra flag with the exact interpretation (if known) as you mention below. But even then we're still left with devices that provide more than one temperature.
So offer a couple of values with flags that determin the semantics
- air_temperature
- min_water_temperature
- avg_water_temperature
And offer flags that clarify semantics
AIR_TEMP_IS_FIRST_SAMPLE AVG_IS_LAST_TEMP MIN_TEMP_IS_AT_MAX_DEPTH
(btw, I love the last one as I frequently dive in a water body that for more than half of the year has temperature go UP as you go deeper...)
To be honest, I don't really like this. This tastes again as adapting our model to that of the dive computers. If we supply three values, then it breaks again if there is some dive computer out there that supports some fourth value. And with those flags, the semantics can be altered in many different ways. I fear this will quickly get messy on the application side.
If you take a step back and think about it, the temperature is just a poor substitute for those devices without a full temperature profile. If you have a full temperature profile, then those min/max/avg values and begin/end values suddenly become a lot less interesting.
So it might be more interesting to have the begin/end temperature, and the temperature at maximum depth delivered in the sample data. We already do this today for the suunto vyper backend, and I think it works quite well. Instead of temperature graph, you now draw a few temperature dots. If we do that, we can then reserve the temperature field exclusively for the true single temperature value case.
Heartrate is only supported by the Uwatec Galileo as far as I know. It certainly not a must-have feature, but I can imagine that if you have such a device, this is an interesting value. Although the actual usefulness is questionable if you ask me, but it's cool gadget :-)
So much of what high end DCs provide is more cool than entirely useful - but I agree that we should make it available regardless. Visualization is the application's job :-)
Yes, that's the idea. As long as it's reasonable device independent (e.g. there is no reason why other devices can't include a heartrate monitor, and beats per minute is a standard unit too) and doesn't complicate the api by (e.g. if you don't need it, you just don't read the value and that's it), then I have no real objections. The same for other types like CNS/OTU.
But things like for example tissue saturation in the decompression model of device X, or RGBM warning Y, that's a whole other story.
I have no idea which devices record this in the data. I know the Suunto's have this weird OLF (Oxygen Limit Factor) value, which is either the CNS or OTU, whichever is higher. And I think the Oceanics have a scale from 1 to 5 or something. So that might be a bit more difficult to support in a uniform way.
I'll keep singing the same song. Pick one semantic as default and provide clearly specified flags to modify that.
But then you introduce extra complexity on the application side. If you get an CNS value, then you have to check all those flags to figure out whether it's a percentage (default), or some scale index, or that weird suunto OLF value, or...
You can't just take the value anymore and treat is as a percentage. From my point of view that makes the abstraction provided by libdivecomputer less useful. I mean how different is that from doing a model based interpretation:
if (suunto) /* OLF */ else if (oceanic) /* Range index */ else /* Percentage */
I think we are better off always providing a percentage, and if that is not available then you don't get a CNS value at all.
What I had in mind, is to have a single DC_SAMPLE_EVENT type. If this bit is set, there are events available. The actual event value would then be another bitmap containing all the events. With this approach we can support several events without needing additional sample types for each event type.
Good idea.
Then the next step will be to make a list of the events we want.
See what I wrote above. The moment you don't have precisely defined semantics of the value (even if the semantic is just our current best guess based on reverse engineering) the usefulness of that value to the application declines dramatically. An app is always at liberty to ignore that additional information ("hey, here's some temperature, no idea what it is"), but if it isn't there all that will happen is that apps start to make educated guesses based on the type of DC used. And that's just ugly.
I feel very strongly about that. I think and API that doesn't define what the values it gives you actually mean is really of rather limited use.
I agree, although not as strong as you do.
Let's say we define the meaning of the temperature as the minimum water temperature, which is probably the most common value. But then you happen to have some dive computer which only provides the average water temperature. Since that's all we have, that's what we give to the application and goes into your logbook. Is the small difference in semantics really that important, considering we don't have any other info? I mean to the user it won't make that much difference, because the semantics will remain consistent across all his dives with that device. In a paper logbook you would also just write down the value that you get from your dive computer.
So while I agree that a crystal clear semantics would be ideal, I also believe some small deviations are acceptable. You can also put it in another way: how can we even be sure about the actual semantics given the fact that it's all based on reverse engineering, so there is always an uncertainty factor present.
I think it gets a lot more dramatically if an application interprets a value incorrect because it fails to take into account one of those flags correctly. For example interpreting the oceanic CNS range index as a percentage, which means it will never go past 5% (or some other small value). Ouch.
Jef
On Wed, 2012-11-21 at 17:57 +0100, Jef Driesen wrote:
I also considered using a bitmap (it was mentioned briefly somewhere in my mail I think), but I couldn't find a way to make it work with vendor extensions. A bitfield is limited to 8 to 64 different types, depending on the integer size we decide to use. For the normal samples, that is probably more than we'll ever need. But if every backend can support a few extension types, and each one needs another bit, then we'll quickly running out of bits. Do you have a solution for this?
Yes. Sort of. You have one bit in your bitmap that says "vendor extension" and then the vendor extension can use another int (16bit or something) with an enum (so 64k values for 16 bits) to indicate which extension this is. So we don't waste the space in every sample, but if there is an extension we have tons of room to grow and add extensions for new things to come.
If I understand your proposal correctly, you would have a data structure similar to the existing vendor data:
struct dc_sample_vendor_t { unsigned int type; unsigned int size; const void *data; };
Yes, something like that.
That would work for a one vendor extension per sample, but I don't see how you would make that work nicely in case there can be multiple extensions per sample. It also makes it harder to use a custom data structure for each extension, instead of just a blob of raw data. Yes, you can abuse the void pointer to point to any structure, but that isn't the most pretty interface.
Let's assume that a device supports a heartrate and an airtime value per sample. And assume for a moment those are not supported with the main api.
If we introduce some kind of a generic dc_sample_get_vendor() function, then we have to make sure it can return both the heartrate and airtime value. We can probably solve that in a similar way as you proposed for the tank pressures. However, my idea was that the backend would provide some dc_xxx_sample_get_heartrate() and dc_xxx_sample_get_airtime() functions with a corresponding data structure. But then a single bit to indicate the presence of a vendor extension isn't sufficient. If every such extensions needs it own bit in the main bitfield, then we'll quickly run out of bits. There are many backends and many possible extensions, so we have a virtually unlimited number of possible extensions. We would at least need an independent bitmap only for the extensions, or some other solution.
There are several problems with that idea. Obviously you run out of bits quickly. But more importantly you create a massive amount of function entry points in the library. So your API gets very complex and even worse, for every new feature that you support you add new functions and now you need matching preprocessor code that allows the application to appropriately do conditional compiles...
This brings us straight back to the enum vs. #define discussion we had somewhere else. Think about the users of your library, the people writing applications. It is very unpopular to create a hard link between a specific version of an app and a specific version of a library - especially as more software starts using libdivecomputer. So you should assume that applications want to be able to say "works with libdivecomputer 0.3.0 or later" - but then they need a rational way to test if a specific feature is available.
Why do I mention this here? Because with your function name based approach, you now need a matching feature flag for each function that you introduce. Not my idea of a great API.
Now imagine the other extreme.
dc_sample_get_vendor() returns a structure like this:
struct dc_sample_vendor_t { unsigned int type; unsigned int size; const void *data; };
The type is a unique number that you assign that precisely describes the data that is delivered. Even, you can reuse that number if multiple computers deliver the same data... let's assume heartrate... my guess most every computer will give that to you as an integer and the backend can pad this so it's 16 bit wide (and deal with other tricks that the dc might do to fit it into the fewest number of bits). So now you have ONE type
#define DC_VSAMPLE_HEARTRATE 0xABC
and data is known to be simply a pointer to the number of beats per minute.
This seems MUCH cleaner than an API that requires you to call
dc_uwatec_get_sample_heartrate()
if the user happens to have an Uwatec computer with heartrate support but
dc_suunto_get_sample_heartrate()
if it's a Suunto that now supports heartrate.
Instead what you do is this.
Foreach(sample) { dc_get_sample_basic(&basic_data); // basic_data has time, depth while (dc_sample_get_vendor(time, &sample_vendor)) { switch(sample_vendor.type) { case DC_VSAMPLE_HEARTRATE: ... break; ... default: // Jef added more features, need to update app } } }
Now as I said, that's an extreme example and pushes EVERYTHING into the sample_get_vendor logic. What might be more reasonable is to have a few common values (as you suggested earlier):
Foreach(sample) { dc_get_sample_basic(&basic_data); // basic_data has time, depth if (basic_data.extradatabits & DC_SAMPLE_TANK_PRESSURE) dc_get_tank_pressure(time, &tank_pressure); if (basic_data.extradatabits & DC_SAMPLE_DECO_NDL) dc_get_deco_ndl(time, &deco_ndl); while (dc_sample_get_vendor(time, &sample_vendor)) { switch(sample_vendor.type) { case DC_VSAMPLE_HEARTRATE: ... break; ... default: // Jef added more features, need to update app } } }
So we break out the common pieces of information and make them available with their own API functions... but that of course brings us back to the type specific flags:
struct dc_deco_ndl_t { int type; int depth; // in mm int duration; // in s int tts; // in s }
Where type then tells you if this dive computer provides no duration, just the duration of the deepest stop, just the time to surface, or both.
While I'm definitely not opposed to the bitmap idea, I'm not that that happy with the basic_sample_t struct. I would prefer to implement this with separate functions. One for the bitmap itself, and one function for each value. Since there are only two values that are always available, I wouldn't treat them any different from the others.
The reason that I thought of treating them differently is that these two values are available in every sample from every DC (otherwise, what's the point of that DC). So you can have one call that gives you what you can always get (time, depth) plus the information about everything else that is available.
But if you'd rather have this be three API calls (get_time, get_depth, get_available_data_bitmap or something), I'll be happy with that, too.
For the multivalued data, I'm not sure how well such a struct (containing a length and pointer to an array) will work for language bindings (python, .net, etc). That's something we'll have to check.
I don't know about .net. In python this would work just fine.
I'm pretty sure such a structure isn't supported out of the box and will need some manual marshaling on the .net side. So my guess is that it can be done, but it won't be very pretty.
We can maybe consider to provide language bindings for the most popular languages (.net and python) ourselves?
Not a bad idea - but what is a bad idea is to then go for the least common denominator and create an ugly, badly implemented C API just because it's harder to map a well designed API into .net constructs.
You can always have the .net binding create its own data structures that follow the preferred way to do things in .net - but don't force that onto the C API...
I prefer enums over pre-processor constants too. I know this will loose the conditional compiling, but this is a trick that will work for C/C++ only.
The problem with enums is that you now have to have strong versioning of your API - something which libdivecomputer has not a good history of doing (and I know all the reasons why - don't take this as criticism, just as an observation). So if you use enums then I have to recreate the logic of "which version of libdivecomputer supports which values for this function" in my app - and so does everyone else. We need to basically create a "version number to feature mapping" in the client.
If you do this with #defines then this is much easier to take care of. At least for C/C++ programers - I have seen other language bindings parse the defines for this as well, but don't know what the current users of libdivecomputer are
As far as I know there are at least .NET (divinglog) and python (kenozooid) based applications. I got quite some questions about .NET support, so I assume there will be others too.
Anyway, I think it's very important to not treat these non C/C++ applications as second class citizens, especially because at least one of them (divinglog) is likely the application with the largest userbase. So that means all features should be usable from the languages bindings too.
Certainly. But since these apps are binary-only they live in a somewhat different world where they can simply take a random snapshop of libdivecomputer, implement all what they want from the API and not worry about libdivecomputer changing (or the user picking a different version of libdivecomputer). For many other applications life isn't that simple.
Sure, I could fork libdivecomputer and simply pull whatever version I want to rely on into the Subsurface git and tell people to just build from that version - but that's not my vision of how software engineering across projects should work - I wouldn't want to do that for libxml2, glib, gtk, libc... so why should I have to do that for libdivecomputer.
So fundamentally:
If your API does not allow a an application a sane way to detect which features are supported and how to conditionally compile for those features, then your API is broken.
BTW, that's for example one of the main reasons the logging interface doesn't exposes a vararg interface, but calls vsnprintf internally to convert the messages to strings. Strings are supported by all languages, varargs are not.
I guess that once we have decided on the final list of dive and sample data that we will support, the chances that we'll have to introduce additional samples in the future are relative small. I mean the goal for the v1.0 was to maintain backwards compatibility from then on. If we then document the first version number in which every feature appeared, then it shouldn't be that much of a hassle to do:
#if DC_VERSION_CHECK(x,y,z)
#endif
and then language bindings can do the same at runtime.
Certainly. But think twice how your API will look like. How many functions will you have? What data types do they return? And how are they extended to provide more data as that becomes available from dive computers (think details about the deco model, saturation of different compartments, gyro information about vertical and lateral movement, pictures that were taken during that sample, voice recordings of the conversation during the dive... I know you think I'm insane, but look at what the Suunto Solution provided and what you can get from a state of the art DC like the OSTC or the Uemis SDA... to assume that the speed of innovation will suddenly come to a halt is being really foolish).
That's why I really wasn't joking with my extreme example above. Have the smallest possible API with the most flexible data types that you can think of. In my model it is easy and transparent to add
#define DC_VSAMPLE_GYRO 0xDEAD
and then have the data pointer point to a
struct dc_gyro_t { int flags; int delta_north; // in mm int delta_west; // in mm int accuracy; // in mm };
which allows you to track the movement of the diver. And all the application does is
#if defined(DC_VSAMPLE_GYRO) ... #endif
in the switch statement that loops over all the vendor samples. Clean, easy to maintain, easy to handle from an application point of view.
Now imagine the same thing with an API that only supports per vendor data without complex data structures. So now you are adding the following entry points to your API:
dc_atomics_gyro_get_delta_north() dc_atomics_gyro_get_delta_west() dc_atomics_gyro_get_delta_accuracy()
dc_uemis_gyro_get_delta_north() dc_uemis_gyro_get_delta_west() dc_uemis_gyro_get_delta_accuracy()
etc. And imagine the code flow in the application that has to duplicate all this crap.
But okay, this is something we can discuss further.
Certainly :-)
- Supported dive and sample data
==================================
- temperature
air temperature, I assume?
Depends. Different dive computers have different interpretations. It could be air/water temperature, the min/max/avg value, and so on. If that's even know to us. Not sure how to deal with this. We could just supply a temperature, and leaving the actual interpretation to the user, or we could have an extra flag with the exact interpretation (if known) as you mention below. But even then we're still left with devices that provide more than one temperature.
So offer a couple of values with flags that determin the semantics
- air_temperature
- min_water_temperature
- avg_water_temperature
And offer flags that clarify semantics
AIR_TEMP_IS_FIRST_SAMPLE AVG_IS_LAST_TEMP MIN_TEMP_IS_AT_MAX_DEPTH
(btw, I love the last one as I frequently dive in a water body that for more than half of the year has temperature go UP as you go deeper...)
To be honest, I don't really like this. This tastes again as adapting our model to that of the dive computers. If we supply three values, then it breaks again if there is some dive computer out there that supports some fourth value. And with those flags, the semantics can be altered in many different ways. I fear this will quickly get messy on the application side.
Quite the contrary. This allows the application to actually understand the data it gets in a dive computer INDEPENDENT way. If all the app cares for is "give me some temperature" then it ignores the flags. If the application would like to be able to plot the temperature where it was taken (even on the dive computer that may only record the temperature at the deepest point of the plot or the minimum temp or something) then it can do so in a sane way.
What you are proposing is to simply pretend that all dive computers are the same and they give us a "temperature". And if the application wants to know what this temperature actually means, tough shit, write DC specific code.
At this point I might as well abandon using libdivecomputer (or use libdivecomputer to do the downloading of raw data) and write the rest in my application - that way at least I know what I'm getting.
Do you see my point? I don't want to adapt libdivecomputer to ONE SPECIFIC divecomputer. I want libdivecomputer to be designed so that it can give me accurate and well defined information for EVERY divecomputer it supports. That's why I want the flags. That's why I want the flexibility.
You seem to want to have an API which provides the information as it would come from some idealized divecomputer with a fixed feature set and ill defined semantics and then let the application deal with the rest. I want the API to be able to provide us the maximum amount of data with the best possible semantic definition for each and every divecomputer out there - and not dumb it down to some least common denominator or some middle ground.
If you take a step back and think about it, the temperature is just a poor substitute for those devices without a full temperature profile. If you have a full temperature profile, then those min/max/avg values and begin/end values suddenly become a lot less interesting.
Yes - but lots of divecomputers don't provide this. And anyway - temperature is just an easy to understand example - the same logic applies for almost every data point that you provide.
So it might be more interesting to have the begin/end temperature, and the temperature at maximum depth delivered in the sample data. We already do this today for the suunto vyper backend, and I think it works quite well. Instead of temperature graph, you now draw a few temperature dots. If we do that, we can then reserve the temperature field exclusively for the true single temperature value case.
And again, you take a specific computer and try to make the model match that computer.
I try to provide an API that allows every computer to provide as much data as they have. And then have the app decide which information it wants to display and how it wants to display it.
So if your computer gives you a full temperature graph on every sample, GREAT. If the computer doesn't give sample temperature at all but only a beginning and min temperature - provide that instead. And if it is end temperature and temperature at maximum depth, then provide that AND DON'T PRETEND THEY ARE BEGINNING TEMPERATURE AND MINIMUM TEMPERATURE. Provide them as what they are and let the app decide what to do with them.
Heartrate is only supported by the Uwatec Galileo as far as I know. It certainly not a must-have feature, but I can imagine that if you have such a device, this is an interesting value. Although the actual usefulness is questionable if you ask me, but it's cool gadget :-)
So much of what high end DCs provide is more cool than entirely useful - but I agree that we should make it available regardless. Visualization is the application's job :-)
Yes, that's the idea. As long as it's reasonable device independent (e.g. there is no reason why other devices can't include a heartrate monitor, and beats per minute is a standard unit too) and doesn't complicate the api by (e.g. if you don't need it, you just don't read the value and that's it), then I have no real objections. The same for other types like CNS/OTU.
But things like for example tissue saturation in the decompression model of device X, or RGBM warning Y, that's a whole other story.
I have no idea which devices record this in the data. I know the Suunto's have this weird OLF (Oxygen Limit Factor) value, which is either the CNS or OTU, whichever is higher. And I think the Oceanics have a scale from 1 to 5 or something. So that might be a bit more difficult to support in a uniform way.
I'll keep singing the same song. Pick one semantic as default and provide clearly specified flags to modify that.
But then you introduce extra complexity on the application side. If you get an CNS value, then you have to check all those flags to figure out whether it's a percentage (default), or some scale index, or that weird suunto OLF value, or...
You can't just take the value anymore and treat is as a percentage. From my point of view that makes the abstraction provided by libdivecomputer less useful. I mean how different is that from doing a model based interpretation:
if (suunto) /* OLF */ else if (oceanic) /* Range index */ else /* Percentage */
I think we are better off always providing a percentage, and if that is not available then you don't get a CNS value at all.
No, Jef, that is NOT being better off. That is better off assuming all you care about are the people who have a computer that follows the model that you pick and everyone else can go screw themselves. We are better off if we provide the app as much information as is there.
And if application writers are as lazy as you seem to assume they are more than welcome to say
get the CNS if (flag & CNS_IS_PERCENTAGE) show cns value else tell user to go fuck themselves and buy my dive computer
Subsurface would rather be able to figure out a way to display the CNS regardless how the dive computer manufacturer decides to encode it - and figure out new ways to do this if new dive computers with a better (or even only different) model show up.
What I had in mind, is to have a single DC_SAMPLE_EVENT type. If this bit is set, there are events available. The actual event value would then be another bitmap containing all the events. With this approach we can support several events without needing additional sample types for each event type.
Good idea.
Then the next step will be to make a list of the events we want.
See what I wrote above. The moment you don't have precisely defined semantics of the value (even if the semantic is just our current best guess based on reverse engineering) the usefulness of that value to the application declines dramatically. An app is always at liberty to ignore that additional information ("hey, here's some temperature, no idea what it is"), but if it isn't there all that will happen is that apps start to make educated guesses based on the type of DC used. And that's just ugly.
I feel very strongly about that. I think and API that doesn't define what the values it gives you actually mean is really of rather limited use.
I agree, although not as strong as you do.
Let's say we define the meaning of the temperature as the minimum water temperature, which is probably the most common value. But then you happen to have some dive computer which only provides the average water temperature. Since that's all we have, that's what we give to the application and goes into your logbook. Is the small difference in semantics really that important, considering we don't have any other info? I mean to the user it won't make that much difference, because the semantics will remain consistent across all his dives with that device. In a paper logbook you would also just write down the value that you get from your dive computer.
Again, you assume that the application is lazy. If what libdivecomputer does is "heck, it's the only temperature I get, let's claim it's the minimum" then all the application can do is show the minimum temperature and if my computer is picking the temperature at maximum depth and I am diving in Hoodsport in the winter and the computer give 10C as temperature at maximum depth but lazy ass app then claims "minimum temperature was 10C then I will think that the application author is on crack while in reality it is libdivecomputer that is giving me information that is WRONG and therefore worthless.
Why is this so hard to understand.
If the other application authors really are that lazy, fine, they can ignore the flags that correctly specify what type of information is provided. Or if there is no sane way to present the information that is provided (heck, what DO those Suunto OLF values mean? beats me...)
But that is a decision you should LEAVE TO THE APP and not make in libdivecomputer.
From what you are writing I think that you want an API that simplifies
things to a degree that destroys information. And I am vehemently opposed to that.
So while I agree that a crystal clear semantics would be ideal, I also believe some small deviations are acceptable. You can also put it in another way: how can we even be sure about the actual semantics given the fact that it's all based on reverse engineering, so there is always an uncertainty factor present.
It is less and less based on reverse engineering as more and more dive computer vendors seem willing to give us API information. And I think this trend will only continue (of the ones that I have contacted, only one so far has told me "no, we will not provide that").
I think it gets a lot more dramatically if an application interprets a value incorrect because it fails to take into account one of those flags correctly. For example interpreting the oceanic CNS range index as a percentage, which means it will never go past 5% (or some other small value). Ouch.
So you are saying you are doing a better job by not giving any data at all in that case, instead of providing the data with a flag that defines what it means?
Strange.
/D
On 2012-11-21 19:22, Dirk Hohndel wrote:
On Wed, 2012-11-21 at 17:57 +0100, Jef Driesen wrote:
That would work for a one vendor extension per sample, but I don't see how you would make that work nicely in case there can be multiple extensions per sample. It also makes it harder to use a custom data structure for each extension, instead of just a blob of raw data. Yes, you can abuse the void pointer to point to any structure, but that isn't the most pretty interface.
Let's assume that a device supports a heartrate and an airtime value per sample. And assume for a moment those are not supported with the main api.
If we introduce some kind of a generic dc_sample_get_vendor() function, then we have to make sure it can return both the heartrate and airtime value. We can probably solve that in a similar way as you proposed for the tank pressures. However, my idea was that the backend would provide some dc_xxx_sample_get_heartrate() and dc_xxx_sample_get_airtime() functions with a corresponding data structure. But then a single bit to indicate the presence of a vendor extension isn't sufficient. If every such extensions needs it own bit in the main bitfield, then we'll quickly run out of bits. There are many backends and many possible extensions, so we have a virtually unlimited number of possible extensions. We would at least need an independent bitmap only for the extensions, or some other solution.
There are several problems with that idea. Obviously you run out of bits quickly. But more importantly you create a massive amount of function entry points in the library. So your API gets very complex and even worse, for every new feature that you support you add new functions and now you need matching preprocessor code that allows the application to appropriately do conditional compiles...
If it's just the number of functions, then it's easy to achieve the same result with just a single function of course. For example with a function that works like the dc_parser_get_field function in the current api:
dc_sample_get_vendor (sample, type, &value);
I'm fine with a one function approach, so that's not the main problem.
This brings us straight back to the enum vs. #define discussion we had somewhere else. Think about the users of your library, the people writing applications. It is very unpopular to create a hard link between a specific version of an app and a specific version of a library - especially as more software starts using libdivecomputer. So you should assume that applications want to be able to say "works with libdivecomputer 0.3.0 or later" - but then they need a rational way to test if a specific feature is available.
How does this change depending on whether you conditionally compile with:
#if DC_VERSION_CHECK(x,y,z)
or:
#ifdef DC_SAMPLE_FOO
In both cases, the compiled version will depend on a certain minimum version, but the code can be compiled for several versions.
In the first case the developer has to lookup once (e.g. when doing the implementation) from which version on a particular feature is supported. But from there on, it's crystal clear from reading the code which version you need to get that feature. In the second case, you can blindly add the #ifdef, but now you can't tell from reading the code which libdivecomputer version you need. So the "works with libdivecomputer x.y.x or later" part just became more difficult to see if you ask me.
Why do I mention this here? Because with your function name based approach, you now need a matching feature flag for each function that you introduce. Not my idea of a great API.
Now imagine the other extreme.
dc_sample_get_vendor() returns a structure like this:
struct dc_sample_vendor_t { unsigned int type; unsigned int size; const void *data; };
The type is a unique number that you assign that precisely describes the data that is delivered. Even, you can reuse that number if multiple computers deliver the same data... let's assume heartrate... my guess most every computer will give that to you as an integer and the backend can pad this so it's 16 bit wide (and deal with other tricks that the dc might do to fit it into the fewest number of bits). So now you have ONE type
#define DC_VSAMPLE_HEARTRATE 0xABC
and data is known to be simply a pointer to the number of beats per minute.
This seems MUCH cleaner than an API that requires you to call
dc_uwatec_get_sample_heartrate()
if the user happens to have an Uwatec computer with heartrate support but
dc_suunto_get_sample_heartrate()
if it's a Suunto that now supports heartrate.
Instead what you do is this.
Foreach(sample) { dc_get_sample_basic(&basic_data); // basic_data has time, depth while (dc_sample_get_vendor(time, &sample_vendor)) { switch(sample_vendor.type) { case DC_VSAMPLE_HEARTRATE: ... break; ... default: // Jef added more features, need to update app } } }
Now as I said, that's an extreme example and pushes EVERYTHING into the sample_get_vendor logic. What might be more reasonable is to have a few common values (as you suggested earlier):
Foreach(sample) { dc_get_sample_basic(&basic_data); // basic_data has time, depth if (basic_data.extradatabits & DC_SAMPLE_TANK_PRESSURE) dc_get_tank_pressure(time, &tank_pressure); if (basic_data.extradatabits & DC_SAMPLE_DECO_NDL) dc_get_deco_ndl(time, &deco_ndl); while (dc_sample_get_vendor(time, &sample_vendor)) { switch(sample_vendor.type) { case DC_VSAMPLE_HEARTRATE: ... break; ... default: // Jef added more features, need to update app } } }
So we break out the common pieces of information and make them available with their own API functions... but that of course brings us back to the type specific flags:
struct dc_deco_ndl_t { int type; int depth; // in mm int duration; // in s int tts; // in s }
Where type then tells you if this dive computer provides no duration, just the duration of the deepest stop, just the time to surface, or both.
The heartrate was maybe a very bad example. If Uwatec and Suunto both support a heartrate with the same beats per minute format, then that's already a first indication that it should probably be supported through the main interface, and not as a vendor extension. But okay, this was just a theoretical example.
Anyway, I see the vendor extensions as something to support things that are really *highly vendor specific*. Take for example the oceanics. They have a fixed 8 bytes of data per sample. Each model stores roughly the same set of information (o2bg, n2bg, pressure, depth, vari, dtr, atr, deco, temperature, alarms, etc), but each model packs this data slightly different into those 8 bytes. (This is a real pain in the ass, and the reason why the oceanic code needs by far the most bugfixes.) Here we could support the basic fields (time, depth, temperature, etc) through the standard interface, and leave everything else as raw data. But as a vendor extension, we could take care of the crazy oceanic data packing internally, and present the data in some nicer struct:
struct dc_oceanic_sample_t { unsigned int o2bg, n2bg; /* O2 and N2 bar graph */ unsigned int dtr, atr; /* Dive and air time remaining */ ... };
This type of extension is extremely unlikely to be shared between different backends. So I think that one of the reason we have a different opinion is that we also have a different view on what we consider to be a vendor extension.
I'm also convinced that there will probably very few applications that will actually make use of the vendor extensions. If you look at the general purpose logbook applications that exist today (divinglog, macdive, etc), then you'll notice they only support the device independent features that would be supported by the standard interface. I don't see how you can even write an application that supports all possible features of each device, considering the vast amount of differences. The only applications that support these features are the vendor applications, which are by definition tied to some specific devices. But such applications are not really our (main) target audience.
BTW, these vendor extensions have the lowest possible priority for me. There will be plenty of work to get the standard interface implemented and reverse engineering bugs to fix. So the first version(s) will likely support no vendor extensions at all. However while this is pretty low on my priority list, I think it's good to keep future extensibility in mind.
We can maybe consider to provide language bindings for the most popular languages (.net and python) ourselves?
Not a bad idea - but what is a bad idea is to then go for the least common denominator and create an ugly, badly implemented C API just because it's harder to map a well designed API into .net constructs.
You can always have the .net binding create its own data structures that follow the preferred way to do things in .net - but don't force that onto the C API...
The reason why I mentioned maintaining the language bindings as part of the project is that, if there is some ugly and obscure construct required, then we can take care of that ourselves, and hide it from the user. Then we don't have to worry anymore about users having trouble with the .NET integration.
Unfortunately I speak from experience here. I spend considerable amounts of time on the .NET interface for the current api (for divinglog). There were several non-trivial issues (calling conventions, interactions with the garbage collector, manual marshaling, etc) that required quite some research on my side to figure out why things were not working as expected. I'm pretty sure the average .NET developer wouldn't be able to get this right either.
So I'm not saying we should model after the least common denominator, just that we have to take care that whatever api we pick, it should be possible to use from other languages. That's why I referred to the varargs functions for example.
Anyway, I think it's very important to not treat these non C/C++ applications as second class citizens, especially because at least one of them (divinglog) is likely the application with the largest userbase. So that means all features should be usable from the languages bindings too.
Certainly. But since these apps are binary-only they live in a somewhat different world where they can simply take a random snapshop of libdivecomputer, implement all what they want from the API and not worry about libdivecomputer changing (or the user picking a different version of libdivecomputer). For many other applications life isn't that simple.
The rules for binary compatibility are the same for C and .NET applications. If your application depends on symbol X, and you switch back to an older library without this symbol, than both the C and .NET app will fail. They may fail in some different way, but that doesn't matter for this discussion.
Sure, I could fork libdivecomputer and simply pull whatever version I want to rely on into the Subsurface git and tell people to just build from that version - but that's not my vision of how software engineering across projects should work - I wouldn't want to do that for libxml2, glib, gtk, libc... so why should I have to do that for libdivecomputer.
So fundamentally:
If your API does not allow a an application a sane way to detect which features are supported and how to conditionally compile for those features, then your API is broken.
Sure, that's definitely not what we want :-)
I just try to understand why you dislike the DC_VERSION_CHECK and prefer the DC_SAMPLE_FOO conditional compilation. In the end, regardless of how you do the conditional compilation, your application will need to know the data format associated with each vendor extension. That is also true whether you use an individual function for each extension, or one function with some type indicator. I mean everything that you wrote about the gyro stuff below can be done with both your and mine proposal.
I guess that once we have decided on the final list of dive and sample data that we will support, the chances that we'll have to introduce additional samples in the future are relative small. I mean the goal for the v1.0 was to maintain backwards compatibility from then on. If we then document the first version number in which every feature appeared, then it shouldn't be that much of a hassle to do:
#if DC_VERSION_CHECK(x,y,z)
#endif
and then language bindings can do the same at runtime.
Certainly. But think twice how your API will look like. How many functions will you have? What data types do they return? And how are they extended to provide more data as that becomes available from dive computers (think details about the deco model, saturation of different compartments, gyro information about vertical and lateral movement, pictures that were taken during that sample, voice recordings of the conversation during the dive... I know you think I'm insane, but look at what the Suunto Solution provided and what you can get from a state of the art DC like the OSTC or the Uemis SDA... to assume that the speed of innovation will suddenly come to a halt is being really foolish).
That's why I really wasn't joking with my extreme example above. Have the smallest possible API with the most flexible data types that you can think of. In my model it is easy and transparent to add
#define DC_VSAMPLE_GYRO 0xDEAD
and then have the data pointer point to a
struct dc_gyro_t { int flags; int delta_north; // in mm int delta_west; // in mm int accuracy; // in mm };
which allows you to track the movement of the diver. And all the application does is
#if defined(DC_VSAMPLE_GYRO) ... #endif
in the switch statement that loops over all the vendor samples. Clean, easy to maintain, easy to handle from an application point of view.
Now imagine the same thing with an API that only supports per vendor data without complex data structures. So now you are adding the following entry points to your API:
dc_atomics_gyro_get_delta_north() dc_atomics_gyro_get_delta_west() dc_atomics_gyro_get_delta_accuracy()
dc_uemis_gyro_get_delta_north() dc_uemis_gyro_get_delta_west() dc_uemis_gyro_get_delta_accuracy()
etc. And imagine the code flow in the application that has to duplicate all this crap.
Oh, but that was definitely not my intention. There is absolutely no reason to limit a vendor extension to just a single value. It's perfectly acceptable to use a struct like your dc_gyro_t.
To be honest, I don't really like this. This tastes again as adapting our model to that of the dive computers. If we supply three values, then it breaks again if there is some dive computer out there that supports some fourth value. And with those flags, the semantics can be altered in many different ways. I fear this will quickly get messy on the application side.
Quite the contrary. This allows the application to actually understand the data it gets in a dive computer INDEPENDENT way. If all the app cares for is "give me some temperature" then it ignores the flags. If the application would like to be able to plot the temperature where it was taken (even on the dive computer that may only record the temperature at the deepest point of the plot or the minimum temp or something) then it can do so in a sane way.
What you are proposing is to simply pretend that all dive computers are the same and they give us a "temperature". And if the application wants to know what this temperature actually means, tough shit, write DC specific code.
[...]
I'll keep singing the same song. Pick one semantic as default and provide clearly specified flags to modify that.
At this point I might as well abandon using libdivecomputer (or use libdivecomputer to do the downloading of raw data) and write the rest in my application - that way at least I know what I'm getting.
Do you see my point? I don't want to adapt libdivecomputer to ONE SPECIFIC divecomputer. I want libdivecomputer to be designed so that it can give me accurate and well defined information for EVERY divecomputer it supports. That's why I want the flags. That's why I want the flexibility.
You seem to want to have an API which provides the information as it would come from some idealized divecomputer with a fixed feature set and ill defined semantics and then let the application deal with the rest. I want the API to be able to provide us the maximum amount of data with the best possible semantic definition for each and every divecomputer out there - and not dumb it down to some least common denominator or some middle ground.
[...]
So it might be more interesting to have the begin/end temperature, and the temperature at maximum depth delivered in the sample data. We already do this today for the suunto vyper backend, and I think it works quite well. Instead of temperature graph, you now draw a few temperature dots. If we do that, we can then reserve the temperature field exclusively for the true single temperature value case.
And again, you take a specific computer and try to make the model match that computer.
I try to provide an API that allows every computer to provide as much data as they have. And then have the app decide which information it wants to display and how it wants to display it.
So if your computer gives you a full temperature graph on every sample, GREAT. If the computer doesn't give sample temperature at all but only a beginning and min temperature - provide that instead. And if it is end temperature and temperature at maximum depth, then provide that AND DON'T PRETEND THEY ARE BEGINNING TEMPERATURE AND MINIMUM TEMPERATURE. Provide them as what they are and let the app decide what to do with them.
I'm not trying to make the model match one specific dive computer. Placing the different temperatures values on the profile would be completely independent of any dive computer model for example. It can be done for every model. Imagine there is some crazy dive computer that gives you temperature at the decostops. That would still work!
Let me go back to the temperature problem, and formulate it in a different way. At the moment we already have five possible interpretations (begin/end, min/max/avg, and at maximum depth). A device can support one or more of them, in all possible combinations. Let's say I agree with you and we want to preserve the exact semantics. How should the api and data structure look like? Just a single temperature with a flag won't work for devices that have two values. If we extend the data structure with some more temperature fields (and flags), then it will break again if there is another device with one more value. How would you solve this problem?
So I'm not saying I'm against trying to preserve the exact temperature semantics. What I don't like are the flags you proposed like MIN_TEMP_IS_AT_MAXDEPTH, because it alters the interpretation of the min_temperature field. I'm not convinced that's the right solution. So maybe we should think again and try to come up with a third solution. I'll start by providing a couple of ideas that crossed my mind:
I already mentioned placing the different temperature values into the samples. We could also support each temperature variant as a different type: DC_FIELD_TEMPERATURE_{MIN,MAX,...}. When a particular variant isn't available we don't set the corresponding bit (or return UNSUPPORTED). Or we define a structure with all the variants we wish to support, along with a bitmap to tell which fields are present (or when using Kelvins, we could consider a value of 0 as not present):
struct dc_field_temperature_t { bitmap; minimum; maximum; ... }
But then you introduce extra complexity on the application side. If you get an CNS value, then you have to check all those flags to figure out whether it's a percentage (default), or some scale index, or that weird suunto OLF value, or...
You can't just take the value anymore and treat is as a percentage. From my point of view that makes the abstraction provided by libdivecomputer less useful. I mean how different is that from doing a model based interpretation:
if (suunto) /* OLF */ else if (oceanic) /* Range index */ else /* Percentage */
I think we are better off always providing a percentage, and if that is not available then you don't get a CNS value at all.
No, Jef, that is NOT being better off. That is better off assuming all you care about are the people who have a computer that follows the model that you pick and everyone else can go screw themselves. We are better off if we provide the app as much information as is there.
And if application writers are as lazy as you seem to assume they are more than welcome to say
get the CNS if (flag & CNS_IS_PERCENTAGE) show cns value else tell user to go fuck themselves and buy my dive computer
Subsurface would rather be able to figure out a way to display the CNS regardless how the dive computer manufacturer decides to encode it - and figure out new ways to do this if new dive computers with a better (or even only different) model show up.
CNS values are defined as percentages, just like depths are defined as lengths. The fact that some vendors choose to store it differently (to save some bits?) doesn't change that. Just like we convert all depths to a standard unit, we would do the same for CNS. If that's not possible because it's some vendor specific concept like OLF, then we can't do the conversion and we act as if the device doesn't support CNS. If you allow the CNS value to contain different types of values (e.g. CNS or OLF depending on the flag), then why not also leave the depth in imperial or metric units, and provide a flag with the units? Because according to me that's exactly the same as what you propose here for CNS. (I know CNS is just an example here, but the same logic applies to other values too of course.)
From my point of view, one of the main advantages of using libdivecomputer is that it provides an abstraction. Call that abstraction an idealized dive computer if you like. But as is the case with all abstractions, you can't support every single feature that's out there. So yes, in some cases you will end up with a more limited subset than what the device supports. I believe that's a trade-off you have to make.
I agree, although not as strong as you do.
Let's say we define the meaning of the temperature as the minimum water temperature, which is probably the most common value. But then you happen to have some dive computer which only provides the average water temperature. Since that's all we have, that's what we give to the application and goes into your logbook. Is the small difference in semantics really that important, considering we don't have any other info? I mean to the user it won't make that much difference, because the semantics will remain consistent across all his dives with that device. In a paper logbook you would also just write down the value that you get from your dive computer.
Again, you assume that the application is lazy. If what libdivecomputer does is "heck, it's the only temperature I get, let's claim it's the minimum" then all the application can do is show the minimum temperature and if my computer is picking the temperature at maximum depth and I am diving in Hoodsport in the winter and the computer give 10C as temperature at maximum depth but lazy ass app then claims "minimum temperature was 10C then I will think that the application author is on crack while in reality it is libdivecomputer that is giving me information that is WRONG and therefore worthless.
Why is this so hard to understand.
If the other application authors really are that lazy, fine, they can ignore the flags that correctly specify what type of information is provided. Or if there is no sane way to present the information that is provided (heck, what DO those Suunto OLF values mean? beats me...)
Suunto OLF (Oxygen Limit Fraction) is the maximum value of CNS and OTU, both expressed as percentages. So it's either CNS or OTU, but you can't tell which one.
But that is a decision you should LEAVE TO THE APP and not make in libdivecomputer.
From what you are writing I think that you want an API that simplifies things to a degree that destroys information. And I am vehemently opposed to that.
While I really understand your concern regarding the temperature problem (and we can probably find some way around it), I really don't get the CNS one. For the temperature, ignoring the flag will still yield a temperature. The semantics may be a bit wrong, but it's still a temperature value. However for the CNS value you would get something completely different! So this is not about applications being lazy, but about providing an abstract model with standardized units. By not supporting certain vendor specific concepts (e.g. OLF) we are actually preserving that information, albeit in the form of raw data or a vendor extension.
So while I agree that a crystal clear semantics would be ideal, I also believe some small deviations are acceptable. You can also put it in another way: how can we even be sure about the actual semantics given the fact that it's all based on reverse engineering, so there is always an uncertainty factor present.
It is less and less based on reverse engineering as more and more dive computer vendors seem willing to give us API information. And I think this trend will only continue (of the ones that I have contacted, only one so far has told me "no, we will not provide that").
I don't know which vendors you contacted, but as far as I know the "big players" like Suunto, Uwatec, Mares, etc have no interest in supporting third-party applications. Only the relative new players on the market are supporting us, like Reefnet, HW, Atomics, Shearwater, etc. No matter how great those few vendors are, the majority of the divers are still using a device from one of the big players.
I think it gets a lot more dramatically if an application interprets a value incorrect because it fails to take into account one of those flags correctly. For example interpreting the oceanic CNS range index as a percentage, which means it will never go past 5% (or some other small value). Ouch.
So you are saying you are doing a better job by not giving any data at all in that case, instead of providing the data with a flag that defines what it means?
Yes. See above for my reason behind it. If it doesn't fit into the abstraction provided by libdivecomputer, then it's likely a vendor specific feature, and should be treated likewise.
Jef
On Wed, Nov 21, 2012 at 4:57 PM, Jef Driesen jefdriesen@telenet.be wrote: [...]
Let's say we define the meaning of the temperature as the minimum water temperature, which is probably the most common value. But then you happen to have some dive computer which only provides the average water temperature. Since that's all we have, that's what we give to the application and goes into your logbook. Is the small difference in semantics really that important, considering we don't have any other info?
Another example is Reefnet Sensus devices and their time implementation (IMHO, quite reasonable solution as it is... hm... headless device? no buttons, no display). It reports time according to its internal clock, so to calculate the real time you need the start of the internal clock, time of dive according to the internal clock and real time of data download.
I believe libdc will handle above with no problem while parsing data during download from the dive logger.
But when you want to reparse the data (which, I hope, will always be possible with libdc), then an application needs to be aware of Sensus' time semantics.
Best regards,
w
On 24-11-12 13:26, Artur Wroblewski wrote:
On Wed, Nov 21, 2012 at 4:57 PM, Jef Driesen jefdriesen@telenet.be wrote: [...]
Let's say we define the meaning of the temperature as the minimum water temperature, which is probably the most common value. But then you happen to have some dive computer which only provides the average water temperature. Since that's all we have, that's what we give to the application and goes into your logbook. Is the small difference in semantics really that important, considering we don't have any other info?
Another example is Reefnet Sensus devices and their time implementation (IMHO, quite reasonable solution as it is... hm... headless device? no buttons, no display). It reports time according to its internal clock, so to calculate the real time you need the start of the internal clock, time of dive according to the internal clock and real time of data download.
I believe libdc will handle above with no problem while parsing data during download from the dive logger.
But when you want to reparse the data (which, I hope, will always be possible with libdc), then an application needs to be aware of Sensus' time semantics.
Re-parsing (or offline parsing as I usually call it) will remain supported. Of course the application will have to store the device and host clock timestamps along with the data. But that's why we have the DC_EVENT_CLOCK event, and that won't change. In the current design, you pass the two clock timestamps to the reefnet_sensusultra_parser_create() function. In the new design that will be done in a similar way, except that for the immediate parsing this will all be taken care of internally.
Jef
On Fri, Nov 16, 2012 at 6:40 PM, Jef Driesen jefdriesen@telenet.be wrote:
On 2012-11-15 19:43, Dirk Hohndel wrote:
Please make the memory application owned and have the application be responsible for freeing memory. Unless you are planning to run libdivecomputer on a wristwatch the overhead for that is negligible with today's OSs (that all have a reasonably smart malloc implementation that will reuse the recently freed memory block on its own).
I certainly agree that this is probably the most intuitive and simple design.
But while I'm not too concerned about the memory allocations by themselves, I still find it a bit silly having to allocate resources where it's not strictly necessary. For the sample data I simply can't think of any use-case where you would want to hold on to the sample objects outside the callback function. Hence my search for possible alternatives.
I agree with Dirk. Trust the users of your lib and keep it simple.
Create an instance of data for each sample or have one data container and require application to copy the data before next iteration step.
There are plenty of data containers and frameworks (i.e. language bindings), which can be used. The simpler libdc is, the easier integration is going to be. No need to reinvent your own.
[...]
CNS/OTU can be calculated from the samples
IMHO, you can't if you don't have previous dive information as you don't know the initial CNS.
[...]
, but we can consider to support it too. I have no idea which devices record this in the data. I know the Suunto's have this weird OLF (Oxygen Limit Factor) value, which is either the CNS or OTU, whichever is higher. And I think the Oceanics have a scale from 1 to 5 or something. So that might be a bit more difficult to support in a uniform way.
If some devices do something their own way, i.e. OLF or 1-5 scale, then ignore it until they document how to recalculate into CNS. For those who really want to deal with such "inventions", there sill is raw data. The rest of us will benefit from simpler (but still flexible) interface.
I assume with NDL, you mean the remaining no-deco time? I don't think any device does support that.
OSTC supports above.
[...]
Best regards,
w
On 2012-11-16 20:58, Artur Wroblewski wrote:
On Fri, Nov 16, 2012 at 6:40 PM, Jef Driesen jefdriesen@telenet.be wrote:
On 2012-11-15 19:43, Dirk Hohndel wrote:
Please make the memory application owned and have the application be responsible for freeing memory. Unless you are planning to run libdivecomputer on a wristwatch the overhead for that is negligible with today's OSs (that all have a reasonably smart malloc implementation that will reuse the recently freed memory block on its own).
I certainly agree that this is probably the most intuitive and simple design.
But while I'm not too concerned about the memory allocations by themselves, I still find it a bit silly having to allocate resources where it's not strictly necessary. For the sample data I simply can't think of any use-case where you would want to hold on to the sample objects outside the callback function. Hence my search for possible alternatives.
I agree with Dirk. Trust the users of your lib and keep it simple.
Create an instance of data for each sample or have one data container and require application to copy the data before next iteration step.
There are plenty of data containers and frameworks (i.e. language bindings), which can be used. The simpler libdc is, the easier integration is going to be. No need to reinvent your own.
See my response to Dirk. Using "applications owns" everywhere to make the api consistent is fine for me.
CNS/OTU can be calculated from the samples
IMHO, you can't if you don't have previous dive information as you don't know the initial CNS.
True, but nobody says that an application can't take into account a series of dives :-)
But that doesn't mean we won't support it. We are just trying to see what is out there, what users want to have, and then make a good decision based on that.
, but we can consider to support it too. I have no idea which devices record this in the data. I know the Suunto's have this weird OLF (Oxygen Limit Factor) value, which is either the CNS or OTU, whichever is higher. And I think the Oceanics have a scale from 1 to 5 or something. So that might be a bit more difficult to support in a uniform way.
If some devices do something their own way, i.e. OLF or 1-5 scale, then ignore it until they document how to recalculate into CNS. For those who really want to deal with such "inventions", there sill is raw data. The rest of us will benefit from simpler (but still flexible) interface.
That's also how I think about it.
I assume with NDL, you mean the remaining no-deco time? I don't think any device does support that.
OSTC supports above.
And we'll support it to. It's already implemented and will be committed soon :-)
Jef
On Wed, 2012-11-21 at 17:58 +0100, Jef Driesen wrote:
CNS/OTU can be calculated from the samples
IMHO, you can't if you don't have previous dive information as you don't know the initial CNS.
True, but nobody says that an application can't take into account a series of dives :-)
That's what the first set of code that I wrote for Subsurface was doing. I ended up throwing that away because of some algorithmic issues, but I will add a better version of it eventually. It will definitely walk the whole dive list to determine how much residual CNS needs to be taken into account (and interestingly enough, there appear to be conflicting formulas for how to do that).
But that doesn't mean we won't support it. We are just trying to see what is out there, what users want to have, and then make a good decision based on that.
:-)
, but we can consider to support it too. I have no idea which devices record this in the data. I know the Suunto's have this weird OLF (Oxygen Limit Factor) value, which is either the CNS or OTU, whichever is higher. And I think the Oceanics have a scale from 1 to 5 or something. So that might be a bit more difficult to support in a uniform way.
If some devices do something their own way, i.e. OLF or 1-5 scale, then ignore it until they document how to recalculate into CNS. For those who really want to deal with such "inventions", there sill is raw data. The rest of us will benefit from simpler (but still flexible) interface.
That's also how I think about it.
Then I guess we agree to disagree.
/D
Hi,
There have been some interesting discussions since I posted my original email. Some of those discussions happened off list (mainly because they were already ongoing before I send out my email), so I think it's a good idea to summarize what has been discussed so far, the decisions that have been made already (or not), new issues that popped up etc.
BTW, there will be a slight change of plans regarding the next v0.3 release. The changes that are being discussed in this thread, will be postponed until after the v0.3 release. The v0.3 release will be an intermediate release, with some smaller api cleanups (which should not affect most apps), some new features and bugfixes. This will also give us some more time to work out the new api design.
- Object ownership, lifetime and memory management.
=====================================================
The application will own all objects. Simplicity and consistency is considered more important than efficiency.
- Querying available sample values?
=====================================
The current proposal is that each sample will provide a bitmap with the available values. We introduce some new function to retrieve the bitmap (tentative name, suggestions are welcome):
dc_sample_get_available (sample, &bitmap);
and then applications can test the returned bitmap for the individual types, and if present, call the corresponding getter function:
if (bitmap & DC_SAMPLE_FOO) { dc_sample_get_foo (sample, &foo) }
- Supported dive and sample data
==================================
3a. Dive data ==============
The original list hasn't changed much:
* date/time * divetime * maxdepth * gasmixes * tank pressures (begin/end) * temperature
For multivalued fields, like the gas mixes and tank pressure, the current proposal is to use a struct with a count and a pointer to the actual data array:
struct { unsigned int count; foo_t *foo; };
Note that such a struct would behave a bit different from the others. Normally the caller passes a pointer to a struct (or plain C datatype) to the getter function, so the storage is provided by the caller and the getter function just fills in the value(s). But in this case we return a pointer to a foo_t struct that is not owned by the sample object. This may not be such a big deal considering the app will own the sample object, but I wanted to make sure this doesn't go unnoticed :-)
Support for salinity and atmospheric pressure has already been requested. The primary use-case is probably for doing decompression calculations. This has already been implemented in the master branch using a structure like this:
struct dc_salinity_t { enum {DC_WATER_FRESH, DC_WATER_SALT} type; density; /* Optional, zero if unavailable. */ };
For devices which don't support atmospheric pressure directly, and provide an altitude ranges instead (e.g. "0-300m" "300-1000m" "1000m-1500m), an estimation for the atmospheric pressure can be calculated with the barometric formula [1]. For ranges, the upper value of the range is probably the best choice, because it yields the lowest pressure. However, because diving at sealevel is probably the most common case, an exception could be made to consider the first range always at sealevel (e.g 1013.25 mbar).
There has been some discussion regarding the exact semantics of the temperature field. On one hand it's important to clearly define the exact semantics, but on the other hand the device may provide one or more temperature temperature values with some different semantics. For example we can define the temperature field to be the minimum temperature, but the device can provide the minimum, maximum or average value, or the temperature before and after the dive, or the temperature at maximum depth, or one or more of the above. How do we support this?
One option is to keep the exact semantics vague, and just supply whatever temperature value the device supplies. This is not ideal, because it leaves applications that care about the exact interpretation in the cold, and it also doesn't work well in case the device provides more than one value.
An alternative option is to move the temperature data into the samples. All the values mentioned above can easily be attached to the correct sample. You still won't get a full temperature profile (only a few data points), but at least their is no confusion about the semantics. For example a temperature value attached to sample at maximum depth, is clearly the temperature at max depth, and so on.
Another suggestion was to use flags to make the exact semantics clear. For example if the default semantics is the minimum temperature, provide a flag to overrule this interpretation with another one (e.g at max depth, etc).
struct temperature { flags; value; ... /* Maybe multiple values here? */ };
And last but not least we can provide a struct with all possible values:
struct temperature { minimum; maximum; average; ... };
and then provide some means to let the application know which values are available (using a bitmap again, or some magic values to indicate "not present").
No decision for the temperature issue has been made yet.
3b. Sample data ================
The basic list doesn't need much discussion:
* time * depth * temperature * tank pressure
There are also a couple less important ones, but as long as they are reasonable device independent, I see no reason not to support them too:
* heartrate * airtime * CNS/OTU * ...
Note that this is definitely not a complete list. We'll probably need a few extra types for tec divers (setpoints), and there may be others that I forgot.
There has been some discussion regarding the CNS values (but the underlying discussion applies to any other type too). The natural unit for the CNS values is obviously a percentage. But there are devices that don't supply a percentage. For example the Suunto has OLF (Oxygen Limit Fraction), which is hybrid between CNS and OTU (e.g. whichever is highest), and Oceanic has their 02 bar graphs. Although both give some indication of the CNS loading, they are not easily translated to a percentage.
What to do in this case? Just not support it, or again provide some flag with the exact interpretation?
3c. Events ===========
The current proposal is to reserve the events exclusively for things that have no duration (and thus no need for the begin/end flags) and don't need to carry any data values. In that case, we can use a simple bitfield to get all the events for the sample at once.
dc_sample_get_available (sample, &bitmap);
if (bitmap & DC_SAMPLE_EVENTS) { dc_sample_get_events (sample, &events);
if (events & DC_SAMPLE_EVENT_BOOKMARK) { /* Bookmark event. */ } if (events & DC_SAMPLE_EVENT_VIOLATION) { /* Decostop violation event. */ } }
While this should work well for true events like the decostop violations, bookmarks, etc we'll need another solution for things like gas changes, decompression info, etc. The idea is deliver this info as sample values, rather than events.
For example the gasswitch event would be changed into an "active gasmix" sample value. The main difference is that you will now get the current gasmix on every sample, and not just the gaschange. For the gasmix sample value, the corresponding data type would be the index of the gasmix in the header (see above).
Decompression data can be supported in a similar way. For each sample you can get the deco status, with a data structure like this:
dc_deco_t { enum {NDL, DECOSTOP} type; time; /* Optional */ depth; /* Optional */ };
If you're not in deco, the type will be set to NDL, and the time contains the remaining no decompression limit (if available). If you're in deco, the type will be set to DECOSTOP, and the time and depth contain the time and depth of the next decompression stop (if available). With such a structure we can support both devices that provide just a simple ndl/deco flag, or a some more detailed info.
We already committed some changes to the master branch to move into that direction. The DC_EVENT_DECOSTOP now contains the depth (in meters) and time (in seconds) as two packed 16bit values, and a new DC_EVENT_NDL was introduced. It's not perfect yet, because for the current api we were limited to the existing data structures (hence the packing), but it's a step in the direction of the model explained here.
Of course we can consider to extend the type with deepstops and safety stops, or add an extra field for TTS (Time To Surface), and things like that. But the main point is that we move away from the event model as much as possible.
3d. Units ==========
The two choices are SI (second, meter, Kelvin, Pascal) or metric (second, meter, Celsius, bar). The only difference between the pressure units "Pascal" and "bar" is a scaling factor, so then it comes down to Kelvin or Celsius for the temperatures.
I'm also seriously considering to adopt the subsurface style unit system, where the units are carefully chosen to make them fit into an integer: second, millimeter, milli Kelvin and millibar. This should provide plenty of precision for our purpose, while avoiding all kinds of floating point issues, such as rounding (e.g. 15.0 may actually be 14.99999) and equality (comparing floating point's for equality is tricky).
No decision has been made yet. I would like to know what application developers (and not just the subsurface ones) think of this.
3e. Vendor extensions ======================
We want some way to support vendor specific extensions. But it's not completely clear yet how the interface should look like. It's not the highest priority, because the above is far more important, but it's good to keep extensibility in mind!
Anyway, this has been discussed on the mailinglist, so I won't repeat all the proposals again here.
[1] http://en.wikipedia.org/wiki/Barometric_formula
Jef
Hi Jef,
Thanks for the detailed summary email. That's really helpful.
Overall I think we made great progress here. To keep this email manageable I'll cut everything out where I would otherwise just say "excellent - I'm glad that's the direction we are taking" :-)
Jef Driesen jefdriesen@telenet.be writes:
There has been some discussion regarding the exact semantics of the temperature field. On one hand it's important to clearly define the exact semantics, but on the other hand the device may provide one or more temperature temperature values with some different semantics. For example we can define the temperature field to be the minimum temperature, but the device can provide the minimum, maximum or average value, or the temperature before and after the dive, or the temperature at maximum depth, or one or more of the above. How do we support this?
One option is to keep the exact semantics vague, and just supply whatever temperature value the device supplies. This is not ideal, because it leaves applications that care about the exact interpretation in the cold, and it also doesn't work well in case the device provides more than one value.
An alternative option is to move the temperature data into the samples. All the values mentioned above can easily be attached to the correct sample. You still won't get a full temperature profile (only a few data points), but at least their is no confusion about the semantics. For example a temperature value attached to sample at maximum depth, is clearly the temperature at max depth, and so on.
Another suggestion was to use flags to make the exact semantics clear. For example if the default semantics is the minimum temperature, provide a flag to overrule this interpretation with another one (e.g at max depth, etc).
struct temperature { flags; value; ... /* Maybe multiple values here? */ };
And last but not least we can provide a struct with all possible values:
struct temperature { minimum; maximum; average; ... };
and then provide some means to let the application know which values are available (using a bitmap again, or some magic values to indicate "not present").
No decision for the temperature issue has been made yet.
Since I was somewhat outspoken on this before, I'll comment again.
The one solution that I don't like is "let's keep it vague and provide some value where it is rather unclear what it might mean". That is in general bad API design - especially as in many cases we do know exactly what the temperature is supposed to mean.
I still like the idea of the flags where one flag could indeed be DC_WE_DONT_KNOW_EXACTLY_WHAT_TYPE_OF_TEMP_THIS_IS (ok, maybe a shorter name would be better). But as an application author, if the backend KNOWS that this is air temperature before the dive, then I want it to tell me. And if the backend knows that this is the the minimum water temperature (vs. the temperature at the deepest spot), then I want to know that, too. Flags are not the only possible implementation - but they seem a straight forward one. And usually "simple is good".
3b. Sample data
The basic list doesn't need much discussion:
- time
- depth
- temperature
- tank pressure
There are also a couple less important ones, but as long as they are reasonable device independent, I see no reason not to support them too:
- heartrate
- airtime
- CNS/OTU
- ...
Note that this is definitely not a complete list. We'll probably need a few extra types for tec divers (setpoints), and there may be others that I forgot.
There has been some discussion regarding the CNS values (but the underlying discussion applies to any other type too). The natural unit for the CNS values is obviously a percentage. But there are devices that don't supply a percentage. For example the Suunto has OLF (Oxygen Limit Fraction), which is hybrid between CNS and OTU (e.g. whichever is highest), and Oceanic has their 02 bar graphs. Although both give some indication of the CNS loading, they are not easily translated to a percentage.
What to do in this case? Just not support it, or again provide some flag with the exact interpretation?
You know what I'll say... I'd love to get the data that is there plus a flag that helps me make sense of it. The Suunto OLF could be roughly mapped to percentages (in the application, not in libdivecomputer), I guess (especially if the app does it's own CNS calculation and then can show the data from the dive computer as additional data to the user).
3c. Events
The current proposal is to reserve the events exclusively for things that have no duration (and thus no need for the begin/end flags) and don't need to carry any data values. In that case, we can use a simple bitfield to get all the events for the sample at once.
dc_sample_get_available (sample, &bitmap);
if (bitmap & DC_SAMPLE_EVENTS) { dc_sample_get_events (sample, &events);
if (events & DC_SAMPLE_EVENT_BOOKMARK) { /* Bookmark event. */ } if (events & DC_SAMPLE_EVENT_VIOLATION) { /* Decostop violation event. */ }
}
While this should work well for true events like the decostop violations, bookmarks, etc we'll need another solution for things like gas changes, decompression info, etc. The idea is deliver this info as sample values, rather than events.
Yay! Yay! Yay! (oops, I said I would remove things where I just wholeheartedly agree, but this one I am really, really happy about :-) )
For example the gasswitch event would be changed into an "active gasmix" sample value. The main difference is that you will now get the current gasmix on every sample, and not just the gaschange. For the gasmix sample value, the corresponding data type would be the index of the gasmix in the header (see above).
Decompression data can be supported in a similar way. For each sample you can get the deco status, with a data structure like this:
dc_deco_t { enum {NDL, DECOSTOP} type; time; /* Optional */ depth; /* Optional */ };
If you're not in deco, the type will be set to NDL, and the time contains the remaining no decompression limit (if available). If you're in deco, the type will be set to DECOSTOP, and the time and depth contain the time and depth of the next decompression stop (if available). With such a structure we can support both devices that provide just a simple ndl/deco flag, or a some more detailed info.
BTW: I ran into a fun special case when implementing this in Subsurface (which, coincidentally stores all the ndl/deco info in samples...): The special case is the safety stop. You have NDL remaining (obviously) and you have a stop (weak obligation) with a time and depth. Subsurface doesn't show a ceiling for such stops but shows detailed information if your mouse hovers over the profile.
I don't think it's necessary to have an additional time value in the dc_deco_t (because the NDL at the safety stop is usually close to infinity and isn't really a critical piece of information at that stage), but maybe we could add a third value to the enum {NDL, DECOSTOP, SAFETYSTOP} ?
We already committed some changes to the master branch to move into that direction. The DC_EVENT_DECOSTOP now contains the depth (in meters) and time (in seconds) as two packed 16bit values, and a new DC_EVENT_NDL was introduced. It's not perfect yet, because for the current api we were limited to the existing data structures (hence the packing), but it's a step in the direction of the model explained here.
Yes - and if I may say so, it's very appreciated by the Subsurface team. I am happy with the stop-gap implementation in 0.3 and even more happy with the direction this is going for the next-API.
Of course we can consider to extend the type with deepstops and safety stops, or add an extra field for TTS (Time To Surface), and things like that. But the main point is that we move away from the event model as much as possible.
Ahh - should have read the whole section before responding above.
I think safety stop and deep stops could just be different enum values. TTS would have to be a fourth member of the dc_deco_t structure; I would still have it in there as the deco obligation is a major component of TTS...
Thanks, Jef!
/D