On Wed, 2012-11-21 at 17:57 +0100, Jef Driesen wrote:
I also considered using a bitmap (it was mentioned briefly somewhere in my mail I think), but I couldn't find a way to make it work with vendor extensions. A bitfield is limited to 8 to 64 different types, depending on the integer size we decide to use. For the normal samples, that is probably more than we'll ever need. But if every backend can support a few extension types, and each one needs another bit, then we'll quickly running out of bits. Do you have a solution for this?
Yes. Sort of. You have one bit in your bitmap that says "vendor extension" and then the vendor extension can use another int (16bit or something) with an enum (so 64k values for 16 bits) to indicate which extension this is. So we don't waste the space in every sample, but if there is an extension we have tons of room to grow and add extensions for new things to come.
If I understand your proposal correctly, you would have a data structure similar to the existing vendor data:
struct dc_sample_vendor_t { unsigned int type; unsigned int size; const void *data; };
Yes, something like that.
That would work for a one vendor extension per sample, but I don't see how you would make that work nicely in case there can be multiple extensions per sample. It also makes it harder to use a custom data structure for each extension, instead of just a blob of raw data. Yes, you can abuse the void pointer to point to any structure, but that isn't the most pretty interface.
Let's assume that a device supports a heartrate and an airtime value per sample. And assume for a moment those are not supported with the main api.
If we introduce some kind of a generic dc_sample_get_vendor() function, then we have to make sure it can return both the heartrate and airtime value. We can probably solve that in a similar way as you proposed for the tank pressures. However, my idea was that the backend would provide some dc_xxx_sample_get_heartrate() and dc_xxx_sample_get_airtime() functions with a corresponding data structure. But then a single bit to indicate the presence of a vendor extension isn't sufficient. If every such extensions needs it own bit in the main bitfield, then we'll quickly run out of bits. There are many backends and many possible extensions, so we have a virtually unlimited number of possible extensions. We would at least need an independent bitmap only for the extensions, or some other solution.
There are several problems with that idea. Obviously you run out of bits quickly. But more importantly you create a massive amount of function entry points in the library. So your API gets very complex and even worse, for every new feature that you support you add new functions and now you need matching preprocessor code that allows the application to appropriately do conditional compiles...
This brings us straight back to the enum vs. #define discussion we had somewhere else. Think about the users of your library, the people writing applications. It is very unpopular to create a hard link between a specific version of an app and a specific version of a library - especially as more software starts using libdivecomputer. So you should assume that applications want to be able to say "works with libdivecomputer 0.3.0 or later" - but then they need a rational way to test if a specific feature is available.
Why do I mention this here? Because with your function name based approach, you now need a matching feature flag for each function that you introduce. Not my idea of a great API.
Now imagine the other extreme.
dc_sample_get_vendor() returns a structure like this:
struct dc_sample_vendor_t { unsigned int type; unsigned int size; const void *data; };
The type is a unique number that you assign that precisely describes the data that is delivered. Even, you can reuse that number if multiple computers deliver the same data... let's assume heartrate... my guess most every computer will give that to you as an integer and the backend can pad this so it's 16 bit wide (and deal with other tricks that the dc might do to fit it into the fewest number of bits). So now you have ONE type
#define DC_VSAMPLE_HEARTRATE 0xABC
and data is known to be simply a pointer to the number of beats per minute.
This seems MUCH cleaner than an API that requires you to call
dc_uwatec_get_sample_heartrate()
if the user happens to have an Uwatec computer with heartrate support but
dc_suunto_get_sample_heartrate()
if it's a Suunto that now supports heartrate.
Instead what you do is this.
Foreach(sample) { dc_get_sample_basic(&basic_data); // basic_data has time, depth while (dc_sample_get_vendor(time, &sample_vendor)) { switch(sample_vendor.type) { case DC_VSAMPLE_HEARTRATE: ... break; ... default: // Jef added more features, need to update app } } }
Now as I said, that's an extreme example and pushes EVERYTHING into the sample_get_vendor logic. What might be more reasonable is to have a few common values (as you suggested earlier):
Foreach(sample) { dc_get_sample_basic(&basic_data); // basic_data has time, depth if (basic_data.extradatabits & DC_SAMPLE_TANK_PRESSURE) dc_get_tank_pressure(time, &tank_pressure); if (basic_data.extradatabits & DC_SAMPLE_DECO_NDL) dc_get_deco_ndl(time, &deco_ndl); while (dc_sample_get_vendor(time, &sample_vendor)) { switch(sample_vendor.type) { case DC_VSAMPLE_HEARTRATE: ... break; ... default: // Jef added more features, need to update app } } }
So we break out the common pieces of information and make them available with their own API functions... but that of course brings us back to the type specific flags:
struct dc_deco_ndl_t { int type; int depth; // in mm int duration; // in s int tts; // in s }
Where type then tells you if this dive computer provides no duration, just the duration of the deepest stop, just the time to surface, or both.
While I'm definitely not opposed to the bitmap idea, I'm not that that happy with the basic_sample_t struct. I would prefer to implement this with separate functions. One for the bitmap itself, and one function for each value. Since there are only two values that are always available, I wouldn't treat them any different from the others.
The reason that I thought of treating them differently is that these two values are available in every sample from every DC (otherwise, what's the point of that DC). So you can have one call that gives you what you can always get (time, depth) plus the information about everything else that is available.
But if you'd rather have this be three API calls (get_time, get_depth, get_available_data_bitmap or something), I'll be happy with that, too.
For the multivalued data, I'm not sure how well such a struct (containing a length and pointer to an array) will work for language bindings (python, .net, etc). That's something we'll have to check.
I don't know about .net. In python this would work just fine.
I'm pretty sure such a structure isn't supported out of the box and will need some manual marshaling on the .net side. So my guess is that it can be done, but it won't be very pretty.
We can maybe consider to provide language bindings for the most popular languages (.net and python) ourselves?
Not a bad idea - but what is a bad idea is to then go for the least common denominator and create an ugly, badly implemented C API just because it's harder to map a well designed API into .net constructs.
You can always have the .net binding create its own data structures that follow the preferred way to do things in .net - but don't force that onto the C API...
I prefer enums over pre-processor constants too. I know this will loose the conditional compiling, but this is a trick that will work for C/C++ only.
The problem with enums is that you now have to have strong versioning of your API - something which libdivecomputer has not a good history of doing (and I know all the reasons why - don't take this as criticism, just as an observation). So if you use enums then I have to recreate the logic of "which version of libdivecomputer supports which values for this function" in my app - and so does everyone else. We need to basically create a "version number to feature mapping" in the client.
If you do this with #defines then this is much easier to take care of. At least for C/C++ programers - I have seen other language bindings parse the defines for this as well, but don't know what the current users of libdivecomputer are
As far as I know there are at least .NET (divinglog) and python (kenozooid) based applications. I got quite some questions about .NET support, so I assume there will be others too.
Anyway, I think it's very important to not treat these non C/C++ applications as second class citizens, especially because at least one of them (divinglog) is likely the application with the largest userbase. So that means all features should be usable from the languages bindings too.
Certainly. But since these apps are binary-only they live in a somewhat different world where they can simply take a random snapshop of libdivecomputer, implement all what they want from the API and not worry about libdivecomputer changing (or the user picking a different version of libdivecomputer). For many other applications life isn't that simple.
Sure, I could fork libdivecomputer and simply pull whatever version I want to rely on into the Subsurface git and tell people to just build from that version - but that's not my vision of how software engineering across projects should work - I wouldn't want to do that for libxml2, glib, gtk, libc... so why should I have to do that for libdivecomputer.
So fundamentally:
If your API does not allow a an application a sane way to detect which features are supported and how to conditionally compile for those features, then your API is broken.
BTW, that's for example one of the main reasons the logging interface doesn't exposes a vararg interface, but calls vsnprintf internally to convert the messages to strings. Strings are supported by all languages, varargs are not.
I guess that once we have decided on the final list of dive and sample data that we will support, the chances that we'll have to introduce additional samples in the future are relative small. I mean the goal for the v1.0 was to maintain backwards compatibility from then on. If we then document the first version number in which every feature appeared, then it shouldn't be that much of a hassle to do:
#if DC_VERSION_CHECK(x,y,z)
#endif
and then language bindings can do the same at runtime.
Certainly. But think twice how your API will look like. How many functions will you have? What data types do they return? And how are they extended to provide more data as that becomes available from dive computers (think details about the deco model, saturation of different compartments, gyro information about vertical and lateral movement, pictures that were taken during that sample, voice recordings of the conversation during the dive... I know you think I'm insane, but look at what the Suunto Solution provided and what you can get from a state of the art DC like the OSTC or the Uemis SDA... to assume that the speed of innovation will suddenly come to a halt is being really foolish).
That's why I really wasn't joking with my extreme example above. Have the smallest possible API with the most flexible data types that you can think of. In my model it is easy and transparent to add
#define DC_VSAMPLE_GYRO 0xDEAD
and then have the data pointer point to a
struct dc_gyro_t { int flags; int delta_north; // in mm int delta_west; // in mm int accuracy; // in mm };
which allows you to track the movement of the diver. And all the application does is
#if defined(DC_VSAMPLE_GYRO) ... #endif
in the switch statement that loops over all the vendor samples. Clean, easy to maintain, easy to handle from an application point of view.
Now imagine the same thing with an API that only supports per vendor data without complex data structures. So now you are adding the following entry points to your API:
dc_atomics_gyro_get_delta_north() dc_atomics_gyro_get_delta_west() dc_atomics_gyro_get_delta_accuracy()
dc_uemis_gyro_get_delta_north() dc_uemis_gyro_get_delta_west() dc_uemis_gyro_get_delta_accuracy()
etc. And imagine the code flow in the application that has to duplicate all this crap.
But okay, this is something we can discuss further.
Certainly :-)
- Supported dive and sample data
==================================
- temperature
air temperature, I assume?
Depends. Different dive computers have different interpretations. It could be air/water temperature, the min/max/avg value, and so on. If that's even know to us. Not sure how to deal with this. We could just supply a temperature, and leaving the actual interpretation to the user, or we could have an extra flag with the exact interpretation (if known) as you mention below. But even then we're still left with devices that provide more than one temperature.
So offer a couple of values with flags that determin the semantics
- air_temperature
- min_water_temperature
- avg_water_temperature
And offer flags that clarify semantics
AIR_TEMP_IS_FIRST_SAMPLE AVG_IS_LAST_TEMP MIN_TEMP_IS_AT_MAX_DEPTH
(btw, I love the last one as I frequently dive in a water body that for more than half of the year has temperature go UP as you go deeper...)
To be honest, I don't really like this. This tastes again as adapting our model to that of the dive computers. If we supply three values, then it breaks again if there is some dive computer out there that supports some fourth value. And with those flags, the semantics can be altered in many different ways. I fear this will quickly get messy on the application side.
Quite the contrary. This allows the application to actually understand the data it gets in a dive computer INDEPENDENT way. If all the app cares for is "give me some temperature" then it ignores the flags. If the application would like to be able to plot the temperature where it was taken (even on the dive computer that may only record the temperature at the deepest point of the plot or the minimum temp or something) then it can do so in a sane way.
What you are proposing is to simply pretend that all dive computers are the same and they give us a "temperature". And if the application wants to know what this temperature actually means, tough shit, write DC specific code.
At this point I might as well abandon using libdivecomputer (or use libdivecomputer to do the downloading of raw data) and write the rest in my application - that way at least I know what I'm getting.
Do you see my point? I don't want to adapt libdivecomputer to ONE SPECIFIC divecomputer. I want libdivecomputer to be designed so that it can give me accurate and well defined information for EVERY divecomputer it supports. That's why I want the flags. That's why I want the flexibility.
You seem to want to have an API which provides the information as it would come from some idealized divecomputer with a fixed feature set and ill defined semantics and then let the application deal with the rest. I want the API to be able to provide us the maximum amount of data with the best possible semantic definition for each and every divecomputer out there - and not dumb it down to some least common denominator or some middle ground.
If you take a step back and think about it, the temperature is just a poor substitute for those devices without a full temperature profile. If you have a full temperature profile, then those min/max/avg values and begin/end values suddenly become a lot less interesting.
Yes - but lots of divecomputers don't provide this. And anyway - temperature is just an easy to understand example - the same logic applies for almost every data point that you provide.
So it might be more interesting to have the begin/end temperature, and the temperature at maximum depth delivered in the sample data. We already do this today for the suunto vyper backend, and I think it works quite well. Instead of temperature graph, you now draw a few temperature dots. If we do that, we can then reserve the temperature field exclusively for the true single temperature value case.
And again, you take a specific computer and try to make the model match that computer.
I try to provide an API that allows every computer to provide as much data as they have. And then have the app decide which information it wants to display and how it wants to display it.
So if your computer gives you a full temperature graph on every sample, GREAT. If the computer doesn't give sample temperature at all but only a beginning and min temperature - provide that instead. And if it is end temperature and temperature at maximum depth, then provide that AND DON'T PRETEND THEY ARE BEGINNING TEMPERATURE AND MINIMUM TEMPERATURE. Provide them as what they are and let the app decide what to do with them.
Heartrate is only supported by the Uwatec Galileo as far as I know. It certainly not a must-have feature, but I can imagine that if you have such a device, this is an interesting value. Although the actual usefulness is questionable if you ask me, but it's cool gadget :-)
So much of what high end DCs provide is more cool than entirely useful - but I agree that we should make it available regardless. Visualization is the application's job :-)
Yes, that's the idea. As long as it's reasonable device independent (e.g. there is no reason why other devices can't include a heartrate monitor, and beats per minute is a standard unit too) and doesn't complicate the api by (e.g. if you don't need it, you just don't read the value and that's it), then I have no real objections. The same for other types like CNS/OTU.
But things like for example tissue saturation in the decompression model of device X, or RGBM warning Y, that's a whole other story.
I have no idea which devices record this in the data. I know the Suunto's have this weird OLF (Oxygen Limit Factor) value, which is either the CNS or OTU, whichever is higher. And I think the Oceanics have a scale from 1 to 5 or something. So that might be a bit more difficult to support in a uniform way.
I'll keep singing the same song. Pick one semantic as default and provide clearly specified flags to modify that.
But then you introduce extra complexity on the application side. If you get an CNS value, then you have to check all those flags to figure out whether it's a percentage (default), or some scale index, or that weird suunto OLF value, or...
You can't just take the value anymore and treat is as a percentage. From my point of view that makes the abstraction provided by libdivecomputer less useful. I mean how different is that from doing a model based interpretation:
if (suunto) /* OLF */ else if (oceanic) /* Range index */ else /* Percentage */
I think we are better off always providing a percentage, and if that is not available then you don't get a CNS value at all.
No, Jef, that is NOT being better off. That is better off assuming all you care about are the people who have a computer that follows the model that you pick and everyone else can go screw themselves. We are better off if we provide the app as much information as is there.
And if application writers are as lazy as you seem to assume they are more than welcome to say
get the CNS if (flag & CNS_IS_PERCENTAGE) show cns value else tell user to go fuck themselves and buy my dive computer
Subsurface would rather be able to figure out a way to display the CNS regardless how the dive computer manufacturer decides to encode it - and figure out new ways to do this if new dive computers with a better (or even only different) model show up.
What I had in mind, is to have a single DC_SAMPLE_EVENT type. If this bit is set, there are events available. The actual event value would then be another bitmap containing all the events. With this approach we can support several events without needing additional sample types for each event type.
Good idea.
Then the next step will be to make a list of the events we want.
See what I wrote above. The moment you don't have precisely defined semantics of the value (even if the semantic is just our current best guess based on reverse engineering) the usefulness of that value to the application declines dramatically. An app is always at liberty to ignore that additional information ("hey, here's some temperature, no idea what it is"), but if it isn't there all that will happen is that apps start to make educated guesses based on the type of DC used. And that's just ugly.
I feel very strongly about that. I think and API that doesn't define what the values it gives you actually mean is really of rather limited use.
I agree, although not as strong as you do.
Let's say we define the meaning of the temperature as the minimum water temperature, which is probably the most common value. But then you happen to have some dive computer which only provides the average water temperature. Since that's all we have, that's what we give to the application and goes into your logbook. Is the small difference in semantics really that important, considering we don't have any other info? I mean to the user it won't make that much difference, because the semantics will remain consistent across all his dives with that device. In a paper logbook you would also just write down the value that you get from your dive computer.
Again, you assume that the application is lazy. If what libdivecomputer does is "heck, it's the only temperature I get, let's claim it's the minimum" then all the application can do is show the minimum temperature and if my computer is picking the temperature at maximum depth and I am diving in Hoodsport in the winter and the computer give 10C as temperature at maximum depth but lazy ass app then claims "minimum temperature was 10C then I will think that the application author is on crack while in reality it is libdivecomputer that is giving me information that is WRONG and therefore worthless.
Why is this so hard to understand.
If the other application authors really are that lazy, fine, they can ignore the flags that correctly specify what type of information is provided. Or if there is no sane way to present the information that is provided (heck, what DO those Suunto OLF values mean? beats me...)
But that is a decision you should LEAVE TO THE APP and not make in libdivecomputer.
From what you are writing I think that you want an API that simplifies
things to a degree that destroys information. And I am vehemently opposed to that.
So while I agree that a crystal clear semantics would be ideal, I also believe some small deviations are acceptable. You can also put it in another way: how can we even be sure about the actual semantics given the fact that it's all based on reverse engineering, so there is always an uncertainty factor present.
It is less and less based on reverse engineering as more and more dive computer vendors seem willing to give us API information. And I think this trend will only continue (of the ones that I have contacted, only one so far has told me "no, we will not provide that").
I think it gets a lot more dramatically if an application interprets a value incorrect because it fails to take into account one of those flags correctly. For example interpreting the oceanic CNS range index as a percentage, which means it will never go past 5% (or some other small value). Ouch.
So you are saying you are doing a better job by not giving any data at all in that case, instead of providing the data with a flag that defines what it means?
Strange.
/D