On 2012-11-21 19:22, Dirk Hohndel wrote:
On Wed, 2012-11-21 at 17:57 +0100, Jef Driesen wrote:
That would work for a one vendor extension per sample, but I don't see how you would make that work nicely in case there can be multiple extensions per sample. It also makes it harder to use a custom data structure for each extension, instead of just a blob of raw data. Yes, you can abuse the void pointer to point to any structure, but that isn't the most pretty interface.
Let's assume that a device supports a heartrate and an airtime value per sample. And assume for a moment those are not supported with the main api.
If we introduce some kind of a generic dc_sample_get_vendor() function, then we have to make sure it can return both the heartrate and airtime value. We can probably solve that in a similar way as you proposed for the tank pressures. However, my idea was that the backend would provide some dc_xxx_sample_get_heartrate() and dc_xxx_sample_get_airtime() functions with a corresponding data structure. But then a single bit to indicate the presence of a vendor extension isn't sufficient. If every such extensions needs it own bit in the main bitfield, then we'll quickly run out of bits. There are many backends and many possible extensions, so we have a virtually unlimited number of possible extensions. We would at least need an independent bitmap only for the extensions, or some other solution.
There are several problems with that idea. Obviously you run out of bits quickly. But more importantly you create a massive amount of function entry points in the library. So your API gets very complex and even worse, for every new feature that you support you add new functions and now you need matching preprocessor code that allows the application to appropriately do conditional compiles...
If it's just the number of functions, then it's easy to achieve the same result with just a single function of course. For example with a function that works like the dc_parser_get_field function in the current api:
dc_sample_get_vendor (sample, type, &value);
I'm fine with a one function approach, so that's not the main problem.
This brings us straight back to the enum vs. #define discussion we had somewhere else. Think about the users of your library, the people writing applications. It is very unpopular to create a hard link between a specific version of an app and a specific version of a library - especially as more software starts using libdivecomputer. So you should assume that applications want to be able to say "works with libdivecomputer 0.3.0 or later" - but then they need a rational way to test if a specific feature is available.
How does this change depending on whether you conditionally compile with:
#if DC_VERSION_CHECK(x,y,z)
or:
#ifdef DC_SAMPLE_FOO
In both cases, the compiled version will depend on a certain minimum version, but the code can be compiled for several versions.
In the first case the developer has to lookup once (e.g. when doing the implementation) from which version on a particular feature is supported. But from there on, it's crystal clear from reading the code which version you need to get that feature. In the second case, you can blindly add the #ifdef, but now you can't tell from reading the code which libdivecomputer version you need. So the "works with libdivecomputer x.y.x or later" part just became more difficult to see if you ask me.
Why do I mention this here? Because with your function name based approach, you now need a matching feature flag for each function that you introduce. Not my idea of a great API.
Now imagine the other extreme.
dc_sample_get_vendor() returns a structure like this:
struct dc_sample_vendor_t { unsigned int type; unsigned int size; const void *data; };
The type is a unique number that you assign that precisely describes the data that is delivered. Even, you can reuse that number if multiple computers deliver the same data... let's assume heartrate... my guess most every computer will give that to you as an integer and the backend can pad this so it's 16 bit wide (and deal with other tricks that the dc might do to fit it into the fewest number of bits). So now you have ONE type
#define DC_VSAMPLE_HEARTRATE 0xABC
and data is known to be simply a pointer to the number of beats per minute.
This seems MUCH cleaner than an API that requires you to call
dc_uwatec_get_sample_heartrate()
if the user happens to have an Uwatec computer with heartrate support but
dc_suunto_get_sample_heartrate()
if it's a Suunto that now supports heartrate.
Instead what you do is this.
Foreach(sample) { dc_get_sample_basic(&basic_data); // basic_data has time, depth while (dc_sample_get_vendor(time, &sample_vendor)) { switch(sample_vendor.type) { case DC_VSAMPLE_HEARTRATE: ... break; ... default: // Jef added more features, need to update app } } }
Now as I said, that's an extreme example and pushes EVERYTHING into the sample_get_vendor logic. What might be more reasonable is to have a few common values (as you suggested earlier):
Foreach(sample) { dc_get_sample_basic(&basic_data); // basic_data has time, depth if (basic_data.extradatabits & DC_SAMPLE_TANK_PRESSURE) dc_get_tank_pressure(time, &tank_pressure); if (basic_data.extradatabits & DC_SAMPLE_DECO_NDL) dc_get_deco_ndl(time, &deco_ndl); while (dc_sample_get_vendor(time, &sample_vendor)) { switch(sample_vendor.type) { case DC_VSAMPLE_HEARTRATE: ... break; ... default: // Jef added more features, need to update app } } }
So we break out the common pieces of information and make them available with their own API functions... but that of course brings us back to the type specific flags:
struct dc_deco_ndl_t { int type; int depth; // in mm int duration; // in s int tts; // in s }
Where type then tells you if this dive computer provides no duration, just the duration of the deepest stop, just the time to surface, or both.
The heartrate was maybe a very bad example. If Uwatec and Suunto both support a heartrate with the same beats per minute format, then that's already a first indication that it should probably be supported through the main interface, and not as a vendor extension. But okay, this was just a theoretical example.
Anyway, I see the vendor extensions as something to support things that are really *highly vendor specific*. Take for example the oceanics. They have a fixed 8 bytes of data per sample. Each model stores roughly the same set of information (o2bg, n2bg, pressure, depth, vari, dtr, atr, deco, temperature, alarms, etc), but each model packs this data slightly different into those 8 bytes. (This is a real pain in the ass, and the reason why the oceanic code needs by far the most bugfixes.) Here we could support the basic fields (time, depth, temperature, etc) through the standard interface, and leave everything else as raw data. But as a vendor extension, we could take care of the crazy oceanic data packing internally, and present the data in some nicer struct:
struct dc_oceanic_sample_t { unsigned int o2bg, n2bg; /* O2 and N2 bar graph */ unsigned int dtr, atr; /* Dive and air time remaining */ ... };
This type of extension is extremely unlikely to be shared between different backends. So I think that one of the reason we have a different opinion is that we also have a different view on what we consider to be a vendor extension.
I'm also convinced that there will probably very few applications that will actually make use of the vendor extensions. If you look at the general purpose logbook applications that exist today (divinglog, macdive, etc), then you'll notice they only support the device independent features that would be supported by the standard interface. I don't see how you can even write an application that supports all possible features of each device, considering the vast amount of differences. The only applications that support these features are the vendor applications, which are by definition tied to some specific devices. But such applications are not really our (main) target audience.
BTW, these vendor extensions have the lowest possible priority for me. There will be plenty of work to get the standard interface implemented and reverse engineering bugs to fix. So the first version(s) will likely support no vendor extensions at all. However while this is pretty low on my priority list, I think it's good to keep future extensibility in mind.
We can maybe consider to provide language bindings for the most popular languages (.net and python) ourselves?
Not a bad idea - but what is a bad idea is to then go for the least common denominator and create an ugly, badly implemented C API just because it's harder to map a well designed API into .net constructs.
You can always have the .net binding create its own data structures that follow the preferred way to do things in .net - but don't force that onto the C API...
The reason why I mentioned maintaining the language bindings as part of the project is that, if there is some ugly and obscure construct required, then we can take care of that ourselves, and hide it from the user. Then we don't have to worry anymore about users having trouble with the .NET integration.
Unfortunately I speak from experience here. I spend considerable amounts of time on the .NET interface for the current api (for divinglog). There were several non-trivial issues (calling conventions, interactions with the garbage collector, manual marshaling, etc) that required quite some research on my side to figure out why things were not working as expected. I'm pretty sure the average .NET developer wouldn't be able to get this right either.
So I'm not saying we should model after the least common denominator, just that we have to take care that whatever api we pick, it should be possible to use from other languages. That's why I referred to the varargs functions for example.
Anyway, I think it's very important to not treat these non C/C++ applications as second class citizens, especially because at least one of them (divinglog) is likely the application with the largest userbase. So that means all features should be usable from the languages bindings too.
Certainly. But since these apps are binary-only they live in a somewhat different world where they can simply take a random snapshop of libdivecomputer, implement all what they want from the API and not worry about libdivecomputer changing (or the user picking a different version of libdivecomputer). For many other applications life isn't that simple.
The rules for binary compatibility are the same for C and .NET applications. If your application depends on symbol X, and you switch back to an older library without this symbol, than both the C and .NET app will fail. They may fail in some different way, but that doesn't matter for this discussion.
Sure, I could fork libdivecomputer and simply pull whatever version I want to rely on into the Subsurface git and tell people to just build from that version - but that's not my vision of how software engineering across projects should work - I wouldn't want to do that for libxml2, glib, gtk, libc... so why should I have to do that for libdivecomputer.
So fundamentally:
If your API does not allow a an application a sane way to detect which features are supported and how to conditionally compile for those features, then your API is broken.
Sure, that's definitely not what we want :-)
I just try to understand why you dislike the DC_VERSION_CHECK and prefer the DC_SAMPLE_FOO conditional compilation. In the end, regardless of how you do the conditional compilation, your application will need to know the data format associated with each vendor extension. That is also true whether you use an individual function for each extension, or one function with some type indicator. I mean everything that you wrote about the gyro stuff below can be done with both your and mine proposal.
I guess that once we have decided on the final list of dive and sample data that we will support, the chances that we'll have to introduce additional samples in the future are relative small. I mean the goal for the v1.0 was to maintain backwards compatibility from then on. If we then document the first version number in which every feature appeared, then it shouldn't be that much of a hassle to do:
#if DC_VERSION_CHECK(x,y,z)
#endif
and then language bindings can do the same at runtime.
Certainly. But think twice how your API will look like. How many functions will you have? What data types do they return? And how are they extended to provide more data as that becomes available from dive computers (think details about the deco model, saturation of different compartments, gyro information about vertical and lateral movement, pictures that were taken during that sample, voice recordings of the conversation during the dive... I know you think I'm insane, but look at what the Suunto Solution provided and what you can get from a state of the art DC like the OSTC or the Uemis SDA... to assume that the speed of innovation will suddenly come to a halt is being really foolish).
That's why I really wasn't joking with my extreme example above. Have the smallest possible API with the most flexible data types that you can think of. In my model it is easy and transparent to add
#define DC_VSAMPLE_GYRO 0xDEAD
and then have the data pointer point to a
struct dc_gyro_t { int flags; int delta_north; // in mm int delta_west; // in mm int accuracy; // in mm };
which allows you to track the movement of the diver. And all the application does is
#if defined(DC_VSAMPLE_GYRO) ... #endif
in the switch statement that loops over all the vendor samples. Clean, easy to maintain, easy to handle from an application point of view.
Now imagine the same thing with an API that only supports per vendor data without complex data structures. So now you are adding the following entry points to your API:
dc_atomics_gyro_get_delta_north() dc_atomics_gyro_get_delta_west() dc_atomics_gyro_get_delta_accuracy()
dc_uemis_gyro_get_delta_north() dc_uemis_gyro_get_delta_west() dc_uemis_gyro_get_delta_accuracy()
etc. And imagine the code flow in the application that has to duplicate all this crap.
Oh, but that was definitely not my intention. There is absolutely no reason to limit a vendor extension to just a single value. It's perfectly acceptable to use a struct like your dc_gyro_t.
To be honest, I don't really like this. This tastes again as adapting our model to that of the dive computers. If we supply three values, then it breaks again if there is some dive computer out there that supports some fourth value. And with those flags, the semantics can be altered in many different ways. I fear this will quickly get messy on the application side.
Quite the contrary. This allows the application to actually understand the data it gets in a dive computer INDEPENDENT way. If all the app cares for is "give me some temperature" then it ignores the flags. If the application would like to be able to plot the temperature where it was taken (even on the dive computer that may only record the temperature at the deepest point of the plot or the minimum temp or something) then it can do so in a sane way.
What you are proposing is to simply pretend that all dive computers are the same and they give us a "temperature". And if the application wants to know what this temperature actually means, tough shit, write DC specific code.
[...]
I'll keep singing the same song. Pick one semantic as default and provide clearly specified flags to modify that.
At this point I might as well abandon using libdivecomputer (or use libdivecomputer to do the downloading of raw data) and write the rest in my application - that way at least I know what I'm getting.
Do you see my point? I don't want to adapt libdivecomputer to ONE SPECIFIC divecomputer. I want libdivecomputer to be designed so that it can give me accurate and well defined information for EVERY divecomputer it supports. That's why I want the flags. That's why I want the flexibility.
You seem to want to have an API which provides the information as it would come from some idealized divecomputer with a fixed feature set and ill defined semantics and then let the application deal with the rest. I want the API to be able to provide us the maximum amount of data with the best possible semantic definition for each and every divecomputer out there - and not dumb it down to some least common denominator or some middle ground.
[...]
So it might be more interesting to have the begin/end temperature, and the temperature at maximum depth delivered in the sample data. We already do this today for the suunto vyper backend, and I think it works quite well. Instead of temperature graph, you now draw a few temperature dots. If we do that, we can then reserve the temperature field exclusively for the true single temperature value case.
And again, you take a specific computer and try to make the model match that computer.
I try to provide an API that allows every computer to provide as much data as they have. And then have the app decide which information it wants to display and how it wants to display it.
So if your computer gives you a full temperature graph on every sample, GREAT. If the computer doesn't give sample temperature at all but only a beginning and min temperature - provide that instead. And if it is end temperature and temperature at maximum depth, then provide that AND DON'T PRETEND THEY ARE BEGINNING TEMPERATURE AND MINIMUM TEMPERATURE. Provide them as what they are and let the app decide what to do with them.
I'm not trying to make the model match one specific dive computer. Placing the different temperatures values on the profile would be completely independent of any dive computer model for example. It can be done for every model. Imagine there is some crazy dive computer that gives you temperature at the decostops. That would still work!
Let me go back to the temperature problem, and formulate it in a different way. At the moment we already have five possible interpretations (begin/end, min/max/avg, and at maximum depth). A device can support one or more of them, in all possible combinations. Let's say I agree with you and we want to preserve the exact semantics. How should the api and data structure look like? Just a single temperature with a flag won't work for devices that have two values. If we extend the data structure with some more temperature fields (and flags), then it will break again if there is another device with one more value. How would you solve this problem?
So I'm not saying I'm against trying to preserve the exact temperature semantics. What I don't like are the flags you proposed like MIN_TEMP_IS_AT_MAXDEPTH, because it alters the interpretation of the min_temperature field. I'm not convinced that's the right solution. So maybe we should think again and try to come up with a third solution. I'll start by providing a couple of ideas that crossed my mind:
I already mentioned placing the different temperature values into the samples. We could also support each temperature variant as a different type: DC_FIELD_TEMPERATURE_{MIN,MAX,...}. When a particular variant isn't available we don't set the corresponding bit (or return UNSUPPORTED). Or we define a structure with all the variants we wish to support, along with a bitmap to tell which fields are present (or when using Kelvins, we could consider a value of 0 as not present):
struct dc_field_temperature_t { bitmap; minimum; maximum; ... }
But then you introduce extra complexity on the application side. If you get an CNS value, then you have to check all those flags to figure out whether it's a percentage (default), or some scale index, or that weird suunto OLF value, or...
You can't just take the value anymore and treat is as a percentage. From my point of view that makes the abstraction provided by libdivecomputer less useful. I mean how different is that from doing a model based interpretation:
if (suunto) /* OLF */ else if (oceanic) /* Range index */ else /* Percentage */
I think we are better off always providing a percentage, and if that is not available then you don't get a CNS value at all.
No, Jef, that is NOT being better off. That is better off assuming all you care about are the people who have a computer that follows the model that you pick and everyone else can go screw themselves. We are better off if we provide the app as much information as is there.
And if application writers are as lazy as you seem to assume they are more than welcome to say
get the CNS if (flag & CNS_IS_PERCENTAGE) show cns value else tell user to go fuck themselves and buy my dive computer
Subsurface would rather be able to figure out a way to display the CNS regardless how the dive computer manufacturer decides to encode it - and figure out new ways to do this if new dive computers with a better (or even only different) model show up.
CNS values are defined as percentages, just like depths are defined as lengths. The fact that some vendors choose to store it differently (to save some bits?) doesn't change that. Just like we convert all depths to a standard unit, we would do the same for CNS. If that's not possible because it's some vendor specific concept like OLF, then we can't do the conversion and we act as if the device doesn't support CNS. If you allow the CNS value to contain different types of values (e.g. CNS or OLF depending on the flag), then why not also leave the depth in imperial or metric units, and provide a flag with the units? Because according to me that's exactly the same as what you propose here for CNS. (I know CNS is just an example here, but the same logic applies to other values too of course.)
From my point of view, one of the main advantages of using libdivecomputer is that it provides an abstraction. Call that abstraction an idealized dive computer if you like. But as is the case with all abstractions, you can't support every single feature that's out there. So yes, in some cases you will end up with a more limited subset than what the device supports. I believe that's a trade-off you have to make.
I agree, although not as strong as you do.
Let's say we define the meaning of the temperature as the minimum water temperature, which is probably the most common value. But then you happen to have some dive computer which only provides the average water temperature. Since that's all we have, that's what we give to the application and goes into your logbook. Is the small difference in semantics really that important, considering we don't have any other info? I mean to the user it won't make that much difference, because the semantics will remain consistent across all his dives with that device. In a paper logbook you would also just write down the value that you get from your dive computer.
Again, you assume that the application is lazy. If what libdivecomputer does is "heck, it's the only temperature I get, let's claim it's the minimum" then all the application can do is show the minimum temperature and if my computer is picking the temperature at maximum depth and I am diving in Hoodsport in the winter and the computer give 10C as temperature at maximum depth but lazy ass app then claims "minimum temperature was 10C then I will think that the application author is on crack while in reality it is libdivecomputer that is giving me information that is WRONG and therefore worthless.
Why is this so hard to understand.
If the other application authors really are that lazy, fine, they can ignore the flags that correctly specify what type of information is provided. Or if there is no sane way to present the information that is provided (heck, what DO those Suunto OLF values mean? beats me...)
Suunto OLF (Oxygen Limit Fraction) is the maximum value of CNS and OTU, both expressed as percentages. So it's either CNS or OTU, but you can't tell which one.
But that is a decision you should LEAVE TO THE APP and not make in libdivecomputer.
From what you are writing I think that you want an API that simplifies things to a degree that destroys information. And I am vehemently opposed to that.
While I really understand your concern regarding the temperature problem (and we can probably find some way around it), I really don't get the CNS one. For the temperature, ignoring the flag will still yield a temperature. The semantics may be a bit wrong, but it's still a temperature value. However for the CNS value you would get something completely different! So this is not about applications being lazy, but about providing an abstract model with standardized units. By not supporting certain vendor specific concepts (e.g. OLF) we are actually preserving that information, albeit in the form of raw data or a vendor extension.
So while I agree that a crystal clear semantics would be ideal, I also believe some small deviations are acceptable. You can also put it in another way: how can we even be sure about the actual semantics given the fact that it's all based on reverse engineering, so there is always an uncertainty factor present.
It is less and less based on reverse engineering as more and more dive computer vendors seem willing to give us API information. And I think this trend will only continue (of the ones that I have contacted, only one so far has told me "no, we will not provide that").
I don't know which vendors you contacted, but as far as I know the "big players" like Suunto, Uwatec, Mares, etc have no interest in supporting third-party applications. Only the relative new players on the market are supporting us, like Reefnet, HW, Atomics, Shearwater, etc. No matter how great those few vendors are, the majority of the divers are still using a device from one of the big players.
I think it gets a lot more dramatically if an application interprets a value incorrect because it fails to take into account one of those flags correctly. For example interpreting the oceanic CNS range index as a percentage, which means it will never go past 5% (or some other small value). Ouch.
So you are saying you are doing a better job by not giving any data at all in that case, instead of providing the data with a flag that defines what it means?
Yes. See above for my reason behind it. If it doesn't fit into the abstraction provided by libdivecomputer, then it's likely a vendor specific feature, and should be treated likewise.
Jef