Next round of api improvements.

Jef Driesen jefdriesen at telenet.be
Sat Nov 24 06:57:37 UTC 2012


On 2012-11-21 19:22, Dirk Hohndel wrote:
> On Wed, 2012-11-21 at 17:57 +0100, Jef Driesen wrote:
>> That would work for a one vendor extension per sample, but I don't 
>> see
>> how you would make that work nicely in case there can be multiple
>> extensions per sample. It also makes it harder to use a custom data
>> structure for each extension, instead of just a blob of raw data. 
>> Yes,
>> you can abuse the void pointer to point to any structure, but that 
>> isn't
>> the most pretty interface.
>>
>> Let's assume that a device supports a heartrate and an airtime value
>> per sample. And assume for a moment those are not supported with the
>> main api.
>>
>> If we introduce some kind of a generic dc_sample_get_vendor() 
>> function,
>> then we have to make sure it can return both the heartrate and 
>> airtime
>> value. We can probably solve that in a similar way as you proposed 
>> for
>> the tank pressures. However, my idea was that the backend would 
>> provide
>> some dc_xxx_sample_get_heartrate() and dc_xxx_sample_get_airtime()
>> functions with a corresponding data structure. But then a single bit 
>> to
>> indicate the presence of a vendor extension isn't sufficient. If 
>> every
>> such extensions needs it own bit in the main bitfield, then we'll
>> quickly run out of bits. There are many backends and many possible
>> extensions, so we have a virtually unlimited number of possible
>> extensions. We would at least need an independent bitmap only for 
>> the
>> extensions, or some other solution.
>
> There are several problems with that idea. Obviously you run out of 
> bits
> quickly. But more importantly you create a massive amount of function
> entry points in the library. So your API gets very complex and even
> worse, for every new feature that you support you add new functions 
> and
> now you need matching preprocessor code that allows the application 
> to
> appropriately do conditional compiles...

If it's just the number of functions, then it's easy to achieve the 
same result with just a single function of course. For example with a 
function that works like the dc_parser_get_field function in the current 
api:

dc_sample_get_vendor (sample, type, &value);

I'm fine with a one function approach, so that's not the main problem.

> This brings us straight back to the enum vs. #define discussion we 
> had
> somewhere else. Think about the users of your library, the people
> writing applications. It is very unpopular to create a hard link 
> between
> a specific version of an app and a specific version of a library -
> especially as more software starts using libdivecomputer. So you 
> should
> assume that applications want to be able to say "works with
> libdivecomputer 0.3.0 or later" - but then they need a rational way 
> to
> test if a specific feature is available.

How does this change depending on whether you conditionally compile 
with:

#if DC_VERSION_CHECK(x,y,z)

or:

#ifdef DC_SAMPLE_FOO

In both cases, the compiled version will depend on a certain minimum 
version, but the code can be compiled for several versions.

In the first case the developer has to lookup once (e.g. when doing the 
implementation) from which version on a particular feature is supported. 
But from there on, it's crystal clear from reading the code which 
version you need to get that feature. In the second case, you can 
blindly add the #ifdef, but now you can't tell from reading the code 
which libdivecomputer version you need. So the "works with 
libdivecomputer x.y.x or later" part just became more difficult to see 
if you ask me.

> Why do I mention this here? Because with your function name based
> approach, you now need a matching feature flag for each function that
> you introduce. Not my idea of a great API.
>
> Now imagine the other extreme.
>
> dc_sample_get_vendor() returns a structure like this:
>
> struct dc_sample_vendor_t {
>      unsigned int type;
>      unsigned int size;
>      const void *data;
>  };
>
> The type is a unique number that you assign that precisely describes 
> the
> data that is delivered. Even, you can reuse that number if multiple
> computers deliver the same data... let's assume heartrate... my guess
> most every computer will give that to you as an integer and the 
> backend
> can pad this so it's 16 bit wide (and deal with other tricks that the 
> dc
> might do to fit it into the fewest number of bits). So now you have 
> ONE
> type
>
> #define DC_VSAMPLE_HEARTRATE 0xABC
>
> and data is known to be simply a pointer to the number of beats per
> minute.
>
> This seems MUCH cleaner than an API that requires you to call
>
> dc_uwatec_get_sample_heartrate()
>
> if the user happens to have an Uwatec computer with heartrate support
> but
>
> dc_suunto_get_sample_heartrate()
>
> if it's a Suunto that now supports heartrate.
>
> Instead what you do is this.
>
> Foreach(sample) {
> 	dc_get_sample_basic(&basic_data); // basic_data has time, depth
> 	while (dc_sample_get_vendor(time, &sample_vendor)) {
> 		switch(sample_vendor.type) {
> 		case DC_VSAMPLE_HEARTRATE:
> 			...
> 			break;
> 		...
> 		default:
> 			// Jef added more features, need to update app
> 		}
> 	}
> }
>
> Now as I said, that's an extreme example and pushes EVERYTHING into 
> the
> sample_get_vendor logic. What might be more reasonable is to have a 
> few
> common values (as you suggested earlier):
>
> Foreach(sample) {
> 	dc_get_sample_basic(&basic_data); // basic_data has time, depth
> 	if (basic_data.extradatabits & DC_SAMPLE_TANK_PRESSURE)
> 		dc_get_tank_pressure(time, &tank_pressure);
> 	if (basic_data.extradatabits & DC_SAMPLE_DECO_NDL)
> 		dc_get_deco_ndl(time, &deco_ndl);
> 	while (dc_sample_get_vendor(time, &sample_vendor)) {
> 		switch(sample_vendor.type) {
> 		case DC_VSAMPLE_HEARTRATE:
> 			...
> 			break;
> 		...
> 		default:
> 			// Jef added more features, need to update app
> 		}
> 	}
> }
>
> So we break out the common pieces of information and make them 
> available
> with their own API functions... but that of course brings us back to 
> the
> type specific flags:
>
> struct dc_deco_ndl_t {
> 	int type;
> 	int depth; // in mm
>  	int duration; // in s
> 	int tts; // in s
> }
>
> Where type then tells you if this dive computer provides no duration,
> just the duration of the deepest stop, just the time to surface, or
> both.

The heartrate was maybe a very bad example. If Uwatec and Suunto both 
support a heartrate with the same beats per minute format, then that's 
already a first indication that it should probably be supported through 
the main interface, and not as a vendor extension. But okay, this was 
just a theoretical example.

Anyway, I see the vendor extensions as something to support things that 
are really *highly vendor specific*. Take for example the oceanics. They 
have a fixed 8 bytes of data per sample. Each model stores roughly the 
same set of information (o2bg, n2bg, pressure, depth, vari, dtr, atr, 
deco, temperature, alarms, etc), but each model packs this data slightly 
different into those 8 bytes. (This is a real pain in the ass, and the 
reason why the oceanic code needs by far the most bugfixes.) Here we 
could support the basic fields (time, depth, temperature, etc) through 
the standard interface, and leave everything else as raw data. But as a 
vendor extension, we could take care of the crazy oceanic data packing 
internally, and present the data in some nicer struct:

struct dc_oceanic_sample_t {
    unsigned int o2bg, n2bg; /* O2 and N2 bar graph */
    unsigned int dtr, atr; /* Dive and air time remaining */
    ...
};

This type of extension is extremely unlikely to be shared between 
different backends. So I think that one of the reason we have a 
different opinion is that we also have a different view on what we 
consider to be a vendor extension.

I'm also convinced that there will probably very few applications that 
will actually make use of the vendor extensions. If you look at the 
general purpose logbook applications that exist today (divinglog, 
macdive, etc), then you'll notice they only support the device 
independent features that would be supported by the standard interface. 
I don't see how you can even write an application that supports all 
possible features of each device, considering the vast amount of 
differences. The only applications that support these features are the 
vendor applications, which are by definition tied to some specific 
devices. But such applications are not really our (main) target 
audience.

BTW, these vendor extensions have the lowest possible priority for me. 
There will be plenty of work to get the standard interface implemented 
and reverse engineering bugs to fix. So the first version(s) will likely 
support no vendor extensions at all. However while this is pretty low on 
my priority list, I think it's good to keep future extensibility in 
mind.

>> We can maybe consider to provide language bindings for the most 
>> popular
>> languages (.net and python) ourselves?
>
> Not a bad idea - but what is a bad idea is to then go for the least
> common denominator and create an ugly, badly implemented C API just
> because it's harder to map a well designed API into .net constructs.
>
> You can always have the .net binding create its own data structures 
> that
> follow the preferred way to do things in .net - but don't force that
> onto the C API...

The reason why I mentioned maintaining the language bindings as part of 
the project is that, if there is some ugly and obscure construct 
required, then we can take care of that ourselves, and hide it from the 
user. Then we don't have to worry anymore about users having trouble 
with the .NET integration.

Unfortunately I speak from experience here. I spend considerable 
amounts of time on the .NET interface for the current api (for 
divinglog). There were several non-trivial issues (calling conventions, 
interactions with the garbage collector, manual marshaling, etc) that 
required quite some research on my side to figure out why things were 
not working as expected. I'm pretty sure the average .NET developer 
wouldn't be able to get this right either.

So I'm not saying we should model after the least common denominator, 
just that we have to take care that whatever api we pick, it should be 
possible to use from other languages. That's why I referred to the 
varargs functions for example.

>> Anyway, I think it's very important to not treat these non C/C++
>> applications as second class citizens, especially because at least 
>> one
>> of them (divinglog) is likely the application with the largest 
>> userbase.
>> So that means all features should be usable from the languages 
>> bindings
>> too.
>
> Certainly. But since these apps are binary-only they live in a 
> somewhat
> different world where they can simply take a random snapshop of
> libdivecomputer, implement all what they want from the API and not 
> worry
> about libdivecomputer changing (or the user picking a different 
> version
> of libdivecomputer). For many other applications life isn't that 
> simple.

The rules for binary compatibility are the same for C and .NET 
applications. If your application depends on symbol X, and you switch 
back to an older library without this symbol, than both the C and .NET 
app will fail. They may fail in some different way, but that doesn't 
matter for this discussion.

> Sure, I could fork libdivecomputer and simply pull whatever version I
> want to rely on into the Subsurface git and tell people to just build
> from that version - but that's not my vision of how software 
> engineering
> across projects should work - I wouldn't want to do that for libxml2,
> glib, gtk, libc... so why should I have to do that for 
> libdivecomputer.
>
> So fundamentally:
>
> If your API does not allow a an application a sane way to detect 
> which
> features are supported and how to conditionally compile for those
> features, then your API is broken.

Sure, that's definitely not what we want :-)

I just try to understand why you dislike the DC_VERSION_CHECK and 
prefer the DC_SAMPLE_FOO conditional compilation. In the end, regardless 
of how you do the conditional compilation, your application will need to 
know the data format associated with each vendor extension. That is also 
true whether you use an individual function for each extension, or one 
function with some type indicator. I mean everything that you wrote 
about the gyro stuff below can be done with both your and mine proposal.

>> I guess that once we have decided on the final list of dive and 
>> sample
>> data that we will support, the chances that we'll have to introduce
>> additional samples in the future are relative small. I mean the goal 
>> for
>> the v1.0 was to maintain backwards compatibility from then on. If we
>> then document the first version number in which every feature 
>> appeared,
>> then it shouldn't be that much of a hassle to do:
>>
>> #if DC_VERSION_CHECK(x,y,z)
>>
>> #endif
>>
>> and then language bindings can do the same at runtime.
>
> Certainly. But think twice how your API will look like. How many
> functions will you have? What data types do they return? And how are
> they extended to provide more data as that becomes available from 
> dive
> computers (think details about the deco model, saturation of 
> different
> compartments, gyro information about vertical and lateral movement,
> pictures that were taken during that sample, voice recordings of the
> conversation during the dive... I know you think I'm insane, but look 
> at
> what the Suunto Solution provided and what you can get from a state 
> of
> the art DC like the OSTC or the Uemis SDA... to assume that the speed 
> of
> innovation will suddenly come to a halt is being really foolish).
>
> That's why I really wasn't joking with my extreme example above. Have
> the smallest possible API with the most flexible data types that you 
> can
> think of. In my model it is easy and transparent to add
>
> #define DC_VSAMPLE_GYRO 0xDEAD
>
> and then have the data pointer point to a
>
> struct dc_gyro_t {
> 	int flags;
> 	int delta_north; // in mm
> 	int delta_west;  // in mm
> 	int accuracy;    // in mm
> };
>
> which allows you to track the movement of the diver. And all the
> application does is
>
> #if defined(DC_VSAMPLE_GYRO)
> 	...
> #endif
>
> in the switch statement that loops over all the vendor samples. 
> Clean,
> easy to maintain, easy to handle from an application point of view.
>
> Now imagine the same thing with an API that only supports per vendor
> data without complex data structures. So now you are adding the
> following entry points to your API:
>
> dc_atomics_gyro_get_delta_north()
> dc_atomics_gyro_get_delta_west()
> dc_atomics_gyro_get_delta_accuracy()
>
> dc_uemis_gyro_get_delta_north()
> dc_uemis_gyro_get_delta_west()
> dc_uemis_gyro_get_delta_accuracy()
>
> etc. And imagine the code flow in the application that has to 
> duplicate
> all this crap.

Oh, but that was definitely not my intention. There is absolutely no 
reason to limit a vendor extension to just a single value. It's 
perfectly acceptable to use a struct like your dc_gyro_t.

>> To be honest, I don't really like this. This tastes again as 
>> adapting
>> our model to that of the dive computers. If we supply three values, 
>> then
>> it breaks again if there is some dive computer out there that 
>> supports
>> some fourth value. And with those flags, the semantics can be 
>> altered in
>> many different ways. I fear this will quickly get messy on the
>> application side.
>
> Quite the contrary. This allows the application to actually 
> understand
> the data it gets in a dive computer INDEPENDENT way. If all the app
> cares for is "give me some temperature" then it ignores the flags. If
> the application would like to be able to plot the temperature where 
> it
> was taken (even on the dive computer that may only record the
> temperature at the deepest point of the plot or the minimum temp or
> something) then it can do so in a sane way.
>
> What you are proposing is to simply pretend that all dive computers 
> are
> the same and they give us a "temperature". And if the application 
> wants
> to know what this temperature actually means, tough shit, write DC
> specific code.
>
> [...]
>
>> > I'll keep singing the same song. Pick one semantic as default and
>> > provide clearly specified flags to modify that.
>>
> At this point I might as well abandon using libdivecomputer (or use
> libdivecomputer to do the downloading of raw data) and write the rest 
> in
> my application - that way at least I know what I'm getting.
>
> Do you see my point? I don't want to adapt libdivecomputer to ONE
> SPECIFIC divecomputer. I want libdivecomputer to be designed so that 
> it
> can give me accurate and well defined information for EVERY 
> divecomputer
> it supports. That's why I want the flags. That's why I want the
> flexibility.
>
> You seem to want to have an API which provides the information as it
> would come from some idealized divecomputer with a fixed feature set 
> and
> ill defined semantics and then let the application deal with the 
> rest.
> I want the API to be able to provide us the maximum amount of data 
> with
> the best possible semantic definition for each and every divecomputer
> out there - and not dumb it down to some least common denominator or
> some middle ground.
>
> [...]
>
>> So it might be more interesting to have the begin/end temperature, 
>> and
>> the temperature at maximum depth delivered in the sample data. We
>> already do this today for the suunto vyper backend, and I think it 
>> works
>> quite well. Instead of temperature graph, you now draw a few 
>> temperature
>> dots. If we do that, we can then reserve the temperature field
>> exclusively for the true single temperature value case.
>
> And again, you take a specific computer and try to make the model 
> match
> that computer.
>
> I try to provide an API that allows every computer to provide as much
> data as they have. And then have the app decide which information it
> wants to display and how it wants to display it.
>
> So if your computer gives you a full temperature graph on every 
> sample,
> GREAT. If the computer doesn't give sample temperature at all but 
> only a
> beginning and min temperature - provide that instead. And if it is 
> end
> temperature and temperature at maximum depth, then provide that AND
> DON'T PRETEND THEY ARE BEGINNING TEMPERATURE AND MINIMUM TEMPERATURE.
> Provide them as what they are and let the app decide what to do with
> them.

I'm not trying to make the model match one specific dive computer. 
Placing the different temperatures values on the profile would be 
completely independent of any dive computer model for example. It can be 
done for every model. Imagine there is some crazy dive computer that 
gives you temperature at the decostops. That would still work!

Let me go back to the temperature problem, and formulate it in a 
different way. At the moment we already have five possible 
interpretations (begin/end, min/max/avg, and at maximum depth). A device 
can support one or more of them, in all possible combinations. Let's say 
I agree with you and we want to preserve the exact semantics. How should 
the api and data structure look like? Just a single temperature with a 
flag won't work for devices that have two values. If we extend the data 
structure with some more temperature fields (and flags), then it will 
break again if there is another device with one more value. How would 
you solve this problem?

So I'm not saying I'm against trying to preserve the exact temperature 
semantics. What I don't like are the flags you proposed like 
MIN_TEMP_IS_AT_MAXDEPTH, because it alters the interpretation of the 
min_temperature field. I'm not convinced that's the right solution. So 
maybe we should think again and try to come up with a third solution. 
I'll start by providing a couple of ideas that crossed my mind:

I already mentioned placing the different temperature values into the 
samples. We could also support each temperature variant as a different 
type: DC_FIELD_TEMPERATURE_{MIN,MAX,...}. When a particular variant 
isn't available we don't set the corresponding bit (or return 
UNSUPPORTED). Or we define a structure with all the variants we wish to 
support, along with a bitmap to tell which fields are present (or when 
using Kelvins, we could consider a value of 0 as not present):

struct dc_field_temperature_t {
    bitmap;
    minimum;
    maximum;
    ...
}

>> But then you introduce extra complexity on the application side. If 
>> you
>> get an CNS value, then you have to check all those flags to figure 
>> out
>> whether it's a percentage (default), or some scale index, or that 
>> weird
>> suunto OLF value, or...
>>
>> You can't just take the value anymore and treat is as a percentage.
>>  From my point of view that makes the abstraction provided by
>> libdivecomputer less useful. I mean how different is that from doing 
>> a
>> model based interpretation:
>>
>> if (suunto)
>>     /* OLF */
>> else if (oceanic)
>>     /* Range index */
>> else
>>     /* Percentage */
>>
>> I think we are better off always providing a percentage, and if that 
>> is
>> not available then you don't get a CNS value at all.
>
> No, Jef, that is NOT being better off. That is better off assuming 
> all
> you care about are the people who have a computer that follows the 
> model
> that you pick and everyone else can go screw themselves. We are 
> better
> off if we provide the app as much information as is there.
>
> And if application writers are as lazy as you seem to assume they are
> more than welcome to say
>
> get the CNS
> if (flag & CNS_IS_PERCENTAGE)
> 	show cns value
> else
> 	tell user to go fuck themselves and buy my dive computer
>
> Subsurface would rather be able to figure out a way to display the 
> CNS
> regardless how the dive computer manufacturer decides to encode it - 
> and
> figure out new ways to do this if new dive computers with a better 
> (or
> even only different) model show up.

CNS values are defined as percentages, just like depths are defined as 
lengths. The fact that some vendors choose to store it differently (to 
save some bits?) doesn't change that. Just like we convert all depths to 
a standard unit, we would do the same for CNS. If that's not possible 
because it's some vendor specific concept like OLF, then we can't do the 
conversion and we act as if the device doesn't support CNS. If you allow 
the CNS value to contain different types of values (e.g. CNS or OLF 
depending on the flag), then why not also leave the depth in imperial or 
metric units, and provide a flag with the units? Because according to me 
that's exactly the same as what you propose here for CNS. (I know CNS is 
just an example here, but the same logic applies to other values too of 
course.)

 From my point of view, one of the main advantages of using 
libdivecomputer is that it provides an abstraction. Call that 
abstraction an idealized dive computer if you like. But as is the case 
with all abstractions, you can't support every single feature that's out 
there. So yes, in some cases you will end up with a more limited subset 
than what the device supports. I believe that's a trade-off you have to 
make.

>> I agree, although not as strong as you do.
>>
>> Let's say we define the meaning of the temperature as the minimum 
>> water
>> temperature, which is probably the most common value. But then you
>> happen to have some dive computer which only provides the average 
>> water
>> temperature. Since that's all we have, that's what we give to the
>> application and goes into your logbook. Is the small difference in
>> semantics really that important, considering we don't have any other
>> info? I mean to the user it won't make that much difference, because 
>> the
>> semantics will remain consistent across all his dives with that 
>> device.
>> In a paper logbook you would also just write down the value that you 
>> get
>> from your dive computer.
>
> Again, you assume that the application is lazy. If what 
> libdivecomputer
> does is "heck, it's the only temperature I get, let's claim it's the
> minimum" then all the application can do is show the minimum 
> temperature
> and if my computer is picking the temperature at maximum depth and I 
> am
> diving in Hoodsport in the winter and the computer give 10C as
> temperature at maximum depth but lazy ass app then claims "minimum
> temperature was 10C then I will think that the application author is 
> on
> crack while in reality it is libdivecomputer that is giving me
> information that is WRONG and therefore worthless.
>
> Why is this so hard to understand.
>
> If the other application authors really are that lazy, fine, they can
> ignore the flags that correctly specify what type of information is
> provided. Or if there is no sane way to present the information that 
> is
> provided (heck, what DO those Suunto OLF values mean? beats me...)

Suunto OLF (Oxygen Limit Fraction) is the maximum value of CNS and OTU, 
both expressed as percentages. So it's either CNS or OTU, but you can't 
tell which one.

> But that is a decision you should LEAVE TO THE APP and not make in
> libdivecomputer.
>
> From what you are writing I think that you want an API that 
> simplifies
> things to a degree that destroys information. And I am vehemently
> opposed to that.

While I really understand your concern regarding the temperature 
problem (and we can probably find some way around it), I really don't 
get the CNS one. For the temperature, ignoring the flag will still yield 
a temperature. The semantics may be a bit wrong, but it's still a 
temperature value. However for the CNS value you would get something 
completely different! So this is not about applications being lazy, but 
about providing an abstract model with standardized units. By not 
supporting certain vendor specific concepts (e.g. OLF) we are actually 
preserving that information, albeit in the form of raw data or a vendor 
extension.

>> So while I agree that a crystal clear semantics would be ideal, I 
>> also
>> believe some small deviations are acceptable. You can also put it in
>> another way: how can we even be sure about the actual semantics 
>> given
>> the fact that it's all based on reverse engineering, so there is 
>> always
>> an uncertainty factor present.
>
> It is less and less based on reverse engineering as more and more 
> dive
> computer vendors seem willing to give us API information. And I think
> this trend will only continue (of the ones that I have contacted, 
> only
> one so far has told me "no, we will not provide that").

I don't know which vendors you contacted, but as far as I know the "big 
players" like Suunto, Uwatec, Mares, etc have no interest in supporting 
third-party applications. Only the relative new players on the market 
are supporting us, like Reefnet, HW, Atomics, Shearwater, etc. No matter 
how great those few vendors are, the majority of the divers are still 
using a device from one of the big players.

>> I think it gets a lot more dramatically if an application interprets 
>> a
>> value incorrect because it fails to take into account one of those 
>> flags
>> correctly. For example interpreting the oceanic CNS range index as a
>> percentage, which means it will never go past 5% (or some other 
>> small
>> value). Ouch.
>
> So you are saying you are doing a better job by not giving any data 
> at
> all in that case, instead of providing the data with a flag that 
> defines
> what it means?

Yes. See above for my reason behind it. If it doesn't fit into the 
abstraction provided by libdivecomputer, then it's likely a vendor 
specific feature, and should be treated likewise.

Jef





More information about the Devel mailing list