Hello guys
I am facing certain errors while trying to use simulator with universal script. I have taken the following steps.
1. I git cloned the libdivecomputer repository. Then I used these commands to compile and build. $ autoreconf --install $ ./configure --enable-pty --disable-shared $ make $ sudo make install
2. Then I used socat to make virtual serial connections. I confirmed that files ttyS0 and ttyS1 are present in /tmp folder $ socat PTY,link=/tmp/ttyS0 PTY,link=/tmp/ttyS1
3. I attached binary of Suunto Gekko to /tmp/ttyS0 $ ./simulator-linux -b vyper -p /tmp/ttyS0 -l simulator.logs gekko.linus.bin
CONFIGURATION: backend=3 devname=/tmp/ttyS0 devtime=0 systime=0 model=0 filename=gekko.linus.bin size=8192 header=gekko.linus.bin.header, size=0 systime=gekko.linus.bin.systime, ticks=0 lines=0 0 is_spyder=0
4. I tried to download the dive data present in this binary by using both subsurface and the universal script. $ ./universal -n "Suunto Gekko" -b vyper -l divedownload.log -d divelogs.xml /tmp/ttyS1
DATETIME 2014-03-30T18:00:39Z (1396202439) VERSION 0.5.0-devel (fa90009c293a9ca1a4eaf49a478b381477695c0a) Opening the device (Suunto Gekko, /tmp/ttyS1). ERROR: Invalid argument (22) [in ../../source/src/serial_posix.c:415
(serial_configure)]
ERROR: Failed to set the terminal attributes. [in
../../source/src/suunto_vyper.c:121 (suunto_vyper_device_open)]
../../source/examples/universal.c:681: Error opening device. Result: Input/output error
It used to work earlier. I don't understand what went wrong. I also tried switching to the release-0.4 branch. But the error persists.
Can anyone please help?
*Venkatesh Shukla B. Tech ( Electrical Engineering )III YearIndian Institute of TechnologyBanaras Hindu UniversityPh No. +91 8960 579 122* *Email: venkatesh.shukla.eee11@iitbhu.ac.in venkatesh.shukla.eee11@iitbhu.ac.in*
On Sun, Mar 30, 2014 at 11:17 AM, Venkatesh Shukla IIT BHU venkatesh.shukla.eee11@iitbhu.ac.in wrote:
It used to work earlier. I don't understand what went wrong. I also tried switching to the release-0.4 branch. But the error persists.
Hmm. I wonder what glibc does for that "tcsetattr()". I'm thinking that the baud-setting makes it do something that the pty's don't like.
Can you please do an strace on the universal binary using something like
strace -o trace -f -TTtt ..cmdline-here..
to capture the actual system calls and what exactly fails.
Linus
On Sun, 2014-03-30 at 11:51 -0700, Linus Torvalds wrote:
On Sun, Mar 30, 2014 at 11:17 AM, Venkatesh Shukla IIT BHU venkatesh.shukla.eee11@iitbhu.ac.in wrote:
It used to work earlier. I don't understand what went wrong. I also tried switching to the release-0.4 branch. But the error persists.
Hmm. I wonder what glibc does for that "tcsetattr()". I'm thinking that the baud-setting makes it do something that the pty's don't like.
Can you please do an strace on the universal binary using something like
strace -o trace -f -TTtt ..cmdline-here..
to capture the actual system calls and what exactly fails.
I ran into the same situation on Fedora 20 just recently but ran out of time trying to debug it.
Linus, have you tried if the simulator works for you at this point? You might be able to reproduce this on your system :-)
/D
On Mon, Mar 31, 2014 at 12:21 AM, Linus Torvalds < torvalds@linux-foundation.org> wrote:
On Sun, Mar 30, 2014 at 11:17 AM, Venkatesh Shukla IIT BHU venkatesh.shukla.eee11@iitbhu.ac.in wrote:
It used to work earlier. I don't understand what went wrong. I also tried switching to the release-0.4 branch. But the error persists.
Hmm. I wonder what glibc does for that "tcsetattr()". I'm thinking that the baud-setting makes it do something that the pty's don't like.
Can you please do an strace on the universal binary using something like
strace -o trace -f -TTtt ..cmdline-here..
to capture the actual system calls and what exactly fails.
Linus
I have done as you asked. The output is here : http://pastebin.com/E50ayiSs
I have recently shifted to Fedora 20. It worked in Ubuntu earlier.
On Sun, Mar 30, 2014 at 12:14 PM, Venkatesh Shukla IIT BHU venkatesh.shukla.eee11@iitbhu.ac.in wrote:
I have done as you asked. The output is here : http://pastebin.com/E50ayiSs
I have recently shifted to Fedora 20. It worked in Ubuntu earlier.
It does indeed look like some glibc idiocy. Sometimes I wonder what the f*ck is wrong with libc people.
The kernel didn't actually complain at all, and what seems to happen is that glibc tries to be "helpful", and after it has done the TCSETC it does a TCGETC and compares the result. And notices that the PARENB
And it looks like it notices that it tried to *set* c_cflags to 0xfbb but then it reads back c_cflags as 0xebb.
The difference is octal 0400, which is PARENB. Because under Linux, pty's always do
tty->termios.c_cflag &= ~(CSIZE | PARENB); tty->termios.c_cflag |= (CS8 | CREAD);
and glibc tries to be "helpful" and notice that some of the settings don't stick. For no good reason.
Now, I'm not saying that the kernel necessarily has to do the above for pty's (it could just ignore the bits, and not set them to what pty's actually do), but Linux has always done the above, so it's some idiotic change in glibc that now causes problems.
Don't ask me why the glibc people decided to "improve" on tcsetattr() this way.
Funnily enough, I do *not* find this in the glibc sources for tcsetattr (sysdeps/unix/sysv/linux/tcsetattr.c):
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/tcs...
so I wonder if this is some Fedora "feature".
Linus
On Sun, 2014-03-30 at 12:52 -0700, Linus Torvalds wrote:
On Sun, Mar 30, 2014 at 12:14 PM, Venkatesh Shukla IIT BHU venkatesh.shukla.eee11@iitbhu.ac.in wrote: The kernel didn't actually complain at all, and what seems to happen is that glibc tries to be "helpful", and after it has done the TCSETC it does a TCGETC and compares the result. And notices that the PARENB
And it looks like it notices that it tried to *set* c_cflags to 0xfbb but then it reads back c_cflags as 0xebb.
The difference is octal 0400, which is PARENB. Because under Linux, pty's always do
tty->termios.c_cflag &= ~(CSIZE | PARENB); tty->termios.c_cflag |= (CS8 | CREAD);
and glibc tries to be "helpful" and notice that some of the settings don't stick. For no good reason.
Now, I'm not saying that the kernel necessarily has to do the above for pty's (it could just ignore the bits, and not set them to what pty's actually do), but Linux has always done the above, so it's some idiotic change in glibc that now causes problems.
Don't ask me why the glibc people decided to "improve" on tcsetattr() this way.
Funnily enough, I do *not* find this in the glibc sources for tcsetattr (sysdeps/unix/sysv/linux/tcsetattr.c):
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/tcs...
so I wonder if this is some Fedora "feature".
Hmm, that would be even more sad. What do you recommend as workaround? I don't see anything wrong with what the kernel does and while I don't know why glibc would be so 'helpful', I don't have enough data to argue one way or another.
Should libdivecomputer ignore the error when on a PTY?
/D
On Sun, Mar 30, 2014 at 12:56 PM, Dirk Hohndel dirk@hohndel.org wrote:
Should libdivecomputer ignore the error when on a PTY?
Yeah, I don't see any better approach.
In fact, I'd suggest that libdivecomputer get rid of all those checks entirely, and get rid of the whole "ENABLE_PTY" thing. Maybe the errors can be logged to the log-file, but they should be considered at most warnings, not errors.
Some settings just don't make sense for all serial lines. Maybe they are pty's. Maybe there is some glibc idiocy. Maybe they are some emulated other thing that doesn't do a complete job, but works well enough otherwise. Who cares? Making the errors fatal just make things *worse*.
And hiding them under ENABLE_PTY doesn't really help anything, just adds configuration complexity and makes the code uglier.
Linus
On 2014-03-30 22:14, Linus Torvalds wrote:
On Sun, Mar 30, 2014 at 12:56 PM, Dirk Hohndel dirk@hohndel.org wrote:
Should libdivecomputer ignore the error when on a PTY?
Yeah, I don't see any better approach.
In fact, I'd suggest that libdivecomputer get rid of all those checks entirely, and get rid of the whole "ENABLE_PTY" thing. Maybe the errors can be logged to the log-file, but they should be considered at most warnings, not errors.
Some settings just don't make sense for all serial lines. Maybe they are pty's. Maybe there is some glibc idiocy. Maybe they are some emulated other thing that doesn't do a complete job, but works well enough otherwise. Who cares? Making the errors fatal just make things *worse*.
And hiding them under ENABLE_PTY doesn't really help anything, just adds configuration complexity and makes the code uglier.
To be honest, I don't feel comfortable removing the error checking for the syscalls, and not just because omitting error checking is usually the wrong thing to do.
First of all, this only seems to be a problem for pty's and not the real thing. The use of pty's is basically a hack to make the simulator work. For that purpose I'm happy with the --enable-pty option. Only a libdivecomputer developer will ever need it.
For real serial ports, the settings that are used by libdivecomputer should always be supported by the underlying hardware. If some communication protocol requires for example RTS/DTR lines, then it makes no sense to use a piece of hardware that doesn't support that. That can never work, and returning an error is perfectly acceptable if you ask me.
But what worries me a lot more is that trying to communicate with a real device using the wrong settings may actually harm the device. This has already happened in the past with the Mares Icon HD and Matrix. Both use the same protocol, but trying to talk to a Matrix with the Icon HD baudrate will cause the device to lock up completely. Luckily there was no permanent damage and the device could be revived by removing the batteries. But that's pretty scary and something I want to avoid at all cost. This Mares problem is now gone, but that doesn't mean it can't happen for others.
So I'm not convinced ignoring the return value of a syscall, and hoping for the best, is the right solution here. For pty's I'm fine with that, but not for real serial ports.
BTW, the reason why we have the --enable-pty option, and not some autodetection at runtime, is that I couldn't find an easy way to tell the difference between a pty and a real serial port. No doubt it can be done some way (e.g. the major number for pty's?), but I need something that works on other unix like systems too.
Jef
On Mon, 2014-03-31 at 10:36 +0200, Jef Driesen wrote:
On 2014-03-30 22:14, Linus Torvalds wrote:
On Sun, Mar 30, 2014 at 12:56 PM, Dirk Hohndel dirk@hohndel.org wrote:
Should libdivecomputer ignore the error when on a PTY?
Yeah, I don't see any better approach.
In fact, I'd suggest that libdivecomputer get rid of all those checks entirely, and get rid of the whole "ENABLE_PTY" thing. Maybe the errors can be logged to the log-file, but they should be considered at most warnings, not errors.
Some settings just don't make sense for all serial lines. Maybe they are pty's. Maybe there is some glibc idiocy. Maybe they are some emulated other thing that doesn't do a complete job, but works well enough otherwise. Who cares? Making the errors fatal just make things *worse*.
And hiding them under ENABLE_PTY doesn't really help anything, just adds configuration complexity and makes the code uglier.
To be honest, I don't feel comfortable removing the error checking for the syscalls, and not just because omitting error checking is usually the wrong thing to do.
But instead of making the error fatal, how about you just log it and try to keep going, anyway?
Think of it this way: "what's the worst that could happen?" a) the error is real, communication will fail (same result as today) b) the error is bogus, communication will succeed (winning!)
First of all, this only seems to be a problem for pty's and not the real thing. The use of pty's is basically a hack to make the simulator work. For that purpose I'm happy with the --enable-pty option. Only a libdivecomputer developer will ever need it.
That assumes there are no issues with the libraries used or the kernel used and the specific serial device.
For real serial ports, the settings that are used by libdivecomputer should always be supported by the underlying hardware. If some communication protocol requires for example RTS/DTR lines, then it makes no sense to use a piece of hardware that doesn't support that. That can never work, and returning an error is perfectly acceptable if you ask me.
Printing an error is, I agree. Forcefully aborting? I think we'll have to agree to disagree here.
But what worries me a lot more is that trying to communicate with a real device using the wrong settings may actually harm the device. This has already happened in the past with the Mares Icon HD and Matrix. Both use the same protocol, but trying to talk to a Matrix with the Icon HD baudrate will cause the device to lock up completely. Luckily there was no permanent damage and the device could be revived by removing the batteries. But that's pretty scary and something I want to avoid at all cost. This Mares problem is now gone, but that doesn't mean it can't happen for others.
But that wasn't a problem of RTS/DTR or other serial parameters. That was a USB communication issue. And IIRC this wouldn't have been caught by an error return, either.
So I'm not convinced ignoring the return value of a syscall, and hoping for the best, is the right solution here. For pty's I'm fine with that, but not for real serial ports.
It's your library... I think it makes much more sense to try to keep going and fail if the responses of actual communication make no sense, not if something odd happens during the setup of the communication.
BTW, the reason why we have the --enable-pty option, and not some autodetection at runtime, is that I couldn't find an easy way to tell the difference between a pty and a real serial port. No doubt it can be done some way (e.g. the major number for pty's?), but I need something that works on other unix like systems too.
I'm not aware of an easy way to do that, either. Linus?
/D
On Mon, Mar 31, 2014 at 7:38 AM, Dirk Hohndel dirk@hohndel.org wrote:
BTW, the reason why we have the --enable-pty option, and not some autodetection at runtime, is that I couldn't find an easy way to tell the difference between a pty and a real serial port. No doubt it can be done some way (e.g. the major number for pty's?), but I need something that works on other unix like systems too.
I'm not aware of an easy way to do that, either. Linus?
There is no portable way to do it. Looking at the major number is indeed the only way, but even under Linux there are multiple different major numbers (legacy pty numbers vs modern ones: major number 3 vs 136-143), and they don't match between unixes anyway.
I'm sure there are heuristics to guess that work well enough in practice. Like just looking at the name you used to open things (but you'd need to follow symlinks etc). But no, nothing simple to say "this is a pseudo-tty vs a real serial line"
Linus
On 2014-03-31 16:38, Dirk Hohndel wrote:
On Mon, 2014-03-31 at 10:36 +0200, Jef Driesen wrote:
To be honest, I don't feel comfortable removing the error checking for the syscalls, and not just because omitting error checking is usually the wrong thing to do.
But instead of making the error fatal, how about you just log it and try to keep going, anyway?
Think of it this way: "what's the worst that could happen?" a) the error is real, communication will fail (same result as today) b) the error is bogus, communication will succeed (winning!)
I get your point, but please read on...
First of all, this only seems to be a problem for pty's and not the real thing. The use of pty's is basically a hack to make the simulator work. For that purpose I'm happy with the --enable-pty option. Only a libdivecomputer developer will ever need it.
That assumes there are no issues with the libraries used or the kernel used and the specific serial device.
True, but I think that's a reasonable assumption. If we can no longer rely on the kernel or device drivers to get things right, then I think we're in far more trouble. Of course there can and will be bugs (as we just discovered the hard way), but those are supposed to get fixed, right?
Linus' mail to the fedora/redhat developers made that pretty clear I think.
But what worries me a lot more is that trying to communicate with a real device using the wrong settings may actually harm the device. This has already happened in the past with the Mares Icon HD and Matrix. Both use the same protocol, but trying to talk to a Matrix with the Icon HD baudrate will cause the device to lock up completely. Luckily there was no permanent damage and the device could be revived by removing the batteries. But that's pretty scary and something I want to avoid at all cost. This Mares problem is now gone, but that doesn't mean it can't happen for others.
But that wasn't a problem of RTS/DTR or other serial parameters. That was a USB communication issue. And IIRC this wouldn't have been caught by an error return, either.
Yes and no.
The scenario where this particular problem happened, was that a user accidentally selected the Icon HD protocol variant to connect to a Matrix dive computer. That happens to be the default choice when using the libdivecomputer universal application with just the -b iconhd option. To select the Matrix variant you need the extra -m 0x0F option (or use the -n "Mares Matrix" option).
The key difference between the two protocol variants was the different baudrate. You are right that trying to set the wrong baudrate did not cause the tcsetattr call to fail. That's because the usb-serial driver/chip happens to support that baudrate too. But, assume for a moment we ignore tcsetattr errors, and try to proceed. How do we know the baudrate was changed correctly? We simply don't know. It could have failed before it even tried to change the baudrate. The only case where we do know, is when it returns without errors. The result is that trying to proceeding anyway after an error, may put the device at risk!
Note that this Mares issue is gone now, because we are no longer using different baudrates, but that's irrelevant for this discussion.
So I'm not convinced ignoring the return value of a syscall, and hoping for the best, is the right solution here. For pty's I'm fine with that, but not for real serial ports.
It's your library... I think it makes much more sense to try to keep going and fail if the responses of actual communication make no sense, not if something odd happens during the setup of the communication.
I'm aware that in practice this risk will probably be very small. But if I have to choose between keeping the strict error checking, and removing it as a workaround for a bug in one particular linux distribution (and which only affects developers but not end users!), then that's an easy choice for me.
For as long I have been working on the libdivecomputer project, I've always been very cautious and never caused any damage to any dive computer, and I would like to continue that trend. I can assure you that the incident with the Mares Matrix was pretty scary, and I never want to go through that again. If that causes some minor inconvenience here and there, then that's a price I'm willing to pay. I hope you'll understand that.
Note that I absolutely don't mind enabling the pty "hack" on the tcsetattr call too. I think that should already be sufficient to work around this glibc bug in fedora/redhat.
Jef
Yes indeed, this seems to be a Fedora "feature".
There's a Fedora-specific insane patch:
glibc-fedora-linux-tcsetattr.patch
that explicitly adds that completely insane check for no good reason, and has this moronic comment:
+ /* The Linux kernel has a bug which silently ignore the invalid + c_cflag on pty. We have to check it here. */
because the kernel is *right* to silently ignore bits like that that don't make sense, and it looks like the idiotic Fedora patch actively tries to break this whole concept.
Adding Jeff and Jakub to the participants list, because I can't make heads or tails of the Fedora glibc package history, because it seems to be very messy. The patch seems to have come into the fedora glibc package git tree in commit c7aa529bd29880da55414efcf317319c8dd790e0, which says
Author: Jakub Jelinek jakub@fedoraproject.org Date: Sat Sep 25 08:23:12 2004 +0000
auto-import glibc-2.3.3-57 on branch devel from glibc-2.3.3-57.src.rpm
but the actual change is not listed in the ChangeLog, so I really don't know who to blame.
Jeff, Jakub: the whole point is to allow people to use pty's instead of "real" tty's, so erroring out when you set bits that don't make sense for a pty is _exactly_ the wrong thing to do, and is the reason why the Linux kernel silently ignores those bits.
Your glibc patch (that hasn't been accepted upstream, because apparently saner people existed there) is stupid and pointless. In fact, it makes me think that in order to *avoid* your idiotic patch, the kernel should now stop changing termios, and just silently ignore those bits - which would just hide the fact that the kernel ignores those bits better. But WHY THE F*CK DOES GLIBC CARE? The kernel tries to be *nice* and show that those bits don't make sense for a pty, but glibc then takes that "nice" and screws things up on purpose.
Can you please hunt down whoever did this, and try to figure out why that idiotic patch exists? And perhaps slip some slow-acting and painful poison in their coffee? Or at least make sure they don't reproduce?
Quoting my previous email for history (not cleaned up, I should have done more editing)/
Linus
On Sun, Mar 30, 2014 at 12:52 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
On Sun, Mar 30, 2014 at 12:14 PM, Venkatesh Shukla IIT BHU venkatesh.shukla.eee11@iitbhu.ac.in wrote:
I have done as you asked. The output is here : http://pastebin.com/E50ayiSs
I have recently shifted to Fedora 20. It worked in Ubuntu earlier.
It does indeed look like some glibc idiocy. Sometimes I wonder what the f*ck is wrong with libc people.
The kernel didn't actually complain at all, and what seems to happen is that glibc tries to be "helpful", and after it has done the TCSETC it does a TCGETC and compares the result. And notices that the PARENB
And it looks like it notices that it tried to *set* c_cflags to 0xfbb but then it reads back c_cflags as 0xebb.
The difference is octal 0400, which is PARENB. Because under Linux, pty's always do
tty->termios.c_cflag &= ~(CSIZE | PARENB); tty->termios.c_cflag |= (CS8 | CREAD);
and glibc tries to be "helpful" and notice that some of the settings don't stick. For no good reason.
Now, I'm not saying that the kernel necessarily has to do the above for pty's (it could just ignore the bits, and not set them to what pty's actually do), but Linux has always done the above, so it's some idiotic change in glibc that now causes problems.
Don't ask me why the glibc people decided to "improve" on tcsetattr() this way.
Funnily enough, I do *not* find this in the glibc sources for tcsetattr (sysdeps/unix/sysv/linux/tcsetattr.c):
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/tcs...
so I wonder if this is some Fedora "feature".
Linus
On 2014-03-30 21:52, Linus Torvalds wrote:
On Sun, Mar 30, 2014 at 12:14 PM, Venkatesh Shukla IIT BHU venkatesh.shukla.eee11@iitbhu.ac.in wrote:
I have done as you asked. The output is here : http://pastebin.com/E50ayiSs
I have recently shifted to Fedora 20. It worked in Ubuntu earlier.
It does indeed look like some glibc idiocy. Sometimes I wonder what the f*ck is wrong with libc people.
The kernel didn't actually complain at all, and what seems to happen is that glibc tries to be "helpful", and after it has done the TCSETC it does a TCGETC and compares the result.
I'm also doing this check in libdivecomputer. This is even mentioned in the tcsetattr man pages:
"Note that tcsetattr() returns success if any of the requested changes could be successfully carried out. Therefore, when making multiple changes it may be necessary to follow this call with a further call to tcgetattr() to check that all changes have been performed successfully."
Because libdivecomputer makes multiple changes to the termios structure, I just followed this advice...
I simply don't know this stuff well enough, especially at the time when I wrote that code, to judge whether this kind of check is needed/stupid/whatever.
Jef