Issues with ISCAN serial port recording

Hi all,

I’m having an issue running DatarecordingFromISCANDemo on a new psychophysics rig. I’m able to read signals from the eye-tracker via a serial port input, but there are long pauses (200-1800ms) between some of the recordings (see example output below - a plot of time differences in ms between successive time stamps).

I’m running the command:

DatarecordingFromISCANDemo('/dev/ttyS4',120,[],115200)

I’m sending data through the serial output of an ISCAN ETL-200 eye-tracker, configured to output Pupil.H1 and Pupil.V1 measurements in ASCII format, at baud rate 115200, and sampling rate 120Hz. I’ve tried multiple baud rates from 19200-115200, as well as binary format, and have obtained the same results.

I’m running Psychtoolbox version 3.0.17 - Flavor: Debian package - psychtoolbox-3 (3.0.17.6.dfsg1-1~nd20.04+1), within MATLAB R2021a, on an Ubuntu 20.04.2 LTS desktop. The serial input is received by a Syba SD-PEX15022 PCIe card (https://www.amazon.com/Syba-Serial-RS232-Components-SD-PEX15022/dp/B003D3MFHM) with no driver - it was advertised as working out of the box on Linux.

Other hardware specs:

  • Processor: Intel Core i7-9700
  • Motherboard: Asus Prime Z390-A
  • GPU: AMD Radeon RX 5700 (ASRock Phantom Gaming D)

Any idea as to what might be causing the delays?

Thanks,

Ben Deen
Postdoctoral Fellow, Rockefeller University

DatarecordingBaud115200Ascii

We’ve purchased PTB paid support to address this issue. Authentication token: UYJUG-B7-2021616114631:276caaad5a358565aa107395db188dada3c4347ab14f2727702dea74abf96358

Thanks,

Ben

Hi Ben,

thanks for doing business with us, your key is now activated. So far you used 30 of the 60 minutes of your priority support request, spent by myself reading and thinking about your problem.

Not yet, but lets check:

  1. Set the tracker to stream ASCII data. The expectation of that demo would be to receive 16 Bytes per tracker sample, 14 Bytes with the ASCII data about the eye sample, two terminator bytes of CR + LF to mark the end of the packet. Given that the demo was written without ever testing it on such an eyetracker, due to lack of hardware, and the original requester of this feature did not ever bother to give feedback, there could be a mismatch in expectations between sender and receiver. Or a new software version might send something else, given that that demo is very old by now. The time when the LF terminator byte 10 is received, is what is time-stamped.

  2. For improved timing, make sure the Linux lowlatency kernel is installed: In a terminal, sudo apt install linux-lowlatency, then reboot. If the right kernel was chosen, a uname -r should report “lowlatency” in the name, as should PTB’s Screen('Computer') output in various fields. Not that it is likely to matter for a low sampling rate of 120 Hz, but better safe than sorry, and it may be beneficial for other timing related stuff anyway. Anything improved?

  3. Is the output from the eye-tracker meaningful and what you expect? Post the Matlab output from a subset of the eyetracker samples, maybe also add a temporary disp(pktdata); command around line 256, so we can verify the binary data received, and if the CR+LF is as expected.

  4. The next thing to try would be if this also happens if data is not collected in the background by our processing thread, but by the main Matlab interpreter thread:
    edit DatarecordingFromISCANDemo.m
    Comment out lines 203 and 213, the ones with the IOPort('ConfigureSerialPort', myport, asyncSetup); commands.
    Do the timestamps improve?

  5. Another thing to try would be if Octave gives different results than Matlab, in case this is some interference of Matlab. sudo apt install octave-psychtoolbox-3 + octave or octave --gui in a terminal. Then more off then run the demo again.

-mario

[Time spent so far on this request: 36 Minutes, 24 Minutes left. Auth: UYJUG-B7-2021616114631:276caaad5a358565aa107395db188dada3c4347ab14f2727702dea74abf96358]

Hi Mario,

Thanks for your suggestions.

I was able to install the lowlatency kernel, but when I boot with this kernel, I’m not able to successfully run DatarecordingFromISCANDemo. I get the error:

Error in function OpenSerialPort: 	Usage error
Error opening serial port device /dev/ttyS4 - The serial port is already open, close it first! Device or resource busy(16) [EBUSY EPERM]. Could be a permission problem as well.

This error occurs even when IOPort has not been run prior to the DatarecordingFromISCAN call. I tried running IOPort(‘CloseAll’) first, but this didn’t solve the problem. I also tried rebooting several times, but got the same error.

For the remaining tests, I used the generic kernel. I tried running DatarecordingFromISCAN with background data collection commands (lines 203 and 213) commented, but the timing behavior didn’t change. I also tried running the function in Octave, but again got the same timing behavior.

Below are some example output lines from the function, with disp(pktdata) added to line 256:

ReceiveTime: 1624637732.934553 seconds GetSecs time: Datastring =  -46.8	 -20.8
    32    45    52    54    46    56     9    32    45    50    48    46    56     9    13    10

ReceiveTime: 1624637732.934557 seconds GetSecs time: Datastring =  -47.2	 -21.6
    32    45    52    55    46    50     9    32    45    50    49    46    54     9    13    10

ReceiveTime: 1624637732.934561 seconds GetSecs time: Datastring =  -48.4	 -21.3
    32    45    52    56    46    52     9    32    45    50    49    46    51     9    13    10

ReceiveTime: 1624637732.934565 seconds GetSecs time: Datastring =  -46.5	 -20.6
    32    45    52    54    46    53     9    32    45    50    48    46    54     9    13    10

ReceiveTime: 1624637732.934569 seconds GetSecs time: Datastring =  -50.3	 -21.1
    32    45    53    48    46    51     9    32    45    50    49    46    49     9    13    10

ReceiveTime: 1624637732.934574 seconds GetSecs time: Datastring =  -48.6	 -21.5
    32    45    52    56    46    54     9    32    45    50    49    46    53     9    13    10

ReceiveTime: 1624637732.934578 seconds GetSecs time: Datastring =  -48.7	 -21.4
    32    45    52    56    46    55     9    32    45    50    49    46    52     9    13    10

ReceiveTime: 1624637732.934582 seconds GetSecs time: Datastring =  -45.9	 -21.9
    32    45    52    53    46    57     9    32    45    50    49    46    57     9    13    10

ReceiveTime: 1624637732.934586 seconds GetSecs time: Datastring =  -48.1	 -21.6
    32    45    52    56    46    49     9    32    45    50    49    46    54     9    13    10

ReceiveTime: 1624637732.934591 seconds GetSecs time: Datastring =  -48.9	 -21.1
    32    45    52    56    46    57     9    32    45    50    49    46    49     9    13    10

ReceiveTime: 1624637732.934595 seconds GetSecs time: Datastring =  -48.5	 -21.5
    32    45    52    56    46    53     9    32    45    50    49    46    53     9    13    10

ReceiveTime: 1624637732.934600 seconds GetSecs time: Datastring =  -49.0	 -20.8
    32    45    52    57    46    48     9    32    45    50    48    46    56     9    13    10

ReceiveTime: 1624637732.934604 seconds GetSecs time: Datastring =  -45.4	 -19.6
    32    45    52    53    46    52     9    32    45    49    57    46    54     9    13    10

ReceiveTime: 1624637732.934608 seconds GetSecs time: Datastring =  -45.2	 -19.6
    32    45    52    53    46    50     9    32    45    49    57    46    54     9    13    10

ReceiveTime: 1624637732.934613 seconds GetSecs time: Datastring =  -50.6	 -20.0 -3
    32    45    53    48    46    54     9    32    45    50    48    46    48    32    45    51

ReceiveTime: 1624637732.935390 seconds GetSecs time: Datastring = 5.3	 -18.0
    53    46    51     9    32    45    49    56    46    48     9    13    10     0     0     0

ReceiveTime: 1624637732.944549 seconds GetSecs time: Datastring =  -25.3	 -17.6
    32    45    50    53    46    51     9    32    45    49    55    46    54     9    13    10

ReceiveTime: 1624637733.261033 seconds GetSecs time: Datastring =   -22.4	 -17.0
    32     0    45    50    50    46    52     9    32    45    49    55    46    48     9    13

ReceiveTime: 1624637733.261040 seconds GetSecs time: Datastring = 
    10     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0

ReceiveTime: 1624637733.261041 seconds GetSecs time: Datastring =  -25.3	 -17.5
    32    45    50    53    46    51     9    32    45    49    55    46    53     9    13    10

ReceiveTime: 1624637733.261045 seconds GetSecs time: Datastring =  -22.9	 -17.3
    32    45    50    50    46    57     9    32    45    49    55    46    51     9    13    10

ReceiveTime: 1624637733.261049 seconds GetSecs time: Datastring =  -19.8	 -16.1
    32    45    49    57    46    56     9    32    45    49    54    46    49     9    13    10

ReceiveTime: 1624637733.261053 seconds GetSecs time: Datastring =  -27.7	 -16.6
    32    45    50    55    46    55     9    32    45    49    54    46    54     9    13    10

ReceiveTime: 1624637733.261057 seconds GetSecs time: Datastring =  -24.1	 -17.3
    32    45    50    52    46    49     9    32    45    49    55    46    51     9    13    10

ReceiveTime: 1624637733.261062 seconds GetSecs time: Datastring =  -23.3	 -16.9
    32    45    50    51    46    51     9    32    45    49    54    46    57     9    13    10

ReceiveTime: 1624637733.261080 seconds GetSecs time: Datastring =  -24.3	 -17.1
    32    45    50    52    46    51     9    32    45    49    55    46    49     9    13    10

For most time points, the readout looks good and agrees with the signal coming from the ISCAN. However, it seems that the packets read don’t always consist of the expected set of bytes. For instance, on some lines (which often coincide with the “pauses” in data recording), the last two bytes don’t correspond to CR/LF.

What’s the output of sudo dmesg after such a failure? What kernel version?

Ok, so no Matlab or multithreading problem.

Was this done with the background read disabled? I’d continue with that for the moment to simplify debugging.

Yes, that’s not good. I’ve poured over all the output provided by you and our source code and the ETL-200 manual for hours now, and what is not clear at all is if this is a cause or a symptom. The timestamps suggest that the data read calls do not return the data at 1/120 secs intervals, but a bunch of packets is buffered up somewhere, then received in one go. The missing CR/LF look like missing data, which screws up the expected formatting.

It’s not clear if this is a problem at the sender side or at the receiver side, and the pattern of timing makes no sense. Could be a software or hardware problem on either side.

Do you have a way to add a data field to the stream with a serial number or maybe a video frame counter? The only online spec for ISCAN i can find with a quick google suggests a field named .POR.Frms1… could maybe be abused for this? Do you receive as many samples as you would expect, e.g. 600 for a 5 second run? I wonder if some data loss between sender and receiver would explain occassional missing CR+LF bytes.

Another thing to try would be choosing binary mode for ISCAN and change the maxReadQuantum variable to 6 Bytes per sample, but double-check if that is the right size of a packet with the ETL-200 manual, i think it is?

How confident are you that the tracker is actually sending all data? Do you have a second computer that you could try as receiver?

Can you extend the script to save the output like content of the tpkt variables etc. in a mat file, and use Octave for this instead of Matlab?

What’s the terminal output of sudo dmesg after such a run?

Btw., i’m afraid you will need to buy more priority support. We are now at 210 minutes of work time spent, which is 150 minutes more than actual paid for with a license. We’d need two more licenses just to not lose money with this support request, more for further diagnosis.

I’m currently out of town and won’t be able to continue debugging the rig until next week. That said, I’ll respond the questions that I can answer now.

On the same machine, we have a Windows 10 installation, and we’ve tested receiving the serial input directly in PuTTY on Windows. This gave the correct number of samples, and the data were streamed continuously with no apparent time lags. This suggests that the eye-tracker is sending the correct information, and that the problem is a software issue on the receiver side. I haven’t tried running this test in Linux, but it’s on my todo list.

I’ve also tried using binary mode, with 6 bytes per sample (which is the right number when the serial output is set to send two values, like H and V position). I ran into the same issues with this format - long temporal lags, and a stream of input that didn’t correspond exactly to what I expected. For instance, sometimes the initial two header bytes (value 68) were seperated with a value 0 byte.

Regarding priority support, we would be happy to purchase more support if it will help diagnose the issue. Here are the remaining two authentication tokens from our first license, which should bring us to 180 minutes of covered time:

UYJUG-B7-2021628144742:6d2896bc4c94c2fa7f4fe58c6a64491f3b922e05b20ed79f555a2176a2da17d7

UYJUG-B7-2021628144852:f1b4c946514bea96592691b599f14a9337758b52d12b3d07a2cfc5271f15df99

Re. purchasing more support - If we’ve already purchased a membership, is there a way to buy a few more hours of support on top of this? I don’t see an option like this on psychtoolbox.net.

Hmm, that sounds indeed like it might be a software problem on the receiver side, ie. PTB bug, Linux serial driver configuration issue, or Linux serial driver bug for your specific serial port card.

Running the test on Linux would indeed make sense.

Have you tried how the ISCAN recording demo behaves on Windows-10 in ASCII mode?

Oh, that’s not how it works. One license gives you up to 3 support incidents for a total of up to 1 hour of my time, not 3x one hour each. You’d have to buy two more licenses ie. pay 2 x 150 Euros for us to just break even on this, because i spent way more than 1 hour on your support request.

Currently our support model is not really profitable, because labs don’t use it the way it was intended to be used. The idea was that many labs buy the membership, but only maybe 20% actually use it for priority support. This is a “statistically a donation, but not formally on paper, so it can be paid by lab funds” model that many labs indicated they would be on board with when we surveyed them a while ago. Most of the money would pay for ongoing development and maintenance. The thing is priced at 150 Euros + tax to work under this assumption. So far it is not working, not at this price point, and especially not if i don’t learn to cut off support at exactly 60 minutes without further payment, but my old habits of unconditional helpfulness die slowly…

No, we don’t have such an option yet. I think it would probably make sense to offer bigger packages with multiple hours in them, given that support requests rarely can be handled within 1 hour. Thanks for the feedback.

Ofc. i can’t promise that this issue will get successfully resolved, only that we try further to get to the root of the problem.

Another thing to try, in case this is a driver problem: In the past i had good experiences with USB-to-serial converter dongles that have a FTDI converter chip in them. I seem to remember the FT232BM being a good choice, for serial port data streaming with good timing, but i could misremember the exact model i tested.

-mario

So, we tested our serial port with PuTTY on Linux, and found the same timing issue, suggesting that the issue relates to the way Linux was handling our serial port. We then tested a USB to serial converter, and the timing issue in PTB was resolved - we’re now getting samples at a consistent rate of 120Hz. Still not sure what exactly is wrong with our initial port, but we’re sticking with the USB solution for now.

Sorry, I misunderstood how the licenses work. We’ll purchase two more licenses to cover your effort so far. I don’t think we’ll need more support for now, since we’ve resolved the problem and it doesn’t seem to be a PTB issue. Thanks again for your help!

What exact model of USB-serial adapter did you buy - useful as reference for others?

This is definitely a misconfiguration or limitation of the Linux serial port driver for your specific PCI card.

In terms of misconfiguration, if your PCI board is /dev/ttyS4, then what is the output of:

setserial -a /dev/ttyS4 and of sudo cat /sys/class/tty/ttyS4/rx_trig_bytes ?

A echo 1 | sudo tee /sys/class/tty/ttyS4/rx_trig_bytes would be worth a try, to lower the threshold at which receiver interrupts fire to the minimum of 1 Byte, instead of the potentially higher threshold set by default.

Other than that, looking at the driver source code i couldn’t see anything obviously fishy. However, the commit log says that the driver for your specific model has only been lightly tested for simple use cases, and another peculiarity is that boards like yours with the Moschip MCS 9922 chip and other 99xx series chips do use MSI-X/MSI interrupts, instead of legacy interrupts like almost all other 3rd party pci serial boards from other vendors would do.

Therefore i wonder if something goes wrong there specifically wrt. MSI? Could be that the chip claims it can do MSI interrupts but has some hardware design bugs, and the MS-Windows driver knows that and doesn’t use MSI for it, but the Linux driver does. Such hardware flaws that need to be worked around by the operating system are not uncommon, e.g., the current driver for PCI serial ports lists 92 such special cases for 93 different devices from different manufacturers alone.

Could be interesting to see the sudo dmesg output after a failed attempt of using that PCI board for the eye tracker.

Or it is something else…

Thanks, that is appreciated. Also please spread the word about our 150 Euro community memberships, we need any word of mouth advertising we can get to keep PTB financed and alive.

-mario

The USB serial adapter we’re using is a Sabrent USB 2.0 to Serial (9-Pin) DB-9 RS-232 Adapter Cable (Amazon.com: Sabrent USB 2.0 to Serial (9-Pin) DB-9 RS-232 Adapter Cable 6ft Cable [FTDI Chipset] (CB-FTDI): Computers & Accessories).

Output of setserial -a:

/dev/ttyS4, Line 4, UART: 16650V2, Port: 0x3010, IRQ: 127
	Baud_base: 115200, close_delay: 50, divisor: 0
	closing_wait: 3000
	Flags: spd_normal skip_test

rx_trig_bytes was set to 16 by default. I tried a value of 1, but this didn’t change the behavior.

I’ll send the output of dmesg via Dropbox link when I’m back at my laptop.

Output of dmesg at Dropbox - ISCANOldSerialInputDmesg.txt - Simplify your life

Ok, looking at that output, it could be some problem with MSI interrupts. specifically this
[ 6.323784] genirq: Flags mismatch irq 128. 00000080 (ttyS5) vs. 00000000 (xhci_hcd) [ 6.324128] genirq: Flags mismatch irq 128. 00000080 (ttyS5) vs. 00000000 (xhci_hcd)
looks fishy. The serial card seems to try to share MSI interrupt 128 with the USB host controller driver - unusual by itself, because usually one uses MSI to avoid having to share irqs. The USB controller doesn’t want to share, the serial port driver wants to, and the conflict will potentially mess things up wrt. irq handling. This wanting to share MSI interrupts looks like a potential driver bug to me.

You can try the following to revert to legacy interrupts instead of MSI interrupts, where irq sharing is the norm. The interesting thing is that out of hundreds of supported pci-serial port cards, only 4 models use MSI interrupts, yours being one of the lucky four. So if there is a bug in the serial-pci driver wrt. MSI, your card would be one of the few affected and you got really unlucky. Also this is the kind of bug that has a low probability of getting caught during testing, which would explain why it may have slipped through.

# Disable PCI serial port driver, shut down card:
echo 0000:02:00.0 | sudo tee /sys/bus/pci/drivers/serial/unbind
echo 0000:02:00.1 | sudo tee /sys/bus/pci/drivers/serial/unbind

# Disallow use of MSI/MSI-X interrupts on pci serial port card:
echo 0 | sudo tee /sys/bus/pci/devices/0000:02:00.0/msi_bus
echo 0 | sudo tee /sys/bus/pci/devices/0000:02:00.1/msi_bus

# Restart driver, reinitialize card, hopefully without MSI irqs now:
echo 0000:02:00.0 | sudo tee /sys/bus/pci/drivers/serial/bind
echo 0000:02:00.1 | sudo tee /sys/bus/pci/drivers/serial/bind

# Check log output and interrupt assignments
cat /proc/interrupts
sudo dmesg

Again the output produced by this would be helpful, also again the output of setserial -a.

-mario

This seems to have solved the problem! Thanks.

Log output and interrupts: Dropbox - ISCANOldSerialInputDmesgInterrupts21-7-28.txt - Simplify your life

So it now works with your PCI serial card without timing issues or data loss?

If so, then this probably confirms a MSI irq sharing bug in the driver. I have submitted a bug report and potential proposed fix for the problem to the linux-serial kernel mailing list:

https://lore.kernel.org/linux-serial/20210729043306.18528-1-mario.kleiner.de@gmail.com/T/#mad3deed4dd6dd85b68153ea8b80bf34f5f0d234d

Lets see how this goes, and if my fix is correct, or if it is “barking up the wrong tree”. Apparently they are currently discussing a redesign of how MSI interrupts are to be handled for PCI serial port cards in general…

You would have to repeat those commands each time you reboot the machine, probably put it into a script. You could also add the kernel boot parameter irq=nomsi to globally disable all use of MSI interrupts on your machine as a very blunt but simple fix – assuming none of the other hardware critically needs MSI’s that is. Or use the USB-serial converter of course.

-mario

Yes, after running the commands you sent I’m able to receive ISCAN input via the PCI serial port without timing issues or data loss - the packets are consistently received at 120Hz.