Inconsistent Clock Behavior on Laptop

asali · March 6, 2024, 12:06pm

Hello,

I tried to post yesterday but it got flagged as spam. Sorry for another post (please disregard that one if it is still in your queue).

We’ve noticed a rare defect in our stimulus onset times that are recorded in our fMRI setup. We get an error message that says a flip occurred before the time it was requested. After the incorrect earlier than expected timestamp, it will jump forward some and then continue. The GetSecs clock (and times from KBQueue commands) remain consistent with onset times.

We are simultaneously collecting eye tracking data with an eyelink 1000+ and so I have those stimulus onset times to compare to our psychtoolbox times. When one of these jumps happens, the eyelink clock shows that there is a delay that occurred at the time of the error.

I’ve tried to troubleshoot this myself, but it is very rare (has happened 3 times over around 80 participants with 3-6 task runs each).

For participants with no visible errors, the eyeylink-derived onsets stay in sync with psychtoolbox. The onset times derived from eyelink alternates between being close to identical to the psych toolbox-derived time and about 8 ms faster from trial-to-trial. I haven’t been able to figure out why that is occurring.

I’ve started to run tests without the MRI display and in one of those cases I observed the psychtoolbox times start in sync with eyelink times, but then jump to being about 700 ms faster than eyelink times.

I’m assuming this isn’t a quick answer, so just let me know in that case and I’ll pay for support. I’d really appreciate help troubleshooting what is going on and confirming that the data we have with no visible errors are reliable.

I’ve pasted the control window output from a run with the error below. Thank you!

PTB-INFO: This is Psychtoolbox-3 for GNU/Linux X11, under Matlab 64-Bit (Version 3.0.18 - Build date: Jun 27 2022).
PTB-INFO: OS support status: Linux 5.18.10-76051810-generic Supported.
PTB-INFO: Type 'PsychtoolboxVersion' for more detailed version information.
PTB-INFO: Most parts of the Psychtoolbox distribution are licensed to you under terms of the MIT License, with
PTB-INFO: some restrictions. See file 'License.txt' in the Psychtoolbox root folder for the exact licensing conditions.
PTB-INFO: For information about paid support, support memberships and other commercial services, please type
PTB-INFO: 'PsychPaidSupportAndServices'.
PTB-INFO: OpenGL-Renderer is AMD :: AMD RENOIR (DRM 3.46.0, 5.18.10-76051810-generic, LLVM 12.0.0) :: 4.6 (Compatibility Profile) Mesa 21.2.6
PTB-INFO: VBL startline = 1080 , VBL Endline = -1
PTB-INFO: Measured monitor refresh interval from VBLsync = 16.667604 ms [59.996624 Hz]. (50 valid samples taken, stddev=0.000674 ms.)
PTB-INFO: Reported monitor refresh interval from operating system = 16.667222 ms [59.998001 Hz].
PTB-INFO: Small deviations between reported values are normal and no reason to worry.
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
Snd(): Initializing PsychPortAudio driver for sound output.
Snd(): PsychPortAudio will be blocked for use by your own code until you call Snd('Close');
Snd(): If you want to use PsychPortAudio and Snd in the same session, make sure to open your
Snd(): stimulation sound device via calls to, e.g., pahandle = PsychPortAudio('Open', ...);
Snd(): *before* the first call to Snd() or any function that might use Snd(), e.g., Beeper()
Snd(): or the Eyelink functions with auditory feedback. Call Snd('Open', pahandle) next, to
Snd(): share that pahandle audio device with Snd(), Beeper() et al. for optimal collaboration.
Snd(): Consider using PsychPortAudio('OpenSlave', ...); on a master device and pass that slave
Snd(): handle to Snd('Open', ...) if you want to allow Snd() to operate fully independently.
Snd(): You may want to read 'help Snd' about other points to consider wrt. pahandle sharing.
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
PTB-INFO: Using modified PortAudio V19.7.0-devel, revision 147dd722548358763a8b649b3e4b41dfffbcfbb6
 | None: No Response 
 | None: No Response 
 | Press 7&: Response Inaccurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 2@: Response Accurate 
 | None: No Response 
 | Press 7&: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 2@: Response Accurate 
PTB-ERROR: OpenML timestamping reports that flip completed before it was scheduled [Scheduled no earlier than 1707739887.292816 secs, completed at 1707739884.800377 secs]!
PTB-ERROR: This could mean that sync of bufferswaps to vertical retrace is broken or some other driver bug! Check your system setup!
 | Press 2@: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 2@: Response Accurate 
 | None: No Response 
 | Press 7&: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 2@: Response Inaccurate 
 | Press 2@: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 7&: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 2@: Response Accurate 
 | Press 2@: Response Accurate 
ans =
   1.7077e+09
Total Time 446.07
[ Warning: Directory already exists.]  
[ > In <a href="matlab:matlab.internal.language.introspective.errorDocCallback('gradCPT_main_v4', '/home/salilab/Desktop/EyetrackCode/gradCPT_main_v4.m', 219)" style="font-weight:bold">gradCPT_main_v4</a> (<a href="matlab: opentoline('/home/salilab/Desktop/EyetrackCode/gradCPT_main_v4.m',219,0)">line 219</a>)]  
saving data- wait!Receiving data file 'S071R2.edf'
ReceiveFile status 10411493
Data file 'S071R2.edf' can be found in '/home/salilab/Desktop/EyetrackCode/data'
INFO: PTB's Screen('Flip', 10) command seems to have missed the requested stimulus presentation deadline
INFO: a total of 15 times out of a total of 8844 flips during this session.
INFO: This number is fairly accurate (and indicative of real timing problems in your own code or your system)
INFO: if you provided requested stimulus onset times with the 'when' argument of Screen('Flip', window [, when]);
INFO: If you called Screen('Flip', window); without the 'when' argument, this count is more of a ''mild'' indicator
INFO: of timing behaviour than a hard reliable measurement. Large numbers may indicate problems and should at least
INFO: deserve your closer attention. Cfe. 'help SyncTrouble', the FAQ section at www.psychtoolbox.org and the
INFO: examples in the PDF presentation in PsychDocumentation/Psychtoolbox3-Slides.pdf for more info and timing tips.

mariokleiner · March 8, 2024, 4:37am

I have picked your specific problem for “free support on resolving a non-trivial issue”, sponsored by Mathworks Neuroscience - MATLAB and Simulink Solutions - MATLAB & Simulink. You get advice and help on this topic for free. Mathworks provided us with 5 such “Jokers” for free support in the time period October 2023 to June 2024, and you get to use the third one out of five.

So I think this problem you observed is the first time I’ve heard this being reported by anyone in the 17+ years of PTB on Linux. Something possible, but unlikely.

My guess is that what happens is that your computers wall-clock, also known as Unix gettimeofday clock or Posix clock CLOCK_REALTIME, jumped back by multiple seconds in the middle of a running trial. This would cause time to go backwards, consistently across GetSecs, Screen timestamps, PsychPortAudio etc.

On Linux we use CLOCK_REALTIME for all timing in the GetSecs reference frame. This clock measures absolute time in seconds since the beginning of the year 1970 UTC. On a typical networked system setup, this clock will be set and adjusted by NTP - the network time protocol - and thereby synchronized to world time with all other networked computers. On a non-networked machine, or with NTP disabled or not working for some reason, the time will be what the system administrator has set in time & date settings, maintained by the computers hardware RTC clock while the machine is sleeping or shut down. It will be subject to inaccuracies of the RTC.

That is great for synchronizing multi-machine setups. The downside is if something goes wrong with NTP. If the system administrator sets a time in the past, time can go backwards. Under automatic NTP adjustment, normally the time server will slowly adjust the clock, speeding it up or slowing it down very slightly over time to keep the machines time in sync to world time - and thereby other NTP driven computers - down to errors of at most a few milliseconds. If NTP for some reason detects a substantial deviation between your computers clock and network time, e.g., of many seconds, it may decide to perform a “step change” and set the computers clock instantly to network provided world time. A backwards step change - if NTP thinks the computers clock is ahead of world time by multiple seconds and needs strong adjustment - would cause GetSecs time and all other time to jump backward and would explain what you’ve seen.

You’d probably find a log entry in the system log about this step change if you grep ntp /var/log/syslog or grep NTP /var/log/syslog, or grep timesyncd /var/log/syslog for such log messages from NTP. E.g., this is an excerpt from my machine doing initial sync after startup to the timeserver of Ubuntu (ntp.ubuntu.com):

“Mar 7 17:20:43 touchy systemd-timesyncd[734]: Initial synchronization to time server [2620:2d:4000:1::40]:123 (ntp.ubuntu.com).”

Do you find similar messages in your system log around the time the glitch happened, hinting to a step adjustment? What timeserver does the machine use? An unreliable timeserver, network, or computer hardware clock RTC could cause something like that.

In theory a broken computer hardware clock in your laptop could cause the same problem, but that is very unlikely, as Linux can operate its system clock based on different high precision hardware clocks, and it continuously monitors multiple of them for consistency, so it can quickly detect faulty or unreliable hardware clocks and switch to a more reliable “backup clock” if needed. Examples of such clocks are TSC, HPET, LAPIC timer, pm-timer. A grep clocksource /var/log/syslog will tell you about clock selection.

It is on my todo list as a rather low priority item to allow Linux PTB users to select the type of clock to use for time keeping. Another choice than CLOCK_REALTIME would be CLOCK_MONOTONIC, which counts time since operating system boot and is normally not subject to rapid or step adjustments of time, only to slow continuous adjustments by NTP. This would be in line with what PTB on macOS and Windows currently does. Maybe it is time to tackle this item to allow dealing with such cases. Each clock type has its pro’s and con’s, but so far there was no evidence it would matter in practice at all.

So check your log files and report back. If this was a step change adjustment, then those are exceedingly rare and most likely only the one specific experiment trial was affected during which the step change happened. Still one would want to find out, why the system would do such a step change in your environment during a steady-state condition, minutes after system boot or changes in network configuration / settings etc. Maybe there is something weird in the local ethernet/wifi network setup and behaviour in your fMRI scanner room?

-mario

asali · March 13, 2024, 2:28am

Hi Mario,

Thanks for your response and for your time. I really appreciate this! I also was extremely surprised when I saw this behavior.

I’ve looked through the log files and I see the timesyncd: Initial synchronization to time server entries during the runs when the time jumped backwards (at least for the two times that happened recently enough for the logs to still be present). Here is the full entry for the most recent case:

salilab-laptop systems-timesyncd[681]: Initial synchronization to time server 185.125.190.56.123 (ntp.ubuntu.com).

After the run with the error, I restarted the computer and it looks like there is another log entry around that time that is identical except for a different number in the brackets:

salilab-laptop systems-timesyncd[690]: Initial synchronization to time server 185.125.190.56.123 (ntp.ubuntu.com).

Is it normal for the synchronization to occur again after I restart the computer?

grep clocksource /var/log/syslog for the appropriate syslog returns this:

Mar  1 15:15:03 salilab-laptop kernel: [    0.036162] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
Mar  1 15:15:03 salilab-laptop kernel: [    0.094686] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Mar  1 15:15:03 salilab-laptop kernel: [    0.114712] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x19e56f852fd, max_idle_ns: 440795258506 ns
Mar  1 15:15:03 salilab-laptop kernel: [    0.274897] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
Mar  1 15:15:03 salilab-laptop kernel: [    1.607117] clocksource: Switched to clocksource tsc-early
Mar  1 15:15:03 salilab-laptop kernel: [    1.956017] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Mar  1 15:15:03 salilab-laptop kernel: [    3.034749] tsc: Refined TSC clocksource calibration: 1810.578 MHz
Mar  1 15:15:03 salilab-laptop kernel: [    3.034759] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1a193238fbe, max_idle_ns: 440795231981 ns
Mar  1 15:15:03 salilab-laptop kernel: [    3.034812] clocksource: Switched to clocksource tsc
Mar  1 15:15:03 salilab-laptop kernel: [    3.648691] clocksource: timekeeping watchdog on CPU6: Marking clocksource 'tsc' as unstable because the skew is too large:
Mar  1 15:15:03 salilab-laptop kernel: [    3.648702] clocksource:                       'hpet' wd_nsec: 516852141 wd_now: 43975b3 wd_last: 3c889f5 mask: ffffffff
Mar  1 15:15:03 salilab-laptop kernel: [    3.648707] clocksource:                       'tsc' cs_nsec: 512868678 cs_now: 54a0c59f0 cs_last: 512b33620 mask: ffffffffffffffff
Mar  1 15:15:03 salilab-laptop kernel: [    3.648710] clocksource:                       'tsc' is current clocksource.
Mar  1 15:15:03 salilab-laptop kernel: [    3.648725] tsc: Marking TSC unstable due to clocksource watchdog
Mar  1 15:15:03 salilab-laptop kernel: [    3.649047] clocksource: Checking clocksource tsc synchronization from CPU 8 to CPUs 0-1,6-7,11-12.
Mar  1 15:15:03 salilab-laptop kernel: [    3.649224] clocksource: Switched to clocksource hpet
Mar  5 00:53:46 salilab-laptop kernel: [    0.035851] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
Mar  5 00:53:46 salilab-laptop kernel: [    0.096384] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Mar  5 00:53:46 salilab-laptop kernel: [    0.116407] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x19e60a6ae91, max_idle_ns: 440795285270 ns
Mar  5 00:53:46 salilab-laptop kernel: [    0.277538] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
Mar  5 00:53:46 salilab-laptop kernel: [    1.608587] clocksource: Switched to clocksource tsc-early
Mar  5 00:53:46 salilab-laptop kernel: [    1.957851] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Mar  5 00:53:46 salilab-laptop kernel: [    3.136658] tsc: Refined TSC clocksource calibration: 1796.624 MHz
Mar  5 00:53:46 salilab-laptop kernel: [    3.136679] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x19e5b473752, max_idle_ns: 440795288133 ns
Mar  5 00:53:46 salilab-laptop kernel: [    3.136752] clocksource: Switched to clocksource tsc
Mar  5 00:57:45 salilab-laptop kernel: [    0.036549] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
Mar  5 00:57:45 salilab-laptop kernel: [    0.097503] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Mar  5 00:57:45 salilab-laptop kernel: [    0.117528] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x19e57f90bde, max_idle_ns: 440795289086 ns
Mar  5 00:57:45 salilab-laptop kernel: [    0.274663] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
Mar  5 00:57:45 salilab-laptop kernel: [    1.617568] clocksource: Switched to clocksource tsc-early
Mar  5 00:57:45 salilab-laptop kernel: [    1.958879] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Mar  5 00:57:45 salilab-laptop kernel: [    3.069589] tsc: Refined TSC clocksource calibration: 1796.624 MHz
Mar  5 00:57:45 salilab-laptop kernel: [    3.069602] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x19e5b473752, max_idle_ns: 440795288133 ns
Mar  5 00:57:45 salilab-laptop kernel: [    3.069659] clocksource: Switched to clocksource tsc
Mar  6 22:28:00 salilab-laptop kernel: [    0.035751] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
Mar  6 22:28:00 salilab-laptop kernel: [    0.096970] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Mar  6 22:28:00 salilab-laptop kernel: [    0.116996] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x19e5397e320, max_idle_ns: 440795226971 ns
Mar  6 22:28:00 salilab-laptop kernel: [    0.273329] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
Mar  6 22:28:00 salilab-laptop kernel: [    1.642076] clocksource: Switched to clocksource tsc-early
Mar  6 22:28:00 salilab-laptop kernel: [    1.990585] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Mar  6 22:28:00 salilab-laptop kernel: [    3.104329] tsc: Refined TSC clocksource calibration: 1796.624 MHz
Mar  6 22:28:00 salilab-laptop kernel: [    3.104346] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x19e5b473752, max_idle_ns: 440795288133 ns
Mar  6 22:28:00 salilab-laptop kernel: [    3.104421] clocksource: Switched to clocksource tsc

The timesyncd in question occurred on Mar 1 at 15:07:57 for reference.

There is a note in there: timekeeping watchdog on CPU6: Marking clocksource 'tsc' as unstable because the skew is too large It then says it switches to hpet. This however happened after the clock jumping back error. I believe it occurred when I restarted the computer right after the run with the error ended. There are no clocksource messages from the day of the error before it actually occurred. I haven’t checked clocksource for the other day there was an error. Would that be useful?

Would it be helpful for me to upload the full log file? I’m not sure why the step change would occur when it did. We were in the second run of the task so it had been going for a while. This computer is a laptop and the hospital network is ultra locked down. Could it be a strange interaction with that network? We haven’t seen the error anywhere else, but we mostly use the laptop at the hospital.

I log the trigger pulses coming from the scanner with KbQueueCheck and I can see that the times for the most part are very close to the perfect 1.6 second increments I would expect. These runs with the errors of course show instances where the time difference between successive pulses is negative. For most of the runs I’ve checked, if I look at the last trigger pulse received, it is less than +/- 20 ms of the total time the scanner is set to run. For at least one run, the maximum offset was around 40 ms. Do you think this is just normal clock drift and within the range that I should expect or a symptom of this larger issue? My runs are between 7-8 minutes long.

Thanks again and let me know what else would be useful.
Anthony

mariokleiner · March 18, 2024, 6:00am

That sounds like the troublemaker if it happens about at the time when your clock jumps backwards.

Yes. The expected behavior is that you get an “Initial synchronization…” once during system startup, or more specifically the first time an internet connection to a new reference NTP (=Network Time Protocol) server is made, in your case ntp.ubuntu.com.

Your computer has a builtin battery backed hardware clock known as RTC (RealTimeClock), a quartz clock, which roughly keeps date and time at second resolution, while the machine is powered off, just like a digital watch. During system startup, an optimal high precision hardware clock source (= way faster and lower overhead to read by the operating system, way higher precision, typically sub-nanoseconds to microseconds, depending on selected hardware clock source) is selected, and then initialized with current date and time from reading that battery backed, always-on, but lower precision RTC. Then the system runs off that clock source until the point in time when you get an internet connection and the NTP client (here systemd-timesyncd) can get high precision world time (ie. UTC, ultimately driven by an atomic clock). That moment, the initial synchronization, will adjust the clock source to the authoritative real world time, a potentially discontinuous jump if there is a large difference between “official” NTP world time and the current time from your current clock source. If the difference is small, it will instead gradually adjust (“slew”) your computers clock source to match NTP time in a continuous fashion without ever having jumps of time forward or backward.

As the hardware RTC of your computer, and the clock source used during system operation, are not as accurate as atomic clocks, they will, depending on many factors, including power supply stability, temperature of the electronics, manufacturing tolerances, ageing of the hardware, drift out of sync with a previously set UTC world time. This is why frequent adjustment is needed to keep the computers time in sync with world time down to milliseconds deviation. Usually at least at system shutdown, usually the current world time will also be used to reset the hardware RTC if possible, so that the RTC starts from a good starting point for timekeeping while the machine is off. If your machine was off for long, or has a strongly drifting RTC clock or clock source for some reason, the error at time of initial sync could have accumulated to a large enough value to trigger a step change where time jumps forward or backward. Also, if you dual-boot Windows and Linux and one operating system is set to expect the RTC to run on local time, and the other expects RTC running on UTC time, then each reboot into the other OS may leave the RTC off by hours from what real UTC is. Similar if the battery of your RTC is dead and the RTC loses time each time the machine is powered off, or the laptops battery runs empty.

So far the theory. In practice, a lot of the details are configurable. There are multiple different NTP client/server implementations available to select from, depending on precision and functionality requirements. E.g., you can manually install and enable the classic “ntpd”, or also more recent “chrony”, for higher precision needs (cfe. https://ubuntu.com/server/docs/how-to-serve-the-network-time-protocol-with-chrony), which query multiple NTP servers on the network, compare latencies and timing jitter and whatnot to find the most reliable server(s) to sync to, and do clever “multi-modal” integration to get extra exact results. They can also act as time servers for your local network, to keep local lab time across local machines and equipment especially consistent. There’s also PTP (Precision Time Protocol) for even higher precision requirements, when milliseconds don’t cut it, and you want to synchronize local machines down to nanoseconds precision - This is what, e.g., high energy particle physics labs like CERN use to sync up their thousands computers and data acquisition and control units down to micro- or nanoseconds, as needed when stuff happens at the speed of light, cfe. https://white-rabbit.web.cern.ch/

The default choice of many current Linux distributions like Ubuntu 20.04-LTS and later Ubuntu’s, is a more simple “consumer” NTP client, called systemd-timesyncd, which only speaks the SNTP (= “Simple NTP”) protocol, and only queries one NTP server for time, a choice that provides less precision, but “good enough” for typical laptop and desktop use, and probably has other advantages like more easy “plug & play” setup, probably lower overhead and better flexibility / automatic dynamic configuration for the “world traveling” laptop ( or tablet/smartphone/home appliance/smart TV), that travels through different time zones and many different wifi networks, as you roam the world or local town with it.

So specifically for the default choice of systemd-timesyncd:

Terminal commands like timedatectl show-timesync or timedatectl timesync-status give you information about where the machine gets its internet time from, with which precision and which parameters.

Ideally you’d only see one “initial time sync” during machine startup / initial connection to the internet and thereby the chosen NTP server, and from there on, the adjustments are gradual without jumps.

In practice, from looking at the source code of timesyncd (cfe. systemd/src/timesync/timesyncd-manager.c at 9cebda59e818cdb89dc1e53ab5bb51b91b3dc3ff · systemd/systemd · GitHub), it seems the “initial time sync” happens every time the timesyncd gets restarted for some reason, or a new NTP server is chosen, or the connection to the given NTP server is lost and then later reestablished, iow. the “initial time sync” is when timesyncd starts speaking to a “new” NTP server.
Indeed, looking at this commit from March 2022 - timesyncd: improve log message whe getting a reply from server · systemd/systemd@922a650 · GitHub - the log output message was reworded to make it more clear what is really meant with “initial time sync”, but this change was too late to be present in Ubuntu 22.04-LTS.

Looking at the actual clock adjustment routine (systemd/src/timesync/timesyncd-manager.c at 9cebda59e818cdb89dc1e53ab5bb51b91b3dc3ff · systemd/systemd · GitHub) shows that a step change of the clock instead of a gradual slow change, leading to forward or backwards time jumps, would be done whenever the discrepancy between system time and NTP network reference time is greater than 0.4 seconds (systemd/src/timesync/timesyncd-manager.c at 9cebda59e818cdb89dc1e53ab5bb51b91b3dc3ff · systemd/systemd · GitHub).

Triggers of additional “initial time sync” (with potential time jump if things go badly), e.g., in the middle of a work session, would be some network trouble / disrupted network connection / NTP network time query trouble, or if the timesync service gets restarted. Apparently, as a quick Google shows, this can happen if your network uses DHCP for dynamically assigning IP addresses to your machine, and lease duration is short, iow. your machine gets assigned a new IP address frequently by the local network.

grep clocksource /var/log/syslog for the appropriate syslog returns this:

Mar  1 15:15:03 salilab-laptop kernel: [    0.036162] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
Mar  1 15:15:03 salilab-laptop kernel: [    0.094686] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Mar  1 15:15:03 salilab-laptop kernel: [    0.114712] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x19e56f852fd, max_idle_ns: 440795258506 ns
Mar  1 15:15:03 salilab-laptop kernel: [    0.274897] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
Mar  1 15:15:03 salilab-laptop kernel: [    1.607117] clocksource: Switched to clocksource tsc-early
Mar  1 15:15:03 salilab-laptop kernel: [    1.956017] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Mar  1 15:15:03 salilab-laptop kernel: [    3.034749] tsc: Refined TSC clocksource calibration: 1810.578 MHz
Mar  1 15:15:03 salilab-laptop kernel: [    3.034759] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1a193238fbe, max_idle_ns: 440795231981 ns
Mar  1 15:15:03 salilab-laptop kernel: [    3.034812] clocksource: Switched to clocksource tsc
Mar  1 15:15:03 salilab-laptop kernel: [    3.648691] clocksource: timekeeping watchdog on CPU6: Marking clocksource 'tsc' as unstable because the skew is too large:
Mar  1 15:15:03 salilab-laptop kernel: [    3.648702] clocksource:                       'hpet' wd_nsec: 516852141 wd_now: 43975b3 wd_last: 3c889f5 mask: ffffffff
Mar  1 15:15:03 salilab-laptop kernel: [    3.648707] clocksource:                       'tsc' cs_nsec: 512868678 cs_now: 54a0c59f0 cs_last: 512b33620 mask: ffffffffffffffff
Mar  1 15:15:03 salilab-laptop kernel: [    3.648710] clocksource:                       'tsc' is current clocksource.
Mar  1 15:15:03 salilab-laptop kernel: [    3.648725] tsc: Marking TSC unstable due to clocksource watchdog
Mar  1 15:15:03 salilab-laptop kernel: [    3.649047] clocksource: Checking clocksource tsc synchronization from CPU 8 to CPUs 0-1,6-7,11-12.
Mar  1 15:15:03 salilab-laptop kernel: [    3.649224] clocksource: Switched to clocksource hpet
Mar  5 00:53:46 salilab-laptop kernel: [    0.035851] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
Mar  5 00:53:46 salilab-laptop kernel: [    0.096384] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Mar  5 00:53:46 salilab-laptop kernel: [    0.116407] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x19e60a6ae91, max_idle_ns: 440795285270 ns
Mar  5 00:53:46 salilab-laptop kernel: [    0.277538] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
Mar  5 00:53:46 salilab-laptop kernel: [    1.608587] clocksource: Switched to clocksource tsc-early
Mar  5 00:53:46 salilab-laptop kernel: [    1.957851] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Mar  5 00:53:46 salilab-laptop kernel: [    3.136658] tsc: Refined TSC clocksource calibration: 1796.624 MHz
Mar  5 00:53:46 salilab-laptop kernel: [    3.136679] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x19e5b473752, max_idle_ns: 440795288133 ns
Mar  5 00:53:46 salilab-laptop kernel: [    3.136752] clocksource: Switched to clocksource tsc
Mar  5 00:57:45 salilab-laptop kernel: [    0.036549] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
Mar  5 00:57:45 salilab-laptop kernel: [    0.097503] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Mar  5 00:57:45 salilab-laptop kernel: [    0.117528] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x19e57f90bde, max_idle_ns: 440795289086 ns
Mar  5 00:57:45 salilab-laptop kernel: [    0.274663] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
Mar  5 00:57:45 salilab-laptop kernel: [    1.617568] clocksource: Switched to clocksource tsc-early
Mar  5 00:57:45 salilab-laptop kernel: [    1.958879] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Mar  5 00:57:45 salilab-laptop kernel: [    3.069589] tsc: Refined TSC clocksource calibration: 1796.624 MHz
Mar  5 00:57:45 salilab-laptop kernel: [    3.069602] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x19e5b473752, max_idle_ns: 440795288133 ns
Mar  5 00:57:45 salilab-laptop kernel: [    3.069659] clocksource: Switched to clocksource tsc
Mar  6 22:28:00 salilab-laptop kernel: [    0.035751] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
Mar  6 22:28:00 salilab-laptop kernel: [    0.096970] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
Mar  6 22:28:00 salilab-laptop kernel: [    0.116996] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x19e5397e320, max_idle_ns: 440795226971 ns
Mar  6 22:28:00 salilab-laptop kernel: [    0.273329] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
Mar  6 22:28:00 salilab-laptop kernel: [    1.642076] clocksource: Switched to clocksource tsc-early
Mar  6 22:28:00 salilab-laptop kernel: [    1.990585] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Mar  6 22:28:00 salilab-laptop kernel: [    3.104329] tsc: Refined TSC clocksource calibration: 1796.624 MHz
Mar  6 22:28:00 salilab-laptop kernel: [    3.104346] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x19e5b473752, max_idle_ns: 440795288133 ns
Mar  6 22:28:00 salilab-laptop kernel: [    3.104421] clocksource: Switched to clocksource tsc

The timesyncd in question occurred on Mar 1 at 15:07:57 for reference.

There is a note in there: timekeeping watchdog on CPU6: Marking clocksource 'tsc' as unstable because the skew is too large It then says it switches to hpet. This however happened after the clock jumping back error. I believe it occurred when I restarted the computer right after the run with the error ended. There are no clocksource messages from the day of the error before it actually occurred. I haven’t checked clocksource for the other day there was an error. Would that be useful?

That message is indeed from system reboot, when the kernel tests all available hardware clock sources to pick the best clock source for further timekeeping, based on various criteria and consistency/reliability checks. It means that the preferred clock source TSC (processor TimeStampCounter) has been detected as unreliable/imprecise on at least one processor core, so the system switches from TSC to HPET (HighPrecisionEventTimer) as a backup clock source. TSC has especially low overhead on Intel compatible processors, HPET has higher overhead and a bit lower precision, but is a fine backup choice. The kernel will continue to periodically check clock source reliability during system runtime, to transparently handle timing trouble.

But this is unrelated to your time jump problem, as it got fixed by Linux long before your work session started.

Following the longer explanation from above, likely yes, the trigger is a strange interaction with the hospital network. E.g., if the heavily locked down network and its firewalls and stuff interfere with proper high precision access to hospital external NTP timeservers like ntp.ubuntu.com, or if the network frequently assigns (leases) new IP addresses to your laptop inside the scanner room, retriggering the “Initial time sync”. If either the problematic network, or inaccurate clocks in your laptop, cause more than 0.4 seconds of time error to accumulate between last gradual NTP time update and the new update from Initial sync, you would expect a time jump to correct for the more than 0.4 seconds error.

If you look at the log preceeding the “Initial time sync message” which correlates with your time jump, you may find clues about network related causes, e.g., messages from NetworkManager, dhcp, or similar about losing and reestablishing network connections, getting new DHCP leases etc.

In the end this means that you can be somewhat confident that only single trials in your experiments were affected by such time jump problems.

Sounds reasonable to me, as 40 msecs in 8 minutess ~ 83 ppm. In the end you see the combined clock drift of the scanners clock and the laptops clock. I don’t know if typical fMRI scanners themselves can be network synchronized to a NTP server, or act as NTP servers themselves, to facilitate consistent timekeeping in such a setup. E.g., Wikipedia (High Precision Event Timer - Wikipedia) mentions that some bad HPET hardware timer implementations had over 800 ppm drift, if that would be chosen as backup time source. Not sure what a typical TSC has in terms of expected drift. But that one kernel message about “TSC unstable” suggests that your specific laptop may not be the most accurate, unless that was some one-off problem. In the end this is why things like NTP exist, to sync to some reference clock of supposedly higher quality.

Back to how to solve your problem though:

A way to avoid time jumps is to upgrade to the latest Psychtoolbox release 3.0.19.8 or later, where, motivated by your problem, I now implemented the ability to select the type of clock time base to use for Psychtoolbox timing functions like GetSecs,WaitSecs, PsychPortAudio, Screen visual timing and timestamping etc.

You can use the new PsychTweak subfunction PsychTweak('GetSecsClock', clockid); at the beginning of a script (before any other calls to Psychtoolbox mex files!) to select a clock with a different clockid. A clockid setting of 0 uses the default CLOCK_REALTIME clock, as in the past. A clockid setting of 1 uses CLOCK_MONOTONIC instead. CLOCK_MONOTONIC, as opposed to default CLOCK_REALTIME, does not maintain time in seconds since 1st January 1970 0:00:00 UTC, but has an unspecified zero point, but usually 0 denotes time of operating system boot. CLOCK_MONOTONIC is always guaranteed to be monotonically increasing, not subject to step adjustments by the system administrator, NTP, PTP, or other external clock reference sources. This prevents synchronization to other computers on the network/setup to exactly the same time, but it should avoid clock trouble caused by time jumping. CLOCK_MONOTONIC’s rate still gets adjusted to a world reference clock if a NTP or PTP server or such is active, iow. should keep clock drift across different machines minimal. CLOCK_MONOTONIC behaves more like the GetSecs clocks for Psychtoolbox on MS-Windows and macOS. For now the new monotonic mode is only lightly tested, and only available on Psychtoolbox for Intel processors, not yet for RaspberryPi - this will follow in a future release, when I’m back home with my Pi’s again.

So if you don’t need to sync time exactly across computers, a PsychTweak('GetSecsClock', 1); as the first command in your script should fix the problem of time jumps.

Another option is to disable NTP, e.g., while you are on the hospitals network and run your sessions. timedatectl set-ntp false can stop NTP, and timedatectl set-ntp true can restart it, or the corresponding GUI check box in the systems “Date and Time” settings GUI, I think “Automatic Date and Time” checkbox. With NTP fully disabled, the clock would only run on the laptops local clock, which avoids any NTP and network related problems, but could cause higher clock drift between laptop and fMRI scanner or Eyelink computer if the laptop hardware clock would now have high uncorrected drift.
Or option 3, to figure out if and why there are network problems on the hospitals network, how to avoid that, or if they even run their own local NTP server for such cases.

Hope this helps, please report back.
-mario

[Work time spent on this Mathworks sponsored issue: 19 hours 15 minutes].

P.S. For detailed debug output of timesyncd: DebuggingSystemd - Ubuntu Wiki

Example output on my machine of timedatectl timesync-status:

       Server: 2620:2d:4000:1::3f (ntp.ubuntu.com)
Poll interval: 2min 8s (min: 32s; max 34min 8s)   
         Leap: normal                             
      Version: 4                                  
      Stratum: 2                                  
    Reference: C279CFF9                           
    Precision: 1us (-25)                          
Root distance: 784us (max: 5s)                    
       Offset: -6.832ms                           
        Delay: 11.071ms                           
       Jitter: 74.806ms                           
 Packet count: 5                                  
    Frequency: +49,305ppm

…suggests about 49 ppm clock drift, which would predict ~23 msecs clock drift in 8 minutes without proper NTP correction. This on a year 2016 MS Surface Pro 6 tablet.

P.P.S: Some other choices of NTP servers, beyond the default ntp.ubuntu.com:
pool.ntp.org: the internet cluster of ntp servers

asali · August 1, 2024, 1:39am

Hi Mario,

Thanks so much for your wonderful attention to this issue and for explaining everything so carefully. I’m sorry that it has taken me so long to respond!

I have not yet tested the new psychtoolbox version with your solution on this machine, but that sounds like it should do the trick.

I did notice a few odd things that complicate our explanation a little. There have been two occasions when instead of the “flip completed before it was scheduled” message, we instead see “OpenML timestamping reports that flip completed before its requested target time.” The wording is a little different in this message and the time offset seems to be less. I was able to grab the syslog for one of these instances and there’s no synchronization message at the time of the psychtoolbox warning so maybe it is unrelated?

Could you help me understand the difference between those two psychtoolbox messages? One one occasion, the “flip requested before it’s requested time” message occurred after I had been testing versions of a psychtoolbox script for several hours on my home network so the hospital network explanation can’t be the whole story.

We’ve also seen the “completed before it was scheduled” message at least once when not at the hospital. Do you think this means the timing on this computer is just generally pretty poor? It is a relatively new machine, but perhaps we just got unlucky?

I plan to test the new psychtoolbox release to see if this makes our issue go away, but I’m also in the market for a new laptop. This is the newer model of our currently not working correctly laptop: https://system76.com/laptops/pangolin. Is there something I should look for or avoid in a future purchase?

Would a photodiode be the only way to test whether there are other timing issues going on for our old machine outside of what psychtoolbox is catching? It seems like these errors are rare and localized to single runs of the task, but I’d like to be able to test to make sure. The laptop appears to run very well according to the metrics psychtoolbox reports in all but these odd occasions.

Thanks again!
Anthony

mariokleiner · August 6, 2024, 2:39pm

That should be the first thing to do.

I did notice a few odd things that complicate our explanation a little. There have been two occasions when instead of the “flip completed before it was scheduled” message, we instead see “OpenML timestamping reports that flip completed before its requested target time.” The wording is a little different in this message and the time offset seems to be less. I was able to grab the syslog for one of these instances and there’s no synchronization message at the time of the psychtoolbox warning so maybe it is unrelated?

Could you help me understand the difference between those two psychtoolbox messages? One one occasion, the “flip requested before it’s requested time” message occurred after I had been testing versions of a psychtoolbox script for several hours on my home network so the hospital network explanation can’t be the whole story.

Your home network could have a similar problem, only following the advice in the previous messages from me and using the new PTB will tell you.

Just switching off network time sync, as explained before, would obviously eliminate network issues / NTP issues as culprit. If you’d still have trouble, the problem would likely be with your computers clocks.

That other message you’ve seen is just another consistency check triggering, but in both cases the OS reported stimulus onset / flip timestamp is compared against timestamps that should be in the past, but aren’t. Originally it was the requested ‘tWhen’ time passed into Screen(‘Flip’);, with the other error message it is a timestamps taken before issuing the swap request to the operating system. These tests are supposed to find problems with the graphics/display system, but in your case they trigger due to the computers clock misreporting time, ie. due to a small or big jump backwards. Different error messages, most likely same cause.

If network time protocol NTP problems would be ruled out, as explained before, then it could be a broken/unreliable hardware clock as the last explanation. Normally on standard PCs/Laptops with Intel architecture, e.g., Intel and AMD laptops/ PC’s / Apple Intel Macs, the processors TSC (TimeStampCounter) is used for highest efficiency and time resolution. These clocks are quick to read by the operating system, as they are per processor core - or group of cores, ie. built into the main processor die. There have been occasions where TSC’s were unreliable on some machines, and that’s why Linux periodically and at startup checks TSC’s for consistency/reliability and switches to other fallback time sources if needed. You could be unlucky though, if the TSC would work at startup and for a while, but randomly temporarily misreport time in the middle of a running experiment session and before a periodic Linux TSC check detects and fixes the problem, so a glitch would get through.

In such a case, you could try to enforce use of the less efficient HPET timer to work around the problem, will have a performance impact…

Add a Linux kernel boot parameter to force use of HPET as less efficient clock source from the beginning, by editing the line GRUB_CMDLINE_LINUX_DEFAULT in the file /etc/default/grub and add a force-override for clock source to HPET: clocksource=hpet
E.g., an edited line might look similar to this:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash clocksource=hpet"
Update the grub bootloader configuration to this new setting via
sudo update-grub

For a more detailed overview of various TSC related problems nowadays, I just found this article:

Surprisingly, efficient and totally reliable timekeeping is still not a solved problem on all hardware out there, with the blame on both processor vendors and BIOS/EFI firmware writers. This rather recent article from 2021 goes into even more detail:
https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/

Don’t know. Some of the article above suggests that modern Intel processors since Haswell (~year 2013) are considered a bit more reliable against buggy/broken BIOS/EFI firmware in the TSC department, than AMD processors. But then in the performance performance/price and integrated graphics department they are currently a bit worse. And the extra robustness of Intel is only needed in case your mainboard has broken/shoddy vendor firmware that tampers with the TSC during system management interrupts (SMI’s), and the Linux kernel should detect such tampering during its periodic clock correctness checks. Myself I have never observed trouble on AMD cpu’s and my main development machine has an AMD cpu.

I guess so. The PTB correctness checks for timing and visual presentation are most advanced on Linux with open-source display drivers (AMD and Intel graphics, RaspberryPi), so they should catch as much as can be caught with software only. And clearly, in your case PTB did detect troublesome trials. But beyond that, only stuff like photo diodes can give additional certainty.

I would appreciate if you could buy a paid support membership and post a authentication token here. This response took quite a bit of work time, more than the 30 minutes normally covered by a support membership, and this response is no longer covered by Mathworks paying for it, so it is a financial loss for us if you don’t.

-mario

asali · August 13, 2024, 12:55pm

Thanks so much, Mario. I appreciate this extra information and all of the help you’ve offered on this issue. I purchased a support membership and I am very grateful for all you do!

Authentication token: XAGJT-JS-202481385230:3f076d1f726e23ba1d88039000b5cb39c5e500b29655f55a5ab912f7bd6257b5

mariokleiner · August 13, 2024, 4:31pm

Thanks for your support! I’ve marked the license as used up, with 42 minutes extra work time used for my recent answers, so everything in this discussion thread is fully paid.

Topic		Replies	Views
PTB BETA update "Spooling up the spinner!"	3	359	May 13, 2017
syncing problem	7	5723	March 20, 2014
Timing issues without any missed frames Hardware linux	10	1644	June 15, 2021
Synchronization failure only when external USB device connected Hardware windows , linux , io , gpu	25	1438	December 9, 2022
VBL (vertical retrace) syncing problem Hardware linux , gpu	22	5992	February 7, 2021

Inconsistent Clock Behavior on Laptop

Related topics