temporal compression of auditory stimuli on Windows Wasapi

Dear PTB community,

we have realized a strange behavior when trying to play auditory stimuli using PsychPortAudio, namely that when recording the output with an oscilloscope, the sound is compressed in time, by about 2.5% (50ms for a 2s stimulus), as shown in the attached picture, while the time stamps given by PsychportAudio indicate the correct duration.

We have tried on several systems (all Windows (7 and 10), with Wasapi). When we play a wavefile of the stimulus using the default windows player, no compression occurs on the oscilloscope. We tried with a different sampling rate, and different latency-settings for PsychportAudio, but we cannot figure out what it is. We also checked with a sound that is modulated at the on- and offset, to see where the output is truncated or compressed, but all parts of the sound are shortened equally.

To make sure it is not our code, we used the BeepDemo by Peter Scarfe and set the stimulus duration to 2s. Still the same behavior.

System information is below.

We would greatly appreciate help with this.

Best wishes,

Sophie

Info given by InitializePsychSound(1);:

PTB-INFO: Using modified PortAudio V19.6.0-devel, revision unknown
PTB-INFO: New audio device -1 with handle 0 opened as PortAudio stream:
PTB-INFO: For 2 channels Playback: Audio subsystem is Windows WASAPI, Audio device name is Haut-parleurs / écouteurs (Realtek Audio)
PTB-INFO: Real samplerate 48000.000000 Hz. Input latency 0.000000 msecs, Output latency 22.000000 msecs.

System Info:

PsychtoolboxVersion = ‘3.0.16 - Flavor : beta - Corresponds to SVN Revision 10420’
Running on Windows 10 (Version 10.0.17763)

Dell PC, Latitude E7470. 64 bit. BIOS Version : Dell Inc 1.15.4, 12/05/2017
Processor : Intel® Core™ i7-6600U CPU @ 2.60GH, 2801 MHz, 2 cores, 4 logical processor

System sound devices : 1. Realtek Audio. driver : 6.0.1.6125, 04/12/2018
2.Son Intel® for screens. driver : 10.24.0.3, 04/12/2018

That’s weird. An error of maybe 20 ppm could be expected due to tolerances in the quartz oscillator of the soundcard clock, but 2.5% is way too much.

Does the sound sound correct? Ie. no audible artifacts? Have you tried different models of sound cards / chips?

Have you tried different settings for the ‘reqlatencyclass’ parameter of PsychPortAudio(‘Open’, …)? Any difference for values 0, 1, 2, 3 on Windows-10?

What’s the debug output at PsychPortAudio(‘verbosity’, 10) ?

Hi Mario, thank you for the reply. Unfortunately, things are a bit slow now, and we currently don’t have access to the oscilloscope. Here is the output with ‘verbose’ , 10 for reglatency class 1 and 4.

%% Reglatency class 1

PsychPortAudio(‘Verbosity’, 10);

pahandle = PsychPortAudio(‘Open’, [], 1, 1, freq, nrchannels);

PTB-DEBUG: PortAudio says: WASAPI: IAudioClient2 set properties: IsOffload = 0, Category = 0, Options = 1

PTB-DEBUG: PortAudio says: WASAPI: IAudioClient2 set properties: IsOffload = 0, Category = 0, Options = 1

PTB-DEBUG: PortAudio says: wFormatTag =WAVE_FORMAT_EXTENSIBLE

PTB-DEBUG: PortAudio says: SubFormat =KSDATAFORMAT_SUBTYPE_IEEE_FLOAT

PTB-DEBUG: PortAudio says: Samples.wValidBitsPerSample =32

PTB-DEBUG: PortAudio says: dwChannelMask =0x3

PTB-DEBUG: PortAudio says: nChannels =2

PTB-DEBUG: PortAudio says: nSamplesPerSec =48000

PTB-DEBUG: PortAudio says: nAvgBytesPerSec=384000

PTB-DEBUG: PortAudio says: nBlockAlign =8

PTB-DEBUG: PortAudio says: wBitsPerSample =32

PTB-DEBUG: PortAudio says: cbSize =22

PTB-DEBUG: PortAudio says: WASAPI::OpenStream(output): framesPerUser[ 480 ] framesPerHost[ 1056 ] latency[ 22.00ms ] exclusive[ NO ] wow64_fix[ NO ] mode[ POLL ]

PTB-INFO: New audio device -1 with handle 1 opened as PortAudio stream:

PTB-INFO: For 2 channels Playback: Audio subsystem is Windows WASAPI, Audio device name is Haut-parleurs (Périphérique High Definition Audio)

PTB-INFO: Real samplerate 48000.000000 Hz. Input latency 0.000000 msecs, Output latency 22.000000 msecs.

%% Reglatency class 4

PsychPortAudio(‘Verbosity’, 10);

pahandle = PsychPortAudio(‘Open’, [], 1, 4, freq, nrchannels);

PTB-DEBUG: PortAudio says: WASAPI: IAudioClient2 set properties: IsOffload = 0, Category = 0, Options = 2

PTB-DEBUG: PortAudio says: WASAPI: IAudioClient2 set properties: IsOffload = 0, Category = 0, Options = 2

PTB-DEBUG: PortAudio says: wFormatTag =WAVE_FORMAT_EXTENSIBLE

PTB-DEBUG: PortAudio says: SubFormat =KSDATAFORMAT_SUBTYPE_PCM

PTB-DEBUG: PortAudio says: Samples.wValidBitsPerSample =24

PTB-DEBUG: PortAudio says: dwChannelMask =0x3

PTB-DEBUG: PortAudio says: nChannels =2

PTB-DEBUG: PortAudio says: nSamplesPerSec =48000

PTB-DEBUG: PortAudio says: nAvgBytesPerSec=384000

PTB-DEBUG: PortAudio says: nBlockAlign =8

PTB-DEBUG: PortAudio says: wBitsPerSample =32

PTB-DEBUG: PortAudio says: cbSize =22

PTB-DEBUG: PortAudio says: WASAPI::OpenStream(output): framesPerUser[ 480 ] framesPerHost[ 480 ] latency[ 10.00ms ] exclusive[ YES ] wow64_fix[ NO ] mode[ EVENT ]

PTB-INFO: New audio device -1 with handle 2 opened as PortAudio stream:

PTB-INFO: For 2 channels Playback: Audio subsystem is Windows WASAPI, Audio device name is Haut-parleurs (Périphérique High Definition Audio)

PTB-INFO: Real samplerate 48000.000000 Hz. Input latency 0.000000 msecs, Output latency 10.000000 msecs.

I did some quick testing with a Windows-10 tablet and Realtek HDA sound chip, (ab)using the sound card of a 2nd computer as a oszillograph. Could reproduce some of the shortening under the (default) reqlatencyclass 1 setting. The problem went away, with a shortening error < 1 msec over 2 seconds playback (ie. ~1999.5 msecs instead of 2000 measured) if reqlatencyclass 2 or higher was used. Ofc. using a soundcard as oszillograph is not as precise as an oszillograph, and small errors are expected due to potential effects of the bandpass filters in sound cards, and other fun electronic artifacts, and simply manufacturing tolerances and slight clock deviations, but < 1 msec is reasonable.

Just for reference, another thing that worked even at default reqlatency 1 setting was to specify the desired stop time for the sound as parameter to PsychPortAudio(‘Start’), setting stop time 2 seconds after start time, e.g.,

t = GetSecs; tStart = t + 1; tEnd = tStart + 2;
PsychPortAudio(‘Start’, pahandle, 1, tStart, 1, tEnd);

This way PsychPortAudio actively schedules sound offset as it does sound onset. This suggests that the timestamps calculated by our driver should be fine and can be used for precise sound onset/offset scheduling inside the driver.

What didn’t work at reqlatency 1 setting at least on Windows-10 with those HDA sound chips was playing a sound vector at a frequency of 48000 samples/sec that has a lenght of 2 * 48000 == 96000 samples and expect an actual sound duration of 2 seconds, as one would rightfully expect :confused:.

My hunch is that the compression happens because of some off-by-n errors in buffer adaptation: If (see your debug output) the framesPerUser size of the audio buffers provided by PsychPortAudio to the underlying Portaudio engine doesn’t match the framesPerHost buffer size of the buffers consumed by the Windows WASAPI sound system, then buffersize has to be adapted, ie. taking content from multiple small user buffers and packing it into one bigger host buffer (e.g., 480 vs. 1056 at reqlatency 1 on your setup). Maybe, because 1056 is not an integral multiple of 480, something goes wrong with the repacking, e.g., missing a sample or two for each source buffer, so by dropping some samples equally distributed over the whole soundvector, the sound data gets shortened to less than the 96000 samples. If this is a bug in the underlying Portaudio library buffer processor, or a bug in Windows WASAPI’s kernel mixer (which mixes sound from multiple audio apps/sources playing to one physical sound device), i don’t know, and i don’t have the time atm. to look into this further.

In reqlatencyclass 2, PsychPortaudio acquires exclusive access to the audio hardware, so the hardware can be programmed to use a fitting host buffer size, avoiding the need for buffer size adaptation/repacking, and disabling the Windows kernel mixer. You see in reqlatencyclass 2 framesPerUser == framesPerHost == 480, exclusive = YES, and latency is reduced from 22 msecs to 10 msecs. The downside of reqlatencyclass >= 2 is that all other sound apps will stop working as the sound hardware is taken away from them.

So i guess my recommendation is to run your experiment setup always at reqlatencyclass 2, 3 or 4 for actual data collection. This optimizes the hardware most aggressively for performance and timing while disrupting any use of other sound apps at the same time. It also disrupts use of parallel sound output from movies played with our own built in movie playback functions, because the underlying GStreamer multi-media framework counts as a separate audio app from the pov of the operating system.

On Linux, PsychPortaudio basically always runs at a reqlatencyclass of 2 by default unless one takes measures to allow sharing the sound card.

On macOS, audio device sharing at default reqlatencyclass 1 is used by default like on Windows, so similar bugs could cause similar effects, in case somebody wants to test this.

Btw. i couldn’t measure such compression with reqlatencyclass == 0. In that case the prehistoric Windows MME sound system is used. Ofc. under MME you might get the duration right, but then latency is in the multi-hundred msecs range and absolute timing precision is equally bad, so this is not much of an option.

If my hunch about the problem is correct, you’d expect different performance depending on sound hardware used, as different hw may use different framesPerHost hostbuffer sizes at reqlatencyclass 1, also depending on audio driver and operating system.

Life is full of trade-offs…
-mario

Hi again, we were just able to get back to the institute now.
It turns out that our problem was not caused by PTB, but that our oscilloscope’s timing is off.
With another one, the output duration is fine. Sorry for the false alarm!

Of course now you’d need a 3rd oscilloscope to decide which of the two is wrong?

We recorded the stimuli with another laptop’s built in microphone, which was also fine. So it is 2:1

Okay that’s good. But then my own results showed something like what you reported when using another laptops soundcard, so it’s 2:2 :).

I guess my recommendation would be to still use reqlatencyclass 2, 3 or 4 instead of the default of 1 for data collection, just to rule out the possibility of some artifacts on some systems.

We will pay attention to that, thank you!