Need help lowering audio latency. How low can Windows go?

Hello,

First of all, please note that I may be missing some fundamental audio tech knowledge or use incorrect terminology. I’m doing my best to learn as I go.

Simply put, I’m unable to get round trip latency below ~50ms when using PTB. That number comes from the beta function AudioFeedbackLatencyTest as well as external testing. I’m aiming for 12 ms of latency, but might could settle for 24.

I don’t think PTB is the cause of the latency, per se. In Audacity, when I monitor the experiment mic through the experiment headphones, or external speakers, using WASAPI drivers, I’m getting a similar amount of latency. That makes me think the cause of the latency is either inherent to WASAPI or something else in my setup. So, I’ve tried testing other things in my setup.

  1. I ran LatencyMon, a program which looks for drivers in your system that are causing latency when an audio stream is running. Overall drivers contributed only a couple hundred microseconds of latency, so seemingly all good there.

  2. I’m using a USB audio interface, the MOTU Microbook II. Sound from the mic can be routed through the CueMix FX (MOTU’s software for controlling the hardware and mixing/routing), and back to the headphones with ~3 ms latency. This is not direct monitoring on the interface itself; it’s doing a round trip through the USB cable. So seemingly this interface itself isn’t the issue either. For what it’s worth, I’ve also tried a Scarlett Solo, which produced even worse results, ~160 ms round trip latency.

  3. I’ve read several blog/support posts on other websites which give Windows setting recommendations for reducing latency. None of those have helped.

I have no special sound card installed in my computer. It’s just the default on the motherboard. High Definition Audio Device is the name, if I’m looking at the right thing in the Device Manager. So I imagine that’s not helping.

Based on this Windows documentation, it sounds like WASAPI is capable of 10 ms round trip latency, but only when using the IAudioClient3 interface. Is that interface being used in PTB? (Sorry, I don’t really know how to check this myself.)

To sum up, here’s what I’m looking for:

  1. To people who’ve made latency-sensitive audio experiments in Windows, what’s the lowest you’ve achieved?
  2. Thoughtful recommendations for how to achieve low latency. (Yes, you can tell me “use Mac or Linux” if that’s what you really think.)

Have you tried setting the reqlatency parameter in the ‘Open’ call to 4? That’s the most aggressive setting, only works on Windows-10, not earlier Windows versions, and asks for the lowest latency. AudioFeedbackLatencyTest hasn’t been updated in a long time, and not since we have Wasapi support, so might not provide best possible results.

I’ve measured 10 msecs one-way latency on some onboard HDA soundchip with Windows-10, so 20 msecs roundtrip should be possible under the right circumstances, however i didn’t test roundtrip in quite a while. DelayedSoundFeedbackDemo.m might be worth a try and a read.

It also depends on what task you want to achieve?

After using a reqlatencyclass of 4 for input and output devices, I got round trip values of ~32 ms. Thanks for the boost. (Previously I’d gotten PTB errors when trying to open these devices with a latencyclass > 1, but I think that was user error.) 32 ms is better, but still not going to cut it probably.

All my tasks will rely on voice onset triggering some action. For the initial experiment, I want to trigger an audio file to play on voice onset. Low latency is important because the goal of the experiment is to replace long voice onset time (VOT) stops like /t/ (50 ms) with short ones like /d/ (17 ms). So, realistically, within 17 ms of someone saying /t/, I need to start playing an audio file so they hear /d/. (A replication of Mitsuya, 2014)

For future experiments, we might trigger the Audapter software package to do some form of formant manipulation, which would then be fed back to the speaker as feedback. That would have similarly stringent latency requirements.

32 msecs is better, but still seems higher than what should be possible. Can you show the output at PsychPortAudio(‘Verbosity’, 10)? Have you tried different sampling frequencies? Depending on driver/os/hw, sometimes doubling the frequency, e.g., from 48000 Hz to 96000 Hz can cut the latency in half. You mentioned different sound cards, did you try all of them?
Btw. PsychPortAudio does use IAudioClient3 if the OS+driver+hw combo supports it. The Windows docs you referenced also contain installation instructions for a Microsoft “inbox HDA” sound driver which may give lower latency - well that, or screw up your onboard HDA sound chip.

Giving Linux a try is another option, given how easy a dual-boot is doable. During past testing i did manage to achieve playback latencies of 5-7 msecs on some onboard HDA soundchips, haven’t measured roundtrip in a while, so it is at least worth a try with yours. Also, for timing critical stuff, Linux is almost always superior.

In this shared folder are some results of running AudioFeedbackLatencyTest with Verbosity == 10. The txt files are >1000 lines long so I’m not posting them directly here. Note that what the latency test reports as “mean offset latency” matches what my independent results are getting for round trip; the latency test’s “roundtrip latency” is much shorter than my measured round trip latency.

I tested with sampling rates of 48 and 96 kHz and got very similar results for the MOTU Microbook, still right around 32 ms. For the Focusrite Scarlett Solo, 48 -> 96 kHz went from 125 -> 114 ms latency. The RMS Babyface Pro FS @ 96kHz and reqlatencyclass == 4 got 80 ms latency. That finding was particularly painful, since it seems the Babyface is touted as top of the line. All devices are using up to date first-party drivers.

My computer already has the HDAudio device installed, and as far as I can tell, it is in use. It would be great to continue using Windows for various reasons, not the least of which is that I’ve never used Linux before. But if Windows ain’t doing it, I’ll figure out another OS.

Don’t trust that script too much. It was an early hack, with big fat warnings added that you shouldn’t trust it. I’ve just improved the script quite a bit. One still shouldn’t totally trust it, as i don’t have my measurement equipment around where i am, so the new script is still not properly verified. But the numbers i get on my 10 years old laptop with Linux and onboard sound at least look pretty plausible and consistent.

Here’s the new script:

https://raw.githubusercontent.com/kleinerm/Psychtoolbox-3/master/Psychtoolbox/PsychTests/AudioFeedbackLatencyTest.m

That said, the numbers you get with the old script are ridiculous. The debug output suggests stuff is working correctly, but the measurements are absurd. Something must be weirdly broken with your USB audio interfaces on Windows.

With the new script, this:

AudioFeedbackLatencyTest(0) would give you measured/estimated input latency from sound onset to detection by the script as “Mean input latency”, iow. the expected delay your “voice key” would have from subjects voice onset to when it could call PsychPortAudio(‘Start’) to play the perturbed feedback. “Mean scheduling offset” should report less than 1-2 msecs on a properly working system, usually below 1 msec.

AudioFeedbackLatencyTest(1) otoh. would give you measured delay from PsychPortAudio(‘Start’) in response to detected voice onset to sound actually playing, e.g., as “Mean startup to sound output delay”. Obviously this is all based on the assumption that timing and timestamping works on your setup, which it does on my Linux system, but seems to be badly bungled on your machine.

I get numbers on my 10 years old Linux laptop + onboard sound that suggest something like 14 msecs for your task or less would be doable on this machine.

So with the right equipment your task should be doable, for what its worth.

Thanks for the new script. Results are again in my shared folder in the subfolder dated 6-17-20. It seems Mean scheduling onset is at least one culprit, since my results showed 30 ms as opposed to your <1 ms.

And I agree, the values I’m getting are ridiculously bad. But I do think they are somewhat localized to Psychtoobox. I can run real-time audio through Audapter, another mex-based Matlab program but which uses ASIO drivers, and independently measure 15 ms round trip latency. That’s with the same hardware setup and Windows settings I’m using when running PTB.

I also tested these devices on a separate Windows 10 PC and got identical results: 15 ms round trip latency in Audapter, 30+ ms in PTB.

Also in the shared folder are results from PsychPortAudioTimingTest, with results for absolute timing and scheduled timing. If I’m reading the results correctly, the scheduled timing system in PTB appears to be working well.

I tested on my Win-10 machine as well, and something seems broken in Windows WASAPI on the input side, at least for reqlatencyclass > 1. The timestamps i get are just as bogus as yours, apparently because Wasapi sends more data every second than is physically possible for a selected sample rate, as if capture would be running at 1.5x the speed that is requested.

The output side looks more trustworthy, and at least that always worked very well when i tested against external equipment. So i’d distrust “Mean scheduling offset” and “Mean startup to sound output delay” completely. Seems to work for default reqlatencyclass 1 though, but then the latency on the input side is much higher.

Otoh., “Sound output delay 19.981800 ms.” looks plausible - i don’t have my equipment around atm. to really test this. At least so far i haven’t ever seen that part fail.

And “Mean roundtrip latency” is based on pure GetSecs measurements for detecting the elapsed time from calling PPA(‘Start’) for playback to when the while loop that checks for sound onset breaks. So that shouldn’t be too wrong.

So that would suggest about 10 msecs for detecting noise via the microphone, and 20 msecs to hear a sound after giving the ‘Start’ command.

Here’s what you can try: Specify the optional ‘buffersize’ parameter in PsychPortAudio(‘Open’,…) to something low, e.g., 48 if you use a samplerate of 48000 Hz, and see if you can force your sound hardware into a lower latency. Try to tinker it so that the line “PTB-DEBUG: PortAudio says: WASAPI::OpenStream(output): framesPerUser[ 960 ] framesPerHost[ 960 ] latency[ 10.00ms ] exclusive[ YES ] wow64_fix[ NO ] mode[ EVENT ]” reads like shown here, but ideally framesPerUser and framesPerHost match, but with smaller numbers. Or at least framesPerHost is an integral multiple of framesPerUser.

Thanks for confirming I’m not crazy! All right, I’ll try playing around with those settings to see if I can get some more serviceable numbers.

If I’m understanding you correctly, something would need to be changed in PTB before these issues would be resolved for all users. Should this be added to the known bugs list in the meantime? And if I may be so bold, do you have a sense of when you might get around to looking into this issue more?

I spent in total almost a whole work day on this and the result is that this is definitely not a Psychtoolbox bug, but a bug or limitation in the underlying 3rd party Portaudio library or even in the Windows Wasapi sound system. Looking at the Portaudio source code for Wasapi shows that Windows Wasapi has quirks (to say it in the most friendly way) on the sound input side which make this challenging for Portaudio to handle. Tons of special cases depending on the sound hardware connected/audio driver used, and the specific sound settings requested. The amount of complexity needed is pretty awful, so i’m not that surprised bugs could creep in easily on the capture side for various sound cards and not get noticed for a long time.

Diagnosing any further if this is a Portaudio problem or MS-Windows problem would take a lot of time and effort, fixing it (if it is fixable from our side at all, ie. not a MS-Windows bug) so i will almost certainly not work on this anytime soon, and not without being contracted and paid to do so. If i spent a day investigating something and at the end i’m more surprised that it works at all, than that it has bugs usually doesn’t mean it will be an easy or quick fix.

What i can say is that there are fundamentally different processing paths in Portaudio depending if you use reqlatencyclass 1 or > 1, and the low-latency settings you’d need (lower than 20 msecs input latency) seem to be even more broken, so atm. you probably can’t get precise timestamps for voice onset and low latency at the same time. There are also different processing paths for full-duplex vs. half-duplex, and various special cases depending on hardware properties.

I guess the best you can do is tinker. As far as i understand your paradigm you don’t need precise timestamps for voice onset, only a quick low-latency way to detect if voice onset happened, so you can quickly call PsychPortAudio(‘Start’) for the playback part. So it may work for you to just push input latency down by tinkering with the bufferSize parameter and setting reqlatency to 4.

Or switch to Linux, where this stuff should work better, and especially if you have timing requirements like yours, as long as you use a reasonably supported sound card. Onboard HDA sound usually works well if the sound chip actually sticks to the HDA standard or is not super brand-new or exotic, as do UAC compliant USB sound cards, and the stuff that is officially supported by ALSA.

Thanks for your unending support on this, Mario. I’ll do some tinkering in the near future and will report back here with what numbers I get for documentation purposes. But long term I’ll probably plan on using Mac/Linux whenever I can get away with it.

I’m marking your previous post as the solution, since it resolves my initial question, as far as I’m concerned.

Yes, please report back, others might appreciate it. And who know, maybe there are some new clues to be had from new results.

Also, try to stay away from Apples toys if you can. The recommendation is Linux for demanding stuff, and Windows-10 as a fallback or for not very demanding stuff. Apple is the worst hardware and software platform one could use for data collection for anything non-trivial and i can only see this becoming worse in the foreseeable future.