Re: Recording with PsychPortAudio

On Sep 11, 2009, at 4:40 PM, Raymond Stanley wrote:

> Hello Mario,

Hi Raymond. I'll forward this mail to the psychtoolbox forum which is
the proper place to ask general questions about Psychtoolbox.

>
> Thank you for taking the time to read this email, and taking the
> time to be such an integral part of a tool that is essential for so
> much research.
>
> If and when you have the time, I have a rather relatively simple
> issue about using the PsychPortAudio interface for recording that
> I'd like to query you on:
>
> I'd like to use the record function to record the stimuli output
> and a voice response from a participant, save that to a wave file,
> and compute the reaction time post-experiment. I see that you've
> created some elegant online voice detection schemes, but I'd rather
> be able to listen to the responses in the data analysis stage.

There are two demos to show voicetriggers. SimpleVoiceTriggerDemo
just waits for the trigger and returns the timestamp.
BasicSoundInputDemo additionally records the wave data and saves it
to a file. The BasicSoundInputDemo however doesn't use low-latency
mode, so if you really wanted to use it for reliable voice triggers,
you should modify the 'Open' call to use low latency mode, just as in
SimpleVoiceTriggerDemo. This difference is because voice triggers
were just added later on to the BasicSoundInputDemo and i don't want
to use low-latency mode by default in that demo -- It would cause
failures on MS-Windows with non-ASIO hardware.

Another limitation of the demos is that they only use a simple
intensity threshold for detecting response onset, no clever filtering
schemes which would be better suited for vocal responses. The point
of the demos is to demonstrate how to get very (sample-accurate)
precise timestamps for events in the recorded sound stream, not to be
the "mother of all voice-triggers". Would be very simple to build on
that demos though and add more fancy stuff like a little integration
window in time or some filtering for more robustness.

>
> I've found conflicting evidence in different places about whether
> the timing of the record function has been found to be stable. So,
> I have two questions:
>
> 1) is the record function have timing as stable as the play function?

What kind of conflicting evidence? I never got any feedback about
that part, so you must know more than me.

In theory it should have timing as stable as the play function, i.e.,
the reported audio onset/capture timestamps should be as accurate -
usually down to sub-millisecond accuracy. The timestamping mechanism
behind capture timestamps is the same as the one for playback
timestamps.

In practice the mechanisms rely on the operating system and the
drivers / hardware of your sound hardware. A misconfiguration of the
sound system, or bugs in the os / drivers / hardware could lead to
wrong timestamps and timing. As opposed to visual stimulus onset
where we have a lot of consistency checks in place to spot such bugs,
there is no automated way to do the same for sound. Our driver has to
fully rely on the hardware doing the right thing. This is why you
must verify the timing at least once for your setup with some test
procedure to make sure it works for your specific setup.

In practice the timing mechanisms work well on the tested Linux and
MacOS/X systems, as well as on Windows machines with a soundcard that
has native ASIO support, e.g., many of Creative labs cards, M-Audio
cards, RME cards... Native support does not mean "Asio4All", which
is a softare emulation for cards that don't have native support.
"Asio4All" may work very well for some cards, and not well for
others. It is a hit and miss thing.

I tested the playback timing extensively on all operating systems,
but i only tested the capture timing indirectly on MacOS and Linux as
follows:

There is a script called KeyboardLatencyTest. It can be used to test
the timing accuracy/latency/variability of response devices.
Initially it could only test keyboards, hence the name. The current
version also tests mouse button responses, the Cedrus USB based
response boxes, the RTBox response box, the CMU button box and the
PST button box. The script runs ten test trials. In each trial the
user has to hit a button on the response device hard while the script
uses PsychPortAudio's "Voicetrigger" feature to detect the timestamp
of the noise made by hitting the response button. It compares that
audio timestamp of button press with the button press timestamp
collected from the response device, prints the difference and
computes the mean deviation and standard deviation.

If you have e.g., a MacbookPro, you can try it yourself. The
microphone is inside the left speaker, so you should place external
response boxes or mice close to the left side of the computer and
then run the script in a silent room. On MS-Windows you'd need an
external microphone properly placed and an ASIO capable soundcard.

The script basically measures the combined error and variability in
timestamping of both the response device and the PsychPortAudio
capture timestamps. I tested this setup on both OS/X and Linux with
the macbook pro's builtin soundchip and microphone with the Cedrus
response box, the RTBox and the PST button box, as well as with mice
and keyboards. While mice and keyboard showed the expected huge
latency and variability, the response boxes all showed almost perfect
timestamps with an accuracy of better than 1msec and variability of
at most 1 msec.

From that i conclude that the accuracy of both our response box
drivers and of the audio capture timestamping must be better than
1msec. Theoretically it could also happen that both the response
devices and the audio driver have huge errors, but that by pure
chance those errors are exactly the same in magnitude but of opposite
sign, so they cancel each other out perfectly. But given that the
same audio driver and hardware was tested over many trials with many
different response devices (each with its own type of connection and
driver/timestamping algorithm), that both modules (audio driver vs.
response box drivers) are completely independent in their operation
and that i always got the same consistently low error, it is much
more likely that everything worked correctly and our drivers are
trustworthy.

So i believe it works correctly and is very accurate if your sound
hardware/drivers are free of bugs and properly configured, which my
test setups apparently are.

In the end you'll need to repeat such tests on your hardware. Could
be that some device drivers out there are buggy or misconfigured,
which would explain the conflicting evidence. If you have a supported
response box you can test with the KeyboardLatencyTest script. If you
only have a mouse or keyboard, you can't test that way, as they are
expected to have enormous errors.

Another practical test would be to get a loopback cable that feeds
sound back from the line-out to the line-in and then write test
script in the spirit of KeyboardLatencyTest to output some test tone,
record it again and compare the sound output onset and sound capture
onset timestamps for consistency.


> 2) Would any record "timing" issues manifest itself in distortion
> of time between two points in a saved file, or would it only show
> in up in problems with onset latency of the recording?

This question i don't understand? The timestamps would be wrong.
Maybe you'd also hear audible artifacts and glitches/dropouts/
distorations. And the structure returned by PychPortAudio
('GetStatus',..); contains a subfield 'xruns' which would show a non-
zero value. If the driver detects overflows or underruns of its audio
buffers, it will increment the xruns count, but the driver can't
detect all types of malfunctions. If xruns is > zero then something
certainly wen't wrong. If xruns is zero then either everything wen't
fine, or some malfunction slipped through undetected.

best,
-mario


>
> Thanks very much-
> Ray
>
> --
> Raymond M. Stanley, Ph.D.
> Postdoctoral Fellow
> Memory And Cognition Lab
> Volen National Center for Complex Systems
> Brandeis University

*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.kleiner@...
office: +49 (0)7071/601-1623
fax: +49 (0)7071/601-616
www: http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)
I did some testing of full duplex stimuli delivery and recording to verify no dropped/added samples between recordings. My goal is to do posthoc calculations of reaction time based on voice response, where i'm interested in relative within-subject differences between conditions delivered in a hearing experiment.

I thought I'd report my results here in case anybody else had similar goals.

I used a Mac Mini 2 GHz Intel with 2 GB RAM, running OS X 10.5.8, with an M-Audio "Mobilepre USB" sound card.

Here was my input/output setup:

microphone --> input 1 of soundcard
headphone output of sound card --> input 2 of sound card ("stimuli patch loop")
line out of sound card --> audiometer --> headphones

I played a sound file with two clicks (1 msec long each), spaced two seconds apart. I placed the headphone on the microphone. I let the first click play, and then before the second click came, I unplugged the stimuli patch loop cable. This gave me a recording that included the stimulus patch loop and audiometer + microphone response for the first click, and then only the audiometer + microphone response for the second click. I repeated this 5 times.

The PsychPortAudio('GetStatus') command showed no reports of overruns/underruns.

The most important finding (for my purposes) was that the time between the end of the stimulus presentation and the beginning of the recorded click was consistently about 1999.8 milliseconds, +/-.2 milliseconds.

Now here are some other observations. The first click that included stimulus + response recording showed a total length of about 1.7 milliseconds long. The difference in amplitude between the stimulus patch loop and the microphone response suggested that 1.2 milliseconds was from the original stimulus patch loop, and the remaining .5 milliseconds or so was the lagged audiometer+ microphone response to that stimulus. Also, there was a consistent 74 milliseconds of silence at the beginning of the recorded sound file, even though there was no silence at the beginning of the source sound file.

For my purposes, these results suggest that this system will be plenty accurate, and I will move ahead with using this method. The only slight glitch that this testing tells me is that there could be a slight consistent absolute error due to a lag of the stimulus being passed through the audiometer (.5 milliseconds or less), but this should be constant across the conditions delivered in my experiment.

-Ray

**********************************************************************
Raymond M. Stanley, Ph.D.
Postdoctoral Fellow
Memory And Cognition Lab
Volen National Center for Complex Systems
Brandeis University
Boston, MA