|
RESEARCH
We are always
recruiting participants for our studies.
If you have no history or speech and/or hearing disorders and
would like to serve as a volunteer, please email Sarah Sullivan.
|

3-D reconstructions of static male and female vocal tract
shapes during the production of an American English vowel (image
provided by Dr. Story).
|
Project:
Vocal Tract Normalization
We are currently studying listeners’
abilities to understand speech that is produced by a variety of
speakers. Acoustical analyses
reveal large degrees of variation amongst speech produced by different
talkers. For example, the word
‘dog’ produced by two different people will be acoustically very
different. A listener, however,
will likely perceive both productions as ‘dog.’ This perceptual constancy in light of
acoustic variability is referred to as talker normalization. The most commonly cited example of
talker normalization is the perception of speech produced by males and
females. Despite widely
different acoustic signals, resulting largely from differences in vocal
tract size/shape, listeners have little difficulty understanding speech
produced by either sex. Dr. Brad Story, one of our collaborators here in the Speech, Language, and
Hearing Sciences department at UA, has examined variance in vowel
productions using Magnetic Resonance Imaging (MRI) and X-ray microbeam
technology. These images have
shown that many of the differences between talkers result from
differences in their neutral vocal tract shape (the shape of the
airspace when producing a neutral vowel). We will use Dr. Story’s model
to investigate the role neutral vocal tract size/shape plays in talker
normalization.
Story and Titze (2002) paper on neutral vocal tracts
Story 2005 paper
on vocal tract model
|
|
|
|
|
|

Stimuli
were created by manipulating the duration of the silent gap between the “s”
and “a” in the word “say.”
|
Project:
Speech in Noise
We are conducting a series of experiments studying how
different listening environments (i.e., quiet, noisy) affect speech
perception. It has been shown that
the perception of words such as “say” or “stay” can be shifted in a
predictable manner by changing the duration of a silence gap between the
“s” and the vowel in “say” (Best, Morrongiello, and Robson, 1981). A number of similar findings involving
different words, syllables, and complex non-speech sounds are well
documented in the area of speech perception. An individual’s perception, however,
might also be affected by the listening context in which the speech is
presented (i.e., the words are embedded in background noise). Specifically, we predict that many
classic perceptual shift studies may result in different findings if the
same stimuli are presented in a variety of noises (e.g., speech-shaped
noise, white noise, etc.). It is
typical for experimenters to present speech (or complex speech-like sounds)
in quiet. These tasks are generally
easy for normal listeners and fail to accurately model real life listening
situations. Most real word speech
perception takes place in non-ideal environments filled with varying
degrees of background noise. The
results will inform us about how listeners integrate acoustic cues to
categorize speech sounds phonemically.
2008 AAS Poster
Best,
Morrongiello, and Robson (1981) paper
|
|

Ambiguous “say-stay”
stimuli were embedded in quiet and speech-shaped noise.
|
|
|
|

Neural correlates of speech perception
|
Project: High Density
Electroencephalic Event-related Bandpower as a Biomarker for Disordered
Speech Perception
This collaborative project
between UA (Dr. Lotto’s Auditory Cognitive Science Lab), ASU (Dr. Julie
Liss’ Motor Speech Disorders Lab), and the Mayo Clinic in Scottsdale (Dr.
John Caviness) seeks to merge three strong independent lines of research
with emerging state-of-the-art functional brain mapping technology. Our ultimate goal is to define the
temporal-spatial cortical activation associated with the perception of
intelligible speech. The topic of
the neural correlates of speech processing is of broad interest, with both
contemporary clinical and basic science implications. The recent surge of
papers on the topic includes results of functional neuroimaging studies
that define the anatomical substrates of speech processing. However, the temporal sequence of
activation of these structures, an aspect critical to processing degraded
speech like dysarthria, remains to be established. We are in a unique position to tackle
this question because of our constellation of expertise in perceptual
processing of normal and disordered speech and state-of-the art
electroencephalography (EEG) mapping.
This project will incorporate behavioral data on speech perception
and production with neurophysiologic data to develop and test a model of
how listeners map from auditory to semantic representations. The novel EEG
modeling techniques along with work on speech perception at the phonemic
and sentence level will be the basis for a unique research program with
theoretical and clinical impact.
|
|
|
|

Speaker Normalization
|
Project:
Talker/Speaker Normalization
Numerous studies have demonstrated that the perception
of a single speech sound, typically a vowel or consonant, can be altered by
the characteristics of surrounding sounds, sentences, or carrier phrases
(Ladefoged and Broadbent, 1957; Mann, 1980, Summerfield, 1981; Lotto and
Kluender, 1998; and Holt, 2005). It
has been suggested that this perceptual shift allows listeners to account
for individual speaker variability in speech production. That is, the listener ‘tunes’ or ‘normalizes’
his/her speech sound categories in accordance with the particular
characteristics of the talker. The
underlying assumption is that in order to have robust spoken language
perception, one cannot rely on average categories for individual speech sounds
across speakers. However, in the 50
years of study regarding “talker normalization,” there has been no
demonstration that the perceptual shifts are necessary or even helpful in
the understanding of normal spoken language. We are in the process of testing the
importance of talker normalization for comprehension of spoken language in
a number of ways.
Classic
Ladefoged and Broadbent (1957) paper
|
|
|
|

Top-down lexical influences or general auditory
effects?
|
Project:
A General Auditory Explanation for Lexical (i.e., TRACE model) Context
Effects
In 1988, Elman and McClelland presented data
suggesting that context effects can be triggered by “illusory phonemes.” In their study, listeners were asked to
participate in a phoneme identification task whereby context words (e.g.,
“foolish” and “Christmas”) were followed by a target sound (an ambiguous
/t/-/k/ or /d/-/g/). Manipulations
were made to the final sound of the context word to create an intermediate
“sh”/”s” sound. The TRACE model was
then used to accurately predict listeners’ phoneme identification shifts,
through the use of top-down lexical influences. However, there may be a simpler
explanation for these findings, one that relies on general auditory
contrast effects like those obtained by Lotto and Kluender (1998). This study tests whether acoustic
characteristics of the context words, as opposed to the linguistic content,
can account for the findings.
Elman &
McClelland (1988) paper
Lotto & Kluender (1998) paper
|
|