|
|
University of Arizona
Department
of Speech, Lang., & Hearing Sciences
Lab:
Auditory Cognitive Science Lab (ACSL)
Lab
Director: Dr. Andrew J. Lotto
|

Vocal
Tracts
|

Pink
Noise
|

Spectrogram
|
|
|
|
My role:
As a research scientist and lab
manager in Dr. Lotto’s laboratory at U of A, I helped create and
continue to maintain the department’s first Auditory Cognitive Science
Lab. I provide assistance with
study design/development, stimuli creation, subject recruitment, as well as
the collection/entry/organization/analysis/presentation of data. I am also responsible for
creating/managing lab databases, writing research reports, supervising
student researchers and webpage design/creation/maintenance. In addition to my normal lab duties,
I am also involved in the development and organization of the Auditory
Cognitive Science Society (ACSS) as well as the coordination of ACSS
meetings.
Project:
Vocal Tract Normalization
Currently, we are studying
listeners’ abilities to understand speech that is produced by a
variety of speakers. Acoustical
analyses reveal large degrees of variation amongst speech produced by
different talkers. For example,
the word ‘dog’ produced by two different people will be
acoustically very different. A
listener, however, will likely perceive both productions as
‘dog.’ This
perceptual constancy in light of acoustic variability is referred to as
talker normalization. The most
commonly cited example of talker normalization is the perception of speech
produced by males and females.
Despite widely different acoustic signals, resulting largely from
differences in vocal tract size/shape, listeners have little difficulty
understanding speech produced by either sex. Dr. Story, one of our collaborators here in the
Speech, Language, and Hearing Sciences department at UA, has examined
variance in vowel productions using Magnetic Resonance Imaging (MRI) and
X-ray microbeam technology.
These images have shown that many of the differences between talkers
result from differences in their neutral vocal tract shape (the shape of
the airspace when producing a neutral vowel). We
will use Dr. Story’s model to investigate the role neutral vocal
tract size/shape plays in talker normalization.
Story and Titze (2002) paper on
neutral vocal tracts
Story 2005 paper on vocal tract
model
Project:
Speech in Noise
We
are conducting a series of experiments studying how different listening
environments (i.e., quiet, noisy) affect speech perception. It has been shown that the
perception of words such as “say” or “stay” can be
shifted in a predictable manner by changing the duration of a silence gap
between the “s” and the vowel in “say” (Best,
Morrongiello, and Robson, 1981).
A number of similar findings involving different words, syllables,
and complex non-speech sounds are well documented in the area of speech
perception. An
individual’s perception, however, might also be affected by the
listening context in which the speech is presented (i.e., the words are
embedded in background noise).
Specifically, we predict that many classic perceptual shift studies
may result in different findings if the same stimuli are presented in a
variety of noises (e.g., speech-shaped noise, white noise, etc.). It is typical for experimenters to
present speech (or complex speech-like sounds) in quiet. These tasks are generally easy for
normal listeners and fail to accurately model real life listening
situations. Most real word
speech perception takes place in non-ideal environments filled with varying
degrees of background noise.
The results will inform us about how listeners integrate acoustic
cues to categorize speech sounds phonemically.
2008 AAS Poster
Best, Morrongiello, and Robson (1981) paper
Project:
Talker/Speaker Normalization
Numerous
studies have demonstrated that the perception of a single speech sound,
typically a vowel or consonant, can be altered by the characteristics of
surrounding sounds, sentences, or carrier phrases (Ladefoged and Broadbent,
1957; Mann, 1980, Summerfield, 1981; Lotto and Kluender, 1998; and Holt,
2005). It has been suggested
that this perceptual shift allows listeners to account for individual
speaker variability in speech production. That is, the listener
‘tunes’ or ‘normalizes’ his/her speech sound
categories in accordance with the particular characteristics of the
talker. The underlying
assumption is that in order to have robust spoken language perception, one
cannot rely on average categories for individual speech sounds across
speakers. However, in the 50
years of study regarding “talker normalization,” there has been
no demonstration that the perceptual shifts are necessary or even helpful in
the understanding of normal spoken language. I am in the process of testing the
importance of talker normalization for comprehension of spoken language in
a number of ways.
Classic
Ladefoged and Broadbent (1957) paper
Project:
A General Auditory Explanation for Lexical (i.e., TRACE model) Context
Effects
In
1988, Elman and McClelland presented data suggesting that context effects
can be triggered by “illusory phonemes.” In their study, listeners were asked
to participate in a phoneme identification task whereby context words
(e.g., “foolish” and “Christmas”) were followed by
a target sound (an ambiguous /t/-/k/ or /d/-/g/). Manipulations were made to the final
sound of the context word to create an intermediate
“sh”/”s” sound. The TRACE model was then used to
accurately predict listeners’ phoneme identification shifts, through
the use of top-down lexical influences. However, there may be a simpler
explanation for these findings, one that relies on general auditory
contrast effects like those obtained by Lotto and Kluender (1998). This study tests whether acoustic
characteristics of the context words, as opposed to the linguistic content,
can account for the findings.
Elman &
McClelland (1988) paper
Lotto &
Kluender (1998) paper
|
|
|
|
University of Texas
Psychology
Department
Lab:
Auditory Cognition & Speech Perception Lab
Lab
Director: Dr. Randy L. Diehl
Click on pictures to see full-size
version
|

Me at
UT-Austin
|

Psych Building
|

Bayesian
Experiments
|

Thomas
Bayes
|
|
|
|
My Role:
As
a graduate research assistant, I was in involved in all aspects of the
following projects, from start to finish. This included literature
reviews/summaries, project design, obtaining IRB approval, stimuli
creation, subject recruitment, data collection, data entry, statistical
analyses, and presentation of the results at various professional
conferences (Acoustical Society of America and Cognitive Science Society)
as well as at departmental talks.
For much of my time as an R.A. in Dr. Diehl’s lab, I also
served as a lab manager whereby I was responsible for equipment
maintenance, backing up lab computers and the server, ordering equipment/supplies,
designing/creating/maintaining the lab webpage, interviewing undergraduate
research assistant applicants, and organizing weekly journal discussion
groups.
Project:
Applying Bayesian Statistical Models to Auditory Categorization
(master’s thesis topic)
A
number of auditory tasks, including speech perception, require listeners to
categorize stimuli on the basis of one or more features of the input.
In many cases, especially speech, there is no one-to-one mapping between values
along continuous features and discrete categories (e.g., phonemes).
How then do perceptual systems categorize stimuli under uncertainty?
One possible solution is to use probabilistic information from experienced
stimulus distributions to optimize accuracy.
We
propose that perceivers incorporate distributional knowledge about the
acoustic environment with the information provided by the signal in order
to make optimal (i.e., maximal accuracy) categorical decisions.
Statistical approaches such as this are widely used in vision research but
are rarely applied to auditory or speech perception. Our goal in this
study was to develop a framework that will provide testable hypotheses
about the nature of statistical (distributional) learning in auditory perception,
in general, and speech perception, specifically.
150th
ASA Minneapolis poster
149th
ASA Vancouver poster
26th
CogSci Chicago poster
146th ASA Austin
invited address
Tutorial on Bayesian Approach
Holt &
Lotto (2003) Lay Language Paper on Statistical Learning of Speech (and
Non-Speech) Categories
|
|
|
|
Boy’s
Town National Research
Hospital
Labs:
Speech Perception Lab & Speech Production Lab
Lab
Director: Dr. Andrew J. Lotto
Click on pictures to see full-size
version
|

Perception Lab
|

Production Lab
|

Child with CI
|

Tiger
|
|
|
|
My role:
As a research technician, I was
responsible for setting up new Speech Production and Speech Perception labs
in the newly constructed Lied
Learning & Technology Center at BTNRH. This involved purchasing equipment,
computers, software, supplies, as well as furniture and then transforming
the empty space into useable lab space. I also performed hearing screenings
and equipment calibration, recruited/scheduled research subjects, collected/entered/analyzed/presented
data, and oriented new personnel (research assistant and post-doctoral
student) to the labs.
Project:
Effects of Communication Mode and Inflection on CI-user Speech (Funded by
NIH-NIDCD)
The
goal of this study was to determine if the acoustics of speech produced by
cochlear implant (CI) children could be affected by variables of the
elicitation task. Two variables
were examined: communication mode and inflection of a sign model. Based on previous findings with NH
adults (Schiavetti,
Whitehead, Metz, & Moore, 1999; Schiavetti, Whitehead, Metz, Whitehead,
& Mignerey, 1996; Whitehead, Schiavetti, Metz, & Farrell, 1999;
Whitehead, Schiavetti, Whitehead, & Metz, 1995), we
predicted that spoken word, vowel and VOT duration would be lengthened
during simultaneous communication (SC) compared to speech alone (SA) and
that these perturbations would be greatest for children with limited speech
skills. These predictions were
not supported by the data.
Communication mode did not significantly impact any of these
temporal measures and there was no interaction with PBK grouping of the
children. We also proposed that
temporal disturbances would be more substantial for signs with multiple
movements (Whitehead,
Schiavetti, Whitehead, & Metz, 1997), but
this was also not borne out in the data.
Project:
Strategies used to Increase Speech Clarity by Normal-Hearing Children
(Funded by NIH-NIDCD)
It
is common among clinicians to ask children to produce their “best
speech” during intervention.
However, it is unclear that children know how to make their speech
clearer. The strategies used by
children with and without hearing loss have implications for maximizing
intelligibility and for understanding the development of communication
competency. As a first step
toward this understanding, children (7 to 12 years of age) with normal
hearing were asked to read ten simple sentences. They were told that they testing a
new computer program designed to recognize speech. There was, in fact, no recognition
program and each child received the same “output”
feedback. After providing
“normal” speech to allow the program to “get used to
their voices”, they subsequently produced their “best”
and then “very best, very clearest” speech in order to see how
accurate the recognition program could be. Acoustic analyses (intensity,
fundamental and first two formant frequencies for all vowels, as well as
sentence, vowel, and VOT durations) were performed on recorded waveforms
from each repetition in order to determine what the children were varying
to comply with “best speech” instructions. The results demonstrate large
individual differences in strategies and persistent gender
differences.
148th
ASA San Diego poster
Project:
Tiger Auditory Perception and Vocalizations
While
at BTNRH, I collaborated with Edward Walsh, PhD and Joann McGee, PhD on a
study aimed to describe the hearing capabilities of tigers and the acoustic
properties of their calls. Audiograms indicate that the tiger’s
auditory system is highly sensitive to low frequency sounds and possibly
infrasonic sounds. We are in the process of analyzing the acoustic properties
of tiger vocalizations to determine whether production data are consistent
with the perceptual findings. Low frequency sounds travel farther and
would be adaptive to the solitary tiger who wished to maintain hunting
territories and communicate with possible mates. Through this
research, it may be possibly to identify individual tigers based on their
unique vocalizations. Tiger identification through acoustic measures
would be useful in the monitoring of tigers in their natural habitat.
Walsh et. al
(2003) ASA Lay Language paper
NewScientist.com
Tiger article
|
|
|
|
Washington State
University
Psychology
Department
Lab:
Speech Perception Lab
Lab
Director: Dr. Andrew J. Lotto
Click on pictures to see full-size
version
|

Me at
WSU
|

Breathiness
|

Locus of
Context Effects Expt.
|
|
|
|
My Role:
I
worked as an undergraduate research assistant in Dr. Lotto’s Speech
Perception Lab for three semesters.
Although the level of my involvement in the following projects
varied to some degree, I was generally responsible for literature
reviews/summaries, subject recruitment, data collection, data entry, data
analyses, and data presentation (in the form of a poster presented at the
Acoustical Society of America in 2002). I also participated in regular lab
meetings and journal discussion groups as well as performed general
clerical duties (e.g., made copies, ran errands, picked-up articles from
the library, etc.).
Project:
Auditory Enhancement in Female Speech (Funded by NIH-NIDCD)
One
of the more notable differences between male and female speech is that
females have a higher fundamental frequency (f0) than males (this is
because female vocal tracts are, on average, 15% shorter than male vocal
tracts). Signals with higher
f0s are represented by fewer harmonics and therefore, should result in
intelligibility problems. That
is, vowels produced by females (with their high f0) are under-sampled
relative to vowels produced by males.
Basic acoustic manipulations demonstrate this idea; as you raise a
vowel in f0, identification scores drop. It turns out; however, that female
speech is actually more
intelligible than male speech.
This means that females must do something to compensate for their
high f0’s.
A
basic observation of English-speaking females is that they use breathy
phonation more often then males.
A popular theory explains this as “If a woman can manage to
sound as though she is sexually aroused, she may be regarded as more
desirable or with greater approbation by a male interlocutor than if she
speaks with an ordinary modal voice” (Bladon & Henton,
1985). This explanation is
clearly flawed; however, in that it can’t explain why prepubescent
boys also use breathy speech (certainly they aren’t trying to sound
sexually excited!). Acoustical
analysis of breathy voice reveals an increase in the fundamental component,
an increase in spectral tilt, and an increase in noise at higher
frequencies. It seems plausible
that the noise from breathiness is filtered by the vocal tract, which would
result in a clearer spectral envelope, leading to better
intelligibility. Dr. Andrew
Lotto has hypothesized that females use breathy phonation contrastively in
order to make vowels more distinctive (i.e., ideally, females should make
high tense vowels breathy but not the lax counterparts). Simply put, Dr. Lotto believes that
breathiness is utilized to compensate for the challenges of high f0 and not
a result of social pressure.
Project:
Learning Complex Auditory Categories (Funded by NSF & NIH-NIDCD)
Exciting
new research has promoted a vital sub-field of speech perception research
concerned with describing the function of categories in the development and
maintenance of language-appropriate perception. Recent work has suggested
that at least part of the formation and function of phonetic categories is
a result of general perceptual categorization mechanisms not specific to
speech or language. Thus, there now appears to be opportunity for an
integration of general categorization research with work on first and
second language acquisition. Unfortunately, much of what is known about
perceptual categorization has been derived from examination of categories
that are fundamentally different from phonetic categories. Moreover, it is
empirically difficult to examine influences of categorization using speech
stimuli because it is extraordinarily difficult to determine a detailed
history of experience. Pilot work has suggested the utility of using
complex non-speech sounds in probing the learning mechanisms that drive
auditory categorization. These sounds can be synthesized to mimic
complexities of phonetic categories and distributions of stimulus
presentation can be theoretically derived to model aspects of phonetic
categories while maintaining full experimental control over experience.
The
main goals of this work are threefold. The first goal is to provide a
detailed database of the formation and structure of complex auditory
categories. There is a dearth of research in this area and the proposed
work will be useful in developing a taxonomy of auditory learning and
testing extant models of general perceptual categorization (which have been
based primarily on data from visual tasks). Experiments using explicit and
incidental learning procedures will map the development of categorical
response structures as listeners gain experience with novel stimuli. The
second goal is to compare the resulting structures that arise from these
categorization tasks to structures typical of speech categories such as
categorical perception and the "perceptual magnet" effect. The
third goal is to develop efficient methods of exposure and training to
teach non-native contrasts to second-language learners. Learning the sound
contrasts for a non-native language is an extremely difficult task.
Exposing the mechanisms of complex category learning could illuminate
potential aids to training individuals to discriminate these complex speech
categories. These aids could extend easily to other complex learning tasks
such as musical training, acoustic warning systems or auditory data
displays.
Project:
Locus of Context Effects
Previous
work has demonstrated that non-speech sounds with the appropriate spectral
characteristics can affect the identification of speech sounds (Lotto &
Kluender, 1998). It has been
proposed that these spectral context effects are due to interactions in the
peripheral auditory system. For
example, they could be the result of masking at the auditory nerve or of
auditory enhancement effects that have been demonstrated to be monaural
(Summerfield & Assmann, 1989).
To example the locus of the context effect, synthesized syllables
varying from /da/ to /ga/ were preceded by single formant stimuli that
mimicked the third formant of the syllables /al/ and /ar/. The non-speech stimulus was
presented either to the same or opposite ear as the target speech
stimulus. Subjects’
speech identifications were shifted as a function of context in predicted
directions for both presentation conditions. However, the size of the shift was
smaller when the context was in the contralateral ear to the target
syllable. These results agree
well with similar results for speech contexts. The data suggest that the context
effects occur at multiple levels of the auditory system and are not simply
examples of masking or auditory enhancement.
143rd
ASA Pittsburgh poster
Project:
Auditory Skills Therapy CD-ROM
Recent
studies indicate that auditory training is a useful intervention tool,
particularly for those with language impairments and auditory processing
disorder (APD). Because of this
urgent clinical need, Dr. Gail Chermak (Washington
State University),
Dr. Frank Musiek (University of Connecticut), and Dr. Andrew Lotto (University of Arizona) have begun developing a
CD-ROM that will be targeted towards these special populations. The training CD will contain a
number of basic auditory exercises such as intensity training tasks,
frequency training tasks, and temporal training tasks, etc. The CD is being designed to be
implemented in both clinical and home settings in a kid-friendly format
(every task is embedded into a game format).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|