SARAH C. SULLIVAN

                                                             home     background     research     publications     cv     links     personal

 

 

 

 research

  

 

University of Arizona

Department of Speech, Lang., & Hearing Sciences

Lab: Auditory Cognitive Science Lab (ACSL)

Lab Director: Dr. Andrew J. Lotto

 

VT

Vocal Tracts

pink noise

Pink Noise

say-stay graphic sm

Spectrogram

 


 

My role:  As a research scientist and lab manager in Dr. Lotto’s laboratory at U of A, I helped create and continue to maintain the department’s first Auditory Cognitive Science Lab.  I provide assistance with study design/development, stimuli creation, subject recruitment, as well as the collection/entry/organization/analysis/presentation of data.  I am also responsible for creating/managing lab databases, writing research reports, supervising student researchers and webpage design/creation/maintenance.  In addition to my normal lab duties, I am also involved in the development and organization of the Auditory Cognitive Science Society (ACSS) as well as the coordination of ACSS meetings. 

 

Project: Vocal Tract Normalization

 

Currently, we are studying listeners’ abilities to understand speech that is produced by a variety of speakers.  Acoustical analyses reveal large degrees of variation amongst speech produced by different talkers.  For example, the word ‘dog’ produced by two different people will be acoustically very different.  A listener, however, will likely perceive both productions as ‘dog.’  This perceptual constancy in light of acoustic variability is referred to as talker normalization.  The most commonly cited example of talker normalization is the perception of speech produced by males and females.  Despite widely different acoustic signals, resulting largely from differences in vocal tract size/shape, listeners have little difficulty understanding speech produced by either sex.  Dr. Story, one of our collaborators here in the Speech, Language, and Hearing Sciences department at UA, has examined variance in vowel productions using Magnetic Resonance Imaging (MRI) and X-ray microbeam technology.  These images have shown that many of the differences between talkers result from differences in their neutral vocal tract shape (the shape of the airspace when producing a neutral vowel).  We will use Dr. Story’s model to investigate the role neutral vocal tract size/shape plays in talker normalization. 

 

Story and Titze (2002) paper on neutral vocal tracts

 

Story 2005 paper on vocal tract model

 

Project: Speech in Noise

 

We are conducting a series of experiments studying how different listening environments (i.e., quiet, noisy) affect speech perception.  It has been shown that the perception of words such as “say” or “stay” can be shifted in a predictable manner by changing the duration of a silence gap between the “s” and the vowel in “say” (Best, Morrongiello, and Robson, 1981).  A number of similar findings involving different words, syllables, and complex non-speech sounds are well documented in the area of speech perception.  An individual’s perception, however, might also be affected by the listening context in which the speech is presented (i.e., the words are embedded in background noise).  Specifically, we predict that many classic perceptual shift studies may result in different findings if the same stimuli are presented in a variety of noises (e.g., speech-shaped noise, white noise, etc.).  It is typical for experimenters to present speech (or complex speech-like sounds) in quiet.  These tasks are generally easy for normal listeners and fail to accurately model real life listening situations.  Most real word speech perception takes place in non-ideal environments filled with varying degrees of background noise.  The results will inform us about how listeners integrate acoustic cues to categorize speech sounds phonemically.

 

2008 AAS Poster

 

Best, Morrongiello, and Robson (1981) paper

 

Project: Talker/Speaker Normalization

 

Numerous studies have demonstrated that the perception of a single speech sound, typically a vowel or consonant, can be altered by the characteristics of surrounding sounds, sentences, or carrier phrases (Ladefoged and Broadbent, 1957; Mann, 1980, Summerfield, 1981; Lotto and Kluender, 1998; and Holt, 2005).  It has been suggested that this perceptual shift allows listeners to account for individual speaker variability in speech production.  That is, the listener ‘tunes’ or ‘normalizes’ his/her speech sound categories in accordance with the particular characteristics of the talker.  The underlying assumption is that in order to have robust spoken language perception, one cannot rely on average categories for individual speech sounds across speakers.  However, in the 50 years of study regarding “talker normalization,” there has been no demonstration that the perceptual shifts are necessary or even helpful in the understanding of normal spoken language.  I am in the process of testing the importance of talker normalization for comprehension of spoken language in a number of ways. 

 

Classic Ladefoged and Broadbent (1957) paper

 

Project: A General Auditory Explanation for Lexical (i.e., TRACE model) Context Effects

 

In 1988, Elman and McClelland presented data suggesting that context effects can be triggered by “illusory phonemes.”  In their study, listeners were asked to participate in a phoneme identification task whereby context words (e.g., “foolish” and “Christmas”) were followed by a target sound (an ambiguous /t/-/k/ or /d/-/g/).  Manipulations were made to the final sound of the context word to create an intermediate “sh”/”s” sound.  The TRACE model was then used to accurately predict listeners’ phoneme identification shifts, through the use of top-down lexical influences.  However, there may be a simpler explanation for these findings, one that relies on general auditory contrast effects like those obtained by Lotto and Kluender (1998).  This study tests whether acoustic characteristics of the context words, as opposed to the linguistic content, can account for the findings. 

 

Elman & McClelland (1988) paper

 

Lotto & Kluender (1998) paper

 


 

University of Texas

Psychology Department

Lab: Auditory Cognition & Speech Perception Lab

Lab Director: Dr. Randy L. Diehl

 

 Click on pictures to see full-size version

Sarah UT 1

Me at UT-Austin

UTPsychBuilding

Psych Building

Bayesian Graphic

Bayesian Experiments

Bayes

Thomas Bayes

 


 

My Role: As a graduate research assistant, I was in involved in all aspects of the following projects, from start to finish.  This included literature reviews/summaries, project design, obtaining IRB approval, stimuli creation, subject recruitment, data collection, data entry, statistical analyses, and presentation of the results at various professional conferences (Acoustical Society of America and Cognitive Science Society) as well as at departmental talks.  For much of my time as an R.A. in Dr. Diehl’s lab, I also served as a lab manager whereby I was responsible for equipment maintenance, backing up lab computers and the server, ordering equipment/supplies, designing/creating/maintaining the lab webpage, interviewing undergraduate research assistant applicants, and organizing weekly journal discussion groups.

 

Project: Applying Bayesian Statistical Models to Auditory Categorization (master’s thesis topic)

 

A number of auditory tasks, including speech perception, require listeners to categorize stimuli on the basis of one or more features of the input.  In many cases, especially speech, there is no one-to-one mapping between values along continuous features and discrete categories (e.g., phonemes).  How then do perceptual systems categorize stimuli under uncertainty?  One possible solution is to use probabilistic information from experienced stimulus distributions to optimize accuracy.

 

We propose that perceivers incorporate distributional knowledge about the acoustic environment with the information provided by the signal in order to make optimal (i.e., maximal accuracy) categorical decisions.  Statistical approaches such as this are widely used in vision research but are rarely applied to auditory or speech perception.  Our goal in this study was to develop a framework that will provide testable hypotheses about the nature of statistical (distributional) learning in auditory perception, in general, and speech perception, specifically.

 

150th ASA Minneapolis poster

 

149th ASA Vancouver poster

 

26th CogSci Chicago poster

 

146th ASA Austin invited address

 

Tutorial on Bayesian Approach

 

Holt & Lotto (2003) Lay Language Paper on Statistical Learning of Speech (and Non-Speech) Categories

 


 

Boy’s Town National Research Hospital

Labs: Speech Perception Lab & Speech Production Lab

Lab Director: Dr. Andrew J. Lotto

 

 

 

 Click on pictures to see full-size version

Speech Perception Lab (Control Work Station) sm

Perception Lab

 

BTLab

Production Lab

image010

Child with CI

Tiger

Tiger

 


 

My role:  As a research technician, I was responsible for setting up new Speech Production and Speech Perception labs in the newly constructed Lied Learning & Technology Center at BTNRH.  This involved purchasing equipment, computers, software, supplies, as well as furniture and then transforming the empty space into useable lab space.  I also performed hearing screenings and equipment calibration, recruited/scheduled research subjects, collected/entered/analyzed/presented data, and oriented new personnel (research assistant and post-doctoral student) to the labs.

 

Project: Effects of Communication Mode and Inflection on CI-user Speech (Funded by NIH-NIDCD)

 

The goal of this study was to determine if the acoustics of speech produced by cochlear implant (CI) children could be affected by variables of the elicitation task.  Two variables were examined: communication mode and inflection of a sign model.  Based on previous findings with NH adults (Schiavetti, Whitehead, Metz, & Moore, 1999; Schiavetti, Whitehead, Metz, Whitehead, & Mignerey, 1996; Whitehead, Schiavetti, Metz, & Farrell, 1999; Whitehead, Schiavetti, Whitehead, & Metz, 1995), we predicted that spoken word, vowel and VOT duration would be lengthened during simultaneous communication (SC) compared to speech alone (SA) and that these perturbations would be greatest for children with limited speech skills.  These predictions were not supported by the data.  Communication mode did not significantly impact any of these temporal measures and there was no interaction with PBK grouping of the children.  We also proposed that temporal disturbances would be more substantial for signs with multiple movements (Whitehead, Schiavetti, Whitehead, & Metz, 1997), but this was also not borne out in the data.

 

Project: Strategies used to Increase Speech Clarity by Normal-Hearing Children (Funded by NIH-NIDCD)

 

It is common among clinicians to ask children to produce their “best speech” during intervention.  However, it is unclear that children know how to make their speech clearer.  The strategies used by children with and without hearing loss have implications for maximizing intelligibility and for understanding the development of communication competency.  As a first step toward this understanding, children (7 to 12 years of age) with normal hearing were asked to read ten simple sentences.  They were told that they testing a new computer program designed to recognize speech.  There was, in fact, no recognition program and each child received the same “output” feedback.  After providing “normal” speech to allow the program to “get used to their voices”, they subsequently produced their “best” and then “very best, very clearest” speech in order to see how accurate the recognition program could be.  Acoustic analyses (intensity, fundamental and first two formant frequencies for all vowels, as well as sentence, vowel, and VOT durations) were performed on recorded waveforms from each repetition in order to determine what the children were varying to comply with “best speech” instructions.  The results demonstrate large individual differences in strategies and persistent gender differences. 

 

148th ASA San Diego poster

 

Project: Tiger Auditory Perception and Vocalizations

While at BTNRH, I collaborated with Edward Walsh, PhD and Joann McGee, PhD on a study aimed to describe the hearing capabilities of tigers and the acoustic properties of their calls.  Audiograms indicate that the tiger’s auditory system is highly sensitive to low frequency sounds and possibly infrasonic sounds.  We are in the process of analyzing the acoustic properties of tiger vocalizations to determine whether production data are consistent with the perceptual findings.  Low frequency sounds travel farther and would be adaptive to the solitary tiger who wished to maintain hunting territories and communicate with possible mates.  Through this research, it may be possibly to identify individual tigers based on their unique vocalizations.  Tiger identification through acoustic measures would be useful in the monitoring of tigers in their natural habitat. 

 

Walsh et. al (2003) ASA Lay Language paper

 

NewScientist.com Tiger article

 


 

Washington State University

Psychology Department

Lab: Speech Perception Lab

Lab Director: Dr. Andrew J. Lotto

 

 

 Click on pictures to see full-size version

Sarah at work

Me at WSU

MM2

Breathiness

LocusGraphic

Locus of Context Effects Expt.

 


 

My Role: I worked as an undergraduate research assistant in Dr. Lotto’s Speech Perception Lab for three semesters.  Although the level of my involvement in the following projects varied to some degree, I was generally responsible for literature reviews/summaries, subject recruitment, data collection, data entry, data analyses, and data presentation (in the form of a poster presented at the Acoustical Society of America in 2002).  I also participated in regular lab meetings and journal discussion groups as well as performed general clerical duties (e.g., made copies, ran errands, picked-up articles from the library, etc.). 

 

Project: Auditory Enhancement in Female Speech (Funded by NIH-NIDCD)

 

One of the more notable differences between male and female speech is that females have a higher fundamental frequency (f0) than males (this is because female vocal tracts are, on average, 15% shorter than male vocal tracts).  Signals with higher f0s are represented by fewer harmonics and therefore, should result in intelligibility problems.  That is, vowels produced by females (with their high f0) are under-sampled relative to vowels produced by males.  Basic acoustic manipulations demonstrate this idea; as you raise a vowel in f0, identification scores drop.  It turns out; however, that female speech is actually more intelligible than male speech.  This means that females must do something to compensate for their high f0’s.

 

A basic observation of English-speaking females is that they use breathy phonation more often then males.  A popular theory explains this as “If a woman can manage to sound as though she is sexually aroused, she may be regarded as more desirable or with greater approbation by a male interlocutor than if she speaks with an ordinary modal voice” (Bladon & Henton, 1985).  This explanation is clearly flawed; however, in that it can’t explain why prepubescent boys also use breathy speech (certainly they aren’t trying to sound sexually excited!).  Acoustical analysis of breathy voice reveals an increase in the fundamental component, an increase in spectral tilt, and an increase in noise at higher frequencies.  It seems plausible that the noise from breathiness is filtered by the vocal tract, which would result in a clearer spectral envelope, leading to better intelligibility.  Dr. Andrew Lotto has hypothesized that females use breathy phonation contrastively in order to make vowels more distinctive (i.e., ideally, females should make high tense vowels breathy but not the lax counterparts).  Simply put, Dr. Lotto believes that breathiness is utilized to compensate for the challenges of high f0 and not a result of social pressure. 

 

Project: Learning Complex Auditory Categories (Funded by NSF & NIH-NIDCD)

Exciting new research has promoted a vital sub-field of speech perception research concerned with describing the function of categories in the development and maintenance of language-appropriate perception. Recent work has suggested that at least part of the formation and function of phonetic categories is a result of general perceptual categorization mechanisms not specific to speech or language. Thus, there now appears to be opportunity for an integration of general categorization research with work on first and second language acquisition. Unfortunately, much of what is known about perceptual categorization has been derived from examination of categories that are fundamentally different from phonetic categories. Moreover, it is empirically difficult to examine influences of categorization using speech stimuli because it is extraordinarily difficult to determine a detailed history of experience. Pilot work has suggested the utility of using complex non-speech sounds in probing the learning mechanisms that drive auditory categorization. These sounds can be synthesized to mimic complexities of phonetic categories and distributions of stimulus presentation can be theoretically derived to model aspects of phonetic categories while maintaining full experimental control over experience.

The main goals of this work are threefold. The first goal is to provide a detailed database of the formation and structure of complex auditory categories. There is a dearth of research in this area and the proposed work will be useful in developing a taxonomy of auditory learning and testing extant models of general perceptual categorization (which have been based primarily on data from visual tasks). Experiments using explicit and incidental learning procedures will map the development of categorical response structures as listeners gain experience with novel stimuli. The second goal is to compare the resulting structures that arise from these categorization tasks to structures typical of speech categories such as categorical perception and the "perceptual magnet" effect. The third goal is to develop efficient methods of exposure and training to teach non-native contrasts to second-language learners. Learning the sound contrasts for a non-native language is an extremely difficult task. Exposing the mechanisms of complex category learning could illuminate potential aids to training individuals to discriminate these complex speech categories. These aids could extend easily to other complex learning tasks such as musical training, acoustic warning systems or auditory data displays.

Project: Locus of Context Effects

 

Previous work has demonstrated that non-speech sounds with the appropriate spectral characteristics can affect the identification of speech sounds (Lotto & Kluender, 1998).  It has been proposed that these spectral context effects are due to interactions in the peripheral auditory system.  For example, they could be the result of masking at the auditory nerve or of auditory enhancement effects that have been demonstrated to be monaural (Summerfield & Assmann, 1989).  To example the locus of the context effect, synthesized syllables varying from /da/ to /ga/ were preceded by single formant stimuli that mimicked the third formant of the syllables /al/ and /ar/.  The non-speech stimulus was presented either to the same or opposite ear as the target speech stimulus.  Subjects’ speech identifications were shifted as a function of context in predicted directions for both presentation conditions.  However, the size of the shift was smaller when the context was in the contralateral ear to the target syllable.  These results agree well with similar results for speech contexts.  The data suggest that the context effects occur at multiple levels of the auditory system and are not simply examples of masking or auditory enhancement.

 

143rd ASA Pittsburgh poster

 

Project: Auditory Skills Therapy CD-ROM

 

Recent studies indicate that auditory training is a useful intervention tool, particularly for those with language impairments and auditory processing disorder (APD).  Because of this urgent clinical need, Dr. Gail Chermak (Washington State University), Dr. Frank Musiek (University of Connecticut), and Dr. Andrew Lotto (University of Arizona) have begun developing a CD-ROM that will be targeted towards these special populations.  The training CD will contain a number of basic auditory exercises such as intensity training tasks, frequency training tasks, and temporal training tasks, etc.   The CD is being designed to be implemented in both clinical and home settings in a kid-friendly format (every task is embedded into a game format).