Daniel
Duran

Dissertation

Daniel Duran: Computer simulation experiments in phonetics and phonology: simulation technology in linguistic research on human speech. Dissertation, Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart. 2013.
URL: http://dx.doi.org/10.18419/opus-3202

[Abstract] [BibTeX]

One goal of this thesis is to give a representative overview of computer simulation experiments in phonetics and phonology. A number of research disciplines are identified as being relevant for the subject of this thesis which are concerned with human speech perception and production. Among these are in particular computational psychology and cognitive sciences. Important methods and research approaches using simulation technology for the study of human speech can also be found in natural language processing and speech signal processing as well as in areas of computer science such as artificial intelligence and machine learning. Due to this interdisciplinary breadth of the topic, this thesis comprises a comprehensive overview of representative publications.

This dissertation offers a comprehensive overview of a promising approach to the study of spoken language. The present thesis is as such the first of this kind in phonetics and phonology. Computer simulation experiments can not only complement laborious empirical studies and replace them to some extent. Simulation technology also offers the potential to directly study theoretical models and the interactions of their components. This allows for testing of existing hypotheses and the generation of new ones.

@PHDTHESIS{Duran2013, author = {Duran, Daniel}, title = {Computer simulation experiments in phonetics and phonology: simulation technology in linguistic research on human speech}, school = {Universit{\"a}t Stuttgart}, year = {2013}, type = {Doctoral dissertation}, url = {http://elib.uni-stuttgart.de/opus/volltexte/2013/8789} }

Papers and presentations

2016

Daniel Duran, Natalie Lewandowski and Antje Schweitzer: A 3D computer game for testing perception of acoustic detail in speech. 22nd International Congress on Acoustics (ICA) in Buenos Aires, Argentina.

[Abstract] [BibTeX] [PDF]

We present a novel experimental framework for perception studies and an application focusing on attention to fine phonetic detail in natural speech perception. Traditional psychological experiments in research on speech perception do not provide a natural testing scenario (notorious supervision and lack of naturalness). A solution to this problem is employing a computer game in which attention to fine phonetic detail comes natural. Computer games are increasingly used in psychology or in studying emotional speech production, where the communication in multi-player games is recorded. Our novel framework implements a traditional psycholinguistic AB test paradigm within a computer game. Using a state-of-the-art game engine, we developed a first person shooter. This genre is ideally suited to implement a test scenario which requires the subjects to click on a specific point on the screen as fast as possible. The player moves around within a virtual 3D environment and reacts to stimuli presented by enemies which belong to two different categories, each of which is associated with one response key. The two categories are initially distinguished by visual and acoustic cues (e.g. different colors, and different sounds). Gradually, visual cues are removed. Thus, the subject has to attend to the acoustic cues and react accordingly. An additional important aspect of our framework is the high involvement in the game and motivation of the subjects to solve the task. In traditional psychological experiments, on the other hand, subjects may easily get tired or bored by the repetitive, unnatural task. We discuss practical and theoretical challenges encountered with the implementation of a psychological test within a computer game.

@InProceedings{Duran++2016ica, Title = {A 3D computer game for testing perception of acoustic detail in speech}, Author = {Duran, Daniel and Lewandowski, Natalie and Schweitzer, Antje}, Booktitle = {22nd {International} {Congress} on {Acoustics} {ICA} 2016: {Proceedings}}, Year = {2016}, Address = {Buenos Aires, Argentina}, Month = sep, Note = {Paper ICA2016-738}, Pages = {Paper ICA2016--738}, Publisher = {Asociación de Acústicos Argentinos, AdAA}, Url = {http://www.ica2016.org.ar/ica2016proceedings/ica2016/ICA2016-0738.pdf} }

Natalie Lewandowski, Carolin Krämer, Daniel Duran and Antje Schweitzer: Impact of personality and social factors on phonetic convergence. Presentation at: 22nd AMLaP conference, Architectures and Mechanisms for Language Processing in Bilbao, Spain.

[Abstract] [BibTeX]

The current study investigated the impact of personality and social factors on phonetic convergence within the GECO database (46 spontaneous German dialogs of approx. 25 minutes length each). We quantified convergence by amplitude envelope similarities between identical words that were said multiple times by both dialog partners. Linear mixed models were employed to predict the envelope similarities using four personality dimensions from a self-monitoring test as fixed effects: sensitivity to expressive behavior and social cues (henceforth sensitivity), acting behavior, other-directedness and extraversion; further fixed effects were time (early vs. late in the dialog) and post-dialog self-ratings (self-perceived dominance and self-confidence). Word type was included as a random factor. Main effects were found for: dominance, acting, and extraversion; and 2-way-interactions for time* with, respectively, *dominance, *acting and *sensitivity. The more extroverted both speakers, the higher in general was the amplitude envelope similarity in the dialog. A large difference in dominance between the partners increased similarity over time (i.e. convergence), whereas higher acting and sensitivity scores decreased envelope similarity over time (i.e. divergence), albeit to a much smaller extent. The results support the idea that personality plays an important role in phonetic adaptation during conversational interactions.

@misc{Lewandowski++2016amlap, Title = {Impact of personality and social factors on phonetic convergence}, Author = {Lewandowski, Natalie and Krämer, Carolin and Duran, Daniel and Schweitzer, Antje}, HowPublished = {Presentation at: Architectures and Mechanisms for Language Processing {(AMLAP)}}, Month = sep, Year = {2016}, Address = {Bizkaia Aretoa, Bilbao, Spain}, Url = {http://www.bcbl.eu/events/amlap2016/en/} }

Antje Schweitzer, Natalie Lewandowski and Daniel Duran: Does phonetic convergence reflect personality? Presentation at: Personality in speech perception & production (LabPhon15 Sattelite Workshop).

[Abstract] [BibTeX] [PDF]

Phonetic convergence, or imitation, a phenomenon by which two speakers become increasingly similar in their phonetic behavior, has received considerable attention since the 1960’s and 1970’s. In recent years it has been suggested that imitation/convergence is a driving force behind sound change. Our contribution to the workshop assesses the role of personality in phonetic convergence, and thus, potentially, in sound change from one individual to the next.

@misc{Schweitzer++2016, Title = {Does phonetic convergence reflect personality?}, Author = {Schweitzer, Antje and Lewandowski, Natalie and Duran, Daniel}, HowPublished = {Presentation at Personality in speech perception & production (LabPhon15 Sattelite Workshop)}, Month = jul, Year = {2016}, Address = {Ithaca, NY}, Keywords = {convergence, GECO}, Type = {Presentation}, Url = {http://www.phonetik.uni-muenchen.de/institut/veranstaltungen/labphon15-satellite-personality/abstracts/Schweitzer%20et%20al.pdf} }

Grzegorz Dogil, Jagoda Bruni, Daniel Duran, Justus Roux and Andries Coetzee: Social dynamics and phonological strength: Post-nasal devoicing in Tswana. Presentation at: LabPhon15: Speech Dynamics and Phonological Representation at Cornell University, Ithaca, NY USA. 2016.

[Abstract] [BibTeX] [PDF]

This study describes the influence of social and political changes in the South African phylum. The socio-political situation in South Africa has changed from very restrictive language policies that characterized the political system of the country for several generations to very liberal language policies that have been introduced at the outset of SA as a “rainbow nation” in 1994.

@inproceedings{Dogil++2016, address = {Cornell University, Ithaca, NY USA}, title = {Social dynamics and phonological strength: {Post}-nasal devoicing in {Tswana}}, url = {http://www.labphon.org/labphon15/long_abstracts/LabPhon15_Revised_abstract_125.pdf}, author = {Dogil, Grzegorz and Bruni, Jagoda and Duran, Daniel and Roux, Justus and Coetzee, Andries W.}, booktitle = {LabPhon15: Speech Dynamics and Phonological Representation}, month = jul, year = {2016}, pages = {Abstract 125} }

Daniel Duran, Jagoda Bruni, Justus Roux und Grzegorz Dogil: Simulation as a mean to investigate phonological evolution of Tswana. Presentation at: 6th International Conference on Bantu Languages (Bantu 6) at the University of Helsinki in Finland. 2016.

[Abstract] [BibTeX]

This study presents an attempt to investigate historical development of devoicing processes in Tswana. We use computational simulations as a mean to demonstrate processes of post-nasal stops' devoicing over generations and across different social statuses in the society of speakers. According to the literature (Meinhof, 1932; Hyman, 2001) post-nasal stops from the Sotho-Tswana group present unituitive devoicing behavior. Unintuitive in the sense that during production greater articulatory effort is required to terminate voicing than to maintain it. It is claimed (Meinhof, 1932) that nasals preceding stops appeared in Bantu languages in order to facilitate voicing production during the stop and were lost later during evolutionary language changes, like in Swahili. However, current acoustic studies on Tswana (Coetzee and Pretorius, 2010) demonstrate not only that nasals remained in that language but also that they occur before voiceless stops much as before voiced ones.

Coetzee, A. W., & Pretorius, R. (2010). Phonetically grounded phonology and sound change: The case of Tswana labial plosives. Journal of Phonetics, 38(3), 404–421.

Hyman, L. M. (2001). The limits of phonetic determinism in phonology: *NC revisited. In E. Hume & K. Johnson (Eds.), The role of speech perception in phonology (pp. 141–185). New York: Academic Press.

Meinhof, C. (1932). Introduction to the phonology of the Bantu languages: being the English version of “Grundriss einer Lautlehre der Bantusprachen.” (N. J. Van Warmelo, Trans.). Berlin; London: Dietrich Reimer (Ernst Vohsen); Williams & Norgate, Ltd.

@misc{Duran++2016bantu, Title = {Simulation as a mean to investigate phonological evolution of {Tswana}}, Author = {Bruni, Jagoda and Duran, Daniel and Dogil, Grzegorz and Roux, Justus}, HowPublished = {Presentation at the 6th International Conference on Bantu Languages {(BANTU 6)}}, Month = jun, Year = {2016}, Address = {University of Helsinki, Finland}, Type = {Presentation}, Url = {http://blogs.helsinki.fi/bantu-6/} }

Jagoda Bruni, Daniel Duran, Grzegorz Dogil: Usage-based phonology and simulations as means to investigate unintuitive voicing behavior. In: Proceedings of the Annual Meetings on Phonology, 2. 2016.

[Abstract] [BibTeX] [www]

We present an attempt at using computational simulations on voicing behavior of Tswana post-nasal stops. Previous approaches to phonological simulations (e.g. Boersma & Hamann, 2008) put a strong emphasis on the functional bias and its role in language change. We base our investigations on the assumption that the role of social biases might play even a higher role in the formation and change of phonologically and phonetically driven sociolinguistic processes (Nettle, 1999; Coetzee & Pretorius, 2010).

Boersma, P., & Hamann, S. (2008). The evolution of auditory dispersion in bidirectional constraint grammars. Phonology, 25, 217–270.

Coetzee, A. W., & Pretorius, R. (2010). Phonetically grounded phonology and sound change: The case of Tswana labial plosives. Journal of Phonetics, 38(3), 404–421.

Nettle, D. (1999). Using Social Impact Theory to simulate language change. Lingua, 108(2–3), 95–117.

@article{bruni++2016amp, Title = {Usage-based phonology and simulations as means to investigate unintuitive voicing behavior}, Author = {Bruni, Jagoda and Duran, Daniel and Dogil, Grzegorz}, Journal = {Proceedings of the Annual Meetings on Phonology}, Year = {2016}, Month = jun, Volume = {2}, Doi = {10.3765/amp.v2i0.3746}, ISSN = {2377-3324}, Url = {http://journals.linguisticsociety.org/proceedings/index.php/amphonology/article/view/3746}, Urldate = {2016-07-08} }

2015

Daniel Duran: In-silico Phonetik. Presented at: 11. Tagung Phonetik und Phonologie im deutschsprachigen Raum (P&P 11), Philipps-Universität Marburg, Germany, October 8, 2015.

[Abstract] [BibTeX] [PDF]

A presentation of the history of computational modelling and simulations in the fields of phonetics and phonology.

@misc{duran2015pup, address = {Marburg (Lahn)}, type = {Poster presentation}, title = {In-silico {Phonetik}}, url = {http://www.online.uni-marburg.de/pundp11/poster/Duran.pdf}, author = {Duran, Daniel}, month = oct, year = {2015} }

Daniel Duran: Perceptual magnets in different neighborhoods. In: A. Leeman, M.-J. Kolly, S. Schmid und V. Dellwo (Hrsg.) "Trends in Phonetics and Phonology: Studies from German-speaking Europe", Peter Lang, 2015: S. 225—237. ISBN 978-3-0343-1653-8

[Abstract] [BibTeX]

This paper discusses the implementation of a possible exemplar-theoretic model of the perceptual magnet effect – the apparent warping of the perceptual phonetic space due to which two distinct speech sounds appear to be perceived as more similar if they are close to the prototypical representation of a given category. One prominent computational model of the perceptual magnet effect is examined and some of its weaknesses are pointed out, and an alternative model is proposed. In particular, the consequences of a strict interpretation of exemplars as discrete elements in a set are examined.

@incollection{duran2015, address = {Frankfurt am Main / Bern}, title = {Perceptual magnets in different neighborhoods}, isbn = {978-3-0343-1653-8}, booktitle = {Phonetics and {Phonology}: {Studies} from {German} speaking {Europe}}, publisher = {Peter Lang}, author = {Duran, Daniel}, editor = {Leemann, A. and Kolly, M.-J. and Schmid, S. and Dellwo, V.}, year = {2015}, pages = {225--237} }

Jagoda Bruni, Daniel Duran, Grzegorz Dogil: Unintuitive phonetic behavior in Tswana post-nasal stops. In: Proceedings of the 16th Annual Conference of the International Speech Communication Association (Interspeech): 1725—1729.

[Abstract] [BibTeX] [www]

This article describes the phonetic process of post-nasal devoicing in Tswana. We propose a multi-agent exemplar model with various interaction schemes which include factors like functional and social biases in order to account for this counter-intuitive phenomenon. Our novel hybrid multi-agent modeling framework facilitates investigation of sound change by combining the sociophonetic model of Nettle [22] and the exemplar-based model of Wedel [26] into a single unified model.

[22] Nettle, Daniel (1999). Using social impact theory to simulate language change. Lingua 108:2-3, 95–117.

[26] Wedel, Andrew (2004). Category competition drives contrast maintenance within an exemplar-based production/ perception loop. Proceedings of the Seventh Meeting of the ACL Special Interest Group in Computational Phonology, Association for Computational Linguistics, Barcelona, Spain, 1–10.

@inproceedings{bruni++2015is, address = {Dresden}, title = {Unintuitive Phonetic Behavior in {Tswana} Post-Nasal Stops}, url = {http://www.isca-speech.org/archive/interspeech_2015/i15_1725.html}, booktitle = {Proceedings of the 16th Annual Conference of the International Speech Communication Association ({Interspeech})}, publisher = {ISCA Archive}, author = {Bruni, Jagoda and Duran, Daniel and Dogil, Grzegorz}, year = {2015}, pages = {1725--1729} }

Lisa Lange, Bartholomäus Pfeiffer, Daniel Duran: ABIMS — Auditory Bewildered Interaction Measurement System. In: Proceedings of the 16th Annual Conference of the International Speech Communication Association (Interspeech): 1074-1075

[Abstract] [BibTeX] [www]

We present a novel computer game-based framework for phonetic perception experiments. We employ the game environment of a first-person shooter game where players interact with and respond to animated creatures in different virtual environments. Whenever the player encounters such an animated creature, a sound stimulus is played and the creature changes its color depending on the type of stimulus. Throughout the game, colors are getting harder to distinguish until the two creature types (corresponding to two stimulus types) are virtually identical in color. Data can be collected in various ways and without the need for specific additional tests. A first pilot study was conducted in which the game was well received by the subjects and which already highlights the potential of this investigative framework.

@inproceedings{lange_abims_2015, address = {Dresden}, title = {{ABIMS} – Auditory Bewildered Interaction Measurement System}, url = {http://www.isca-speech.org/archive/interspeech_2015/i15_1074.html}, booktitle = {Proceedings of the 16th Annual Conference of the International Speech Communication Association ({Interspeech})}, publisher = {ISCA Archive}, author = {Lange, Lisa and Pfeiffer, Bartholomäus and Duran, Daniel}, year = {2015}, pages = {1074--1075} }

Daniel Duran, Jagoda Bruni, Michael Walsh und Grzegorz Dogil: A Hybrid Model to Investigate Language Change. In The Scottish Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, UK: the University of Glasgow. ISBN 978-0-85261-941-4. Paper number 735.

[Abstract] [BibTeX]

In this paper we propose a hybrid multi-agent modelling framework which facilitates investiga- tion into sound change by combining the socio- phonetic model of Nettle [14] and the exemplar- based model of Wedel [21] into a single unified model. The framework facilitates simulation sce- narios of different social networks with varying in- teraction schemes and social distances between the agents. Additionally, the structure of an individual agent’s mental lexicon is embedded in an exemplar- theoretic setting. The goal of this modelling frame- work is to enable examination of competition be- tween different phonetic forms. Though in no way limited to this particular example, we illustrate our new hybrid framework with the case of competition between phonetically intuitive /mb/ and unintuitive [mp] voicing variants of post-nasal stops in Tswana.

@inproceedings{duran++2015icphs, address = {Glasgow, UK}, title = {A {Hybrid} {Model} to {Investigate} {Language} {Change}}, isbn = {978-0-85261-941-4}, url = {http://www.icphs2015.info/pdfs/Papers/ICPHS0735.pdf}, booktitle = {Proceedings of the 18th {International} {Congress} of {Phonetic} {Sciences}}, author = {Duran, Daniel and Bruni, Jagoda and Dogil, Grzegorz}, editor = {{The Scottish Consortium for ICPhS 2015}}, year = {2015}, note = {Paper number 735} }

2014

Jagoda Bruni, Daniel Duran und Grzegorz Dogil: Usage-based phonology and simulations as means to investigate unitinuitive voicing behavior. Presentation at: Annual Meeting on Phonology 2014 at MIT. Cambridge, MA.

[Abstract] [BibTeX]

This paper describes preliminary work concerning phonetic simulations of unintuitive voicing behavior in Tswana. We adapt and combine the methods proposed by [Nettle,1999] and [Wedel,2004] by modeling competition between variants undergoing functional and social selection during language acquisition over many generations. With our simulation experiments we investigate the influence of various parameters and compare the results against the reported “unintuitive” voicing behavior in Tswana and its diachronic development.

@Misc{bruni++2014, title = {Usage-based phonology and simulations as means to investigate unitinuitive voicing behavior}, author = {Bruni, Jagoda and Duran, Daniel and Dogil, Grzegorz}, howpublished = {Presentation}, month = {September}, year = {2014}, url = {http://phonology2014.mit.edu/wp-content/uploads/2014/09/Bruni_et_al_abstract.pdf} }

Natalie Lewandowski, Antje Schweitzer, Daniel Duran and Grzegorz Dogil: An examplar-based hybrid model of phonetic adaptation. Presentation at GURT 2014 – Usage-based Approaches to Language, Language Learning, and Multilingualism. Georgetown University, Washington D.C.

[Abstract] [BibTeX]

We present a hybrid model of phonetic convergence in a usage-based framework. A speaker’s phonetic and cognitive abilities (talent and mental flexibility), together with certain personality features are suggested to impact the core component of the adaptation mechanism which is involved in the storage and re-usage of richly indexed exemplars.

@MISC{lewandowski++2014, author = {Lewandowski, Natalie and Schweitzer, Antje and Duran, Daniel and Dogil, Grzegorz}, title = {An examplar-based hybrid model of phonetic adaptation}, month = mar, year = {2014}, address = {Georgetown University, Washington {D.C.}}, type = {Presentation}, url = {http://www8.georgetown.edu/college/gurt/2014/} }

2013

Mihael Duran and Daniel Duran: Vorstandswechsel und Medieneinfluss: Eine Sentiment‐Analyse. Contribution to: 11. Jahrestagung des Arbeitskreises Empirische Personal‐ und Organisationsforschung (AKempor); Munich.

[Abstract] [BibTeX]

@MISC{duran+duran2013, author = {Duran, Mihael and Duran, Daniel}, title = {Vorstandswechsel und Medieneinfluss: Eine {Sentiment‐Analyse}}, month = nov, year = {2013}, address = {Ludwig-Maximilians-Universit{\"a}t, M{\"u}nchen}, url = {http://www.hrmresearch.de/akempor/pmwiki.php/Site/ElfteTagung} }

Michael Walsh, Daniel Duran and Jagoda Bruni: Exemplar-based categorisation of vowels using acoustic and articulatory data. In: Phonetik & Phonologie 9 — Book of Abstracts, p.96, 2013, Zurich.

[Abstract] [BibTeX] [PDF (Book of Abstracts)]

Exemplar models of speech perception have been shown to be suitable for modelling a variety of phenomena, such as vowel identification, frequency of occurrence effects, sex identification, the emergence of grammatical knowledge, syllable duration variability, entrenchment and lenition, among others. One of the best known exemplar-based models of categorisation, Nosofsky’s Generalized Context Model (GCM), which employs weighted Eulidean distance to compare incoming percepts to exemplars stored in memory, has been applied to a number of these areas. However, with regard to vowel categorisation, the GCM has, thus far, only been applied to highly controlled cases, e.g. hid, had, hod etc.

Here, we present the results of a number of experiments in which the GCM is applied to acoustic data (represented in terms of amplitude envelopes over 8 frequency bands), and to a combination of acoustic and articulatory data, from the MOCHA (Multi-CHannel Articulatory) database. This corpus contains phonetically annotated recordings of Southern English speech of two native speakers, one male, one female.

@INPROCEEDINGS{walsh++2013, author = {Walsh, Michael and Duran, Daniel and Bruni, Jagoda}, title = {Exemplar-based categorisation of vowels using acoustic and articulatory data}, booktitle = {Book of Abstracts {Phonetik & Phonologie 9}}, year = {2013}, pages = {96}, address = {Z{\"u}rich}, month = oct, }

Daniel Duran: The strength of perceptual magnets determined by different neighbourhoods. In: Phonetik & Phonologie 9 — Book of Abstracts, p.29, 2013, Zurich.

[Abstract] [BibTeX] [PDF (Book of Abstracts)]

A considerable body of research has been devoted to the perceptual magnet effect (PME). One prominent model of the PME was proposed by Lacerda [1995]*. His model describes the PME as an emergent property of exemplar-based categorization of perceived stimuli. Newly encountered stimuli are categorized based on per-category similarities within a local neighbourhood around the input stimulus in the respective perceptual space. Thus, the PME arises from the distribution of categorical information of the exemplars stored in memory. That distribution, however, is modelled as being continuous, which is a strong assumption.

This study investigates the consequences of a strict interpretation of exemplars as discrete elements in a set. This assumption introduces considerable perturbations in comparison to Lacerda’s original model. It is shown how size and shape of the neighbourhood determines the behaviour of the model, leading to very different categorization effects. Essentially, it is not only the process of locally determining the similarity to competing categories which gives rise to a language-specific, “warped” perceptual space. The simulations show how the PME depends on both, the distributions of exemplars and the shape of the neighbourhood.

* Francisco Lacerda. The perceptual-magnet effect: An emergent consequence of exemplar-based phonetic memory. In K. Elenius and P. Branderyd, editors, Proceedings of the 13th International Congress of Phonetic Sciences, volume 2, pages 140–147, Stockholm, 1995.

@INPROCEEDINGS{duran2013pp, author = {Duran, Daniel}, title = {The strength of perceptual magnets determined by different neighbourhoods}, booktitle = {Book of Abstracts {Phonetik \& Phonologie 9}}, year = {2013}, pages = {29}, address = {Z{\"u}rich}, month = oct, }

Tomasz Kuczmarski, Daniel Duran, Norbert Kordek and Jagoda Bruni: Magnet Effect in the perception of Mandarin Chinese T1 and T3. In: Book of Abstracts of the 8th Conference of the European Association of Chinese Linguistics (EACL 8), 2013, p.19, Paris.

[Abstract] [BibTeX] [PDF (Book of Abstracts)]

This work is an attempt to study the Magnet Effect in the perception of Mandarin Chinese T1 and T3 by systematic manipulation of a synthetic F0 contour. For this purpose, a simple method of F0 contour approximation using second-order polynomial functions was proposed and evaluated in a preliminary study. A syllable consisting of a single vowel /a/ was selected from a citation tones speech corpus. The natural F0 and duration of T1 and T3 realized within that syllable, as occurring in the speech corpus, set the lower and upper bounds for a discrete space of anchor points. The natural constraints of the human articulatory and auditory systems were used to determine the total number of anchor points and their spacing.

Least squares method was used to determine the best fitting second-order polynomials for all possible anchor point triplets, where the x value of the first and third points were fixed on the opposing ends. Additionally, curves that were open downwards and whose vertices appeared beyond the allowed space were eliminated. The resulting F0 curves were resynthesized using Praat and presented to native speakers in a perception experiment.

@INPROCEEDINGS{kuczmarski++2013eacl, author = {Kuczmarski, Tomasz and Duran, Daniel and Kordek, Norbert and Bruni, Jagoda}, title = {Magnet Effect in the perception of Mandarin Chinese T1 and T3}, booktitle = {Book of abstracts of the 8th Conference of the European Association of Chinese Linguistics}, year = {2013}, pages = {19}, address = {Paris}, month = sep, }

Daniel Duran, Jagoda Bruni and Grzegorz Dogil: Acoustic and articulatory information as joint factors coexisting in the context sequence model of speech production. In: Proceedings of Meetings on Acoustics (POMA), 2013 Vol. 19, pp. 060091. ICA 2013 Montréal, Canada.

[Abstract] [BibTeX] [DOI: 10.1121/1.4799009]

This study presents the integration of an articulatory factor into the Context Sequence Model (CSM) of speech production using Polish sonorant data measured with the Electromagnetic Articulograph Technology (EMA). Based on exemplar-theoretic assumptions, the CSM models the speech production-perception loop operating on a flat, detail-rich memory of previously processed speech utterance exemplars. Selection of an item for production is based on context matching, comparing the context of the currently produced utterance with the contexts of stored candidate items in memory. We extended the CSM by incorporating articulatory information in parallel to the acoustic representation of the speech exemplars. Our study demonstrates that memorized raw articulatory information—movement habits of the speaker—can be available during speech production. Successful incorporation of this factor shows that not only acoustic but also articulatory information can be made directly available during speech production.

@INPROCEEDINGS{duran++2013ica, author = {Duran, Daniel and Bruni, Jagoda and Dogil, Grzegorz}, title = {Acoustic and articulatory information as joint factors coexisting in the context sequence model of speech production}, booktitle = {Proceedings of Meetings on Acoustics {POMA} - - {ICA}}, year = {2013}, volume = {19}, number = {1}, pages = {060091--060091}, address = {Montr{\'e}al, Canada}, publisher = {Acoustical Society of America}, doi = {10.1121/1.4799009}, }

Tomasz Kuczmarski, Daniel Duran, Norbert Kordek and Jagoda Bruni: Second-degree polynomial approximation of Mandarin Chinese lexical tone pitch contours — a preliminary evaluation. In: Petra Wagner (ed.): Elektronische Sprachsignalverarbeitung 2013 — Tagungsband der 24. Konferenz; Bielefeld, 26. - 28. März 2013. TUDpress, pp. 218-222. ISBN: 978-3-944331-03-4

[Abstract] [BibTeX]

The current paper presents a preliminary evaluation of a second-degree polynomial pitch stylization method for Mandarin Chinese (MC) cited lexical tones. This study was devised to verify methodological assumptions for a subsequent work where a systematic manipulation of the F0 curve in MC syllables will be used to study the perceptual Magnet Effect. For this purpose, a number of MC syllables representing various phonological templates were chosen from a single speaker corpus. Stylized pitch curves were resynthesized and compared with their natural counterparts in a discrimination experiment. The results of native speakers’ judgments show that the approximation method is adequate for the desired application.

@INPROCEEDINGS{kuczmarski++2013, author = {Kuczmarski, Tomasz and Duran, Daniel and Kordek, Norbert and Bruni, Jagoda}, title = {Second-degree polynomial approximation of Mandarin Chinese lexical tone pitch contours -- a preliminary evaluation}, booktitle = {Elektronische Sprachsignalverarbeitung 2013}, editor = {Wagner, Petra}, number = {65}, series = {Studientexte zur Sprachkommunikation}, year = {2013}, pages = {218--222}, address = {Dresden}, publisher = {{TUDpress}}, isbn = {978-3-944331-03-4}, }

Daniel Duran, Jagoda Bruni and Grzegorz Dogil: Modeling multi-modal factors in speech production with the Context Sequence Model. In: Petra Wagner (ed.): Elektronische Sprachsignalverarbeitung 2013 — Tagungsband der 24. Konferenz; Bielefeld, 26. - 28. März 2013. TUDpress, pp. 86–92. ISBN: 978-3-944331-03-4

[Abstract] [BibTeX]

This article describes modeling speech production with multi-modal factors integrated into the Context Sequence Model. It is posited that articulatory information can be successfully incorporated and stored in parallel to the acoustic information in a speech production model. Results demonstrate that a memory sensitive to rich context and enlarged by the additional inputs facilitates exemplar weighing and selection during speech production.

@INPROCEEDINGS{duran++2013essv, author = {Duran, Daniel and Bruni, Jagoda and Dogil, Grzegorz}, title = {Modeling multi-modal factors in speech production with the Context Sequence Model}, booktitle = {Elektronische Sprachsignalverarbeitung 2013}, editor = {Wagner, Petra}, number = {65}, series = {Studientexte zur Sprachkommunikation}, year = {2013}, pages = {86--92}, address = {Dresden}, publisher = {{TUDpress}}, isbn = {978-3-944331-03-4}, }

2012

Daniel Duran, Jagoda Bruni, Hinrich Schütze and Grzegorz Dogil: Specification in context – incorporation of an aritculatory factor into the Context Sequence Model. Presentation at the 43rd Poznań Linguistic Meeting (PLM), September 2012, Poznań, Poland.

[Abstract] [BibTeX] [PDF]

This study presents the integration of an articulatory factor into the Context Sequence Model (CSM) of speech production using Polish sonorant data measured with the Electromagnetic Articulograph Technology. Based on exemplar-theoretic assumptions, the CSM models the speech production-perception loop operating on a flat, detail-rich memory of previously processed speech utterance exemplars. Selection of an item for production is based on context matching, comparing the context of the currently produced utterance with the contexts of stored candidate items in memory. Reconsidering the basic assumptions of the CSM—the perception-production feedback loop and the detailed episodic memory—we extended the work by incorporating articulatory information in parallel to the acoustic representation of the speech exemplars used by the model.

@INPROCEEDINGS{duran++2012plm, author = {Duran, Daniel and Bruni, Jagoda and Sch{\"u}tze, Hinrich and Dogil, Grzegorz}, title = {Specification in context -- incorporation of an aritculatory factor into the Context Sequence Model}, booktitle = {43rd Poznań Linguistic Meeting ({PLM2012}) -- Book of Abstracts}, year = {2012}, owner = {Daniel Duran}, timestamp = {2012.09.11}, url = {http://ifa.amu.edu.pl/plm_old/2012/files/Abstracts/PLM2012_Abstract_Duran_etal.pdf} }

Daniel Duran, Jagoda Bruni, Michael Walsh, Hinrich Schütze and Grzegorz Dogil: Phonological constraints verified by a rich memory exemplar model: extrametricality and articulatory binding in Polish obstruent-sonorant rhymes. In: Book of Abstracts of the 13th Conference on Laboratory Phonology, 2012, pp. 217-218. Stuttgart, Germany.

[Abstract] [BibTeX]

We present a study in which unlabeled articulatory data of Polish vowels in VC and VCC rhymes are automatically analyzed in order to investigate phonological behavior of vowels in such contexts. Polish shows differences in the behavior of voicing in obstruent sonorant clusters. From the acoustic point of view sonorants in word-initial position preceded by a voiceless obstruent in VCC clusters tend to be voiced during nearly the entire duration time, undergoing only slight initial devoicing. However, in word-final position sonorants in a similar phonetic environment tend to be fully devoiced.

This study investigates, using a clustering methodology, whether word-final sonorant influence on the preceding vowel is visible. If such an influence is present, there should be information in the vowel segment from the right context which should allow a clustering method to group the vowels into distinct sets according to their context.

@INPROCEEDINGS{duran++LabPhon13, author = {Duran, Daniel and Bruni, Jagoda and Walsh, Michael and Sch{\"u}tze, Hinrich and Dogil, Grzegorz}, title = {Phonological constraints verified by a rich memory exemplar model: extrametricality and articulatory binding in Polish obstruent-sonorant rhymes}, booktitle = {Book of Abstracts of the 13th Conference on Laboratory Phonology, Stuttgart, Germany}, year = {2012}, editor = {Schweitzer, Antje and Lintfert, Britta}, pages = {217-218}, publisher = {Chair of Experimental Phonetics, Institute of Natural Language Processing, Universit{\"a}t Stuttgart}, }

2011

Daniel Duran, Jagoda Bruni, Hinrich Schütze and Grzegorz Dogil: Using unlabeled EMA data in a speech production model with a rich memory. Presentation at: 7. Tagung zu Phonetik und Phonologie im deutschsprachigen Raum (P&P 7), October 2011, Osnabrück, Germany.

[Abstract] [BibTeX] [PDF]

We present a pilot study which integrates articulatory information into the Context Sequence Model (CSM) of speech production. The CSM is an exemplar-theoretic model which builds on the concept of the speech perception—production loop and incorporates a rich acoustic memory of past speech items which are stored sequentially in their original context. In the present study, we enrich the original acoustic memory of the CSM with articulatory information by using continuous Electromagnetic Midsaggital Articulography (EMA) measurements. To our knowledge, there are no existing speech production models which use the full continuous EMA signals directly and in the same way as acoustic speech signals.

In a first series of experiments, we used data from a Polish corpus designed to investigate the coordination between articulatory gestures within syllables in onset and coda positions. Our results indicate that it might be possible to incorporate articulatory information into speech perception—production models using raw EMA data (without having to manually label specific articulatory landmarks). This also allows using unlabeled EMA traces in acquisition models without having to justify the employment of an a priori defined set of discrete gestural features or landmarks.

@MISC{duran++pup7, author = {Duran, Daniel and Bruni, Jagoda and Sch{\"u}tze, Hinrich and Dogil, Grzegorz}, title = {Using unlabeled {EMA} data in a speech production model with a rich memory}, howpublished = {Presentation at: 7. Tagung zu Phonetik und Phonologie im deutschsprachigen Raum (P\&P 7)}, month = {October}, year = {2011}, url = {http://www.home.uni-osnabrueck.de/tmeisenb/Duran_et_al.pdf} }

Daniel Duran, Jagoda Bruni, Grzegorz Dogil and Hinrich Schütze: Speech Events are Recoverable from Unlabeled Articulatory Data: Using an Unsupervised Clustering Approach on Data Obtained from Electromagnetic Midsaggital Articulography (EMA). Interspeech Conference Proceedings, 2011, pp. 2201-2204. Florence, Italy.

[Abstract] [BibTeX] [www]

Some models of speech perception/production and language acquisition make use of a quasi-continuous representation of the acoustic speech signal. We investigate whether such models could potentially profit from incorporating articulatory information in an analogous fashion. In particular, we investigate how articulatory information represented by EMA measurements can influence unsupervised phonetic speech categorization. By incorporation of the acoustic signal and non-synthetic, raw articulatory data, we present first results of a clustering procedure, which is similarly applied in numerous language acquisition and speech perception models. It is observed that nonlabeled articulatory data, i.e. without previously assumed landmarks, perform fine clustering results. A more effective clustering outcome for plosives than for vowels seems to support the motor view of speech perception.

@INPROCEEDINGS{duran++2011interspeech, author = {Duran, Daniel and Bruni, Jagoda and Dogil, Grzegorz and Sch{\"u}tze, Hinrich}, title = {Speech events are recoverable from unlabeled articulatory data: Using an unsupervised clustering approach on data obtained from Electromagnetic Midsaggital Articulography (EMA)}, booktitle = {Interspeech Conference Proceedings}, year = {2011}, pages = {2201-2204}, address = {Florence, Italy}, month = {August}, url = {http://www.isca-speech.org/archive/interspeech_2011/i11_2201.html} }

Daniel Duran, Jagoda Bruni, Hinrich Schütze and Grzegorz Dogil: Context Sequence Model of Speech Production Enriched with Articulatory Features. Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS), 2011, pp. 615-618. Hong Kong.

[Abstract] [BibTeX] [PDF]

This study describes integration of an articulatory factor into the exemplar-based Context Sequence Model of speech production, CSM, which builds on the concept of a speech perception-production loop. It has been demonstrated that selection of new exemplars for speech production is based on about 0.5 s of preceding acoustic context and following linguistic match of the exemplars. This investigation presents the role of the articulatory features integrated in the exemplar weighing processes.

@INPROCEEDINGS{duran++2011ICphS, author = {Duran, Daniel and Bruni, Jagoda and Sch\"{u}tze, Hinrich and Dogil, Grzegorz}, title = {Context Sequence Model of Speech Production Enriched with Articulatory Features}, booktitle = {Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS)}, year = {2011}, pages = {615-618}, address = {Hong Kong}, month = {August}, url = {http://www.icphs2011.hk/resources/OnlineProceedings/RegularSession/Duran/Duran.pdf} }

2010

Daniel Duran, Hinrich Schütze, Bernd Möbius and Michael Walsh: A Computational Model of Unsupervised Speech Segmentation for Correspondence Learning. Research on Language & Computation, 2010, Vol. 8, No. 2-3, pp. 133-168.

[Abstract] [BibTeX] [DOI: 10.1007/s11168-011-9075-4]

In this paper, we develop a new conceptual framework for an important problem in language acquisition, the correspondence problem: the fact that a given utterance has different manifestations in the speech and articulation of different speakers and that the correspondence of these manifestations is difficult to learn. We put forward the Correspondence-by-Segmentation Hypothesis, which states that correspondence is primarily learned by first segmenting speech in an unsupervised manner and then mapping the acoustics of different speakers onto each other. We show that a rudimentary segmentation of speech can be learned in an unsupervised fashion. We then demonstrate that, using the previously learned segmentation, different instances of a word can be mapped onto each other with high accuracy when trained on utterance-label pairs for a small set of words.

@ARTICLE{duran++2010RLC, author = {Duran, Daniel and Sch{\"u}tze, Hinrich and M{\"o}bius, Bernd and Walsh, Michael}, title = {A Computational Model of Unsupervised Speech Segmentation for Correspondence Learning}, journal = {Research on Language \& Computation}, year = {2010}, volume = {8}, pages = {133-168}, number = {2-3}, doi = {10.1007/s11168-011-9075-4}, url = {http://dx.doi.org/10.1007/s11168-011-9075-4} }

Daniel Duran, Hinrich Schütze and Bernd Möbius: Towards a computational model of unsupervised speech segmentation for correspondence learning. Presented at the Positional Phenomena in Phonology and Phonetics workshop, Glow 33, 2010. Wrocław, Poland.

[Abstract] [BibTeX] [PDF (Workshop abstract booklet)]

We address one of the fundamental components of linguistic competence: recognizing various percepts from different modalities as corresponding to the same underlying phone. Mature speakers have learned correspondences between all of the different manifestations of a phone: They are able to relate their own articulatory and auditory feedback as well as the perceived gestures and sounds from other speakers to specific phones. Because of the significant differences between these different percepts, a simple correlation analysis operating directly on the speech signal is unlikely to be the basis for learning correspondence. Our hypothesis is that the child is able to establish correspondence between acoustics and articulation after succeeding in segmenting speech.

@MISC{duran2010GLOW, author = {Daniel Duran and Hinrich Sch{\"u}tze and Bernd M{\"o}bius}, title = {Towards a computational model of unsupervised speech segmentation for correspondence learning}, howpublished = {Poster}, month = {April}, year = {2010}, note = {Presented at GLOW XXXIII, Wroc{\l}aw, Poland}, url = {http://www.ifa.uni.wroc.pl/~glow33/PHONO-ABSTR.pdf} }

contact: www.ims.uni-stuttgart.de/institut/mitarbeiter/durandl/