Atsushi Fukada, Maki Hirotani, Kazumi Matsumoto Cantrell
This site demonstrates how to collect, edit, and annotate speech samples, and generate objective measures (fluency, accuracy, and complexity measures). The tools used are:
Audio samples can be collected using any audio recording device or audio recording software on a computer. Recording one sample at a time, however, is extremely time-consuming. For example, suppose a researcher wants to collect four narrative tasks from 30 students over the course of a semester. (S)he will have to schedule 120 recording sessions. Speak Everywhere (http://speak-everywhere.com) can be deployed to facilitate this task. It is a web-based oral practice/assessment platform. It allows foreign language instructors to create online oral exercises of various formats. When students work on them, their productions are captured automatically and submitted to the server. The instructors can review the submissions, grade them, and/or give feedback on them. For fluency studies, monologue, Q&A, and oral reading tasks may be useful. If one uses Speak Everywhere in their teaching, students’ audio samples accumulate automatically, making audio data collection extremely easy. With Speak Everywhere, all (s)he has to do is set up speaking tasks and assign them at appropriate times. A sample Speak Everywhere screen is shown below:
*Speak Everywhere is a commercial product.
Assuming that you have audio samples now, this section demonstrates how to get them ready for Fluency Calculator.
(1) Pre-editing audio files
Follow these steps to pre-edit audio files. For this editing, any full-featured audio editing program can be used such as Audacity, a well-known freeware program.
Watch this video for a demonstration.
(2) Run audio analysis tool Syllable Nuclei and get ready for annotation work
Syllable Nuclei is a Praat script written by Nivja de Jong and Ton Wempe (De Jong & Wempe 2008; 2009). It is a script that runs on Praat. Praat (http://www.praat.org) was written by Paul Boersma and David Weenink and is designed to perform a variety of speech analyses. Since it is highly programmable, researchers can write scripts to create custom analysis programs. Syllable Nuclei is an example of such a script. It recognizes sounding and silent portions and identifies syllables.
The following clip demonstrates how to run Syllable Nuclei to generate a TextGrid file. It also shows how to get five annotation tiers ready (SYLL, PAUS, ASUN, DYSF, and SENT).
(3) Edit syllables
Syllable Nuclei automatically detects syllables, but it makes errors. You need to go through and make corrections. The following video first shows how to play sounding segments. To play a segment, click a sounding segment on the PAUS tier to select it and press the tab key. (The tab also works as a stop button.) The video then shows how to add a syllable on the SYLL tier. Don't worry about the placement of an added syllable. An approximate location will do. To delete a syllable, select one and press Alt-Backspace.
(4) Work on the PAUS tier
On the PAUS tier, play each segment to make sure the boundaries are correct. Move the boundary lines to make adjustments as necessary. Hesitations like "Um..." count as filled pauses. Make such segments as "fp" as demonstrated in the following video.
(5) Mark dysfluency features
Identify dysfluency features like self-corrections (SC), repetitions (RP), and stuttering (ST) and mark them as such, as demonstrated in the video.
(6) Mark AS-Units and sentences
Identify AS-Units and mark them as either "E+" (AS-Unit that contains errors) or "E-" (error-free AS-Unit) as shown in the video. If an AS-Unit is not a clause, mark it as "E". It will be counted as an AS-Unit for the purpose of calculating pause-related measures, but will NOT be included in accuracy-related computations. If you want to annotate the number of clauses per AS-Unit, use this format: E+N or E-N, where N is a single digit from 0 to 9.
Also, mark sentences on the SENT tier.
(7) Save your annotations
To save your annotation work, select File / Save TextGrid as text file. (Do this often!) All TextGrid files must end with .TextGrid. File names are used as subject IDs and show up in the first column of the output file..
The last step is completely automated. Run CAF Calculator, click Browse and select the data folder you created above, and click START PROCESSING. As long as your TextGrid files are correctly formatted, the program should terminate normally and an EXCEL file titled "measures.csv" should appear in the data folder.
The measures calculated are as follows:
1. Total response time. The time in seconds from the beginning of an audio response to the end of it.
2. Speech time. The sum of all sounding intervals in seconds (excluding fillers).
3. Total number of syllables. All syllables in the file.
4. Number of AS-Units.
5. Number of error-free AS-Units.
6. Number of AS-units with errors.
7. Error-free AS-unit ratio. (Number of error-frree AS-Units) / (Number of error-frree AS-Units + Number of AS-Units with errors) * 100
8. Number of sentences.
9. Clause count. The number of clauses in file.
10. Syntactic complexity. Clause count / Number of AS-Units
11. Silent pause count. The number of all silent pauses.
12. Silent pause time. The time in seconds of the sum of the duration of all silent pauses.
13. Filled pause count. The number of all filled pauses.
14. Filled pause time. The time in seconds of the sum of the duration of all filled pauses.
15. Silent pause count within AS. The number of silent pauses within AS-Unit intervals.
16. Silent pause time within AS. The time in seconds of the sum of the duration of silent pauses falling within AS-Units.
17. Silent pause count between AS. The number of silent pauses between AS-Unit intervals.
18. Silent pause time between AS. The time in seconds of the sum of the duration of silent pauses falling outside AS-Units.
19. Filled pause count within AS. The number of filled pauses within AS-Unit intervals.
20. Filled pause time within AS. The time in seconds of the sum of the duration of filled pauses falling within AS-Units.
21. Filled pause count between AS. The number of filled pauses between AS-Unit intervals.
22. Filled pause time between AS. The time in seconds of the sum of the duration of filled pauses falling outside AS-Units.
23. Speech rate. (Total number of syllables) / (Total response time) * 60
24. Articulation rate. (Total number of syllables) / (Speech time + Filled pause time) *60
25. Mean length run. (Total number of syllables) / (Number of runs) where a run is a sounding interval
26. Silent pause ratio. Silent pause time as a percentage of Total response time.
27. Phonation time ratio. (Speech time) / (Total response time) * 100
28. Silent and filled pause ratio. (Silent pause time + Filled pause time) / (Total response time) * 100
29. Silent pause ratio within AS. (Silent pause time within AS-unit)/(Total response time)*100
30. Silent and filled pause ratio within AS. (Silent pause time within AS-Unit + Filled pause time within AS-Unit)/(Total response time)*100
31. Ratio of silent pause time between AS to total response time. (Silent pause time between AS-unit)/(Total response time)*100
32. Ratio of silent and filled pause time between AS to total response time. (Silent pause time between AS-unit + Filled pause time between AS-unit)/(Total response time)*100
33. Repeat count. The number of repeat intervals (RP) on the DYSF tier.
34. Reformulation count. The number of reformulation intervals (RF) on the DYSF tier.
35. Stutter count. The number of stutter intervals (ST) on the DYSF tier.
36. Self-correction count. The number of self-correction intervals (SC) on the DYSF tier.
37. Repeat time. The total duration of repeat intervals (RP) on the DYSF tier.
38. Reformulation time. The total duration of reformulation intervals (RF) on the DYSF tier.
39. Stutter time. The total duration of stutter intervals (ST) on the DYSF tier.
40. Self-correction time. The total duration of self-correction intervals (SC) on the DYSF tier.
41. DYSF time. The total duration of all dysfluency intervals on the DYSF tier.
42. Repeat ratio. (Repeat time) / (Total Response Time) * 60
43. Reformulation ratio. (Reformulation time) / (Total Response Time) * 60
44. Stutter ratio. (Stutter time) / (Total Response Time) * 60
45. Self-correction ratio. (Self-correction time) / (Total Response Time) * 60
46. DYSF ratio. (DYSF time) / (Total Response Time) * 60
47. Effective syllable count. Total number of syllables – syllables in repeat, stutter, and self-correction intervals
48. AS-Unit time. The time in seconds of the sum of the duration of all AS-Units.
49. AS-Unit speech rate. Effective syllable count / AS-Unit time * 60
50. Sounding count. The number of all sounding intervals.