A method for the automated production of the MathSpeak™ versions of Math and Science materials is also needed in order to make a useful product. This method will be a module that will be added in to gh's existing production process known as the Media Conversion Process™ (MCP). This system, which will be explained in more detail below, represents proprietary technologies of gh which are significant competitive advantages in an industry where the use of technology is the exception rather than the norm. It is expected that the MathSpeak™ technology would be added into the MCP as an additional media product. Interestingly, the dissemination of computer-generated audio MathSpeak™ files may take place via any of four potential distribution channels: hard-copy (Audio CD or cassette), electronic (Digital Talking Book), telephone (audio only), and Interactive Cable (visual and audio rendering).
The way that MCP works is to accept the client-provided input format and convert that format into an internal XML-based standard. This process involves the efforts of a Data Processing Specialist, who uses a semi-automated, custom gh toolset to visually format and markup the data prior to automated conversion to XML. This XML data is then passed through the gh conversion engines in the processing-output stage, and produced as a variety of outputs. The output creation involves the production of the desired output from the XML data using a multitude of custom gh conversion tools. The processing-output stage is 99% automated and requires only the supervision of a skilled gh Translator. During each stage of the process, and especially after the output is produced, the product is reviewed by a QC Specialist.
MathSpeak™ will fit quite seamlessly into the existing MCP production facilities at gh. During the project, a number of specific research questions specifically geared towards inclusion of MathSpeak™ into MCP will be addressed. These questions can be divided into three main stages: 1) Creation of XML-based specification for MathSpeak™ - XML Schema development, 2) XSLT from gh XML to MathSpeak™ Voice XML, and 3) Automated generation of audio file from MathSpeak™ VoiceXML file. XML is a universal method for data storage and exchange that is used heavily in the gh MCP. XSLT, or eXtensible Stylesheet Transformation Language, is a method by which one "flavor" of XML can be converted to another. In general, the process of converting a source document into a MathSpeak™ audio product occurs in three main steps, as shown below:
The input stage involves the re-authoring of the source material into MathML format using proprietary gh authoring tools. This input is then converted using Process I above into a proprietary "gh XML" format. This part of the MCP is already developed and in use for other media types.
The second process O is the step that will require development for the MathSpeak™ product to be integrated into the gh MCP. This step converts the gh XML into a more specific "flavor" of XML, such as VoiceXML, which is useful to produce the output. This is typically accomplished by use of XSLT. Next, a rendering engine is used to automatically create the output product as an electronic file, from which physical hard copies can be mastered. A summary of this process is shown below:
Step Ox involves an XSLT to convert the gh XML into VoiceXML, which can be used to automatically generate computer-synthesized speech. Step Oy involves the actual generation of this computer-synthesized speech as an electronic master audio file. Finally, step Oz produces the physical copies of the book or test on Audio CD's (or CD-ROM's) for use by the individual customers. Steps Ox and Oy do not currently exist for MathSpeak, although gh has some experience with computer-generated audio for non-math content which will be helpful in developing these steps. Step Oz is already in use by gh for the production of human audio recordings for several customers.
An XML Schema is a special file that defines the features (including elements and their attributes) of the core XML specification. For example, the commonly-used DTD (Document Type Definition) is an XML Schema. gh will develop a Schema for MathSpeak™ that will encompass all of the needed features of MathSpeak™ as a specific subset of both the general gh XML and MathML (both Presentational and Semantic), which is the coding language of choice for mathematics. This Schema will be developed using the MSXML v4.0 SDK and will conform to the proposed W3C XML 2.0 specification.
This process is largely theoretical in nature and will involve more time spent thinking about the interrelationships of mathematical entities and their speech analogs than time spent actually writing code. The critical part of this step is to develop a specification that affords a one-to-one relationship between each fundamental mathematical entity in MathML and each spoken representation as defined by the MathSpeak™ specification developed in the first part of the research. The trick with the development of the MathSpeak™ XML Schema will be to encompass the elements and attributes of MathML above within a special "gh Namespace" used specifically for accessibility purposes. In other words, additional information will be required above and beyond the MathML code given above to ensure that the desired output features (such as pronunciation) will be met.
During this step XSLT will be used to convert the internal gh XML file into the actual VoiceXML file needed for generation of audio. VoiceXML is an XML standard that is used primarily for speech recognition purposes by large phone companies; however, gh uses it for the production of speech output as opposed to speech input. The XSLT will replace each MathSpeak™ construct with an instruction to the speech rendering engine of what, and how, to speak the element.
Note that the original elements such as the MathML <mfrac> ... </mfrac> element, which is used as a container for a fraction, will be converted to the MathSpeak™ reserved words BEGIN FRACTION ... END FRACTION by the XSLT. These reserved words will be themselves surrounded by VoiceXML instructions to the TTS engine to pause slightly and change the voice from male to female in order to improve clarity for the listener. Of course many other audio enhancements can be done with VoiceXML as well.
After the VoiceXML file has been generated, the actual master audio file must be automatically created. This is done with the assistance of a Text-to-Speech (TTS) engine. A TTS engine converts the VoiceXML document into a sequence of phonemes, or basic units of sound, along with special commands as to how those phonemes should be synthesized into an audio file that a user can listen to. gh currently uses off-the-shelf TTS software for in-house audio generation. For the MathSpeak™ project, however, gh will develop a specialized TTS engine for the correct pronunciation, diction, clarity, and audio effects needed for proper rendering of the math content.