VoiceType (Speech Recognition)

What Is VTD?

OS/2 Warp 4.0 (formerly known as "Merlin") includes a technology IBM calls "Voice Type Dictation" ("VTD"), in which the computer recognizes the spoken word in addition to keyboard, mouse, and other traditional computer inputs. This technology was previously available for some Aptiva models and as an (expensive) add-on hardware/software product. Warp 4.0 includes a version that works with ordinary hardware. VTD requires at least 24-32MB of RAM and a mid-range Pentium (P90, say) to function well, along with a 16-bit sound card with OS/2 MMPM/2 drivers that can handle the DART multimedia extensions. VTD requires the Pentium not so much for its raw speed as for its superior floating-point capability. When using certain non-Intel CPUs, then, such as the Cyrix or IBM 6x86, it's often necessary to use a somewhat faster CPU, since these CPUs usually include different floating-point capabilities (many NexGen CPUs don't include a floating point unit at all). The 16-bit sound card is required because 8 bits is not enough resolution to handle voice with accuracy. I know of no way around this limitation, except for older GUS cards, for which the latest Manley drivers can record at 8 bits but convert this to 16 bits for VTD -- but I don't know how well VTD functions with this.

The version of VTD that shipped with some Aptivas could use an Mwave DSP to offload much of the processing requirements, hence lowering the necessary CPU level. Warp 4.0's VTD, however, does not include this support, though Mwave boards are still useable as sound input devices for VTD. Some other sound cards include DSPs, but they are mostly much less powerful than the Mwave DSPs, and so it's unlikely they would be able to help much in this respect, even if somebody were to attempt to write VTD support code for them (not a trivial undertaking in itself).

Supported Sound Boards

Boards that have been reported to work well with VTD include the Creative Labs 16, 32, and AWE32 boards, some ESS-based products, all Gravis boards (with the Manley 1.20 beta drivers; this includes the older GUS boards with 8-bit input); assorted Mwave-based boards; the MediaVision PAS (using a dynamic microphone or a battery adapter pack with the microphone that comes with Warp 4.0); an Ensoniq Soundscape (but only after changing a jumper setting); and assorted Crystal Semiconductor-based boards.

Boards that have been reported not to work include all 8-bit cards except the GUS boards with 8-bit input; Aztech boards; most Reveal boards; most OPTi-based boards that can't be made to work with other drivers; and some Crystal Semiconductor-based products.

Note that some boards appear on both the "good" and "bad" lists. This may be due to different driver revisions, conflicts with other hardware, installation problems, or different microphones. (For information on microphones, check Shure's web page, which has lots of good technical information on this topic.)

VTD and Full-Duplex Sound

VTD, unfortunately, cuts off OS/2's ability to play back system sounds whenever VTD is in use. In principle, a full-duplex sound card (see the Technology section) would permit the use of system sounds while VTD is active, but in practice this won't work. I don't understand the details of why, but apparently DART, which VTD uses, disables output when VTD uses input, so even full-duplex sound boards like the Gravis or Mwave-based products go mute when VTD is in use. I know of no way around this limitation.
Copyright © 1996, 1997 Rod Smith, rodsmith@rodsbooks.com
Go to the next section
Return to the OS/2 Soundcard Summary main page