Audio Recognition - Non-Speech Signal

Audio recognition for non-speech signals, such as music, natural sound, animal sound and human non-speech sound (coughing, crying, laughing, sneezing, ...), has found rapidly increasing applications in many fields including military, security, environment, and robots/toys.

As examples, under a contract with SONY, Japan, CYBIT successfully developed a new technology, audio pattern matching, for automatic audio quality control during mass production of Blu-ray disc players. Recently, CYBIT has also developed a unique technology for (human) non-speech recognition for a customer's toy product. Zerb  
 

Audio Recognition - Speech Signal

Under many hands-busy and eyes-busy conditions, voice command and control using speech recognition provdes the best means for human-machine interface. SR is a language independent speech recognition technology, as it can be used for any language. The reason for such language-independence is because SR requires user training (with the user's chosen language) before use. The training is very easy - only ONE utterance for each speech template to be recognized. In most applications, the training is very beneficial as the machine/device would only respond to the owner's voice command.

SR is useful for many hardware and software applications, but with its small hardware resource requirement, it is especially applicable for mobile and portable electronics devices, including game pads, cellphones, PDA's, wireless-controlled robot, and so on.

SR as described below applies to 4-kHz telephone-bandwidth speech input (with 8-kHz sample rate) for low-cost implemenattion. For applications that require higher recognition accuracy, SR can be easily extended for wider-bandwidth speech input (such as 7-kHz or higher) for further enhanced recognition performance.

 

FEATURES

  • Applicable to all languages.
  • Accurate recognition of short to long utterances: sentences, phrases, digits, English alphabet (A-Z).
  • Minimal training for new speakers: TWO utterances for each speech template (more training is better but two are sufficient).
  • Flexible vocabulary size depending on applications.
  • Input format: 16 bit/sample linear PCM, 8-kHz sample rate.
 

PERFORMANCE

  • Recognition Accuracy: tested under quiet condition - digit (99.7%), phrase (99%, size of 20), A-Z (90%).
  • Field Test: tested under driving conditions on highways and local roads - at 35 mph: 98% accuracy for spoken phrases; at 60 mph: 95% accuracy for spoken phrases.

End of Page