SlideShare a Scribd company logo
L&H Game ASDK for Sony PlayStation 2 Sony PlayStation2 Developers Conference San Jose, March 2001
L&H Games ASDK Presentation Lernout & Hauspie Technologies Products Speech in Games Games ASDK (our suggestion) Engine Properties Concepts Development Process Grammars Integration Engine: flexible, comprehensive feature-set, targeted
L&H Technologies
L&H Desktop Dictation Products Dragon NaturallySpeaking ® L&H Voice Xpress  ™ World most popular continuous dictation products SDK available for free download – build your own speech-enabled applications under Windows Jun 6, 2009
Diverse other Speech Products  based on L&H technologies Jun 6, 2009 CrimeTalk Reporter ST Microelectronics VX Medicine Xsara (Citroen) Seaman (Vivarium/Sega)
Lernout & Hauspie Speech Products A complete portfolio of Speech and Language technologies Speech Recognition(DNS), Text-to-Speech (RealSpeak), Speaker Verification, Speech and Music coders, Phonetic tools Intelligent Content Management tools (spelling and grammar checkers, topic identifier, summarizer, translators, audio mining) Most complete set of languages Broad range of (application-specific) development kits Optimized development kits for each market segment Speech and Language technologies : Ensure maximum interaction and sense of realism Allow for natural dialog and interaction Speech is the most powerful interface Jun 6, 2009
L&H Games ASDK Lernout & Hauspie Technologies Products Speech in Games Games ASDK (our suggestion) Engine Properties Concepts Development Process Grammars Integration Engine: flexible, comprehensive feature-set, targeted Jun 6, 2009
Speech Recognition for Games Speech as an interface Most natural & intuitive We all know how to use it Use speech & game-pad at the same time Examples Interact with NPCs Voice command and control of complex games Simulation games Realtime strategy & multiplayer RPGs Turn-based games Empire building Input to virtual/animated character Build a personal relationship moving strategic units and looking for strategic ressources) New genre for speech enabled games? Jun 6, 2009
L&H Games ASDK Lernout & Hauspie Technologies Products Speech in Games Games ASDK (our suggestion) Engine Properties Concepts Development Process Grammars Integration Engine: flexible, comprehensive feature-set, targeted Jun 6, 2009
L&H Games ASDK for PlayStation 2 The only vendor of Speech Recognition SDK for the PS2 Two parts Development Runtime Brings code into events and user interface Hides linguistic and phonetic knowledge Lets developers concentrate on program flow and functionality Jun 6, 2009
Types of Speech Recognition Continuous versus Discrete Command-and-Control versus Dictation Vocabulary size Speaker-independent versus Speaker-dependent Phoneme-based versus Word-based Environment (office, telephony, car, …) Language dependency Jun 6, 2009
Basics on the ASR1600 engine Basic characteristics of the ASR1600 continuous speech recognition engine command-and-control speaker-independent, with the ability to train words phoneme-based noise robust trained for office environment input audio format: 11 kHz 16 bit linear PCM language modules (speaker-independent) Jun 6, 2009
Definition of concepts Grammar list of sentences/words that can be recognized rule -based a rule can be e.g. a list of names or commands a rule can also be more complex, and e.g. define any string of one or more digits  the recognizer only considers the  start rules . one rule may refer to another ( local ) rule source grammar source code that describes the rules in a readable format syntax: L&H BNF grammar format binary grammar compiled source grammar binary format (unreadable) used by the engine can be saved to file Jun 6, 2009
Definition of concepts Context binary grammars can not be loaded directly on an engine one or more  binary grammars can be converted to  one  context rule resolution all start rules in parallel binary grammars must have been compiled for same langauge an engine must have  one and only one  context loaded context subsets (i.e. rules and/or grammars) can be dynamically activated or deactivated while the context is loaded on the engine. a context can not be saved to a file Jun 6, 2009
Development Process Offline grammar creation Offline development tools for Win32 platform Development of grammars and contexts Testing and tuning Runtime: Implementation / Integration Jun 6, 2009
Offline Context Creation Not part of the compile / debug / build cycle Tested on PC Exported to format suitable for GSAPI runtime system Jun 6, 2009
Grammar Specification L&H Bachus Naur format for grammar description Context Information File Specification of events & parameters On a per-context basis Jun 6, 2009
Representation of a Grammar Specified as a list by designer or programmer Think of it as a directed graph  Jun 6, 2009
Grammar Simplification When sentences are generated, the tool walks the graph and specifies longer paths in terms of shorter paths: faster go ret_faster_at_1 move ret_faster_at_1 Jun 6, 2009
L&H BNF Grammar !start<start>; <verb>:play|stop; <object>:music|radio; <start>:<verb>!optional(<object>); Jun 6, 2009 play, stop, play music, stop music, play radio, stop radio !optional to define an optional expression : the expression can be spoken, but does not have to.
L&H BNF Grammar !start <rule> ; !import <dirs> ; <rule>: go <dirs> ; Jun 6, 2009 !export <dirs> ; <dirs> : left | right ; grammar1.bnf grammar2.bnf go left,  go right import rule export rule context
Development Process Offline Context Creation Implementation / Integration Jun 6, 2009
GSAPI Implementation / Integration More like a server than just a library Call gsapi_ExecServer() every vsync  (we’re not dropping frames, now are we?) Initialization couldn’t be easier Initialize API Load a Language Open a recognizer Error = gsapi_Init(cbResult, 0); Error = gsapi_LanguageLoadBuffer(pLng, 0); Error = gsapi_EngineOpen(0, &hEngine); Jun 6, 2009
GSAPI Implementation / Integration Specify mode with gsapi_EngineSetMode() GSAPI_MODE_CONTINUOUS : Speech recognition is always active (always listening in the background…  privacy issues for Q&A ) GSAPI_MODE_SINGLE : Speech recognition is activated and deactivated (PTT, dialog) In Continuous mode, load and activate a context, and start the recognizer. In Single mode, begin recognition in response to an event (eg. button press)  The recognizer will automatically stop.  The programmer then has the opportunity to set another context, and to enable or disable rules. gsapi_EngineDisableAllWords() gsapi_EngineEnableWords() gsapi_EngineStart() Jun 6, 2009
Engine Operating Modes Record mode Not intended for recognition Wave data can be obtained through notification callback Jun 6, 2009
Result Callback Upon initializing GSAPI, provide a result callback: GSAPI_ERROR gsapi_Init(GSAPI_CBRESULT pResultCallback, unsigned long u32UserData) When a result is available, the engine calls it: VOID (* GSAPI_CBRESULT)(GSAPI_HENGINE hEngine, GSAPI_ACTION Action, GSAPI_RESULT* pResult, unsigned long u32UserData) Jun 6, 2009
Tips Don’t just translate controller input, create a SUI We can provide state-of-the art speech technologies, but don’t ignore limitations Plan and be creative Speech has strengths & weaknesses, so design it in from the start! Eg. Adding speech to a game that’s done: Speech will never seem to be a natural part of the application unless interface is redesigned so speech parts make more sense. User’s don’t want to learn commands by heart.  You can make it intuitive. Misrecognitions should not be fatal Smart responses can add to the experience  Jun 6, 2009
L&H Games ASDK Presentation Lernout & Hauspie Technologies Products Speech in Games Games ASDK (our suggestion) Engine Properties Concepts Development Process Grammars Integration Engine: flexible, comprehensive feature-set, targeted Jun 6, 2009
Engine Parameters Quick preview Demonstrate Flexibility Comprehensive set of features Chosen with the game developer in mind, but With many years of experience in the industry to know “what works” Jun 6, 2009
Engine parameters N-best maximum number of possible recognition results that should be returned per utterance Accuracy the maximum number of hypotheses that are simultaneously searched at any time: if the number of hypotheses grows beyond the accuracy parameter, the ones that score worst are removed from the search affects  the precision  of the recognizer  the CPU load and memory consumption Jun 6, 2009
Engine parameters Rejection confidence level gives an indication of how confident the engine is that the recognized result was actually spoken, compared to the  average speech model . the lower this parameter, the more the engine will lean towards the  average speech  model. does not influence the actual recognition results; only the confidence levels default is set as such that for a confidence level of 5000, the number of  false rejections  is equal to the number of  false acceptances Jun 6, 2009
Engine parameters Jun 6, 2009 threshold (confidence level) #FAs 5000 0 10000 False acceptance (FA) : a wrong result is accepted #FRs False rejection (FR) : a correct result is rejected
Engine parameters Garbage garbage model (<…>): absorbs speech that is of no importance the garbage parameter modules the match on the garbage model: the lower the parameter, the more the engine will lean towards the garbage model and the more keywords could be missed. Gender selection can be any combination of male, female, child. by default, all genders are active: engine detects the gender of the speaker can only be set if a language (i.e. context) is loaded on the engine all ASR1600 languages only support male, female and child Jun 6, 2009
Engine parameters Speech detection enables or disables the “ Voice Activity Detector ” (VAD). if the VAD detects speech in the input signal, the engine goes from low CPU mode to full recognition mode disable if application requires tight control over the starting point of speech Sensitivity minimal energy jump (dB/100) in the input signal that is perceived by the VAD as being speech the higher, the more deaf the engine is the lower, the higher the risk that the engine starts full recognition on background speech Minimum duration for start minimum duration of speech in milliseconds to trigger the VAD prevent the engine from reacting to short, high energy events Jun 6, 2009
Engine parameters Trailing silence detection enables or disables the trailing silence detector the trailing silence detector triggers the engine to calculate a result  disable if application requires tight control over the end point of speech Reaction time minimum trailing silence in milliseconds the higher it is, the lower the response time has to be longer than the duration of a pause between two words Time out indicates the longest time (in seconds) an utterance can take (low CPU mode included) set to zero to disable time-out Jun 6, 2009
Engine parameters Continuous mode if disabled: the engine stops automatically after each recognition if enabled: the engine recognizes multiple utterances one after the other, till the application explicitly aborts the recognition process if enabled, trailing silence detection + speech detection are enabled and time-out is disabled Far talk on/off parameter if off, then AGC is to use small increments when changing the volume settings Jun 6, 2009
Engine parameters AGC set to FALSE (0) to disable  if enabled: engine will request for optimal gain if engine recognizes directly from WaveIn device:  set to TRUE (1) to enable engine will adjust automatically gain of WaveIn device will be reset to 0 if WaveIn device does not allow AGC (immediately or when calling lhs_asrRecAcqOpen) if engine acquires audio data from the application:  set to current gain (1-65535) to enable requests are notified through a callback function Jun 6, 2009
L&H Games ASDK Lernout & Hauspie Technologies Products Speech in Games Games ASDK (our suggestion) Engine Properties Concepts Development Process Grammars Integration Engine: flexible, comprehensive feature-set, targeted Jun 6, 2009

More Related Content

PDF
Natural language processing in iOS / OSX
PPTX
There should be a tool for that - GameQALoc Barcelona 2016
PDF
Minimizing Human Error in Localization
PPTX
Careers in software development
PPTX
Minimizing Human Errors in Game Localization
PPTX
Playstation 2 slideshow
PDF
General Speereo Technology
PPTX
Aplikace pro rozpoznávání řeči - Jan Šedivý
Natural language processing in iOS / OSX
There should be a tool for that - GameQALoc Barcelona 2016
Minimizing Human Error in Localization
Careers in software development
Minimizing Human Errors in Game Localization
Playstation 2 slideshow
General Speereo Technology
Aplikace pro rozpoznávání řeči - Jan Šedivý

Similar to Games Speech ASDK (20)

PDF
Speech recognizers & generators
PDF
Paper on Speech Recognition
PPTX
Artificial Intelligence - An Introduction
PPTX
Artificial Intelligence- An Introduction
PPTX
Dilpreetanshika major project
PDF
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
PPTX
Speech Recognition
PPTX
Speech recognition challenges
PPSX
Speech recognition an overview
PPTX
Jeeves -natural language interface application
PPT
Asr
PPT
Dialogue systems and personal assistants
PDF
CS-321 Compiler Design computer engineering PPT.pdf
PPTX
Automatic subtitle generation
PDF
4.language expert rendering unicode text on ascii editor for indian languages...
PPTX
Speech Analysis
PPT
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
PDF
Artificial Intelligence for Speech Recognition
PPT
Sslis
Speech recognizers & generators
Paper on Speech Recognition
Artificial Intelligence - An Introduction
Artificial Intelligence- An Introduction
Dilpreetanshika major project
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
Speech Recognition
Speech recognition challenges
Speech recognition an overview
Jeeves -natural language interface application
Asr
Dialogue systems and personal assistants
CS-321 Compiler Design computer engineering PPT.pdf
Automatic subtitle generation
4.language expert rendering unicode text on ascii editor for indian languages...
Speech Analysis
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Artificial Intelligence for Speech Recognition
Sslis
Ad

Games Speech ASDK

  • 1. L&H Game ASDK for Sony PlayStation 2 Sony PlayStation2 Developers Conference San Jose, March 2001
  • 2. L&H Games ASDK Presentation Lernout & Hauspie Technologies Products Speech in Games Games ASDK (our suggestion) Engine Properties Concepts Development Process Grammars Integration Engine: flexible, comprehensive feature-set, targeted
  • 4. L&H Desktop Dictation Products Dragon NaturallySpeaking ® L&H Voice Xpress ™ World most popular continuous dictation products SDK available for free download – build your own speech-enabled applications under Windows Jun 6, 2009
  • 5. Diverse other Speech Products based on L&H technologies Jun 6, 2009 CrimeTalk Reporter ST Microelectronics VX Medicine Xsara (Citroen) Seaman (Vivarium/Sega)
  • 6. Lernout & Hauspie Speech Products A complete portfolio of Speech and Language technologies Speech Recognition(DNS), Text-to-Speech (RealSpeak), Speaker Verification, Speech and Music coders, Phonetic tools Intelligent Content Management tools (spelling and grammar checkers, topic identifier, summarizer, translators, audio mining) Most complete set of languages Broad range of (application-specific) development kits Optimized development kits for each market segment Speech and Language technologies : Ensure maximum interaction and sense of realism Allow for natural dialog and interaction Speech is the most powerful interface Jun 6, 2009
  • 7. L&H Games ASDK Lernout & Hauspie Technologies Products Speech in Games Games ASDK (our suggestion) Engine Properties Concepts Development Process Grammars Integration Engine: flexible, comprehensive feature-set, targeted Jun 6, 2009
  • 8. Speech Recognition for Games Speech as an interface Most natural & intuitive We all know how to use it Use speech & game-pad at the same time Examples Interact with NPCs Voice command and control of complex games Simulation games Realtime strategy & multiplayer RPGs Turn-based games Empire building Input to virtual/animated character Build a personal relationship moving strategic units and looking for strategic ressources) New genre for speech enabled games? Jun 6, 2009
  • 9. L&H Games ASDK Lernout & Hauspie Technologies Products Speech in Games Games ASDK (our suggestion) Engine Properties Concepts Development Process Grammars Integration Engine: flexible, comprehensive feature-set, targeted Jun 6, 2009
  • 10. L&H Games ASDK for PlayStation 2 The only vendor of Speech Recognition SDK for the PS2 Two parts Development Runtime Brings code into events and user interface Hides linguistic and phonetic knowledge Lets developers concentrate on program flow and functionality Jun 6, 2009
  • 11. Types of Speech Recognition Continuous versus Discrete Command-and-Control versus Dictation Vocabulary size Speaker-independent versus Speaker-dependent Phoneme-based versus Word-based Environment (office, telephony, car, …) Language dependency Jun 6, 2009
  • 12. Basics on the ASR1600 engine Basic characteristics of the ASR1600 continuous speech recognition engine command-and-control speaker-independent, with the ability to train words phoneme-based noise robust trained for office environment input audio format: 11 kHz 16 bit linear PCM language modules (speaker-independent) Jun 6, 2009
  • 13. Definition of concepts Grammar list of sentences/words that can be recognized rule -based a rule can be e.g. a list of names or commands a rule can also be more complex, and e.g. define any string of one or more digits the recognizer only considers the start rules . one rule may refer to another ( local ) rule source grammar source code that describes the rules in a readable format syntax: L&H BNF grammar format binary grammar compiled source grammar binary format (unreadable) used by the engine can be saved to file Jun 6, 2009
  • 14. Definition of concepts Context binary grammars can not be loaded directly on an engine one or more binary grammars can be converted to one context rule resolution all start rules in parallel binary grammars must have been compiled for same langauge an engine must have one and only one context loaded context subsets (i.e. rules and/or grammars) can be dynamically activated or deactivated while the context is loaded on the engine. a context can not be saved to a file Jun 6, 2009
  • 15. Development Process Offline grammar creation Offline development tools for Win32 platform Development of grammars and contexts Testing and tuning Runtime: Implementation / Integration Jun 6, 2009
  • 16. Offline Context Creation Not part of the compile / debug / build cycle Tested on PC Exported to format suitable for GSAPI runtime system Jun 6, 2009
  • 17. Grammar Specification L&H Bachus Naur format for grammar description Context Information File Specification of events & parameters On a per-context basis Jun 6, 2009
  • 18. Representation of a Grammar Specified as a list by designer or programmer Think of it as a directed graph Jun 6, 2009
  • 19. Grammar Simplification When sentences are generated, the tool walks the graph and specifies longer paths in terms of shorter paths: faster go ret_faster_at_1 move ret_faster_at_1 Jun 6, 2009
  • 20. L&H BNF Grammar !start<start>; <verb>:play|stop; <object>:music|radio; <start>:<verb>!optional(<object>); Jun 6, 2009 play, stop, play music, stop music, play radio, stop radio !optional to define an optional expression : the expression can be spoken, but does not have to.
  • 21. L&H BNF Grammar !start <rule> ; !import <dirs> ; <rule>: go <dirs> ; Jun 6, 2009 !export <dirs> ; <dirs> : left | right ; grammar1.bnf grammar2.bnf go left, go right import rule export rule context
  • 22. Development Process Offline Context Creation Implementation / Integration Jun 6, 2009
  • 23. GSAPI Implementation / Integration More like a server than just a library Call gsapi_ExecServer() every vsync (we’re not dropping frames, now are we?) Initialization couldn’t be easier Initialize API Load a Language Open a recognizer Error = gsapi_Init(cbResult, 0); Error = gsapi_LanguageLoadBuffer(pLng, 0); Error = gsapi_EngineOpen(0, &hEngine); Jun 6, 2009
  • 24. GSAPI Implementation / Integration Specify mode with gsapi_EngineSetMode() GSAPI_MODE_CONTINUOUS : Speech recognition is always active (always listening in the background… privacy issues for Q&A ) GSAPI_MODE_SINGLE : Speech recognition is activated and deactivated (PTT, dialog) In Continuous mode, load and activate a context, and start the recognizer. In Single mode, begin recognition in response to an event (eg. button press) The recognizer will automatically stop. The programmer then has the opportunity to set another context, and to enable or disable rules. gsapi_EngineDisableAllWords() gsapi_EngineEnableWords() gsapi_EngineStart() Jun 6, 2009
  • 25. Engine Operating Modes Record mode Not intended for recognition Wave data can be obtained through notification callback Jun 6, 2009
  • 26. Result Callback Upon initializing GSAPI, provide a result callback: GSAPI_ERROR gsapi_Init(GSAPI_CBRESULT pResultCallback, unsigned long u32UserData) When a result is available, the engine calls it: VOID (* GSAPI_CBRESULT)(GSAPI_HENGINE hEngine, GSAPI_ACTION Action, GSAPI_RESULT* pResult, unsigned long u32UserData) Jun 6, 2009
  • 27. Tips Don’t just translate controller input, create a SUI We can provide state-of-the art speech technologies, but don’t ignore limitations Plan and be creative Speech has strengths & weaknesses, so design it in from the start! Eg. Adding speech to a game that’s done: Speech will never seem to be a natural part of the application unless interface is redesigned so speech parts make more sense. User’s don’t want to learn commands by heart. You can make it intuitive. Misrecognitions should not be fatal Smart responses can add to the experience Jun 6, 2009
  • 28. L&H Games ASDK Presentation Lernout & Hauspie Technologies Products Speech in Games Games ASDK (our suggestion) Engine Properties Concepts Development Process Grammars Integration Engine: flexible, comprehensive feature-set, targeted Jun 6, 2009
  • 29. Engine Parameters Quick preview Demonstrate Flexibility Comprehensive set of features Chosen with the game developer in mind, but With many years of experience in the industry to know “what works” Jun 6, 2009
  • 30. Engine parameters N-best maximum number of possible recognition results that should be returned per utterance Accuracy the maximum number of hypotheses that are simultaneously searched at any time: if the number of hypotheses grows beyond the accuracy parameter, the ones that score worst are removed from the search affects the precision of the recognizer the CPU load and memory consumption Jun 6, 2009
  • 31. Engine parameters Rejection confidence level gives an indication of how confident the engine is that the recognized result was actually spoken, compared to the average speech model . the lower this parameter, the more the engine will lean towards the average speech model. does not influence the actual recognition results; only the confidence levels default is set as such that for a confidence level of 5000, the number of false rejections is equal to the number of false acceptances Jun 6, 2009
  • 32. Engine parameters Jun 6, 2009 threshold (confidence level) #FAs 5000 0 10000 False acceptance (FA) : a wrong result is accepted #FRs False rejection (FR) : a correct result is rejected
  • 33. Engine parameters Garbage garbage model (<…>): absorbs speech that is of no importance the garbage parameter modules the match on the garbage model: the lower the parameter, the more the engine will lean towards the garbage model and the more keywords could be missed. Gender selection can be any combination of male, female, child. by default, all genders are active: engine detects the gender of the speaker can only be set if a language (i.e. context) is loaded on the engine all ASR1600 languages only support male, female and child Jun 6, 2009
  • 34. Engine parameters Speech detection enables or disables the “ Voice Activity Detector ” (VAD). if the VAD detects speech in the input signal, the engine goes from low CPU mode to full recognition mode disable if application requires tight control over the starting point of speech Sensitivity minimal energy jump (dB/100) in the input signal that is perceived by the VAD as being speech the higher, the more deaf the engine is the lower, the higher the risk that the engine starts full recognition on background speech Minimum duration for start minimum duration of speech in milliseconds to trigger the VAD prevent the engine from reacting to short, high energy events Jun 6, 2009
  • 35. Engine parameters Trailing silence detection enables or disables the trailing silence detector the trailing silence detector triggers the engine to calculate a result disable if application requires tight control over the end point of speech Reaction time minimum trailing silence in milliseconds the higher it is, the lower the response time has to be longer than the duration of a pause between two words Time out indicates the longest time (in seconds) an utterance can take (low CPU mode included) set to zero to disable time-out Jun 6, 2009
  • 36. Engine parameters Continuous mode if disabled: the engine stops automatically after each recognition if enabled: the engine recognizes multiple utterances one after the other, till the application explicitly aborts the recognition process if enabled, trailing silence detection + speech detection are enabled and time-out is disabled Far talk on/off parameter if off, then AGC is to use small increments when changing the volume settings Jun 6, 2009
  • 37. Engine parameters AGC set to FALSE (0) to disable if enabled: engine will request for optimal gain if engine recognizes directly from WaveIn device: set to TRUE (1) to enable engine will adjust automatically gain of WaveIn device will be reset to 0 if WaveIn device does not allow AGC (immediately or when calling lhs_asrRecAcqOpen) if engine acquires audio data from the application: set to current gain (1-65535) to enable requests are notified through a callback function Jun 6, 2009
  • 38. L&H Games ASDK Lernout & Hauspie Technologies Products Speech in Games Games ASDK (our suggestion) Engine Properties Concepts Development Process Grammars Integration Engine: flexible, comprehensive feature-set, targeted Jun 6, 2009