SlideShare a Scribd company logo
09.25.2012
September 25, 2012




AT&T SPEECH API DEEP DIVE
                      Michael Owens (@mko on Twitter, mowens on Github)
                      Jay Lieske ( jay.lieske@att.com, jayatyp on Github)




                                                                                                                                  AT&T Developer Program
   ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
WHAT IS THE
    AT&T SPEECH API?




2
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
How the
    AT&T
    Speech
    API Works




2
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Powered by AT&T WATSON℠
    • Developed 20+ years
    • Optimized for different usage scenarios:
      • Web Search
      • Business Search
      • Question & Answer
      • Voicemail-to-Text
      • Short Message (SMS)
      • TV Search/Remote (U-Verse)
      • Generic Speech-to-Text
2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Simple Speech-to-Text
    • One REST endpoint
    • Accepts audio in WAV or AMR
    • Structured JSON response
       • Text spoken by user
       • Metrics to evaluate recognition quality
    • AT&T Native SDKs for Android and iOS
     handle audio capture and streaming




2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Apps in the Wild




    AT&T-Translator                                                                               Speak4it                          U4Verse-Easy-Remote



2
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                       AT&T Developer Program
GETTING STARTED
    WITH THE AT&T
    SPEECH API




3
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Sign Up for API Access
    • j.mp/ATTDevSignUp
    • Free API Access for
     DevLab Attendees
    • Detailed Instructions in
     your Attendee Packet
    • Sign up with code
     “APILAB12”
    • AT&T Staff is on hand to
     answer questions and
     help get you set up

2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Before You Code
    • Get your API Keys from Developer portal:
      • Client ID (“API Key” on the AT&T Developer Portal)
      • Client Secret (“Secret Key” on the AT&T Developer Portal)
    • OAuth 2.0 client_credentials grant type
    • OAuth 2.0 access_token
    • Audio File Types:
      • AMR: narrowband, 12.2 kbits/s, 8 kHz sampling
      • WAV: 16 bit PCM WAV, single channel, 8 kHz sampling
    • Audio File Length:
      • Voicemail: 4 minutes or less
      • Other: 1 minute or less


2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Step 1: Connect via OAuth
    Request Method:                                                              POST
    Request URL:                                                                 https://api.att.com/oauth/token

    Request Headers: Content-Type: application/x-www-form-
                                                                                 urlencoded
    Request Body:                                                                client_id=ATT_API_CLIENT_ID
                                                                                 &client_secret=ATT_API_CLIENT_SECRET
                                                                                 &grant_type=client_credentials
                                                                                 &scope=SPEECH

    Response Body:                                                               {
                                                                                              "access_token": "xxyz123"
                                                                                 }




2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Step 2: POST Audio to AT&T
                                                        (Non-Streaming HTTP Request)
    Request Method: POST
    Request URL:    https://api.att.com/rest/1/SpeechToText
    Request Headers: Accept: application/json
                                                                                 Authorization: Bearer xxyz123
                                                                                 Content-Type: audio/wav
                                                                                 Content-Length: 1534
                                                                                 X-SpeechContext: BusinessSearch
    Request Body:                                                                 AUDIO_BINARY_DATA
    Note: The Audio Binary Data
    goes directly in POST Body,
    not a MIME Attachment.


2
        ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                       AT&T Developer Program
Step 2: POST Audio to AT&T
                                                                                     (Streaming HTTP Request)
    Request Method: POST
    Request URL:    https://api.att.com/rest/1/SpeechToText
    Request Headers: Accept: application/json
                                                                                Authorization: Bearer xxyz123
                                                                                Content-Type: audio/amr
                                                                                Transfer-Encoding: chunked
                                                                                X-SpeechContext: QuestionAndAnswer
    Request Body:                                                               200
    Note: Numbers are the                                                       AUDIO_BINARY_DATA_CHUNK
    recommended chunk size                                                      200
    in hexadecimal format.                                                      AUDIO_BINARY_DATA_CHUNK
                                                                                0
2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
AT&T SPEECH API
    EXAMPLE
    APPLICATION
    Download the Source:
    https://github.com/attdevsupport/2012DevLabExamples




4
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Transcription in Three Steps
         1. Capture Audio Input                                                              2. POST Audio to AT&T                      3. Use AT&T API Response

    Capturing audio input differs                                                   Once the audio input has been                     The AT&T API sends back a very
    from platform to platform.                                                      captured, we send the                             easy to parse JSON object with
                                                                                    compatible audio file from our                     the interpreted text.
    In our Basic Example, we use a                                                  server to the Speech API using
    small Adobe Flex app to access                                                                                                    In our Basic example, we
                                                                                    a simple POST.
    the mic via Flash, capture the                                                                                                    output this to the user’s screen
    audio in one of the two                                                         In our Basic Example, we use a                    pretty printed and syntax
    accepted formats, then save                                                     small Node.js module called                       highlighted, but you could do
    that newly created audio file to                                                 “Watson.js” (NPM: “watson-js”)                    much more.
    disk on the server.                                                             to OAuth to the Speech API
                                                                                                                                      In our Speech Labs, we will look
                                                                                    and then POST the audio file.
    In our Speech Labs, we will look                                                                                                  at other ways to use this data,
    at the methods by which you                                                     In our Speech Labs, we will do                    like searching for businesses
    can capture and stream audio                                                    this on iOS, Android, and Web.                    on Foursquare.
    directly to the Speech API.




2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                          AT&T Developer Program
Watson.js
    Node.js API Wrapper for the AT&T
    Speech API

     GitHub: http://github.com/mowens/watson-js/
     NPM: https://npmjs.org/package/watson-js




5
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Using Watson.js
    1. Require API Wrapper
          var WatsonClient = require(‘watson-js’);

    2. Set API Client Options
          var options = {
              client_id: ATT_API_CLIENT_ID,
              client_secret: ATT_API_CLIENT_SECRET,
              access_token: ACCESS_TOKEN,
              scope: "SPEECH",
              context: "Generic",
              access_token_url: "https://api.att.com/oauth/token",
              api_domain: "api.att.com"
           };

    3. Instantiate New API Client
          var Watson = new WatsonClient.Watson(options);

2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
The Methods of Watson.js
    Watson.getAccessToken(callback)
    Method for requesting a new OAuth Access Token using
    the Client Credentials grant type and passes the returned
    Access Token to the passed callback function.


    Watson.speechToText(speechFile, accessToken, callback)
    Method for piping a speech file (passed as an absolute file
    location) to the AT&T Speech API using the passed access
    token. The API Response’s JSON is returned to the passed
    callback function as parsed JSON.



2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
AT&T SPEECH API
    EXAMPLE APP CODE
    WALKTHROUGH
    Using the AT&T Speech API to convert
    generic audio to text in a web browser.
    example-basic in the examples repo




6
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Frameworks &
    Requirements:
    Server-side:
    • Node.js:                                  JavaScript platform for building fast, scalable network apps
    • FS:                                       Node.js File System module
    • Express:                                  Minimal web application framework for Node.js
    • Optimist:                                 Lightweight option parsing module for Node.js
    • HBS:                                      Express View Engine wrapper for Handlebars
    • Watson.js:                                Simple API Wrapper for AT&T Speech API

    Client-side:
    • jQuery:                                   The gold standard of client-side JavaScript libraries
    • swfobject:                                JavaScript to make embedding Flash objects easier
    • Bootstrap:                                Twitter’s CSS framework for quickly developing web apps


2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Capture Audio Input
    recorder.swf:
            Adobe Flex app that accesses the user’s microphone and emits events to JS
    recorder.js:
            JavaScript interface to receive events, update UI, and POST file to Node.js
    Node.js upload script:
            function cp(source, destination, callback) {
              fs.readFile(source, function(err, buf) {
                 fs.writeFile(destination, buf, callback);
              });
            }
            app.post('/upload', function(req, res) {
              cp(req.files.upload_file.filename.path, __dirname +
              req.files.upload_file.filename.name, function(err) {
                 res.send({ saved: 'saved' });
                 return;
              });
            });

2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
POST Audio to AT&T
    AJAX Request via POST from client side to Node.js
    // Receive an AJAX POST from client-side JavaScript
    app.post('/speechToText', function(req, res) {

      // Pass the audio file and access token to AT&T Speech API
      Watson.speechToText(__dirname + '/public/audio/audio.wav',
      this.access_token, function(err, reply) {

           // Pass any errors associated with API call to client-side JS
           if(err) { res.send({ error: err }); return; }

           // Return the parsed JSON to client-side JavaScript
           res.send(reply);
           return;

      });

    });


2
      ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Use Speech API Response
    Example API Response, returned                                                                                                Response-
                                                                                                                                               What-The-Response-Parameter-Means
    from call using Content-Type of                                                                                               Parameter
    ‘application/json’:                                                                                                         Recognition    Body"object"for"the"AT&T"Speech"API"Response
                                                                                                                                 ResponseId    Unique"IdenGfier"for"a"specific"API"call
                                                                                                                                               Array"of"hypothesis"objects"(possible"
    {                                                                                                                                  NBest
                                                                                                                                               transcripGons"of"audio"data).
    "Recognition": {
                                                                                                                                               PlainKtext,"cleaned"up"representaGon"of"the"
      "ResponseId": "74a964bf2fe",                                                                                               ResultText    Hypothesis."This"should"be"used"when"displaying"
      "NBest": [ {                                                                                                                             the"text"to"users."
        "WordScores": [1, 0.75, 1, 0.75],                                                                                                      Confidence"score"for"the"overall"Hypothesis."
        "Confidence": 0.75,                                                                                                      Confidence    Scored"on"a"scale"from"0"(not"confident)"to"1.0"
                                                                                                                                               (very"confident)
        "Grade": "accept",
                                                                                                                                               Recommended"acGon"to"take"with"the"current"
        "ResultText": "This is a test.",                                                                                               Grade
                                                                                                                                               Hypothesis:"accept,"reject,"or"confirm
        "Words": [“This”, “is”, “a”,                                                                                                           Array"of"the"individual"words."Confidence"scores"
    “test.”],                                                                                                                          Words   for"each"word"are"available"in"the"WordScores"
        "LanguageId": "en-us",                                                                                                                 array."
        "Hypothesis": "This is a test."                                                                                                        Array"of"individual"confidence"scores"for"each"
                                                                                                                                 WordScores    word"in"the"ResultText"parameter."Corresponds"
        } ]                                                                                                                                    to"Words"array.
      }                                                                                                                                        RepresentaGon"of"the"response"language."
    }                                                                                                                            LanguageId    Supports"English"&"Spanish"in"Generic;"EnglishK
                                                                                                                                               only"in"other"contexts.
                                                                                                                                               The"raw"transcripGon"of"the"audio"that"was"
                                                                                                                                 Hypothesis
                                                                                                                                               interpreted.


2
        ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                                         AT&T Developer Program
Up Next:




                                     Michael Fitzpatrick

2
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Up Next:




                                                          Jason Goecke
                                                           Adam Kalsey
2
     ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
ADVANCED
    EXAMPLES
    What can you do with Speech-to-text?
     You could…
     • Make your mobile or web application accessible with voice commands
     • Post tweets using voice commands in a simple Twitter app
     • Add on-the-fly transcripts while recording in a podcasting app
     • Add captioning to videos hosted on your website automatically
     • Create real-time closed captions of a conference speaker’s presentation
     • Search for nearby places to check in at on Foursquare




7
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Speech Labs
    We’re now going to break out into three clusters, each focusing on a
    different technology stack. Work independently or with a partner!

           Web (Flex + Node.js)                                                                  iOS (Objective-C)                           Android (Java)

    In the Web Speech Lab, Michael                                                 In the iOS Speech Lab, Brant                       In the Android Speech Lab, Jay
    will be on hand to help get your                                               will help you try out the AT&T                     will help you try out the AT&T
    Node.js app working with the                                                   Speech API on iOS and go into                      Speech API on Android and go
    AT&T Speech API. Code up your                                                  more depth about the AT&T                          into more depth about the
    own Speech API app from                                                        Speech SDK for iOS.                                AT&T Speech SDK for Android.
    scratch, or you can start from a                                               The mobile SDK allows you to                       The mobile SDK allows you to
    boilerplate app that uses                                                      quickly capture and stream                         quickly capture and stream
    Foursquare to search for                                                       audio from your iPhone or iPad                     audio from your Android
    locations and allow you to                                                     app to the AT&T Speech API.                        phone or tablet app to the
    check-in from your web                                                                                                            AT&T Speech API.
    browser!




2
       ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                          AT&T Developer Program
September 25, 2012




THANKS! ANY QUESTIONS?
                      Michael Owens (@mko on Twitter, mowens on Github)
                      Jay Lieske ( jay.lieske@att.com, jayatyp on Github)




                                                                                                                                  AT&T Developer Program
   ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.

More Related Content

PDF
OAuth 2.0 Updates #technight in Osaka
PDF
Incorporating OAuth: How to integrate OAuth into your mobile app
PDF
OpenID Connect via WebIntents
PDF
OAuth 2.0 & OpenID Connect @ OpenSource Conference 2011 Tokyo #osc11tk
PDF
OAuth 1.0
PPSX
SMS Passcode - Vcw Sales Presentation
PDF
Nordic APIs - Building a Secure API
PDF
AT&T API Platform
OAuth 2.0 Updates #technight in Osaka
Incorporating OAuth: How to integrate OAuth into your mobile app
OpenID Connect via WebIntents
OAuth 2.0 & OpenID Connect @ OpenSource Conference 2011 Tokyo #osc11tk
OAuth 1.0
SMS Passcode - Vcw Sales Presentation
Nordic APIs - Building a Secure API
AT&T API Platform

Similar to AT&T 2012 DevLab Speech API Deep Dive (20)

PPTX
Multi-Network Location & SMS APIs
PDF
AT&T Enhanced WebRTC API Overview
PDF
"Reinventing the Dialplan" slides from Twilio's Astricon 2009 talk
PDF
The Nexmo Voice API - AAT 2016
PPTX
Codestrong 2012 breakout session at&t api platform and trends
PDF
Mobicents Summit 2012 - Jonas Borjesson - Introduction to Twilio
PPT
Enterprise Global Messaging
PDF
Open Source Telephony Disruptive Solutions
PDF
Tring Me Overview Dec08
PDF
Infiltrating Telecoms Using Ruby
PDF
Twilio Voice Applications with Amazon AWS S3 and EC2
PDF
Customized IVR Implementation Using Voicexml on SIP (Voip) Communication Plat...
KEY
Teleku eComm Slides
PDF
Building A Great API - Evan Cooke, Cloudstock, December 2010
PDF
Exploring Mobile Apps Categories and Successful Mobile VAS and Multimedia App...
KEY
Build travel apps the easy way
PDF
Telephony Through Ruby Colored Lenses
PDF
Contextual Voice/Communications as an App or App Feature (on Android)
PDF
VoiceCon: Developing Voice Apps Using Mashups and SOA
PPTX
Introduction to IP telephony & VoIP
Multi-Network Location & SMS APIs
AT&T Enhanced WebRTC API Overview
"Reinventing the Dialplan" slides from Twilio's Astricon 2009 talk
The Nexmo Voice API - AAT 2016
Codestrong 2012 breakout session at&t api platform and trends
Mobicents Summit 2012 - Jonas Borjesson - Introduction to Twilio
Enterprise Global Messaging
Open Source Telephony Disruptive Solutions
Tring Me Overview Dec08
Infiltrating Telecoms Using Ruby
Twilio Voice Applications with Amazon AWS S3 and EC2
Customized IVR Implementation Using Voicexml on SIP (Voip) Communication Plat...
Teleku eComm Slides
Building A Great API - Evan Cooke, Cloudstock, December 2010
Exploring Mobile Apps Categories and Successful Mobile VAS and Multimedia App...
Build travel apps the easy way
Telephony Through Ruby Colored Lenses
Contextual Voice/Communications as an App or App Feature (on Android)
VoiceCon: Developing Voice Apps Using Mashups and SOA
Introduction to IP telephony & VoIP
Ad

AT&T 2012 DevLab Speech API Deep Dive

  • 2. September 25, 2012 AT&T SPEECH API DEEP DIVE Michael Owens (@mko on Twitter, mowens on Github) Jay Lieske ( jay.lieske@att.com, jayatyp on Github) AT&T Developer Program ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
  • 3. WHAT IS THE AT&T SPEECH API? 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 4. How the AT&T Speech API Works 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 5. Powered by AT&T WATSON℠ • Developed 20+ years • Optimized for different usage scenarios: • Web Search • Business Search • Question & Answer • Voicemail-to-Text • Short Message (SMS) • TV Search/Remote (U-Verse) • Generic Speech-to-Text 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 6. Simple Speech-to-Text • One REST endpoint • Accepts audio in WAV or AMR • Structured JSON response • Text spoken by user • Metrics to evaluate recognition quality • AT&T Native SDKs for Android and iOS handle audio capture and streaming 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 7. Apps in the Wild AT&T-Translator Speak4it U4Verse-Easy-Remote 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 8. GETTING STARTED WITH THE AT&T SPEECH API 3 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 9. Sign Up for API Access • j.mp/ATTDevSignUp • Free API Access for DevLab Attendees • Detailed Instructions in your Attendee Packet • Sign up with code “APILAB12” • AT&T Staff is on hand to answer questions and help get you set up 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 10. Before You Code • Get your API Keys from Developer portal: • Client ID (“API Key” on the AT&T Developer Portal) • Client Secret (“Secret Key” on the AT&T Developer Portal) • OAuth 2.0 client_credentials grant type • OAuth 2.0 access_token • Audio File Types: • AMR: narrowband, 12.2 kbits/s, 8 kHz sampling • WAV: 16 bit PCM WAV, single channel, 8 kHz sampling • Audio File Length: • Voicemail: 4 minutes or less • Other: 1 minute or less 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 11. Step 1: Connect via OAuth Request Method: POST Request URL: https://api.att.com/oauth/token Request Headers: Content-Type: application/x-www-form- urlencoded Request Body: client_id=ATT_API_CLIENT_ID &client_secret=ATT_API_CLIENT_SECRET &grant_type=client_credentials &scope=SPEECH Response Body: { "access_token": "xxyz123" } 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 12. Step 2: POST Audio to AT&T (Non-Streaming HTTP Request) Request Method: POST Request URL: https://api.att.com/rest/1/SpeechToText Request Headers: Accept: application/json Authorization: Bearer xxyz123 Content-Type: audio/wav Content-Length: 1534 X-SpeechContext: BusinessSearch Request Body: AUDIO_BINARY_DATA Note: The Audio Binary Data goes directly in POST Body, not a MIME Attachment. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 13. Step 2: POST Audio to AT&T (Streaming HTTP Request) Request Method: POST Request URL: https://api.att.com/rest/1/SpeechToText Request Headers: Accept: application/json Authorization: Bearer xxyz123 Content-Type: audio/amr Transfer-Encoding: chunked X-SpeechContext: QuestionAndAnswer Request Body: 200 Note: Numbers are the AUDIO_BINARY_DATA_CHUNK recommended chunk size 200 in hexadecimal format. AUDIO_BINARY_DATA_CHUNK 0 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 14. AT&T SPEECH API EXAMPLE APPLICATION Download the Source: https://github.com/attdevsupport/2012DevLabExamples 4 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 15. Transcription in Three Steps 1. Capture Audio Input 2. POST Audio to AT&T 3. Use AT&T API Response Capturing audio input differs Once the audio input has been The AT&T API sends back a very from platform to platform. captured, we send the easy to parse JSON object with compatible audio file from our the interpreted text. In our Basic Example, we use a server to the Speech API using small Adobe Flex app to access In our Basic example, we a simple POST. the mic via Flash, capture the output this to the user’s screen audio in one of the two In our Basic Example, we use a pretty printed and syntax accepted formats, then save small Node.js module called highlighted, but you could do that newly created audio file to “Watson.js” (NPM: “watson-js”) much more. disk on the server. to OAuth to the Speech API In our Speech Labs, we will look and then POST the audio file. In our Speech Labs, we will look at other ways to use this data, at the methods by which you In our Speech Labs, we will do like searching for businesses can capture and stream audio this on iOS, Android, and Web. on Foursquare. directly to the Speech API. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 16. Watson.js Node.js API Wrapper for the AT&T Speech API GitHub: http://github.com/mowens/watson-js/ NPM: https://npmjs.org/package/watson-js 5 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 17. Using Watson.js 1. Require API Wrapper var WatsonClient = require(‘watson-js’); 2. Set API Client Options var options = { client_id: ATT_API_CLIENT_ID, client_secret: ATT_API_CLIENT_SECRET, access_token: ACCESS_TOKEN, scope: "SPEECH", context: "Generic", access_token_url: "https://api.att.com/oauth/token", api_domain: "api.att.com" }; 3. Instantiate New API Client var Watson = new WatsonClient.Watson(options); 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 18. The Methods of Watson.js Watson.getAccessToken(callback) Method for requesting a new OAuth Access Token using the Client Credentials grant type and passes the returned Access Token to the passed callback function. Watson.speechToText(speechFile, accessToken, callback) Method for piping a speech file (passed as an absolute file location) to the AT&T Speech API using the passed access token. The API Response’s JSON is returned to the passed callback function as parsed JSON. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 19. AT&T SPEECH API EXAMPLE APP CODE WALKTHROUGH Using the AT&T Speech API to convert generic audio to text in a web browser. example-basic in the examples repo 6 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 20. Frameworks & Requirements: Server-side: • Node.js: JavaScript platform for building fast, scalable network apps • FS: Node.js File System module • Express: Minimal web application framework for Node.js • Optimist: Lightweight option parsing module for Node.js • HBS: Express View Engine wrapper for Handlebars • Watson.js: Simple API Wrapper for AT&T Speech API Client-side: • jQuery: The gold standard of client-side JavaScript libraries • swfobject: JavaScript to make embedding Flash objects easier • Bootstrap: Twitter’s CSS framework for quickly developing web apps 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 21. Capture Audio Input recorder.swf: Adobe Flex app that accesses the user’s microphone and emits events to JS recorder.js: JavaScript interface to receive events, update UI, and POST file to Node.js Node.js upload script: function cp(source, destination, callback) { fs.readFile(source, function(err, buf) { fs.writeFile(destination, buf, callback); }); } app.post('/upload', function(req, res) { cp(req.files.upload_file.filename.path, __dirname + req.files.upload_file.filename.name, function(err) { res.send({ saved: 'saved' }); return; }); }); 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 22. POST Audio to AT&T AJAX Request via POST from client side to Node.js // Receive an AJAX POST from client-side JavaScript app.post('/speechToText', function(req, res) { // Pass the audio file and access token to AT&T Speech API Watson.speechToText(__dirname + '/public/audio/audio.wav', this.access_token, function(err, reply) { // Pass any errors associated with API call to client-side JS if(err) { res.send({ error: err }); return; } // Return the parsed JSON to client-side JavaScript res.send(reply); return; }); }); 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 23. Use Speech API Response Example API Response, returned Response- What-The-Response-Parameter-Means from call using Content-Type of Parameter ‘application/json’: Recognition Body"object"for"the"AT&T"Speech"API"Response ResponseId Unique"IdenGfier"for"a"specific"API"call Array"of"hypothesis"objects"(possible" { NBest transcripGons"of"audio"data). "Recognition": { PlainKtext,"cleaned"up"representaGon"of"the" "ResponseId": "74a964bf2fe", ResultText Hypothesis."This"should"be"used"when"displaying" "NBest": [ { the"text"to"users." "WordScores": [1, 0.75, 1, 0.75], Confidence"score"for"the"overall"Hypothesis." "Confidence": 0.75, Confidence Scored"on"a"scale"from"0"(not"confident)"to"1.0" (very"confident) "Grade": "accept", Recommended"acGon"to"take"with"the"current" "ResultText": "This is a test.", Grade Hypothesis:"accept,"reject,"or"confirm "Words": [“This”, “is”, “a”, Array"of"the"individual"words."Confidence"scores" “test.”], Words for"each"word"are"available"in"the"WordScores" "LanguageId": "en-us", array." "Hypothesis": "This is a test." Array"of"individual"confidence"scores"for"each" WordScores word"in"the"ResultText"parameter."Corresponds" } ] to"Words"array. } RepresentaGon"of"the"response"language." } LanguageId Supports"English"&"Spanish"in"Generic;"EnglishK only"in"other"contexts. The"raw"transcripGon"of"the"audio"that"was" Hypothesis interpreted. 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 24. Up Next: Michael Fitzpatrick 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 25. Up Next: Jason Goecke Adam Kalsey 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 26. ADVANCED EXAMPLES What can you do with Speech-to-text? You could… • Make your mobile or web application accessible with voice commands • Post tweets using voice commands in a simple Twitter app • Add on-the-fly transcripts while recording in a podcasting app • Add captioning to videos hosted on your website automatically • Create real-time closed captions of a conference speaker’s presentation • Search for nearby places to check in at on Foursquare 7 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 27. Speech Labs We’re now going to break out into three clusters, each focusing on a different technology stack. Work independently or with a partner! Web (Flex + Node.js) iOS (Objective-C) Android (Java) In the Web Speech Lab, Michael In the iOS Speech Lab, Brant In the Android Speech Lab, Jay will be on hand to help get your will help you try out the AT&T will help you try out the AT&T Node.js app working with the Speech API on iOS and go into Speech API on Android and go AT&T Speech API. Code up your more depth about the AT&T into more depth about the own Speech API app from Speech SDK for iOS. AT&T Speech SDK for Android. scratch, or you can start from a The mobile SDK allows you to The mobile SDK allows you to boilerplate app that uses quickly capture and stream quickly capture and stream Foursquare to search for audio from your iPhone or iPad audio from your Android locations and allow you to app to the AT&T Speech API. phone or tablet app to the check-in from your web AT&T Speech API. browser! 2 ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 28. September 25, 2012 THANKS! ANY QUESTIONS? Michael Owens (@mko on Twitter, mowens on Github) Jay Lieske ( jay.lieske@att.com, jayatyp on Github) AT&T Developer Program ©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.