How to correctly specify SSML in an Alexa Skill lambda function?

Per the alexa-sdk source code for response.js on GitHub, the speechOutput object in your code is expected to be a string. Response.js is responsible for building the response object you're trying to build in your code:

this.handler.response = buildSpeechletResponse({
    sessionAttributes: this.attributes,
    output: getSSMLResponse(speechOutput),
    shouldEndSession: true
});

Digging deeper, buildSpeechletResponse() invokes createSpeechObject(), which is directly responsible for creating the outputSpeech object in the Alexa Skills Kit response.

So for simple responses with no advanced SSML functionality, just send a string as that first parameter on :tell and let alexa-sdk handle it from there.


For advanced ssml functionality, like pauses, give the ssml-builder npm package a look. It allows you to wrap your response content in SSML without having to implement or hardcode an SSML parser yourself.

Example usage:

var speech = new Speech();

speech.say('This is a test response & works great!');
speech.pause('100ms');
speech.say('How can I help you?');    
var speechOutput = speech.ssml(true);        
this.emit(':ask', speechOutput , speechOutput); 

This example emits an ask response where both the speech output and the reprompt speech are set to the same value. SSML Builder will correctly parse the ampersand (which is an invalid character in SSML) and inject a pause 100ms pause in-between the two say statements.

Example response:

Alexa Skills Kit will emit the following response object for the code above:

{
  "outputSpeech": {
    "type": "SSML",
    "ssml": "<speak> This is a test response and works great! <break time='100ms'/> How can I help you? </speak>"
  },
  "shouldEndSession": false,
  "reprompt": {
    "outputSpeech": {
      "type": "SSML",
      "ssml": "<speak> This is a test response and works great! <break time='100ms'/> How can I help you? </speak>"
    }
  }
}

It is an old question but I recently had a similar problem and wanted to contribute with an answer which doesn't need extra dependencies.

As mentioned, speechOutput suppose to be a string so the reason alexa says "object object" is because instead it is a json.

Trying your handler as follows

'Speaketh': function () {
    var speechOutput = 'This <break time="0.3s" /> should work';

    this.emit(':tellWithCard', speechOutput, SKILL_NAME, "some text here")
}

returns this response

{ 
  ...
  "response": {
    "outputSpeech": {
    "ssml": "<speak> This <break time=\"0.3s\" /> should work </speak>",
    "type": "SSML"
  },
  ...
}