Tips for Using Free Text to Speech (TTS) with Switchvox Phone Systems

It’s exciting to see how easy it is for our resellers and customers to customize their phone systems to meet their business needs – and then be willing to share their tips with others. In this guest blog post, Dan Ribar, CIO of 1st Guard Corporation, explains how to use a free version of Text to Speech in a Switchvox phone system. Thanks for sharing, Dan!

The natural progression for a new telephone system is to typically find it’s way to some form of dynamic IVR that looks to a database for the information.  My office has had the Digium Switchvox system for a couple years now and feel pretty good about writing basic IVRs. Now, it’s time for some cool text-to-speech (TTS).

Why do I need TTS?  In a basic IVR,  you can record messages that may never need to change, like “Press one to reach customer service.”  But, you may want to have a bit more dynamic text coming back to the end user, like “Your account is in great standing.  Your account balance is $123.45 and you have one pending claim….” We wanted that dynamic text and began looking for a way to handle it.

After a few weeks of research,  I found a couple of solutions.

1.  Buy a TTS engine that runs on your server.  (This will cost anywhere from $1,000 and up.)

2.  Use a free TTS web service.  (I couldn’t find any voices that could be easily understood so this wasn’t a good option.)

3.  Use a paid TTS web service.  (Well, cost is always an issue, but more importantly I didn’t want to rely on the performance of a web-based service to feed a dynamic IVR.)

4.  Use the TTS built into Microsoft dot net framework. (Sounds great, but it requires a physical server with a sound card. Not a good option since my office is 100% virtual server based.)

5.  Write your own.

Now option five seems a bit daunting.  I mean,  how do you write a TTS engine?  Simple answer is you don’t.  What you can do is package some free components to make it all work.  Here’s how it was implemented at my office:

Downloaded the free TTS engine called eSpeak and installed it on the IIS server.  If you want to see how it works,  just install it on your PC and try out the command line.

Then, used visual studio to write (or enhance in this case) the http listener for the TTS requests.  I wanted a REST like solution so it would integrate well into the IVR on the Switchvox.

Some code would be nice, too. This is what the asp.net listener looks like:

Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load

If Not IsPostBack Then

Dim myWords As String = Request.QueryString(“myWords”)

FileName &= Trim(Now().Year.ToString) & Trim(Now().DayOfYear.ToString) &              Trim(Now().Hour.ToString) & Trim(Now().Minute.ToString) & Trim(Now().Millisecond.ToString)

FileName &= “.wav”

Dim p As New Diagnostics.Process

‘ s = speed

‘ p = pitch

‘ a = amplitude or volume

Dim args As String = “-v en-us -s 150 -a 120  -w ” & FileName & ” “”" & myWords & “”" “

p.StartInfo.Arguments = args

p.StartInfo.FileName = “d:/data/espeak/command_line/espeak.exe”

p.StartInfo.UseShellExecute = False

p.StartInfo.CreateNoWindow = True

p.StartInfo.RedirectStandardError = True

p.Start()

Dim ttsErrors As String = p.StandardError.ReadToEnd

p.WaitForExit()

Response.Clear()

Response.ClearHeaders()

Response.ContentType = “audio/wav”

Response.AddHeader(“Content-Disposition”, “inline; filename=test.wav”)

Response.TransmitFile(FileName)

Response.End()

That’s it! Pretty easy. Right?  If your listener is called tts.aspx,  you just call it with:

tts.aspx?myWords=’Hello World’

…and he returns a wav file.

It’s simple to then integrate it into Switchvox. In your IVR, add an action type of ‘Play Sound From URL’ and add this line:

http://mysite.mydomain.com/tts.aspx?myWords=’Hello World’

That’s almost everything. All that’s left is to review the following:

Is this solution:

Pretty cool? YES

Free? YES

LAN based? YES

Works well with asp.net, Visual Studio and IIS? YES

About the author

Dan Ribar is a lifetime technologist – born and raised in Indiana. After graduating from Purdue in 1983 with a degree in Electrical Engineering, Mr. Ribar started his professional career at Dana Corporation in 1983. Working in parallel with the American automotive Industry, Mr. Ribar became a leader in CAD, CAM, CIM, Networking and security for Dana Corporation and then for multiple integration companies in the Detroit area. Mr. Ribar then moved on to Tropicana & PepsiCo, Cellular One, and for the last five years has been with 1st Guard Corporation as their CIO.

3 Responses to “Tips for Using Free Text to Speech (TTS) with Switchvox Phone Systems”

  1. roderickm

    For Switchvox fans that don’t dabble in Visual Studio programming and IIS administration, a web service may be a good fit. VoiceForge.com is a great TTS web service from Cepstral, the same folks that made the “Allison” voice and over 50 others.

  2. Jon Daley

    I am surprised to see some notable items missing here. Why don’t you mention flite, and the others like it? Free, non-web, standalone libraries that easily integrate into Asterisk?

    They’re not great, and I don’t use them for customer-facing parts, but I use them for employee-facing parts of the system, and it seems like they are worth mentioning on your list, since someone might find them useful, and much more useful than a 3rd party service, and way less work than compiling your own TTS engine.

    As for Roderick’s statement about Cepstral and Allison – I thought Allison was a real person, or maybe Roderick means “the same folks who recorded the Allison voice”, though I didn’t think she was related to Cepstral at all.

  3. Dan Ribar

    Dan Ribar

    Good points Jon.

    We were looking for a specific integration style (SOAP vs command line vs REST vs API vs etc etc) and just landed on eSpeak. My point was only to show that’s it is very easy to get a free TTS brewing. Like everything technology — there are as many options as there are opinions and most of them work :)

    After using eSpeak for a while, we ended up adding in the AT&T Natural Voices for our production system. Not best for everyone, but a great solution for us.

    Thanks again.

Leave a Reply