This post is part of the series on Text-to-Speech (TTS) for eLearning written by Dr. Joel Harband and edited by me. The other posts are:
- Text-to-Speech Overview and NLP Quality,
- Digital Signal Processor and Text-to-Speech,
- Using Text-to-Speech in an eLearning Course,
- Text-to-Speech eLearning Tools - Integrated Products,
- Text-to-Speech vs Human Narration for eLearning, and
- Using Punctuation and Mark-Up Language to Increase Text-to-Speech Quality.
- Text-to-Speech Examples
In this post, we will look more closely at costs of Text-to-Speech and issues around TTS Voice Licensing and Pricing. The subject of TTS voice licensing and pricing is important because it helps e-learning practitioners understand which TTS tools they are legally allowed to use for their specific applications as well as letting them estimate the costs and pricing models of using these TTS tools.
Voice Talent Rates and Implications on TTS Licensing and Pricing
When the TTS voice manufacturers looked for a pricing model for their product, they naturally looked at the model used by real voice talents, including the rates and the types of usage.
Here are some examples of rates of voice talents suited for e-learning (voice talents for advertising can be a lot more). The rates are given in $ per hour of recorded sound.
Site | Rate per hour |
$1200 | |
$800-$1200 | |
$1800 (median) | |
$650 |
The average for this type of voice talent is about $1200/hour.
In addition, voice talents and voice producers tend to charge depending on the purpose of the recording and size of the intended audience of the recording. If the recording is going to be heard by many people and helps the customer make a lot of money, the voice talents expect to get more money than if the recording has a very limited use. The rates will be different for local, regional or national broadcasts. For example: a 30 second recording for a TV commercial broadcast all over the US may cost several times more than a five minute recording for a local documentary (voice123.com).
The TTS vendors learned two things from the voice-talent pricing model:
- The price level – Because of their perceived lower voice quality, TTS voices need to be priced much lower than their real counterparts to be an attractive alternative
- The TTS voice price need to be fixed according to the value of its use to the customer.
Based on the second criterion, the TTS vendors work with the following two general usage categories:
- Personal Use - reading books, making sound files for personal use, etc. This is a low value usage.
- Audio Distribution – a sound file that was created with the voice is distributed and played to an audience. This usage has a higher value.
Audio Distribution
Audio distribution, which is similar to broadcasting a voice talent recording, is considered to be a usage that is more valuable to the customer. Within audio distribution the following categories exist. They are listed according to increasing value to the customer:
- Internal audio distribution - a company puts an audio training presentation with the TTS sound file on an internal server – intended for company employees only
- Public audio distribution - a company puts an audio product presentation they made with the TTS sound file on a public web site – intended for a more general audience. Includes call centers.
- Selling a TTS sound file for profit - a company sells an audio training presentation they made with the TTS sound file for profit
E-learning courses created by corporations are typically in the category of internal audio distribution – the course is put on the company server to be accessed by company employees.
TTS Voice Licenses
The TTS voice vendors enforce the intended usage of a voice by a voice license, which precisely describes the restrictions on the use of the voice. Strictly speaking, the vendors sell a license to use the voice software rather than selling the voice itself.
The following licenses are used:
Personal Use License – this license covers personal use of the TTS voice by the customer and expressly prohibits audio distribution - thus cannot be used for e-learning. Examples of products that use voices with personal license are: Natural Reader, TextAloud, Read the Words, and Spoken Text.
Personal licenses are sold for a low fixed price. In general, a price less than $50 indicates a personal license.
Audio Distribution License (ADL) – This is the license that permits audio distribution and is the type of license required for e-learning in business and education. The voice vendors have fixed the rate for an ADL license to be around 1/3 – 2/3 of the price of the equivalent voice talent ($1200/hour), depending on the TTS voice quality. The basic TTS voices cost $360/hour of recorded time and for the best TTS voices the rate can go up to $720/hour of recorded time.
The TTS voices with ADL are sold in three ways:
- Fixed Price – this model allows a company to purchase any TTS voice with full audio distribution rights from a vendor without being tied to the specific authoring tools (e.g., Captivate) and its bundled voices. In talking to vendors (Neospeech), I was told that this model had been selected by some companies. Acapela also offers a model like this for unrestricted ADL. The fixed price depends on the exact usage. A typical price for internal training courses is $2500, which represents about 7 hours of recorded time.
- On-Demand - Voices can be purchased via on-demand web services or desktop products that accept text and generate sound files. They charge according to elapsed time of the generated file, at a rate of about $6/min ($360/hour).
- From a Reseller – A company purchases voices from a reseller at a reduced price. The price is lower because the reseller has bought voices in volume. This is the case for Adobe Captivate and Tuval Software’s Speech-Over Professional where those companies resell the voices bundled with their products and the voice license terms are specified with the product. At present, the purchase price for these tools includes a fixed price for the voice license that does not depend on the elapsed time of the generated file.
Course developers generally are going to use bundled products, i.e., from a reseller. In some cases, there will be use of other models.