Import gTTS library and 'os' module in order to play the converted audio. From gtts import gTTS import os. Creating a text that we want to convert into audio. Text = 'Global warming is the long-term rise in the average temperature of the Earth's climate system' gTTS supports multiple languages. Please refer to the documentation here. The official home of the Python Programming Language. While Javascript is not essential for this website, your interaction with the content will be limited.
The gtts.tokenizer
module powers the default pre-processing and tokenizing features of gTTS
and provides tools to easily expand them. gtts.tts.gTTS
takes two arguments pre_processor_funcs
(list of functions) and tokenizer_func
(function). See: Pre-processing, Tokenizing.
gtts.tokenizer module reference (
gtts.tokenizer
)
Function that takes text and returns text. Its goal is to modify text (for example correcting pronounciation), and/or to prepare text for proper tokenization (for example enuring spacing after certain characters).
Function that takes text and returns it split into a list of tokens (strings).In the gTTS
context, its goal is to cut the text into smaller segments that do not exceed the maximum character size allowed for each TTS API request, while making the speech sound natural and continuous.It does so by splitting text where speech would naturaly pause (for example on '.') while handling where it should not (for example on '10.5' or 'U.S.A.'). Such rules are called tokenizer cases, which it takes a list of.
Function that defines one of the specific cases used by gtts.tokenizer.core.Tokenizer
. More specefically, it returns a regex
object that describes what to look for for a particular case. gtts.tokenizer.core.Tokenizer
then creates its main regex pattern by joining all tokenizer cases with '|'.
You can pass a list of any function to gtts.tts.gTTS
's pre_processor_funcs
attribute to act as pre-processor (as long as it takes a string and returns a string).
By default, gtts.tts.gTTS
takes a list of the following pre-processors, applied in order:
gtts.tokenizer.pre_processors.
abbreviations
(text)[source]¶Remove periods after an abbreviation from a list of knownabbrevations that can be spoken the same without that period. Thisprevents having to handle tokenization of that period.
Note
Could potentially remove the ending period of a sentence.
Note
Abbreviations that Google Translate can't pronounce without(or even with) a period should be added as a word substitution with aPreProcessorSub
pre-processor. Ex.: ‘Esq.', ‘Esquire'.
gtts.tokenizer.pre_processors.
end_of_line
(text)[source]¶Re-form words cut by end-of-line hyphens.
Remove ''.
gtts.tokenizer.pre_processors.
tone_marks
(text)[source]¶Add a space after tone-modifying punctuation.
Because the tone_marks tokenizer case will split after a tone-modidfyingpunctuation mark, make sure there's whitespace after.
gtts.tokenizer.pre_processors.
word_sub
(text)[source]¶Word-for-word substitutions.
This module provides two classes to help build pre-processors:
gtts.tokenizer.core.PreProcessorRegex
(for regex-based replacing, as wouldre.sub
use)gtts.tokenizer.core.PreProcessorSub
(for word-for-word replacements).
The run(text)
method of those objects returns the processed text.
Speech corrections (word substitution)¶
The default substitutions are defined by the gtts.tokenizer.symbols.SUB_PAIRS
list. Add a custom one by appending to it:
Abbreviations¶
The default abbreviations are defined by the gtts.tokenizer.symbols.ABBREVIATIONS
list. Add a custom one to it to add a new abbreviation to remove the period from. Note: the default list already includes an extensive list of English abbreviations that Google Translate will read even without the period.
See gtts.tokenizer.pre_processors
for more examples.
Install Gtts For Python In Mac Os
You can pass any function to gtts.tts.gTTS
's tokenizer_func
attribute to act as tokenizer (as long as it takes a string and returns a list of strings).
By default, gTTS
takes the gtts.tokenizer.core.Tokenizer
's gtts.tokenizer.core.Tokenizer.run()
, initialized with default tokenizer cases:
The available tokenizer cases are as follows:
gtts.tokenizer.tokenizer_cases.
colon
()[source]¶Colon case.
Match a colon ':' only if not preceeded by a digit.Mainly to prevent a cut in the middle of time notations e.g. 10:01
gtts.tokenizer.tokenizer_cases.
legacy_all_punctuation
()[source]¶Match all punctuation.
Use as only tokenizer case to mimic gTTS 1.x tokenization.
gtts.tokenizer.tokenizer_cases.
other_punctuation
()[source]¶Match other punctuation. Poison clan 2 low life muthas zip.
Match other punctuation to split on; punctuation that naturallyinserts a break in speech.
gtts.tokenizer.tokenizer_cases.
period_comma
()[source]¶Period and comma case.
Match if not preceded by '.' and only if followed by space.Won't cut in the middle/after dotted abbreviations; won't cut numbers.
Note
Won't match if a dotted abbreviation ends a sentence.
Note
Won't match the end of a sentence if not followed by a space.
gtts.tokenizer.tokenizer_cases.
tone_marks
()[source]¶Keep tone-modifying punctuation by matching following character.
AquaSnap Pro 1.23.10 Crack break Pro Serial Key. The Aquasnap Serial key gives you the likelihood to snap windows to the edges or to the edges of the work area. Likewise Aquasnap proficient split lets you set windows to stay on top, stretch windows and the sky is the limit from there. Aquasnap key.
Assumes the tone_marks pre-processor was run for cases where there mightnot be any space after a tone-modifying punctuation mark.
A tokenizer case is a function that returns a compiled regex object to be used in a re.split()
context.
gtts.tokenizer.core.Tokenizer
takes a list of tokenizer cases and joins their pattern with '|' in one single pattern.
This module provides a class to help build tokenizer cases: gtts.tokenizer.core.RegexBuilder
. See gtts.tokenizer.core.RegexBuilder
and gtts.tokenizer.tokenizer_cases
for examples.
Even though gtts.tokenizer.core.Tokenizer
works well in this context, there are way more advanced tokenizers and tokenzing techniques. As long as you can restrict the lenght of output tokens, you can use any tokenizer you'd like, such as the ones in NLTK.
The Google Translate text-to-speech API accepts a maximum of 100 characters.
If after tokenization any of the tokens is larger than 100 characters, it will be split in two:
On the last space character that is closest to, but before the 100th character;
Between the 100th and 101st characters if there's no space.
gtts.tokenizer module reference (gtts.tokenizer
)¶
gtts.tokenizer.core.
RegexBuilder
(pattern_args, pattern_func, flags=0)[source]¶Builds regex using arguments passed into a pattern template.
Builds a regex object for which the pattern is made from an argumentpassed into a template. If more than one argument is passed (iterable),each pattern is joined by '|' (regex alternation ‘or') to create asingle pattern.
pattern_args (iteratable) – String element(s) to be each passed to
pattern_func
to create a regex pattern. Each element isre.escape
'd before being passed.pattern_func (callable) – A ‘template' function that should take astring and return a string. It should take an element of
pattern_args
and return a valid regex pattern group string.flags –
re
flag(s) to compile with the regex.
Example
To create a simple regex that matches on the characters 'a', 'b',or 'c', followed by a period:
Looking at rb.regex
we get the following compiled regex:
The above is fairly simple, but this class can help in writing morecomplex repetitive regex, making them more readable and easier tocreate by using existing data structures.
Example
To match the character following the words 'lorem', 'ipsum', 'meili'or 'koda':
Looking at rb.regex
we get the following compiled regex:
gtts.tokenizer.core.
PreProcessorRegex
(search_args, search_func, repl, flags=0)[source]¶Regex-based substitution text pre-processor.
Runs a series of regex substitutions (re.sub
) from each regex
of agtts.tokenizer.core.RegexBuilder
with an extra repl
replacement parameter.
search_args (iteratable) – String element(s) to be each passed to
search_func
to create a regex pattern. Each element isre.escape
'd before being passed.search_func (callable) – A ‘template' function that should take astring and return a string. It should take an element of
search_args
and return a valid regex search pattern string.repl (string) – The common replacement passed to the
sub
method foreachregex
. Can be a raw string (the case of a regexbackreference, for example)flags –
re
flag(s) to compile with each regex.
Example
Add '!' after the words 'lorem' or 'ipsum', while ignoring case:
In this case, the regex is a group and the replacement uses itsbackreference 1
(as a raw string). Looking at pp
we get thefollowing list of search/replacement pairs:
It can then be run on any string of text:
See gtts.tokenizer.pre_processors
for more examples.
run
(text)[source]¶Run each regex substitution on text
.
text (string) – the input text.
text after all substitutions have been sequentiallyapplied.
string
gtts.tokenizer.core.
PreProcessorSub
(sub_pairs, ignore_case=True)[source]¶Simple substitution text preprocessor.
Performs string-for-string substitution from list a find/replace pairs.It abstracts gtts.tokenizer.core.PreProcessorRegex
with a defaultsimple substitution regex.
sub_pairs (list) – A list of tuples of the style
(str>,str>)
ignore_case (bool) – Ignore case during search. Defaults to
True
.
Example
Replace all occurences of 'Mac' to 'PC' and 'Firefox' to 'Chrome':
Looking at the pp
, we get the following list ofsearch (regex)/replacement pairs:
It can then be run on any string of text:
See gtts.tokenizer.pre_processors
for more examples.
run
(text)[source]¶Run each substitution on text
.
text (string) – the input text.
text after all substitutions have been sequentiallyapplied.
string
gtts.tokenizer.core.
Tokenizer
(regex_funcs, flags=)[source]¶An extensible but simple generic rule-based tokenizer.
A generic and simple string tokenizer that takes a list of functions(called tokenizer cases) returning regex
objects and joins them by'|' (regex alternation ‘or') to create a single regex to use with thestandard regex.split()
function.
regex_funcs
is a list of any function that can return a regex
(from re.compile()
) object, such as agtts.tokenizer.core.RegexBuilder
instance (and its regex
attribute).
See the gtts.tokenizer.tokenizer_cases
module for examples.
regex_funcs (list) – List of compiled
regex
objects. Eachfunctions's pattern will be joined into a single pattern andcompiled.flags –
re
flag(s) to compile with the final regex. Defaults tore.IGNORECASE
Note
When the regex
objects obtained from regex_funcs
are joined,their individual re
flags are ignored in favour of flags
.
TypeError – When an element of regex_funcs
is not a function, or a function that does not return a compiled regex
object.
Gtts Python Install
Warning
Joined regex
patterns can easily interfere with one another inunexpected ways. It is recommanded that each tokenizer case operateon distinct or non-overlapping chracters/sets of characters(For example, a tokenizer case for the period ('.') should alsohandle not matching/cutting on decimals, instead of making thata seperate tokenizer case).
Example
A tokenizer with a two simple case (Note: these are bad cases totokenize on, this is simply a usage example):
Looking at case1().pattern
, we get:
Looking at case2().pattern
, we get:
Finally, looking at t
, we get them combined:
It can then be run on any string of text:
run
(text)[source]¶Python Gtts Languages List
Tokenize text.
text (string) – the input text to tokenize.
A list of strings (token) split according to the tokenizer cases.
list
symbols.
ABBREVIATIONS
= ['dr', 'jr', 'mr', 'mrs', 'ms', 'msgr', 'prof', 'sr', 'st']¶
symbols.
SUB_PAIRS
= [('Esq.', 'Esquire')]¶
symbols.
ALL_PUNC
= '?!?!.,¡()[]¿…‥،;:—。,、:n'¶
symbols.
TONE_MARKS
= '?!?!'¶