PDF Text-to-Speech


Authentication

For this project I made use of Google Cloud's Text-to-Speech client libraries, which provide high-level language support for programmatic authentication. I used locally stored Application Default Credentials (ADC).

I did code a tkinter GUI for this project through which the user could select a pdf from their files, then listen to the generated mp3 in the same GUI. I don't believe this solution offered any advantages over simply outputting the mp3 file to the user's local files though.


Google's Text-to-Speech Technology

The offering from Google here is very impressive.


main.py

                            
"""Synthesizes speech from the input string of text or ssml.
Make sure to be working in a virtual environment.

Note: ssml must be well-formed according to:
    https://www.w3.org/TR/speech-synthesis/
"""

from google.cloud import texttospeech
from pypdf import PdfReader

# creating a pdf reader object
reader = PdfReader('sample.pdf')

# printing number of pages in pdf file
print(len(reader.pages))

text_string = ""

for i in range(len(reader.pages)):
    page = reader.pages[i]
    text = page.extract_text()
    text_string += text

print(text_string)

# # creating a page object
# page = reader.pages[0]
#
# # extracting text from page
# text = page.extract_text()


# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(text=text_string)

# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.VoiceSelectionParams(
    language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)

# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
    input=synthesis_input, voice=voice, audio_config=audio_config
)

# The response's audio_content is binary.
with open("output.mp3", "wb") as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')
                            
                        
Snake Game image