Notebooks
M
Meta Llama
Pdf To Podcast Using Llama On Together

Pdf To Podcast Using Llama On Together

llamaAIvllmmachine-learning3p-integrationstogetheraillama2LLMllama-cookbookPythonfinetuningpytorchlangchain

Open In Colab

A Quick Implementation of PDF to Podcast Using Llama 3.1 on Together.ai

Introduction

In this notebook we will see how to easily create a podcast from PDF using Llama 3.1 70b (or 8b) hosted on Together.ai.

The quickest way to try the whole notebook is to open the Colab link above, then select Runtime - Run all.

Inspired by Notebook LM's podcast generation feature and a recent open source implementation of Open Notebook LM. In this cookbook we will implement a walkthrough of how you can build a PDF to podcast pipeline.

Given any PDF we will generate a conversation between a host and a guest discussing and explaining the contents of the PDF.

In doing so we will learn the following:

  1. How we can use JSON mode and structured generation with open models like Llama 3 70b to extract a script for the Podcast given text from the PDF.
  2. How we can use TTS models to bring this script to life as a conversation.
[ ]
[ ]

You can easily get free trial API keys at Together.ai and cartesia.ai. After that, replace the keys below.

[ ]

Define Dialogue Schema with Pydantic

We need a way of telling the LLM what the structure of the podcast script between the guest and host will look like. We will do this using pydantic models.

Below we define the required classes.

  • The overall conversation consists of lines said by either the host or the guest. The DialogueItem class specifies the structure of these lines.
  • The full script is a combination of multiple lines performed by the speakers, here we also include a scratchpad field to allow the LLM to ideate and brainstorm the overall flow of the script prior to actually generating the lines. The Dialogue class specifies this.
[ ]
[ ]

Call Llama 3.1 to Generate Podcast Script

Below we call Llama-3.1-70B to generate a script for our podcast. We will also be able to read it's scratchpad and see how it structured the overall conversation. We can also call Llama-3.1-8B, but the output may not be as good as calling 70B - e.g. using 70B with the system prompt above, more natural output with occasional occasional verbal fillers such as Uhh, Hmm, Ah, Well will be generated.

[ ]
[ ]

Load in PDF of Choice

Here we will load in an academic paper that proposes the use of many open source language models in a collaborative manner together to outperform proprietary models that are much larger!

[ ]
[ ]
[ ]

Generate Script

Below we generate the script and print out the lines.

[ ]
[ ]

Generate Podcast Using TTS

Below we read through the script and parse choose the TTS voice depending on the speaker. We define a speaker and guest voice id.

We can loop through the lines in the script and generate them by a call to the TTS model with specific voice and lines configurations. The lines all appended to the same buffer and once the script finishes we write this out to a wav file, ready to be played.

[ ]
[ ]
[ ]