Spring AI Speech to Text Example
🎤 Spring AI Whisper Model: Speech-to-Text Magic! 🚀
Spring AI currently plays nice with OpenAI’s and AzureOpenAI’s Whisper model for turning your multilingual speech into transcribed text (JSON or TEXT format). Wanna stay updated on the latest models? Check out OpenAI’s site! 🧙♂️✨
1️⃣ Getting Started: Let’s Transcribe Like a Pro! 🎙️
Since Spring AI is all about OpenAI models right now, we roll with OpenAiAudioTranscriptionModel
to convert audio files into text. If Spring AI gets more transcription buddies in the future, we’ll extract a common AudioTranscriptionModel
interface.
🔧 First Things First: Add This Dependency
Before diving in, make sure you add the spring-ai-openai-spring-boot-starter
dependency to your project. Trust me, it’s the secret sauce! 🏗️
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
🔥 Autoconfiguration Magic
Spring Boot’s auto-config does the heavy lifting, creating an OpenAiAudioTranscriptionModel
instance with these default settings:
# 🔑 API Key - Don’t forget this one!
spring.ai.openai.api-key=${OPENAI_API_KEY}
# 🎙️ Default Configuration
spring.ai.openai.audio.transcription.options.model=whisper-1
spring.ai.openai.audio.transcription.options.response-format=json # json, text, srt, verbose_json or vtt
spring.ai.openai.audio.transcription.options.temperature=0 # sampling temperature, between 0 and 1
spring.ai.openai.audio.transcription.options.timestamp_granularities=segment # segment or word (or both)
📜 Choose Your Format
Pick your flavor of transcription output:
- json: Structured response format (for the nerds 🤓)
- text: Plain ol’ text (classic!)
- srt: Subtitle file format 🎬
- verbose_json: More metadata than you’ll ever need
- vtt: Perfect for web video captions 🖥️
🎛️ More Configuration Options
Want to get fancy? Customize your transcription output with these:
# 📝 Guide the model’s style or continue a previous segment
spring.ai.openai.audio.transcription.options.prompt={prompt}
# 🌍 Language setting (ISO-639-1 format, e.g., "en")
spring.ai.openai.audio.transcription.options.language={language}
🏗️ Inject the Transcription Model Into Your Spring App
@RestController
class TranscriptionController {
private final OpenAiAudioTranscriptionModel transcriptionModel;
TranscriptionController(OpenAiAudioTranscriptionModel transcriptionModel) {
this.transcriptionModel = transcriptionModel;
}
// handler methods
}
2️⃣ Manual Configuration: Take Control! 🎛️
You’re a DIYer? No problem! Just need the API key:
var openAiAudioApi = new OpenAiAudioApi(System.getenv("OPENAI_API_KEY"));
var transcriptionModel = new OpenAiAudioTranscriptionModel(openAiAudioApi);
Wanna tweak things further? Use OpenAiAudioSpeechOptions
like a boss:
OpenAiAudioTranscriptionOptions options = OpenAiAudioTranscriptionOptions.builder()
.withLanguage("en")
.withPrompt("Create transcription for this audio file.")
.withTemperature(0f)
.withResponseFormat(TranscriptResponseFormat.TEXT)
.build();
var openAiAudioApi = new OpenAiAudioApi(System.getenv("OPENAI_API_KEY"));
var transcriptionModel = new OpenAiAudioTranscriptionModel(openAiAudioApi, options);
3️⃣ Speech-to-Text Transcription Example 🎤➡️📜
Time to put it all together! Use the call()
method to transcribe an audio file.
@Value("classpath:speech.mp3")
Resource audioFile;
AudioTranscriptionResponse response = transcriptionModel.call(new AudioTranscriptionPrompt(audioFile));
String text = response.getResult().getOutput();
Or integrate it into a REST API:
@RestController
public class OpenAiSpeechToTextController {
private final OpenAiAudioTranscriptionModel transcriptionModel;
@Autowired
public OpenAiSpeechToTextController(OpenAiAudioTranscriptionModel transcriptionModel) {
this.transcriptionModel = transcriptionModel;
}
@GetMapping("/transcription")
public String getSpeechToText(@Value("classpath:speech.mp3") Resource audioFile) {
return transcriptionModel.call(new AudioTranscriptionPrompt(audioFile))
.getResult()
.getOutput();
}
@GetMapping("/transcription_v2")
public String getSpeechToTextDetails(@Value("classpath:speech.mp3") Resource audioFile) {
var transcriptionOptions = OpenAiAudioTranscriptionOptions.builder()
.responseFormat(OpenAiAudioApi.TranscriptResponseFormat.TEXT)
.temperature(0f)
.build();
var transcriptionResponse = transcriptionModel
.call(new AudioTranscriptionPrompt(audioFile, transcriptionOptions));
return transcriptionResponse.getResult().getOutput();
}
}
🔊 Note: We’re using speech.mp3
for demo purposes. Replace it with your own files!
4️⃣ Summary: What We Learned Today 🎓
- Spring AI only supports OpenAI’s Whisper model for now.
OpenAiAudioTranscriptionModel
is your go-to class for transcriptions.- Just add
spring-ai-openai-spring-boot-starter
, and boom 💥—you’re good to go! - Use
application.properties
to tweak the default configurations. - Prefer manual control? Use the
OpenAiAudioTranscriptionOptions
builder.
🎉 Now go forth and transcribe like a pro! 🎙️✨
_Happy Learning! 🚀_