Mistral
Configuration
'mistral' => [
'api_key' => env('MISTRAL_API_KEY', ''),
'url' => env('MISTRAL_URL', 'https://api.mistral.ai/v1'),
],
Provider-specific options
Audio Processing
Mistral provides advanced speech-to-text capabilities through their Voxtral models, offering state-of-the-art transcription accuracy with native multilingual support and audio understanding features.
Speech-to-Text
Mistral's Voxtral models deliver exceptional speech recognition performance, outperforming industry standards like Whisper large-v3 across multiple languages and acoustic environments.
Basic STT Usage
use Prism\Prism\ValueObjects\Media\Audio;
$audioFile = Audio::fromPath('/path/to/recording.mp3');
$response = Prism::audio()
->using('mistral', 'voxtral-mini-2507')
->withInput($audioFile)
->asText();
echo "Transcription: " . $response->text;
Model Selection Guide
Choose the right Voxtral model for your use case:
// For production transcription with highest accuracy
$response = Prism::audio()
->using('mistral', 'voxtral-small-latest')
->withInput($audioFile)
->asText();
// For efficient transcription and edge deployment
$response = Prism::audio()
->using('mistral', 'voxtral-mini-latest')
->withInput($audioFile)
->asText();
// For optimized transcription-only service
$response = Prism::audio()
->using('mistral', 'voxtral-mini-2507')
->withInput($audioFile)
->asText();
Language Detection and Specification
Voxtral automatically detects languages or you can specify them for better accuracy:
$response = Prism::audio()
->using('mistral', 'voxtral-mini-2507')
->withInput($audioFile)
->withProviderOptions([
'language' => 'en', // ISO-639-1 code (optional)
'temperature' => 0.0, // Lower for more deterministic results
])
->asText();
// Multilingual support - single model handles multiple languages
$response = Prism::audio()
->using('mistral', 'voxtral-mini-2507')
->withInput($multilingualAudioFile)
->withProviderOptions([
// Auto-detection works well for mixed-language content
'temperature' => 0.1,
])
->asText();
Timestamps and Segmentation
Get detailed timing information with your transcriptions:
$response = Prism::audio()
->using('mistral', 'voxtral-mini-2507')
->withInput($audioFile)
->withProviderOptions([
'timestamp_granularities' => ['segment'], // Available: word, segment
'response_format' => 'json',
])
->asText();
// Access segment information with timestamps
$segments = $response->additionalContent['segments'] ?? [];
foreach ($segments as $segment) {
echo "Text: " . $segment['text'] . "\n";
echo "Start: " . $segment['start'] . "s\n";
echo "End: " . $segment['end'] . "s\n";
}
Context and Prompts
Improve transcription accuracy with contextual information:
$response = Prism::audio()
->using('mistral', 'voxtral-mini-2507')
->withInput($audioFile)
->withProviderOptions([
'prompt' => 'This is a medical consultation discussing patient symptoms and treatment options.',
'language' => 'en',
'temperature' => 0.0, // Deterministic for medical content
])
->asText();
Long-form Audio Processing
Voxtral handles extended audio without chunking:
// Process up to 30 minutes of audio in a single request
$longAudioFile = Audio::fromPath('/path/to/long_meeting.wav');
$response = Prism::audio()
->using('mistral', 'voxtral-small-latest')
->withInput($longAudioFile)
->withProviderOptions([
'timestamp_granularities' => ['segment'],
'language' => 'en',
])
->asText();
echo "Full transcription: " . $response->text;
// Access usage information
if ($response->usage) {
echo "Audio duration: " . $response->usage->promptTokens . " tokens\n";
echo "Total tokens: " . $response->usage->totalTokens . "\n";
}
Creating Audio Objects
use Prism\Prism\ValueObjects\Media\Audio;
// From local file path
$audio = Audio::fromPath('/path/to/audio.mp3');
// From remote URL
$audio = Audio::fromUrl('https://example.com/recording.wav');
// From base64 encoded data
$audio = Audio::fromBase64($base64AudioData, 'audio/mpeg');
// From binary content
$audioContent = file_get_contents('/path/to/audio.wav');
$audio = Audio::fromContent($audioContent, 'audio/wav');
Documents
The text generation part of the exposed Facade only allows documents to be passed in through via URL. See the documents on how to do that.
OCR
Mistral provides an OCR endpoint which can be used to extract text from documents. This OCR endpoint can be used like this:
use Prism\Prism\Enums\Provider;
use Prism\Prism\Prism;
use Prism\Prism\Tool;
use Prism\Prism\ValueObjects\Messages\UserMessage;
use Prism\Prism\ValueObjects\Messages\SystemMessage;
use Prism\Prism\Providers\Mistral\Mistral;
use Prism\Prism\Providers\Mistral\ValueObjects\OCRResponse;
/** @var Mistral $provider */
$provider = Prism::provider(\Prism\Prism\Enums\Provider::Mistral);
/** @var OCRResponse $ocrResponse */
$ocrResponse = $provider->ocr(
'mistral-ocr-latest',
Document::fromUrl('https://prismphp.com/storage/prism-text-generation.pdf')
);
/**
* Just need the full text of all the pages combined? Use the toText() method.
*/
$text = $ocrResponse->toText();
TIP
The OCR endpoint response time can vary depending on the size of the document. We recommend doing this in the background like a queue with a longer timeout.