azure-ai-contentunderstanding-py
Azure AI Content Understanding SDK for Python. Use for multimodal content extraction from documents, images, audio, and video.
- risk
- unknown
- source
- community
- date added
- 2026-02-27
Azure AI Content Understanding SDK for Python
Multimodal AI service that extracts semantic content from documents, video, audio, and image files for RAG and automated workflows.
Installation
pip install azure-ai-contentunderstanding
Environment Variables
CONTENTUNDERSTANDING_ENDPOINT=https://<resource>.cognitiveservices.azure.com/
Authentication
import os from azure.ai.contentunderstanding import ContentUnderstandingClient from azure.identity import DefaultAzureCredential endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"] credential = DefaultAzureCredential() client = ContentUnderstandingClient(endpoint=endpoint, credential=credential)
Core Workflow
Content Understanding operations are asynchronous long-running operations:
- Begin Analysis — Start the analysis operation with
begin_analyze()(returns a poller) - Poll for Results — Poll until analysis completes (SDK handles this with
.result()) - Process Results — Extract structured results from
AnalyzeResult.contents
Prebuilt Analyzers
| Analyzer | Content Type | Purpose |
|---|---|---|
prebuilt-documentSearch | Documents | Extract markdown for RAG applications |
prebuilt-imageSearch | Images | Extract content from images |
prebuilt-audioSearch | Audio | Transcribe audio with timing |
prebuilt-videoSearch | Video | Extract frames, transcripts, summaries |
prebuilt-invoice | Documents | Extract invoice fields |
Analyze Document
import os from azure.ai.contentunderstanding import ContentUnderstandingClient from azure.ai.contentunderstanding.models import AnalyzeInput from azure.identity import DefaultAzureCredential endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"] client = ContentUnderstandingClient( endpoint=endpoint, credential=DefaultAzureCredential() ) # Analyze document from URL poller = client.begin_analyze( analyzer_id="prebuilt-documentSearch", inputs=[AnalyzeInput(url="https://example.com/document.pdf")] ) result = poller.result() # Access markdown content (contents is a list) content = result.contents[0] print(content.markdown)
Access Document Content Details
from azure.ai.contentunderstanding.models import MediaContentKind, DocumentContent content = result.contents[0] if content.kind == MediaContentKind.DOCUMENT: document_content: DocumentContent = content # type: ignore print(document_content.start_page_number)
Analyze Image
from azure.ai.contentunderstanding.models import AnalyzeInput poller = client.begin_analyze( analyzer_id="prebuilt-imageSearch", inputs=[AnalyzeInput(url="https://example.com/image.jpg")] ) result = poller.result() content = result.contents[0] print(content.markdown)
Analyze Video
from azure.ai.contentunderstanding.models import AnalyzeInput poller = client.begin_analyze( analyzer_id="prebuilt-videoSearch", inputs=[AnalyzeInput(url="https://example.com/video.mp4")] ) result = poller.result() # Access video content (AudioVisualContent) content = result.contents[0] # Get transcript phrases with timing for phrase in content.transcript_phrases: print(f"[{phrase.start_time} - {phrase.end_time}]: {phrase.text}") # Get key frames (for video) for frame in content.key_frames: print(f"Frame at {frame.time}: {frame.description}")
Analyze Audio
from azure.ai.contentunderstanding.models import AnalyzeInput poller = client.begin_analyze( analyzer_id="prebuilt-audioSearch", inputs=[AnalyzeInput(url="https://example.com/audio.mp3")] ) result = poller.result() # Access audio transcript content = result.contents[0] for phrase in content.transcript_phrases: print(f"[{phrase.start_time}] {phrase.text}")
Custom Analyzers
Create custom analyzers with field schemas for specialized extraction:
# Create custom analyzer analyzer = client.create_analyzer( analyzer_id="my-invoice-analyzer", analyzer={ "description": "Custom invoice analyzer", "base_analyzer_id": "prebuilt-documentSearch", "field_schema": { "fields": { "vendor_name": {"type": "string"}, "invoice_total": {"type": "number"}, "line_items": { "type": "array", "items": { "type": "object", "properties": { "description": {"type": "string"}, "amount": {"type": "number"} } } } } } } ) # Use custom analyzer from azure.ai.contentunderstanding.models import AnalyzeInput poller = client.begin_analyze( analyzer_id="my-invoice-analyzer", inputs=[AnalyzeInput(url="https://example.com/invoice.pdf")] ) result = poller.result() # Access extracted fields print(result.fields["vendor_name"]) print(result.fields["invoice_total"])
Analyzer Management
# List all analyzers analyzers = client.list_analyzers() for analyzer in analyzers: print(f"{analyzer.analyzer_id}: {analyzer.description}") # Get specific analyzer analyzer = client.get_analyzer("prebuilt-documentSearch") # Delete custom analyzer client.delete_analyzer("my-custom-analyzer")
Async Client
import asyncio import os from azure.ai.contentunderstanding.aio import ContentUnderstandingClient from azure.ai.contentunderstanding.models import AnalyzeInput from azure.identity.aio import DefaultAzureCredential async def analyze_document(): endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"] credential = DefaultAzureCredential() async with ContentUnderstandingClient( endpoint=endpoint, credential=credential ) as client: poller = await client.begin_analyze( analyzer_id="prebuilt-documentSearch", inputs=[AnalyzeInput(url="https://example.com/doc.pdf")] ) result = await poller.result() content = result.contents[0] return content.markdown asyncio.run(analyze_document())
Content Types
| Class | For | Provides |
|---|---|---|
DocumentContent | PDF, images, Office docs | Pages, tables, figures, paragraphs |
AudioVisualContent | Audio, video files | Transcript phrases, timing, key frames |
Both derive from MediaContent which provides basic info and markdown representation.
Model Imports
from azure.ai.contentunderstanding.models import ( AnalyzeInput, AnalyzeResult, MediaContentKind, DocumentContent, AudioVisualContent, )
Client Types
| Client | Purpose |
|---|---|
ContentUnderstandingClient | Sync client for all operations |
ContentUnderstandingClient (aio) | Async client for all operations |
Best Practices
- Use
begin_analyzewithAnalyzeInput— this is the correct method signature - Access results via
result.contents[0]— results are returned as a list - Use prebuilt analyzers for common scenarios (document/image/audio/video search)
- Create custom analyzers only for domain-specific field extraction
- Use async client for high-throughput scenarios with
azure.identity.aiocredentials - Handle long-running operations — video/audio analysis can take minutes
- Use URL sources when possible to avoid upload overhead
When to Use
This skill is applicable to execute the workflow or actions described in the overview.