Yurii Shunkin
Yurii Shunkin
On-Demand Webinar "From Traditional Automation to AI Agents: What fits your project best"
Contact us

AI-Powered Video Translation Platform

AI-powered PoC development that ensures automatic translation of videos with realistic lip-sync and voice cloning

ABOUT
the project

Client:

Leobit's Internal Project

Location:

Country flag

USA

Company Size:

100+ Employees

Technologies:

Angular

.NET

HeyGen API

Azure Blob Storage

Azure Cosmos DB

Azure Functions

Application Insights

Azure Key Vault

Azure AD B2C

 

This proof of concept showcases cutting-edge AI-powered video translation technology, which enables seamless multilingual content delivery. Powered by an advanced AI voice cloning and lip-sync technology, this solution generates natural-sounding translated audio that closely matches the original speaker’s voice characteristics. The intelligent lip-sync engine ensures visual authenticity by synchronizing mouth movements with the translated speech to deliver a truly immersive multilingual viewing experience.

We underwent a thorough R&D process, researching various existing tools, testing them, and eventually selecting HeyGen because it delivered the best results and offered robust API capabilities. Following this, we developed a prototype for an automated video translation process, hosting the solution on the Azure cloud.

Vitalii Datsyshyn

Solution Architect at Leobit

Landscape image of flags

Customer

Leobit’s R&D team embarked on a two-week experimental project to explore how artificial intelligence could enhance the quality and efficiency of video content dubbing. The primary objective was to create a lightweight, AI-powered proof of concept that would translate video content into multiple languages while preserving its visual and emotional authenticity.

Business Challenge

Global content distribution requires significant resources for manual translation and localization. Our PoC demonstrates how AI can eliminate language barriers by automatically translating video content while maintaining visual authenticity through advanced lip-sync technology and voice cloning capabilities. This solution addresses the growing demand for multilingual content in educational platforms, corporate training, marketing campaigns, and international communication.

Why Leobit

This initiative built upon Leobit’s prior experience with AI-powered solutions and cloud-native architectures. The PoC served both as a technical experiment and a foundation for future client solutions in the media, education, and enterprise sectors.

Project
in detail

Leobit successfully delivered a robust PoC that demonstrates the potential of AI in transforming multilingual video content.

Project in detail AI-Powered Video Translation Platform

The system supports 175 languages and dialects thanks to HeyGen’s contextual translation and voice generation capabilities. After detecting the source language, users could select one or multiple target languages through the UI. In the back end, Leobit designed a queue-based architecture using Azure Functions to handle translation requests in parallel. For each selected target language, a translation task was submitted to HeyGen, which returned an AI-dubbed video with synchronized voice and lip movements. These translated videos were stored separately in Azure Blob Storage, and their processing status was tracked in Cosmos DB.

Leobit used HeyGen’s built-in transcription features to generate subtitles and improve the readability of video content. After initial processing, the transcription returned by HeyGen was parsed and enhanced on the back end using natural language rules and punctuation correction logic. We also implemented additional processing to remove filler words, standardize formatting, and break the transcript into logically segmented subtitles. These subtitles can then be displayed alongside the video or exported in standard formats, such as SRT or VTT.

Leobit engineered the back end to accept multiple target languages in a single job submission. Once the user selects multiple languages, each is treated as a separate asynchronous translation task using Azure Durable Functions. The back end automatically handles queuing, parallel execution, and result aggregation. Azure Cosmos DB tracks the processing status of each target language individually, so that the front end can display progress updates in real-time.

Leobit built the system to accept multiple input types (e.g., MP4, MOV, AVI) and generate outputs in different resolutions and codecs. The front end includes a format checker and resolution selector, while the back end uses Azure Media Services and format conversion tools where necessary to ensure compatibility with HeyGen’s input requirements. After receiving processed outputs, the back end converts the results into the user’s preferred output format and resolution before storing them in Azure Blob Storage.

The Angular front end provided an interface for users to input and manage custom terms, along with their preferred translations, to ensure brand consistency and prevent misinterpretation of critical terms. These entries were saved in Azure Cosmos DB and dynamically injected into the translation requests as part of the metadata payload sent to HeyGen. Although HeyGen’s support for custom vocabulary is limited in its current API version, we structured the integration in a way that will allow easy adaptation as more advanced support becomes available.

languages
Integration with HeyGen API

Integration with HeyGen API

Leobit’s integration with the HeyGen API served as the backbone of the PoC’s AI-powered video translation capabilities. During the first week of development, our developer thoroughly analyzed HeyGen’s documentation and capabilities to design a modular back-end workflow using Azure Functions. The integration was structured around RESTful calls to HeyGen’s services, which enabled automatic video ingestion, language detection, translation, and AI-generated voiceovers with synchronized lip movements.

We used dependency injection within the .NET 8 back end to create loosely coupled service wrappers around the HeyGen endpoints. These service layers allowed for easier testing, monitoring, and error handling. The output from HeyGen was then passed to Azure Blob Storage for secure handling and stored metadata in Azure Cosmos DB to track processing status, logs, and language configurations.

Intelligent language detection

Intelligent language detection

To enable automatic source language identification, Leobit implemented a dedicated Azure Function that handled initial video ingestion and language detection. Once a user uploaded a video through the Angular front end, the video file was securely stored in Azure Blob Storage. The back end then triggered a HeyGen API call to initiate language recognition.

The API returned a language code (e.g., en, de, fr), which was stored in Azure Cosmos DB along with a unique file identifier. This allowed the rest of the translation workflow to dynamically adapt based on the detected source language, without requiring any manual user input.

API integration ready

API integration ready

Leobit developed the back end in accordance with RESTful principles, which allow the PoC to evolve into a fully operational microservice or SaaS module that can be embedded into larger video management or e-learning platforms.

This makes the solution API-ready, allowing external applications to trigger video uploads, initiate translations, and retrieve results programmatically. All endpoints are protected with secure authentication via Azure AD B2C, and request logs are captured through Azure Application Insights for monitoring and debugging.

Explore
The solution
prototype

Experience seamless AI-powered video translation. This proof of concept showcases cutting-edge dubbing technology, built with Angular 20 and .NET 8, and powered by Azure services and HeyGen AI to deliver intelligent, real-time video localization.

Explore demo

Technology Solution

  • HeyGen API integration enables automatic language detection, contextual translation, lip-sync generation, and voice cloning.
  • Serverless back end with .NET 8 and Azure Functions, which ensures cost efficiency and high availability without managing infrastructure.
  • RESTful endpoints make it easy to integrate with external systems, content platforms, or enterprise applications for broader adoption.
  • The user interface was developed using Angular 20, with standalone components and reactive signals, to ensure a smooth and responsive experience throughout the video upload and translation process.

Value Delivered

  • By integrating advanced lip-sync and voice cloning technologies, the solution significantly enhances the realism and emotional impact of dubbed videos.
  • Support for custom vocabulary and transcript enhancement means the system can be adapted for domain-specific content, ensuring translation accuracy and brand consistency.