AI-powered Solution for Transcribing Voice Recordings
Custom development of a PoC for a tool that uses AI models for speech-to-text translation
ABOUT the project
- Client:
- Leobit's Internal Project
- Location:
-
USA
- Company Size:
- 100+ Employees
- Industry:
-
Information Technology
- Solution:
- Custom Software
Technologies:
Leobit built an AI-powered solution for transcribing voice recordings. The app can record speeches (including the background recording option) and uses AI algorithms to deliver textual transcriptions of these recordings. Our team used Flutter to ensure a consistent user experience across iOS/macOS and Android devices.
The tool has significant potential for capturing interactions and can support both Leobit’s internal workflows and be customized to meet our customers’ specific needs. Its cross-platform design makes it easy to use, allowing to leverage its functionality even during brief or occasional interactions.
Customer
It was Leobit’s internal project planned and designed to experiment with an AI-powered solution for transcribing voice recordings that can facilitate our internal workflows, as well as be used by our customers from across different industries.
Business Challenge
Sales teams, project managers, and developers often need a reliable way to capture and transcribe customer interactions efficiently. To address this, we developed a convenient, portable app that leverages AI-powered speech-to-text technology to ensure accurate and seamless transcription of oral communication. This PoC supports Leobit’s internal teams and can also be customized to meet the specific needs of our customers.
Project
in detail
The Leobit team started the project with an idea of creating a convenient and portable solution for transcribing voice recordings. We started with a comprehensive yet fast planning phase.
Our team started by planning the solution’s architecture and AI model to support its core functionality. To ensure the tool’s fast and efficient outreach, we decided to use Flutter for cross-platform development. Correspondingly, we used the AI service from the Flutter ecosystem, namely the whisper_flutter_new kit, to ensure AI-powered voice text transcription.
Our team used the Flutter BLoC design pattern to deliver the tool fast and efficiently, with the technical side and business logic clearly separated. The app connects to a device’s local file storage to store recordings. We also integrated the solution with the FFmpeg library to ensure efficient audio processing that enhances the quality of the uploaded audio, removes background noise, as well as silent moments. We also use the whisper_flutter_new kit to let the app integrate with OpenAI’s Whisper models.
Our QA specialists applied Flutter unit testing to review the app’s logic in isolation. We also ran widget tests to see how components of the app’s UI work together, as well as golden tests to examine the solution’s visual appearance. Upon successfully completing all these QA workflows, we deployed the tool.
AI-powered text-to-speech functionality
The solution accesses OpenAI’s Whisper models to ensure its core functionality. We can change models depending on purpose and accuracy requirements. The access to AI models ensures efficient speech-to-text conversion in multiple languages, with strong recognition of diverse accents.
Convenient UI/UX design across different platforms
With Flutter’s BLoC design pattern, we managed to separate the concerns (technical side and business logic) of the application, which allowed us to enhance the speed and the efficiency of its development. The app offers a clean, minimal UI with support for dark and light modes across. It delivers consistent user experience across iOS and Android devices.
Strong potential for continuous improvement
The solution has a significant potential for continuous improvement. In particular, we can enhance the quality of translation from diverse languages by connecting the tool to more specific OpenAI models. We can also enhance the solution by expanding its capabilities with text summarization and formatting or by enhancing the efficiency of audio processing through integration with specialized AI tools. Our specialists are also ready to leverage Flutter to build a desktop version of the tool.
Technology Solutions
- Convenient and manageable cross-platform architecture built the Flutter BLoC model.
- Access OpenAI’s Whisper models and the ability to switch between them.
- Strong capabilities for improving audio quality ensured through the integration with the FFmpeg library.
- Efficient usage of the device’s features for storing recordings and running background workflows.
Value Delivered
- Strong potential for continuous improvement.
- Project that explores Flutter’s connectivity with OpenAI’s models.
- A tool that can be used across various industries, including healthcare, education and research, journalism and media.