Tech
|
30th October 2025, 4:16 PM

▶
Pixa AI has introduced Luna AI, a novel speech-to-speech foundational model designed to revolutionize human-AI interactions. This model bypasses the conventional speech-to-text and text-to-speech conversion steps, processing audio directly to generate speech output. This direct audio processing significantly reduces latency, allowing for more responsive and natural conversations, complete with nuances like singing, whispering, and emotional inflection.
Pixa AI's founder, Sparsh Agrawal, emphasizes that Luna AI is built with an 'emotional first' approach, aiming to make AI conversations feel more human rather than robotic. Internal evaluations indicate Luna AI surpasses leading real-time systems. In Automatic Speech Recognition (ASR), it achieved a 5.24% error rate, outperforming Deepgram Nova (8.38%) and ElevenLabs Scribe (5.81%). For Text-to-Speech Word Error Rate (TTS WER), Luna AI recorded 1.3%, better than Sesame (2.9%) and GPT-4o TTS (3.2%). Its Mean Opinion Score (MOS) for naturalness was 4.62, topping GPT-real-time's 4.15.
The company is actively pursuing B2B applications through a licensing-led business model, targeting sectors such as entertainment (collaborating with European companies), automotive (for in-car infotainment systems), and AI toys (with a US-based company). Other potential applications include mental health counseling, elderly companionship, and children's education. A pilot with a large company for customer call automation showed increased customer engagement and conversion rates.
Initially supporting English, Luna AI plans to roll out multilingual capabilities for 12 major Indian languages and additional global languages within three months. The startup, backed by investors including Nikhil Kamath, Kunal Shah, and Kunal Kapoor, is also planning team expansion and engaging with the IndiaAI Mission for GPU access.
Impact: This advancement in AI technology could significantly boost the Artificial Intelligence and Technology sectors. It opens new avenues for conversational AI applications, potentially leading to increased investment in AI startups and companies, and enhancing user experiences across various industries. The development of advanced AI models like Luna AI is crucial for India's ambition in the global AI landscape. Rating: 7/10
Heading Speech-to-Speech: A technology where an AI model takes spoken audio as input and generates spoken audio as output directly, without converting to text in between. Foundational Model: A large AI model trained on a vast amount of data that can be adapted for a wide range of downstream tasks. Latency: The delay between an action or input and the system's response. Emotional Intelligence: The ability of an AI to understand, interpret, and respond to human emotions. Automatic Speech Recognition (ASR): The technology that converts spoken language into written text. Text-to-Speech Word Error Rate (TTS WER): A metric measuring the accuracy of converting written text into spoken audio. Mean Opinion Score (MOS): A subjective measure used to evaluate the quality of speech or audio, often on a scale of 1 to 5, with higher scores indicating better quality. Licensing-led business model: A strategy where a company grants permission to others to use its intellectual property (like technology or software) for a fee. Proof of concept (POC): A small-scale test or demonstration to prove that a concept or idea is feasible. IndiaAI Mission: A government initiative in India aimed at promoting the development and adoption of Artificial Intelligence. GPU: Graphics Processing Unit, a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images.