R&D
As an expert and innovative bespoke player, we everyday aim to push the boundaries of voice.
Search Acapela
As an expert and innovative bespoke player, we everyday aim to push the boundaries of voice.
Acapela Group is actively working on Deep Neural Networks (DNN) and we are very enthusiastic and proud to present the first achievements of our research in this fascinating field, creating new opportunities for voice interfaces.
Humanoid intelligent companions, multilingual conversations, singing speech, expressive reading and transmission of emotions, Internet of Things, biometrics and multimodal man-machine intercation are some of the domains we have been seriously involved with for over a decade, partnering with experts worldwide.
2021– FABLANG
2018– VIADUCT
2018– VOICI
2017– EMPATHIC
2017– ARCHIBALD
2014 – ChaNTeR
2013 – PATH
2013– I-Treasures
2012 – DBOX
2012 – Mardi
2012 – Content4All
2011 – DIYSE
2010– EMOSPEECH
2010– BIOSPEAK
2010 – ROMEO
2009 – GVLEX
2009 – FRANEL
2008 – HMMTTS
2007 – INDIGO
2005 – BOON Companion
2004 – DIVINES
2003 – E! 2990- MAJORCALL Majordome CRM Call Centers
2003 – STOP
2003 – NORMALANGUE
2003 – ULYCES
The recent innovations in voiceAI and the massive deployment of voice technologies have considerably increased user expectations in terms of new usages, performance and personalization. But apart from the major international languages, the linguistic offer remains limited.
The FabLang project aims to develop a web platform that will allow Acapela Group and its customers to develop new languages, dialects and voices with accents that are under-resourced and not available yet.
Acapela will use its expertise in Deep Neural Networks and voice AI, along with its large corpus of data available in up to 34 languages, to train the algorithms and move forward on new additional languages and dialects.
The objective of the project is to answer the demand through an innovative collaborative platform. Using a multi-language learning approach, Fablang will facilitate the training for the target languages. The user will benefit from other resources present on the platform. This project is financed by Public Service of Wallonia | Department of Research and Technological Development
VIADUCT (Voice Interface for Autonomous Driving based on User experienCe Techniques) is a project part of the Pôle de compétitivité MecaTech, 23rd Project Call granted by the Walloon Region.
The product resulting from VIADUCT project consists of a multimodal, adaptive and centered human-machine interface based on voice technologies for driving semi-autonomous cars (MultiModal Voice-centric HMI).
This product integrates two innovative technological bricks:
– A multimodal conversational agent based on new voice technologies optimized for vehicles Automatic Speech Recognition (ASR) and Text-To-Speech (TTS). This agent organizes the effective communication between the driver (or a passenger) and the vehicle, and is able to adapt to the driver’s profile, and especially to the elderly, taking into account the decline of their visual and auditory abilities.
– A driver monitoring system (DMS) based on the technology available and sold by AW augmented with software functions to detect the physical, psychological, physiological, cognitive state of mind of the driver or passengers to dynamically adapt the behaviour of the conversational agent.
The dynamic adaptability of the VIADUCT HMI will be applied to the situation of older drivers, but is also applicable in any other situation where the capacity of the driver would be altered (malaise, handicap …).
This project will help to finance our R&D efforts on related ASR & TTS topics and reinforce the position of Acapela in the Automotive sector. Additionally, a new collaboration with AW Europe for the exploitation of the project is already scheduled and foreseen.
This 3 years project will mobilize for Acapela 6 people (2 additional recruitments forecasted) to develop in French:
The VIADUCT project is the result of an action plan developed by AW Technical CEnter and Acapela Group to address the challenges of voice interfaces in cars. With their expertise in automotive technologies, vehicle information systems, Artificial Intelligence and Voice Technologies, AWTCE and Acapela have mobilized the best skills available in Wallonia for the achievement of this project:
‘VOICI’ is part of ‘Clean Sky 2’ (CS2) that targets European aeronautics research and innovation and make the global aviation industry ‘future proof’, that is providing safe, seamless and sustainable air mobility to meet the needs of citizens. The first call of CS2 includes 29 topics and has a total funding budget of €205m from Horizon 2020. More info here.
Within the 6th call of Clean Sky 2, the Voice Crew Interaction (VOICI) project aims to develop the technology that implements an intelligent voice crew interaction system as a “natural crew assistant” in a cockpit environment up to TRL 3.
The main goal of the project is to provide a proof-of-concept demonstrator of a natural crew assistant, which is capable of listening to all communications occurring in the cockpit, either between crew members or between crew and ATC, recognizing and interpreting speech content, interacting with the crew and fulfilling crew requests, such as to simplify crew tasks and reduce workload.
The topic leader has predefined: sound recording, voice recognition and artificial intelligence, as the three main technology components constituting the system, which should fulfil specific requirements such as Robustness against noisy environment, high recognition rate and requests interpretation. An audio evaluation environment will be developed which will allow the evaluation of the sound recording/ voice recognition systems and natural crew assistant according to evaluation scenarios provided by the topic manager.
Acapela will work on the development of a specific voice for the cockpit environment to provide clear and understandable vocal information to the crew using different technologies: CTS, TTS, DNN.
The EMPATHIC Research & Innovation project will research, innovate, explore and validate new paradigms and platforms, laying the foundation for future generations of Personalised Virtual coaches.
The project is part of the Horizon 2020 programme which is the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020). It is made of 10 partners involved in health-maintenance end-user organizations, technology developers, academic / research institutes and system integrators.
Innovative multimodal face analytics, adaptive spoken dialogue systems and natural language interfaces are part of what the project will research and innovate, in order to help dependent aging persons and their careers.
Acapela will provide a new TTS technology based on Deep Neural Networks, and adapted expressive speech which will enhance the expressive possibilities of the dialogue system and adapt it to the user’s emotions and mood to improve the believability, naturalness and adaptability of the interaction. Four languages are targeted: English, French, Spanish and Norwegian.
The project will use remote non-intrusive technologies to extract physiological markers of emotional states in real-time for online adaptive responses of the coach, and advance holistic modelling of behavioral, computational, physical and social aspects of a personalized expressive virtual coach.
It will include a demonstration and validation phase with clearly-defined realistic use cases. It will focus on evidence-based, user-validated research and integration of intelligent user and context sensing methods through voice, eye and facial analysis, intelligent heuristics (complex interaction, user intention detection, distraction estimation, system decision), visual and spoken dialogue system, and system reaction capabilities. Through measurable end-user validation, to be performed in 3 different countries (Spain, Norway and France) with 3 distinct languages and cultures (plus English for R&D), the proposed methods and solutions will ensure usefulness, reliability, flexibility and robustness.
This project aims to intensify the way we use digital audiovisual content by accelerating their availability. While keeping focused on Audiovisual sector major needs, the Archibald project foresees the Sonuma archives as an opportunity for incubating projects that meet the expectations of application fields such as voice technologies, Research and Education.
Those objectives will be achieved by combining leading edge expertise in voice technologies available in Wallonia (Acapela and Cental), the professional experience and the needs of targeted users (media, Acapela, Sonuma SA, Universities and High schools, etc.) and the audio/text and metadata content available with the 140,000 hours of audiovisual records already digitized by Sonuma SA.
The outcome will result in the delivery of technological modules and two pilot experiments. The scientific context covers several fields of application: audio, automatic language processing and indexing/classification of digital documents.
The recent development of Deep Neural Network technologies have made possible the use of these technologies in the mentioned fields.
The goals of this project are therefore to:
These technological modules are important for the industrial developments of Sonuma SA and Acapela as well as for the international positioning of Wallonia as a major digital player.
The system will sing the words of a song and the synthesizer imagined will work in two modes ‘song from text’ or ‘virtual singer’. In the first mode, the user can enter a text to be sung along with a score (times and pitches), and the machine will transform it into sound. In the second one, the ‘virtual singer’ mode, the user controls the song synthesizer in real-time via specific interfaces, just like playing an instrument.
To achieve the synthesizer, the project will combine advanced voice transformation techniques, including analysis and processing of the parameters of the vocal tract and the glottal source, with state of the art know how about unit selection for speech synthesis, rules based singing synthesis systems, and innovative gesture control interfaces. The project focuses on capturing and reproducing a variety of vocal styles (e.g. lyrical/classical, popular/song).
A prototype system for singing synthesis will be developed to be used by projects partners to offer synthesized singing voice and singing instrument products that are currently lacking, or to improve the functions of currently existing products. The project will offer musicians and performers a new artistic approach to synthesized song, new means of creation that make interactive experiences with a sung voice possible.
ANR (The French National Research Agency), LIMSI, IRCAM and DUALO.
The diversity of needs of each person with autism involves the need for flexible and individualized communication tools. PATH aims to provide individuals with autism, families and therapists custom tools to generate or enhance communication via a collaborative platform.
PATH combines technological dimension (speech synthesis – recognition – eye movement tracking – embedded technologies) with a participatory dimension (cloud computing – sharing – “custom” adaptation).
Mons University (SUSA), ULG, TRIPTYK, MULTITEL
Cultural expression is not limited to architecture, monuments or collections of artifacts. It also includes fragile intangible live expressions, which involve knowledge and skills. Such expressions include music, dance, singing, theatre, human skills and craftsmanship. These manifestations of human intelligence and creativeness constitute our Intangible Cultural Heritage (ICH).
The main objective of i-Treasures is to develop an open and extendable platform to provide access to ICH resources, enable knowledge exchange between researchers and contribute to the transmission of rare know-how from Living Human Treasures to apprentices. To this end, the project aims to go beyond the mere digitization of cultural content.
Combining conventional learning procedures and advanced services, such as Singing Voice Synthesis and sensorimotor learning through an interactive 3D environment, the i-Treasure is expected to break new ground in education and knowledge transfer of ICH.
Centre for Research and Technology Hellas, Université Pierre et Marie Curie , Centre National de la Recherche Scientifique, Université de Mons, Consiglio Nazionale delle Richerche, University College London, Turk Telekom Company, University System of Maryland, Aristotle University of Thessaloniki, University of Macedonia.
D-Box’s main goal is to develop and test an innovative architecture for conversational agents whose purpose is to support multilingual collaboration between users on a common problem in an interactive application. The interactive agent will enable type-written and/or spoken collaboration in the users’ native language by mediating communication: all user interactions will be transmitted through the D-Box multilingual agent.
Mipumi, IDIAP, KOMEI, Saarland University
We believe that the interaction must have a physical realization, anchored in the real world to be natural and effective. In order to embody interactive systems, we propose to use humanoid robots.
Robots, endowed with perceptions, but also means to act in the environment, allow the integration of a physical context in the interaction for the machine as well as for humans.
SUPELEC, LIA, LAAS
The basic concept behind this project is to allow anyone including people suffering from visual disabilities (Elderly or Blind) to access to the same information as other people.
Multitel.
The Do-it-Yourself Smart Experiences project (DiYSE) aims at enabling ordinary people to easily create, setup and control applications in their smart living environments as well as in the public Internet-of-Things space, allowing them to leverage aware services and smart objects for obtaining highly personalised, social, interactive, flowing experiences at home and in the city.
The partners are coming from France, Belgium, Spain, Greece, Turkey, Finland, Ireland
Alcatel-Lucent Bell Labs France, AnswareTech, Archos, Atos Origin, Catholic University of Leuven – Distrinet Catholic University of Leuven – CUO, ENSIIE, FeedHenry, Finwe, Forthnet, Geniem, Geosparc, Information & Image Management Systems (IMS), Institut TELECOM Sud Paris, Mobilera, Neotiq, Philips Innovative Applications, Pozitim, Rinnekoti-Säätiö, Tecnalia-European Software Institute (ESI), Tecnalia-Robotiker, Thales Communications, There Corporation, Turkcell Teknoloji , Universidad Politécnica de Madrid, University of Alcalá, University of Applied Sciences LAUREA, University of Mons, University of Oulu, University of Tampere, Videra, Vrije Universiteit Brussel – SOFT, Vrije Universiteit Brussel – SMIT, Vrije Universiteit Brussel – Starlab, VTT – Technical Research Centre of Finland, Waterford Institute of Technology, Wiktio.
Virtual worlds are a very new way for socializing. They allow users to embody an avatar evolving in a three-dimensional representation of a real or imaginary place, in which they usually can meet other users and interact with them.
In this case, such applications digitally extend users social life. E-Learning solutions can appear as simple as forms to fill-up or be developed using technologies from the Computer entertainment field. The later type of E-learning solutions is known as serious games. They aim at merging educational content in a gaming design, allowing users to actively learn and improve their skills. Virtual worlds and serious games offer a good technological answer to this challenge since they give users virtual experiences of real situations. If actual existing solutions reached a satisfying level of physical immersion, the next steps consist in providing users with a higher level of interaction both with other users and with virtual humans populating the digital environments.
Today’s applications lack of verbal and emotional interactions. Filling this gap would give the virtual experience a better realism. For instance, avatar’s lips and face animation should be coherent not only with the phrasing but also with the emotional message (anger, pity, etc.). Thus, spoken interactions (in other words, dialogs) are an important aspect to focus on in order to improve users’ experience. More precisely, synthesized speech and face animations should take into account verbal and non-verbal components (mainly emotions) to fully represent speaker intentions. Allowing users’ avatars but also virtual humans to handle emotions will definitely improve the immersiveness of the virtual worlds and serious games.
INRIA Lorraine (Parole and Talaris), Artefacto
BioSpeak partner companies will benefit from state of the art algorithms for speaker validation integrated into their products. The BioSpeak project aims to create robust and scalable tools for Interactive Voice Response (IVR) systems, able to process thousands of channels in parallel using state of the art algorithms. These tools will allow multilingual interoperability and they will be designed to work on security and telephony focused environments.
This project will develop biometric tools based on ALIZE, an open source library designed for research and experimentation of signal processing algorithms and statistics used on biometric authentication. Although very complete, ALIZE is not ready to be used in a large scale commercial application with real time and multiple audio channels processing needs.
University of Swansea,ValidSoft, Multitel, Calistel, University of Avignon
This 10 million Euros project is subsidized up to 4.9 millions. The project’s objective is to develop a humanoid robot that can act as a comprehensive assistant for persons suffering from loss of autonomy. With this target in mind, the robot has to be able to interact with most familiar objects/movements (open and close a door, grasp a glass, a bottle, a bunch of keys…). But it will also have to assist people who need to move around their home and be able to help them should they fall on the ground. Beyond its physical abilities, Romeo has to come with a very “human-friendly” interface, voice and gestures being the principal means of communication with the robot. It will have to understand what is said to him, carry out simple talks and even feel the intentions and emotions of its interlocutor in order to deduce the actions it has to realize.
ALDEBARAN, VOXLER, SpirOps, AsAnAngel, LISV, LIMSI, LAAS, CEA-LIST, Paris Telecom, INRIA, LPPA (college de france), Institut de la Vision.
GV-LEX is subsidized by the French National Agency for Research (ANR) in the scope of the 2009 project “Content and Interaction”. Members of the consortium are ALDEBARAN Robotics (holder of the project), Acapela, CNRS/LIMSI and Telecom Paris Tech. Its aim is to make the robot NAO and the Avatar Greta capable of reading texts for several minutes without boring the listener with a monotoneous computer voice. To reach this objective, we propose to bring expressiveness into the speech synthesis itself as well as to take advantage of the robot or virtual human being: they are capable of performing expressive gestures while talking.
Aldebaran Robotics , LIMSI , Telecom, Paris Tech.
Specifically, learning activities developed from reports of regional television stations WTV (West Flanders), C9 (Nord-Pas-de-Calais) and NoTV (Hainaut) and three universities: KULeuven Campus Kortrijk on the Flemish side, the University Lille III Charles de Gaulle on the French side and the Polytechnic Faculty of Mons in Wallonia.
K.U.Leuven Campus Kortrijk, Lille3 Charles De Gaulle, Faculté Polytechnique de Mons, WTV, C9, NoTélé, Televic, BLCC, VDAB, Forem, AVnet, ILT
Intelligibility and expressivity have become the keywords in speech synthesis. For this, a system (HTS) based on the statistical generation of voice parameters from Hidden Markov Models has recently shown its potential efficiency and flexibility.
Nevertheless this approach has not yet reached its maturity and is limited by the buzziness it produces. This latter inconvenience is undoubtedly due to the parametrical representation of speech inducing a lack of voice quality. The first part of this thesis is consequently devoted to the high-quality analysis of speech. In the future, applications oriented towards voice conversion and expressive speech synthesis could also be carried out.
FPMs
A key enabling technology for next-generation robots for the service, domestic and entertainment market is Human-Robot-Interaction. A robot that collaborates with humans on a daily basis – be this in care applications, in a professional or private context – requires interactive skills that go beyond keyboards, button clicks or metallic voices.
For this class of robots, human-like interactivity is a fundamental part of their functionality. INDIGO aims to develop human-robot communication technology for intelligent mobile robots that operate and serve tasks in populated environments. In doing so, the project will involve technologies from various sectors, and will attempt to introduce advances in respective areas, i.e. natural language interaction, autonomous navigation, visual perception, dialogue systems, and virtual emotions.
The project will address human-robot communication from two sides: by enabling robots to correctly perceive and understand natural human behaviour and by making them act in ways that are familiar to humans.
FORTH-ICS,Univ Edinburgh ,Uni Albert Ludwigs of Freiburg ,University of Athens, University of Geneva, NEOGPS, HANSON ROBOTICS, Fondation Hellenic World, NCSR.
The consortium’s interest in ACS is motivated by the desire to develop intelligent companions and domestic assistants that could exhibit some human-like cognitive abilities (e.g. adaptiveness to the interaction context, adaptiveness to the user) and thus gain in acceptance.
BERCHET, CEA, Wany Robotics, Eurecom, Generation 5, Thales, Philips, Sound Intelligence,University of Gröningen, University of Utrecht, CRIFA
The goal of DIVINES is to develop some new knowledge towards renewed feature extraction and modelling techniques that would have better capacities, particularly in handling speech intrinsic variabilities. First, human and machine performance and the effect of intrinsic variabilities will be compared based on a diagnostic procedure. The outcomes of this analysis will then be exploited to target feature extraction, acoustic and lexical modelling. Compatibility with techniques dealing with noise and integration within current systems are also part of the objectives.
The project is relevant to the “multimodal interfaces” objective as it concerns more accurate and adaptable recognition of spoken language. This is central to the concept of multimodal man-machine interaction where the speech understanding service is likely to remain an independent component in a modular design. Advances in this field could be decisive in realizing the vision of natural interactivity.
THE ROYAL INSTITUTION FOR THE ADVANCEMENT OF LEARNING (MCGILL UNIVERSITY), FRANCE TELECOM SA, LOQUENDO SPA, UNIVERSITE D’AVIGNON ET DU PAYS-VAUCLUSE, INSTITUT EURECOM, CARL VON OSSIETZKY UNIVERSITAET OLDENBURG, POLITECNICO DI TORINO
Integration of speech technology with communication, marketing and customer related services in a single comfortable process enabling instantaneous mobile access to crucial business information.
MULTITEL ,Software 602, GVZ, Vecsys, ENST, Knowledge S.A., University Of Patras, , Harpax, Italy
Speech dynamics and voice Quality analysis for improved speech synthesis.
It aims at improving speech synthesis technologies by exploiting speech dynamics, a field that has been unexplored till now. The aim of the project is to compute a software library to modify dynamics in concatenative speech synthesis (diphones and Non Uniform Units). For this, not only the modification of the prosody is envisaged, but also the voice quality should be adapted to the desired perceived phonation.
“5e Saison”, a French society oriented towards digital sound processing, (France),
Development of the arabic TTS system, New voice: Bruno has been recorded in this project. MixLP method : separation of signal source and vocal tract, TCTS lab, the Circuits Theory and Signal Processing laboratory of the Faculté Polytechnique de Mons (FPMs)
To achieve this objective, the project consortium brings together seven partners from industry and academic players in the field of speech technologies in four complementary angles: upstream research, speech technology vendors, sellers of voice platforms, component vendors.
This sub-project is divided into two phases: the first is more general and covers all relevant standards for speech technology, for 1 year and the second concentrating on the main standard, VoiceXML, over a period of 2 years .
SIEMENS,TELISMA, IDYLIC, ST Microelectr., LORIA, ENST Paris
The UlyCEs project aimed to develop a Telematic platform for the automotive industry, based on Win CE technology.
EZOS, TWIN DEVELOPMENT, GILLET Automobile
The project is financed by the French Ministry of Research in the context of the Technolangue programme.
This evaluation campaign is intended to expand upon the ARC-AUPELF (now AUF) campaign of 1996-1999, the only previous evaluation campaign for text-to-speech systems for the French language. The EvaSy campaign is subdivided into three components:
– evaluation of the grapheme-to-phoneme module,
– evaluation of prosody and expressivity,
– global evaluation of the quality of the synthesised speech.
ELDA (Evaluations and Language Resources distribution Agency), LIMSI, Equipe de recherche DELIC (Description Linguistique Informatisée sur Corpus), Université de Provence, CRISCO (Centre de Recherches Inter-langues sur la Signification en Contexte), ICP (Institut de la Communication Parlée), LIA (Laboratoire Informatique d’Avignon), MULTITEL ASBL
Need more information about our solutions? Let’s talk 😊!
We are here to guide you towards the right solution for your voice enabled project.