Research research
Sauti Project
Building speech and language resources for under-resourced African languages.
- Cohort
- Ongoing research initiative
- Cadence
- Continuous
- Location
- Pan-African
- Status
- active
The challenge
Many modern speech systems still struggle with accent diversity. They often perform well for speakers whose accents are already well represented in existing datasets, but less effectively for African speakers of English and other under-represented speech communities.
This creates real challenges in education, work, customer service, and digital communication. A learner may spend more effort trying to understand an unfamiliar accent than absorbing the content itself. A customer and a service representative may both speak English, yet still experience communication gaps because of accent differences.
There is also a major research gap. African accent data remains limited, under-annotated, and difficult to access, which slows down progress in building speech technologies that are inclusive, accurate, and useful for African users.
Our approach
The Sauti Project brings together data, tools, models, and applications to support accent-aware speech technology.
The work is organised into four connected components.
SautiDB
Spare-heading the collection, curation, and annotation of African-accented speech data. It provides a foundation for research in accent classification, speech recognition, speaker representation, and accent-aware learning technologies.
SautiClean
Supporting the preparation of speech data by helping with cleaning, organising, and improving the quality of recorded audio for research and model development.
SautiClassify
Focuses on identifying and classifying accent patterns across different African English speakers. This helps researchers and developers better understand accent variation and build more responsive speech systems.
SautiTranslate
Explores how educational and audio content can be adapted into accents that listeners are more familiar with, reducing cognitive burden and improving comprehension. The Sauti Project takes a community-driven approach to building speech resources for under-represented African languages. The work spans three parallel threads.
Get involved
We welcome researchers, linguists, native speakers, educators, developers, and community organisers who want to support African speech research.
You can contribute by donating your voice, supporting data collection, collaborating on research, or helping build tools for accent-aware speech technology.
Reach out at research@tri-ai.org.