#3. Language belongs to the people, not the platforms
What if the frontline of innovation isn’t Silicon Valley. What if it’s Sokoto, Kigali, or Accra?
Generative image of a row of Black African dolls in vibrant textiles, inspired by Zohra Opoku’s installation, Give Me Back My Black Dolls,. These embody the truth that language belongs to the people, not the platforms.
Most AI is built in labs, far from the people it affects.
But what if the frontline of innovation isn’t Silicon Valley, it’s Sokoto, Kigali, or Accra?
Most mainstream Natural Language Processing (NLP) models are trained using a pipeline that centralises computation but marginalises context.
Data is scraped, labelled, and scaled, often without the involvement of the very communities it represents. The result? Language models that encode hierarchy, not humanity.
David Adelani’s work with Masakhane and related participatory research efforts reconfigures this dynamic. Instead of treating African languages as “low resource,” it reframes them as underrepresented by design and builds the resource base by centring the people who speak them.
This is not crowdsourcing; it’s co-creation.
Participation in this model isn't transactional. It's relational.
It invites multidisciplinary collaboration, computer scientists learning from linguists, local speakers becoming language technologists, and volunteers contributing to an open research commons.
Together, they build language datasets and benchmarks in community. The result is more than data, it's a social infrastructure of trust, representation, and agency.
It exemplifies what Adelani calls a community-grounded data methodology: one that embeds cultural, contextual, and linguistic specificity from the outset, increasing not just model performance, but legitimacy.
Why does this matter? Because inclusion at the dataset level leads to more credible, culturally grounded, and usable AI systems. And because when people shape the tools from the ground up, they’re more likely to trust, adopt, and innovate with them.
And it works.
Swahili–English and Yoruba–English translation models built by Masakhane have outperformed tools like Google Translate, especially on idioms, cultural nuance, and local expressions.
Open, peer-reviewed datasets like AfroLID and AfroBench have established benchmarks for dozens of African languages that previously had no digital presence.
Pan-African research networks now connect cities like Nairobi, Lagos, Accra, and Cape Town with labs in Toronto and Montreal, forming one of the world’s most dynamic, decentralised Natural Language Processing (NLP) ecosystems.
UNESCO, BigScience, and other global bodies are turning to Masakhane for guidance on ethical dataset design, linguistic inclusion, and participatory AI governance.
So why does this matter?
Because language isn’t just a technical problem. It’s a human one. When AI is built with communities, not just for them, it becomes more than a tool. It becomes a mirror of dignity, difference, and design justice.
Africa isn’t waiting to be included in AI. It’s already shaping how inclusion is done.