There's a stat that should bother everyone in EdTech: fewer than 10% of language learners ever reach conversational fluency. Not "read a novel" fluency. Not "pass a C2 exam" fluency. Conversational. As in: you're standing in front of a real person and you can hold a basic exchange.
Over 100 million people use Duolingo every month. A billion are learning a new language right now. And yet almost none of them will learn to speak it. So what's going on?
The most popular language apps are built around recognition. You see a word, you tap the right translation. You hear a sentence, you pick the matching picture. You drag tiles into the correct order. It feels like learning. Your streak goes up. The owl is happy.
But recognition is not production. Knowing that "mesa" means "table" when you see it on screen is completely different from producing "mesa" when you're pointing at a table and trying to say something. The first is retrieval from a prompt. The second is retrieval from meaning, under time pressure, out loud. They use different cognitive pathways.
See "mesa" → select "table"
Hear a word → pick the image
Rearrange words into a sentence
Choose the missing word from options
No prompts, no tiles, no options
Retrieve the word and articulate it
The length of a natural pause
Listen, think, and speak in real time
Matching words to meanings, translating sentences, picking the right answer. This is what most apps are built around, and they do it well.
Hearing the language spoken in real context. Podcasts, audio lessons, native speakers. Valuable, and increasingly available.
Actually producing language. Forming your own sentences, out loud, in real time. The hardest skill, the most desirable, and the one almost no app teaches.
This is well-documented in language acquisition research:
Swain's output hypothesis: comprehension alone isn't enough. Learners must be pushed to produce language to develop fluency.
Nation's four strands model: requires a dedicated fluency development strand alongside input and explicit learning.
DeKeyser's skill acquisition theory: maps the journey from declarative knowledge ("I know this word") through procedural practice ("I can use it in context") to automaticity ("it comes out without thinking"). Most apps stop at step one.
You feel fluent because you're scoring 95% on quizzes. But quizzes test recognition. The moment someone asks you a question in real life, the quiz score is irrelevant. You need to formulate a response, retrieve the right words, assemble them in the right order, and say them out loud. And you need to do it in about two seconds, because that's how long a conversational pause lasts before it gets awkward.
Two seconds. That's the gap between knowing a language and speaking it.
The fix isn't more quizzes. It isn't longer streaks. It's practice that actually requires you to open your mouth. Repeatedly, in realistic contexts, with feedback, at a difficulty level that stretches you without breaking you.
The technology to do this at scale exists now. Voice AI can hold a real conversation, adapt to your level, and give you feedback. The marginal cost is near zero. That's what we're working on at eevi.
A billion learners deserve better.