Sign up for our newsletter to receive these emails in your inbox.

When Google Translate launched in 2006, it supported two languages. By 2016, that number had grown to 103. Then expansion started to slow; over the next five years, just six languages were added to the service. Google had hit a bottleneck. To learn to translate, their machine learning models needed a large body of “parallel” text, or text in one language paired with a translated version. Unfortunately, that kind of data existed only for a small fraction of languages.

In 2022, Google researchers made a breakthrough. They assembled two datasets: one of parallel text in 112 languages and the other of untranslated text in over 1,000 languages. Using the first, the researchers trained a model to translate text from one language to another, and with the second, the model learned to reconstruct randomly garbled text in the same language. When the researchers combined these tasks during training, the model learned to generalize its translation abilities to languages it had never seen translated before. This capability, called zero-shot translation, brought 24 new languages to the service.

Large language models have unlocked even greater advancements. On Thursday, citing advances driven by one such model, Google added support for 110 languages, some spoken by just a few thousand people. As these models continue to improve, language barriers will become increasingly obsolete.

Malcolm Cochran, Digital Communications Manager

Culture & Tolerance:

Energy & Environment:

Food & Hunger:

Health & Demographics:

Science & Technology: