“In a new preprint paper, researchers from UC Santa Cruz show that it is possible to eliminate the most computationally expensive element of running large language models, called matrix multiplication, while maintaining performance. In getting rid of matrix multiplication and running their algorithm on custom hardware, the researchers found that they could power a billion-parameter-scale language model on just 13 watts, about equal to the energy of powering a lightbulb and more than 50 times more efficient than typical hardware. 

Even with a slimmed-down algorithm and much less energy consumption, the new, open source model achieves the same performance as state-of-the-art models like Meta’s Llama.”

From UC Santa Cruz.