You don't need a sledgehammer to crack a nut.
Jonathan Frankl does research on artificial intelligence, not pistachios, but the same philosophy applies to his " lottery ticket hypothesis." He argues that hidden within the massive neural networks of a more compact network can perform the same task more efficiently. The trick is to find those "lucky" subnets called winning lottery tickets.
In a new paper, Frankl and his colleagues discovered such subnets hiding in BERT, a modern neural network approach to natural language processing (NLP). As a branch of artificial intelligence, NLP aims to decipher and analyze human language using applications such as intelligent text generation or online chatbots. In terms of computing, BERT is cumbersome and usually requires ultra-high computing power that is not available to most users. Access to a WINNING Bert lottery ticket could level the playing field, potentially allowing more users to develop effective NLP tools on a smartphone - without a sledgehammer.
"We are getting to the point where we need to make these models more compact and efficient, "says Frankl, adding that this progress may one day" lower the barriers to entry " to NLP.
Frankl, Ph. D. A student in Michael Karbin's group at the mit computer science and artificial intelligence Lab has co-authored a study that will be presented next month at a conference on neural information processing systems. Tianlong Chen of the University of Texas at Austin is the lead author of the paper, which includes collaborators Zhangyang Wang, also from Texas A&M, as well as Shiyu Chang, Xijia Liu, and Yang Zhang, all from MIT-IBM's Watson AI lab. .
You probably interacted with the BERT network today. This is one of the technologies behind Google's search engine, and it has caused excitement among researchers since Google released BERT in 2018. BERT is a method for creating neural networks-algorithms that use multi-level nodes or "neurons" to learn how to perform a task through training on numerous examples. BERT is trained by repeatedly trying to fill in words left out of a passage of text, and its strength lies in the gigantic size of this initial training data set. Users can then configure the BERT neural network for a specific task, such as creating a chatbot for customer service. But fighting BERT requires huge computing power.
"The standard BERT model these days - the garden variety - has 340 million parameters," says Frankl, adding that this number can reach 1 billion. A supercomputer may be required to fine-tune such a massive network. "It's just obscenely expensive. This is beyond the computational capabilities of you or me."
Chen agrees. Despite BERT's surge in popularity, such models "suffer from the huge size of the network," he says. Fortunately, " the lottery ticket hypothesis seems to be the solution."
To reduce the cost of computing, Chen and his colleagues tried to find a smaller model hidden in BERT. They experimented by iteratively removing parameters from the full BERT network, and then comparing the performance of the new subnet with that of the original BERT model. They made this comparison for a number of NLP tasks, from answering questions to filling in an empty word in a sentence.
The researchers found successful subnets that were 40-90 percent thinner than the original BERT model, depending on the task. In addition, they were able to identify winning lottery tickets before performing any fine-tuning for a specific task - a discovery that could further minimize computational costs for NLP. In some cases, a subnet selected for one task may be repurposed for another, although Frankl notes that this migration capability was not universal. However, Frankl is more than happy with the group's results.
"I was shocked that it even worked," he says. "It's not something I took for granted. I expected a much more unpleasant result than we got."
According to Ari Morkos, a scientist at Facebook AI Research, this discovery of a winning ticket in the BERT model is "compelling." "These models are becoming more common," says Morcos. "So it's important to understand whether the lottery ticket hypothesis is correct." He adds that this discovery could allow BERT-like models to operate with much less computing power, "which could make a big difference given that these extremely large models are currently very expensive to operate."
Frankl agrees. He hopes this work will make BERT more accessible, because it counters the trend of ever-growing NLP models. "I don't know how much more we can achieve using these supercomputer - style calculations," he says. "We'll have to lower the barrier to entry." Identifying a weak, lottery-winning subnet does just that-allowing developers who lack the computing power of Google or Facebook to still perform advanced NLP. "The hope is that it will lower the cost, that it will make it more accessible to everyone... for the little guys who only have a laptop," says Frankl. "It's really interesting for me."