A paper lays the foundation for modern artificial intelligence: the story behind 8 Google employees

Source: AI style

Eight Google employees met by chance and co-wrote the groundbreaking “Converter” paper, a technological breakthrough that revolutionized the field of artificial intelligence, especially in understanding and generating human-like texts.

In the spring of 2017, a scientific paper titled “Attention Is All You Need” was born. The eight authors signed on it were from Google, although one member had already left the company at that time.Senior author Noam Shazeer unexpectedly found his name at the top when he saw the first draft, which seemed to mean that his contribution was the most important.In response, he said: “I have not thought about this issue deliberately.”

In academia, how to arrange the author’s name has always been a delicate balance question – whose name is placed first and who is placed last.Especially in this case where everyone has a unique mark in a real teamwork.In the process of hastily completing the paper, the research team finally decided to break the rules and stop ranking contributors.They added an asterisk and footnote next to each name: “Equal Contributor” and stated that “the order of ranking is random.”The paper was then presented to a prestigious AI conference, where it sparked a revolution.

Name: NOAM SHAZEER / Career: Co-founder and CEO of Role AI

Now, with the seventh anniversary of the “Attention” paper, it has achieved legendary status.The authors of this paper took it to a new level from a thriving artificial intelligence technology, neural networks: they created a digital system that was powerful enough to have alien intelligence.This architecture, known as “transformers”, has become the mysterious force behind all amazing AI products, including ChatGPT and graphics generators Dall-E and Midjourney.

Shazeer joked that if he had known the paper would have become so famous, he “may have been more worried about the sorting of the authors’ list.”Today, all eight authors have become miniature celebrities.”Someone asked me for a selfie because I had worked on a paper once,” said Llion Jones (stop fifth).

Name: LLION JONES/Occupation: Co-founder of SAKANA AI

“Without a transformer, I don’t think we’re here today,” said world-renowned AI scientist Geoffrey Hinton, although he is not the author of the paper.He refers to the era of change we are in, where companies such as OpenAI are building systems that even surpass human output in some ways.

All eight authors later left Google.Now, like millions of people, they are using the technology they created in 2017 in some way.I interviewed these eight “transformer” authors to try to piece together the full picture of this breakthrough achievement – a collection of human intelligence, creating a machine that may eventually end self-destruction.

The story of the transformer begins with the fourth name on the list: Jakob Uszkoreit.His father, Hans Uszkoreit, was a well-known computing linguist.Hans was imprisoned in East Germany for 15 months in the late 1960s for protesting against the Soviet invasion of Czechoslovakia.After being released from prison, he fled to West Germany and studied computer and linguistics in Berlin.He later came to the United States and worked at the SRI Institute in Menlo Park, California, when Jakob was born.Eventually, the family returned to Germany, where Jakob went to college.

Name: JAKOB USZKOREIT / Occupation: Co-founder and CEO of INCEPTIVE

Although he had no intention of focusing on languages, when he started his graduate studies, he interned in Google’s Mountain View office and joined the company’s translation team.He abandoned his PhD program and decided in 2012 to join a team from Google that worked on developing a system that could directly answer user questions on search pages without redirecting users to other websites.At the time, Apple had just released Siri, a virtual assistant that promised to provide one-time answers in casual conversations, and Google executives believed Siri could threaten their search traffic.They began to focus more on Uszkoreit’s new team.

“It’s a false panic,” Uszkoreit said.Siri doesn’t really threaten Google.But he welcomes the opportunity to delve into systems where computers and humans are talking.At that time, recurrent neural networks—once at the edge of academia—suddenly began to transcend other AI engineering methods.These networks are composed of multiple layers in which information is repeatedly passed to identify the best response.

Neural networks have achieved great success in areas such as image recognition, and the AI ​​revival movement suddenly emerged.Google is frantically tweaking its workforce structure to adopt these technologies.Companies want to build systems that can produce human-like responses—such as auto-complete sentences in emails, or create relatively simple customer service chatbots.

However, this area has encountered limitations.Recurrent neural networks have difficulty handling longer blocks of text.For example, to understand the “two hits” in the sentence “Joe is a baseball player, and after a good breakfast he went to the park and got two hits”, the language model needs to remember information about baseball.In human terms, it must be kept attentive.

The solution at the time was a technology called “Long Short-term Memory” (LSTM), which allowed language models to process larger and more complex text sequences.But the computer still processes these sequences strictly in order — word by word — and ignores contextual clues that may appear behind the text.“The approach we are applying is basically a stopgap measure,” Uszkoreit said.“We can’t really get the right stuff to work on scale.”

Around 2014, he began to conceive a different approach, which he called self-attention.This network can translate a word by citing any other part of the text.These other parts can help clarify the intent of a word and help the system produce a good translation.“It actually takes everything into consideration and gives you an effective way to look at a lot of inputs at the same time and then take out something in a rather selective way,” he said.Although AI scientists are careful not to confuse the metaphor of neural networks with the way biological brains actually work, Uszkoreit seems to believe that self-focusing is somewhat similar to how humans process language.

Uszkoreit believes that self-focusing models may be faster and more efficient than recurrent neural networks.It also handles information in a way that is well suited to mass-produced parallel processing chips that support the machine learning boom.Instead of taking a linear approach (seeing each word in order), it adopts a more parallel approach (seeing multiple words simultaneously).If done correctly, Uszkoreit doubts that you can use self-focus specifically for better results.

Not everyone thinks the idea will change the world, including Uszkoreit’s father, who won two Google Faculty Research Awards while his son worked for the company.“People are skeptical about this because it abandons all existing neural architectures,” Jakob Uszkoreit said.Say goodbye to recurrent neural networks?This is heresy!“We weren’t exactly the same in the conversation between my dad and I at the table.”

Uszkoreit convinced some colleagues to experiment with self-attention.Their work showed promise and published a paper on it in 2016.Uszkoreit hopes to push their research further—the team’s experiments used only tiny parts of the text—but none of his collaborators were interested.Instead, they left the casino with a modest victory, applying the lessons they learned to Google in various areas, including search and ultimately advertising.In many ways, this is an amazing success, but Uszkoreit doesn’t want to stop there.

Uszkoreit believes that self-focusing can take on bigger tasks.He would articulate his vision to anyone who would listen, or even some who would not, and outline his vision with whiteboards in the building 1945 Charleston Road, the north edge of Google campus.

One day in 2016, Uszkoreit was having lunch with a scientist named Illia Polosukhin at Google Café.Born in Ukraine, Polosukhin worked at Google for nearly three years.He was assigned to the team that answers questions directly raised in the search field.Things didn’t go very smoothly.“To answer something on Google.com, you need something very cheap and high-performance,” Polosukhin said.”Because you only have milliseconds to respond.” When Polosukhin expressed his complaint, Uszkoreit proposed a solution without hesitation.”He advises, why not use self-attention?” Polosukhin said.

Name: ILLIA POLOSUKHIN/Profession: Co-founder of NEAR

Polosukhin sometimes works with colleague Ashish Vaswani.Born in India and raised in the Middle East, Vaswani went to the University of Southern California to earn his PhD from the Elite Team of Machine Translation.He later moved to Mountain View to join Google — specifically a new organization called Google Brain.He described the brain as “a radical team” and believed that “neural networks will advance human understanding.”But he is still looking for a big project to work on.His team worked next to Building 1945, Building 1965, and he heard about the idea of ​​self-attention.Will that be a project?He agreed to proceed.

The three researchers jointly drafted a design document called “Transformer: Iterative self-focusing and handling various tasks.”They chose the name “transformer” from “day one”, Uszkoreit said.The idea is that this mechanism will transform the information it receives, allowing the system to extract as much understanding as possible—or at least give the impression.In addition, Uszkoreit has wonderful memories of playing with Hasbro action characters toys in childhood.“I had two little Transformers toys when I was a kid,” he said.The file ends with a cartoon image of six Transformers emitting lasers to each other in mountainous terrain.

Name: ASHISH VASWANI/Profession: Co-founder and CEO of ESENTIAL AI

The sentence at the beginning of the paper is also a bit arrogant: “We are great.”

In early 2017, Polosukhin left Google to start his own company.By then, new collaborators joined in.An Indian engineer named Niki Parmar worked for an American software company in India and later moved to the United States.She received her master’s degree from the University of Southern California in 2015 and was recruited by all the major tech companies.She chose Google.When she started working, she joined Uszkoreit and worked on improving the model variants of Google search.

Another new member is Llion Jones.He was born and raised in Wales and he liked computers “because it wasn’t normal”.At the University of Birmingham, he took an AI course and developed curiosity about neural networks as an introduction to historical sites.He received his master’s degree in July 2009 and lived on relief funds for several months because he could not find a job during the recession.He found a job at a local company and then applied for Google as a “desperate move.”He got the job and eventually entered Google Research, where his manager was Polosukhin.

One day, Jones heard about the concept of self-attention from a colleague named Mat Kelcey and later joined the Transformer team.(Later, Jones met Kelcey and briefed him on the transformer project. Kelcey didn’t buy it. “I told him, ‘I’m not sure that would work,’ which was basically the biggest misprediction of my life,” he said.Kelcey says now. )

Name: NIKI PARMAR / Career: Co-founder of ESSENTIAL AI

The work of the transformer attracted other Google Brain researchers who are also trying to improve large language models.The third wave includes Polish-born theoretical computer scientist Łukasz Kaiser and his intern Aidan Gomez.Gomez grew up in a small farm village in Ontario, Canada, and his family pounded maple trees every spring for maple syrup.

As a junior at the University of Toronto, he fell in love with AI at first sight and joined the lab of Geoffrey Hinton, a machine learning group.He began to contact people who had written interesting papers at Google and came up with ideas for expanding their work.Kaiser took the bait and invited him to do an internship.It wasn’t until a few months later that Gomez learned that these internships were for doctoral students, not undergraduates like him.

Kaiser and Gomez quickly realized that self-focusing appears to be a promising and more radical solution to the problem they are solving.“We consciously discussed whether we wanted to merge the two projects,” Gomez said.The answer is yes.

The transformer team began building a self-focus model that translates text from one language to another.They measure its performance using a benchmark called BLEU, which compares the output of the machine to the work of a human translator.From the beginning, their new model has done a great job.“We went from having no proof of concept to having something that was at least comparable to the best alternative to LSTM at the time,” Uszkoreit said.But compared to long and short-term memory, “it’s not better.”

They reached a platform–until one day in 2017, when Noam Shazeer stumbled on their project.Shazeer is a senior Google employee who joined the company in 2000 and is an internal legend, starting with his work on the company’s early advertising system.Shazeer has been working in deep learning for five years and has recently developed an interest in large language models.But these models are far from producing the smooth conversation he thought could be.

According to Shazeer’s recollection, he was walking through Kaiser’s work area in the corridor of Building 1965.He found himself listening to a heated discussion.”I remember Ashish was talking about the idea of ​​using self-focus, and Niki was so excited about it. I thought, wow, that sounded like a good idea. It looked like a fun, smart team doing something promising.”Shazeer found the existing recurrent neural networks “annoying” and thought: “Let’s replace them!”

Shazeer’s joining the team is key.“These theories or intuitive mechanisms, such as self-focus, always require very careful implementation, often with a handful of experienced ‘magics’ to show any signs of life,” Uszkoreit said.Shazeer immediately began to cast his magic.He decided to write his own version of the code for the transformer team.“I took the basic idea and made it myself,” he said.

Occasionally he would ask Kaiser questions, but most of the time, he said, he “just did it for a while and then came back and said, ‘Look, it worked.’” using team members later with “magic,” “alchemy,” andWhat the words “bell and whistle” describe, he took the system to a new level.

“That triggered a sprint,” Gomez said.They have motivation, and they want to catch up with the upcoming deadline—May 19, a paper published at the Neural Information Processing Systems conference, the largest AI event of the year, held in December.Submission date.As winter in Silicon Valley turns into spring, the pace of experiments has accelerated.They tested two transformer models: one produced with 12-hour training and the other is a more powerful version called Big, which was trained after three and a half days.They asked them to start translating English to German.

The basic model surpasses all competitors – and Big gets a BLEU score, decisively breaking previous records while being more computationally efficient.“We did it, faster than anyone else,” Parmar said.”And that’s just the beginning, because the numbers keep rising.” When Uszkoreit heard the news, he took out an old bottle of champagne he had been holding in his mountain adventure truck to celebrate.

The team’s pace of work went crazy in the last two weeks before the deadline.Although some team members still have desks in Building 1945, they mostly work in Building 1965 because there is a better espresso machine in the miniature kitchen there.“People hardly sleep,” Gomez recalls, who was busy debugging as an intern while also working on visualizations and charting papers.In such projects, ablation experiments are usually performed – some parts are removed to verify that the remaining parts are sufficient to complete the task.

“We tried all the possible combinations of techniques and modules – which are useful and which are useless. We kept trying and replacing,” Gomez said.”Why does the model work in this counterintuitive way? Oh, because we forgot to do the occlusion correctly. Now does it work? OK, let’s move on to the next one. All these components of what we call now the transformerThey are all the products of this high-speed, iterative trial and error process. “With the help of Shazeer’s code implementation, the ablation experiment produced “some simple result”, Jones commented.”Noam is a wizard.”

Vaswani remembers spending the night on the office sofa once, while the team was writing the paper.He stared at the curtains that separated the sofa from the rest of the room, attracted by the pattern above, which looked like synapses and neurons.Gomez was also present at the time, and Vaswani told him that the work they were doing would go beyond machine translation.“Ultimately, like the human brain, you need to unify all of these modalities—voice, audio, vision—in a single architecture,” he said.“I have a strong sense that we are discovering something more common.”

However, at Google’s top leadership, the work is seen as just another interesting AI project.The authors were asked whether their bosses often called them to update project progress, and there were not many answers.But “we know that’s probably a pretty big thing,” Uszkoreit said.“This leads us to actually be fascinated by a sentence at the end of the paper.”

That sentence foreshadows what might happen next—the transformer model applies to basically all forms of human expression.“We are excited about the future of attention-based models,” they wrote.“We plan to extend the transformer to problems involving input and output modalities other than text” and to look at “image, audio and video.”

One night a few days before the deadline, Uszkoreit realized they needed a title.Jones noted that the team has made a fundamental rejection of one technology: attention.The Beatles once named a song “All you need is love.”Why not name the paper “Attention Is All You Need”?

“I’m British,” Jones said.”It really took only five seconds to think. I didn’t expect them to use it.”

They continued to collect experimental results until the deadline.”The English-French numbers came out five minutes before we submitted the paper.” “I sat in the miniature kitchen of Building 1965 and got the last number.” They had only two minutes left and hurriedly sent it,” Parmar said.The paper was written.

Like nearly every other tech company, Google quickly applied for a temporary patent for the work.The reason is not to stop others from using these ideas, but to build their patent portfolio for defense purposes.(The company’s philosophy is “If technology advances, Google will benefit.”)

When the converter team hears feedback from the peer reviewers of the meeting, the reactions are mixed.“One is positive, one is extremely positive, and the other is, ‘This is OK,’” Parmar said.The paper was accepted for display in the poster session at night.

By December, the paper began to cause a sensation.Their four-hour meeting on December 6 was packed with scientists who wanted to learn more.The authors were hoarse when they talked about their voices.At 10:30 pm, at the end of the meeting, there was still a group of people.”The security guard had to tell us to leave,” Uszkoreit said.The most satisfying moment for him was probably the computer scientist Sepp Hochreiter stepped forward to praise the work—a considerable compliment considering that Hochreiter was the co-inventor of long and short-term memory, and the transformer just put itReplaced as the preferred tool in the AI ​​toolbox.

The converter didn’t take over the world immediately, or even Google.Kaiser recalls that before and after the paper was published, Shazeer suggested to Google executives that companies should abandon the entire search index and train a huge network with a transformer – basically using transformers to change the way Google organizes information.At that time, even Kaiser thought the idea was ridiculous.Now, traditional wisdom thinks it’s only a matter of time.

A startup called OpenAI seized the opportunity faster.Shortly after the paper was published, OpenAI’s principal investigator Ilya Sutskever — who knew the transformer team during Google’s time — suggested that its scientist Alex Radford study the idea.The result is the first batch of GPT products.As OpenAI CEO Sam Altman told me last year, “When the transformer paper came out, I don’t think anyone from Google realized what it means.”

The internal situation is more complicated.“We know very well that the transformers can do something really magical,” Uszkoreit said.”Now, you might ask, why didn’t Google launch ChatGPT in 2018? In fact, we could have had GPT-3 or even 3.5 in 2020. The real problem isn’t, did they see it? The question is,Why are we not doing anything with the facts we have seen? The answer is complicated.”

Many tech critics point to Google’s transition from an innovation-centric playground to a bottom-line-centric bureaucracy.As Gomez told the Financial Times, “They are not modernizing. They are not adopting this technology.” But it takes a lot of courage for a giant company that has been leading the industry for decades and has made huge profits.Google does begin integrating converters into products in 2018, first with its translation tools.In the same year, it introduced a new transformer-based language model, BERT, and began to be applied to search the next year.

Name: AIDAN GOMEZ/Occupation: Co-founder and CEO of COHERE

But these behind-the-scenes changes seem timid compared to OpenAI’s leap and Microsoft’s bold integration of converter-based systems into its product lineup.When I asked CEO Sundar Pichai why his company didn’t first launch a large language model like ChatGPT did last year, he thought that in this case, Google found it beneficial to let others lead.”I’m not quite sure if it will be as successful as it is. The truth is, after people see how it works, we can do more,” he said.

It is undeniable that all eight authors of the paper left Google.Polosukhin’s company Near has built a blockchain with a market capitalization of about $4 billion.Parmar and Vaswani became business partners in 2021, co-founding Adept (valued at $1 billion) and are now operating their second company called Essential AI (received $8 million in investment).

Sakana AI in Tokyo-based Llion Jones is valued at $200 million.After Shazeer left in October 2021, he co-founded Character AI (valued at $5 billion).Intern Aidan Gomez co-founded Toronto-based Cohere (valued at $2.2 billion) in 2019.Jakob Uszkoreit’s biotech company Inceptive is valued at $300 million.All of these companies (except Near) are based on converter technology.

Name: LUKASZ KAISER / Occupation: OPENAI Fellow

Kaiser is the only one who has not started a company.He joined OpenAI and became the inventor of a new technology called Q*, which Altman said last year would “push the veil of ignorance and push the frontier of discovery forward.” (When I tried to go on oursWhen asked Kaiser about this question during the interview, OpenAI’s PR staff almost jumped over the table to stop him.)

Does Google miss these deserters?Of course, except for others moving from the company to the new AI startup.(Pichai reminded me that when I asked him about the translator departure, industry darling OpenAI also saw defection: “The AI ​​field is very, very dynamic,” he said.) But what Google can boast is that it creates a supportA environment that pursues non-traditional ideas.“In many ways, Google has been ahead – they invest in the right mind and create an environment where we can explore and push the limits,” Parmar said.”It took time to be adopted. Google has more stakes.”

If there is no environment: there is no converter.Not only are the authors all Google employees, they work in the same office.Encounter encounters in the corridor and chatting at lunch led to major moments.The team is also culturally diverse.Six of the eight authors were born outside the United States; the other two were children of two Germans with green cards, who were temporarily in California, and a generation of Americans who fled persecution from their families.

Uszkoreit from his office in Berlin said that innovation is all about the right conditions.“It’s getting people who are very excited about something together at the right time in their lives,” he said.“If you have this and you have fun doing things, you’re dealing with the right issues – and you’re lucky – magic happens.”

Something magic happened between Uszkoreit and his famous father.After all those table debates, Hans Uszkoreit, his son reported that he now co-founded a company that is building a large language model.Of course, the converter is used.

  • Related Posts

    Can Trump fire Powell?What economic risks will it bring?

    Author: Nik Popli, Time; Compilation: Tao Zhu, Bitchain Vision The Fed has long prided itself on its independence from political pressure.But the tradition is under new pressure as President Donald…

    Why Americans Want to Leave the United States: Economic, Political and Global Changes

    Source: Zhou Ziheng Donald Trump’s presidency has sparked the largest trade war in history, prompting more and more Americans to consider leaving the country.A new survey shows that rising cost…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Historic Trend: Bitcoin is Being a Safe-Habiting Asset

    • By jakiro
    • April 19, 2025
    • 12 views
    Historic Trend: Bitcoin is Being a Safe-Habiting Asset

    What makes cryptocurrency rug pull events happen frequently?

    • By jakiro
    • April 18, 2025
    • 14 views
    What makes cryptocurrency rug pull events happen frequently?

    Wintermute Ventures: Why do we invest in Euler?

    • By jakiro
    • April 18, 2025
    • 13 views
    Wintermute Ventures: Why do we invest in Euler?

    Can Trump fire Powell?What economic risks will it bring?

    • By jakiro
    • April 18, 2025
    • 11 views
    Can Trump fire Powell?What economic risks will it bring?

    Glassnode: Are we experiencing a bull-bear transition?

    • By jakiro
    • April 18, 2025
    • 15 views
    Glassnode: Are we experiencing a bull-bear transition?

    The Post Web Accelerator’s first batch of 8 selected projects

    • By jakiro
    • April 17, 2025
    • 28 views
    The Post Web Accelerator’s first batch of 8 selected projects
    Home
    News
    School
    Search