Lihong Li, a senior main scientist in Amazon ads, has won the Seoul Test of Time Time Award for the 2010 for the 2010 paper “A Contextual bandit is available for personal news article.” The paper, co -false by Wei Chu, John Langford and Robert E. Schapire,,,,,,,, Introduced an innovative approach to personalized recommendation engines.
Seoul Test of Time Award “Awarded annually to the author or authors of a paper that has taken place at a previous World Wide Web conference, as the name suggests, stood the test of the time.”
“The paper tackles an important problem from a new angle that turned out to one of the basic techniques in the coming years after publication,” Li said. “The paper considers recommendation as a reinforcing learning problem that was not a popular view of it”
Li and his colleagues who worked at Yahoo! Labs in 2010 introduced a new way of thinking about personalized recommendation engines. The team addressed the challenge of creating a personal recommendation engine to directly maximize a tool function that measures user satisfaction.
Recumend systems at the time that is due to previous user activities to provide meaningful recommendations at the individual level. However, the paper notes, “In many web -based scenarios, the content universe undergoes frequent changes, with content popularity that also changes over time. In addition, there are new visitors to a site without historical consumption record.”
“These from do traditional recommend systems -approaches difficult to use,” the paper states. “Thus, it becomes important for the goodness of the match between user interests and content from user interactions when one or stand of them is new.”
Contextual bandits
The paper suggested with contextual-tape approach to operating personalized recommendations in news content ”, in which a learning algorithm-sequentilly chooses to earn users based on contextual information about users and articles, while at the same time adjusting its article strategy base maximizing the overall user click.”
“News content changes every hour with the day,” Li said. “That’s why we need a solution to quickly adapt to changing content and recommend the best content to users. Thus, the solution has been a balance between two compast targets: Maximization of user satisfaction and collection of information about” goodness in match “between user interest and content. Contextual bandits are a special class of reinforcing learning problems that are well suited to the scenario.
The paper develops practical contextual bandit algorithms that optimize user user obligation measurements, such as click frequencies, downnstream returns or other business impacts. LI later worked to expand his approach to scenarios where the tool is measured with regard to long -term obligations.
“In reality, decisions change the user’s behavior and, on their part, change the future way they interact with the site and the future tool,” Li said. “So a system must be able to take these long -term effects penetrate and make a decision to maximize long ground.
The authors reported that their “calculation-efficient contextual bandit algorithm” not only ran higher click frequencies also resolved for the scaling challenge because it could “reliably evaluate offline using previous range traffic.” The technical evaluation itself has also found in other web -based scenarios.
The road to the price
Li received a bachelor of technology in computer science and technology at Tsinghua University in Beijing and then continued to earn a Master of Science in Computing Science at the University of Alberta. He won his Ph.D. in computer science from Rutgers University and worked in reinforcement learning.
During his time at Rutgers, Li met two mentors who would later become Coauthors on the award -winning paper. Schapire was a Princeton professor of LI’s special defense committee, and Langford was LI’s trainee’s mentor at Yahoo! In 2007. In Octuber 2020, Li joined Amazon as a senior main scientist.
“One thing that attractive me is the customer’s occupation culture in Amazon that uses solid science technologies and solutions to tackle deep customer questions,” Li said. “Contextual bandits and more generally reinforcing learning techniques can help Amazon meet customer needs in shopping, entertainment and beyond, as well as a key role in improving large language models.
Li and his colleagues received Seoul Test of Time Award at Web Conference 2023 in Austin, Texas.
“I was penetrated and winning was totally unaffected,” Li said.
First devised in 1989 by Tim Berners-Lee at CERN in Geneva, Web Conference (train known as International World Wide Web Conference, abbreviation as WWW) is an annual international academic conference on the topic of future directions for the World Wide Web.
“Researchers often publish innovation in papers. When the invention remains on paper and not when the real world does not feel like the story is complete,” Li said. “This award is a recognition that the invention has had long-la-la-la-la-day influence, not only on the problem we were working on, but also in the field and in other parts of the industry. I am grateful to be the recipient of the award and am happy to see this 13-yy useful.”