RECSYS: RAJEEV RASTOGI about three recommendation system challenges

Rayev Rastogi, Vice President of Applied Science in Amazon’s International Emerging Stores Division.

In a Keynote address of this year’s ACM Conference on Recommendation Systems (Recsys), which starts next week, Rajeev Rastogi, Vice President of Applied Science in Amazon’s International Emerging Stores Division, will discuss three problems that have been in its recommendation algorithms: Recommendations in direct graphs; Exercise machine learning models when measuring labels change over time; And utilization of estimates of prediction uncertainty to improve the models’ accairy.

“The connections are that these are general techniques that cut over Maria different recommendation problems,” explains rastogy. “And these are things that we actually use in practice. They do another in the real world.”

Direct graphs

The first problem involves directly graphs or graphs if edges describe the relationship driving in just one direction.

“Direct graphs have apps in many different domains out there – from quote network where an edge UV Indicates paper U Cities of paper Vor in social networks where an edge UV Would show that user U Following another user Vand in e-commerce where an edge UV indicates that customers Bught -product U Before the Bough Product V”Explains rastogy.

Although the problem of exploring direct graphs is generally, the researchers focused on Rastogi’s organization on this last box: Related-product recommendation, where the goal is to predict what other products may interest a customer who has just made a porch.

“The interesting part here is that related production conditions are actually asymmetrical,” explains rastogy. “If you have, for example, two nodes, a phone and a telephone case, given a phone, you would recommend a phone case. But if the customer has Bush a phone case, you would not recommend a phone because they probably already have one.”

As many graph-based applications involve the Amazon team’s solution to the problem of asymmetrical recommendation of related product product graph neural networks (GNNs), where each node in a graph is embedded In a representative space where geometric relationships between nodes carry information about their relationship in the network. The embedded process is iterative, where each iteration factoring in information about nodes on the basis of larger removes until each nodes embedding carries about its Neighborhood.

“A single embedded room does not have the express force to model the asymmetric relationship between nodes in corrected graphs,” explains rastogy. “Something that we borrowed from previous work is to reproduce each knot with double deposits, and one of our new contributions is really to read the double embedders in a GNN setting that utilizes the entire graph structure.

In the middle is a graph that indicates the relationship between mobile phones and related products, such as a case, a power adapter and a screen protection. On is a schematic illustration embedding (vector representation) of the node Hair In a traditional graph neural network (GNN); to the right is the double injury of HairACE has both recommendation target (On) and a source recommendation (SEAM)in the magazine. From “Blade: Particed Neighborhood Sampling Based Graph Neural Network to Direct Graphs”.

“Then we had additional techniques, adaptive sampling,” adds rastogy. “These vanilla gnns trials fixed neighborhood sizes for each knot. But we found that low-degree nodes”-it is, nodes with add-on connections to other nodes “have sub-optimal performance when you have fixed neighborhood sizes for each knot because there are low-degree nodes.

“So we actually choose to try larger neighborhoods for low -degree nodes and smaller neighborhoods for high -degree nodes. It’s a bit of table space, but it gives us much better results.”

Delayed feedback

A typical machine learning model (ML) model is trained on labeled data and the model must learn to predict the labels – its its Training goals – From the data. The second problem Rastogi addresses in its speech is how to best train a model when you know the measurement labels will go to the near future.

“This is again a very common problem across Mayry different from domains,” says rastogy. “In recommendations, there may be a time delay of a few days between customers looking at a recommendation and purchase of the product.

“There is a change here: If you use all training data in real time, some of them from newer education mayor may have measurement labels that are wrong because they will change over time. On the other hand, if you ignore all the training you are going for the past five days, you miss, we will receive data and your model will not be so good – especially in about how models should be believed.

“Here’s what we donated P (x, y) Be the true data distribution and Q (x, y) Be the data distribution you observe in the training set. Our importance saumping strategy uses the relationship P (x, y) Divided by Q (x, y) As the importance of weight.

“Our most important innovation centers on techniques to calculate these important weights in new scenarios. One is where we take into account pre -cension signals. People tend to do something before they convert; they can add to shopping cart or they can click the product to examine it before filling the verchas.

“But then it makes the calculation of the importance of importance a little more complex. If it is much like the measurement label will actually change from 0 – a negative example – to to 1 Then the weight weight would be much lower than if the likelihood of the example not changing was very low. In essence, what you are trying to do is learn from the data The likelihood of the measurement label going, will change in the future and catch it in the importance of the weights. “

Prediction Ucretainty

Finlly, says Rastogi, the third technique he will discuss in his speech is the use of uncertainty estimates to improve the accuracy of model predictions.

“ML models will typically return point estimates,” explains rastogy. “But usually you have a probability distribution. In some boxes you could know that there is a 0.5 chance that this customer will have to buy the product. But in some boxes it can be everywhere between 0.2 and 0.8.

“We trained a binary classification to predict advertising click the probability of an ADS recommendation application. For each sample in Holdout set, we both generated the model result, which is the probability prediction and also an uncertainty estimate, which is how I am sure I am above the previous probability.

“If I looked at a lot of example in the teamout set of a model score of 0.5, you would expect about 50% of them resulting in clicks: That’s the Empirical positivity. If it was 0.8, the empirical degree of positivity should be about 80%.

“But what we found is that as the variance of model score increased, the empirical positivity rats went down. If I have a score of 0.8, I could say, yes, it is between 0.79 and 0.81, which corresponds to a low variance. Or I could say, 0.65 and 0.95, indicating a high variance.

“It has consequences for choosing the decision -making limit for binary classifiers. Traditionally used binary classifers a single threshold on model results. But now, as the empirical degree of positivity depends on both the model result and the uncertainty estimate and just chose a single threshold to be sub -optimal.

Members of Rastogi’s organization are currently writing a paper about their predictions of uncertainty – but the method is already in production.

“There is much of what people publish papers about and they are forgotten and never used,” says Rastogi. “Fra Amazon gør vi videnskab, der faktisk gør en forskel for kunderne og løser kundepunkter. Dette er tre eksempler på at lave kunde-obseret videnskab, der er faktisk en anden i den virkelige verden.”

Leave a Comment