Removing Selection Bias from Evaluation of Recommendations

More than 60% of sales in Amazon’s store come from independent sellers. One of the great drivers for this growth has been met by Amazon or FBA, which is an optional program to allow sellers outsourcing order fulfillment to Amazon. The FBA gives customers access to a large selection of products with fast delivery speeds, and it allows sellers to utilize Amazon’s global logistics network and advanced technology to select, package and send customer orders and handle customer service and return. FBA also USO advanced optimization and machine learning models to provide sellers inventory recommendations, such as how much of which products to stock.

How Fulfilling Amazon Works.

The goal of these recommendations is to improve the seller’s performance-for example to maximize seller-relevant performance measurements such as returned, units sent, and the customer’s click on product listings. To determine if the recommendations are working, we would like to compare the results that sellers get from Adaptation to Amazon FBA -Recommendations with the results they would get from from Not adapted You.

Related content

The new Fulfillment of the Amazon System allows sellers to have more transparency and control over their capacity within Amazon’s full film network by applying market -based principles.

But performing this comparison is not as simple as comparing results for two seller populations, those that follow the recommendations and those who don. It is because of the so -called Selection bias: The very traits that cause some sellers to follow a recommendation could mean that if they did not have After following it, their results would differ from sellers who did not actually follow the same recommendation.

At this year’s meeting at the Institute of Operations Research and Management Sciences (Informer) we present a tutorial that shows how to use advanced Cause Machine Learning Methods of filtering the selection bias as the estimation of the effects of FBA recommendations.

To build the causal model we used Double machine learning. Specifically, we train two machine learning models: a pédicts that each seller will follow the recommendation, based on input such as inventory management history and product characteristics; The others predict seller results using the same input as the first model plus the seller’s acceptance decisions. We use the predictions of these models to give birth for any selection bias that cannot already be determined by the observed data we explain below.

Using this method we have shown why and how much FBA recommendations improve the seller’s results. We surfaces Effochi estimate to sellers through the Seller Central side to raise awareness and adoption.

An illustration of how the FBA makes stock recommendations on Seller Central. For this product, 114 devices are estimated to be “excess” (ie more than it is neded to meet). Creating a marketing market would increase the potential sales by 28 units in the following 30 days.

Selection bias

To measure and monitor the effect of such recommendations, we would ideally run experiment regularly. But we do not run such experience because we want to maintain a positive seller experience and maintenance rights, and we do not want to adversely affect seller decisions. Let’s explain.

An experiment involves two groups: the treatment group receiving an intervention (eg a recommendation) and the control group that does not receive this intervention. A well -designed experiment would randomly assign some participants to the treatment group and others to the control group to ensure impartial comparisons.

To avoid exposing sellers to such differential processing, we are intensely dependent on data that we collect by observing the seller’s decisions and the resulting results. Our methodology is therefore suitable for the environment where experience is potentially impossible (eg health care where experimentation would interfere with patient treatment and results).

Related content

The pandemic turbo-charged retail growth teams for researchers at Amazon threw a path forward to deal with the scale.

Selection bias arises when the award for treatment and control groups is not random, and the factors that determine group membership also affect the results. In our case, the treatment group included sellers who decide to adapt their actions with the recommendations and the control group consists of sellers who choose not to follow the recommendations. In other words, sellers are not randomly awarded, but rather self -select to be in any of these groups.

Therefore, it is possible that sellers who are proactive and now to manage their warehouse can decide on the treatment group, while sellers who are less busy with inventory management may decide to be in the control group. In this case it would probably be wrong to attribute the higher treatment group that are exclusively back to FBA recommendations.

It is also possible that the members of the control group already have such a thorough understanding of inventory management that they feel they need FBA’s recommendations and, as a consequence, their results are better than the treatment group’s WOW intervention. Therefore, it is necessary to compare the results of the two groups inadequate: Another method is necessary to carefully quantify the power cycle of following FBA recommendations.

Double machine learning

Our solution is to use Double Machine Learning (DML), as two models to estimate causal effects: one model estimates the accelerated seller result, considering the decision to adapt or not adapt to the recommendation; The others estimate the propensity to adapt to the recommendation. Variation in these inclinations is the source of selection bias.

Each model receives hundreds of input, included inventory management and product data. For each seller we calculate remaining Of the seller’s performance model (the difference between the model’s prediction and the actual outcoma) and the remaining of the seller’s decision model (the difference between the model’s prediction and the seller’s actual decision to follow the recommendation). These residues take Unlapped variation In the seller’s result and the seller decision – the variation not Explained by observable data.

Related content

Howzon’s Team for Supply Chain Optimization Technologies has evolved over time to meet a challenge with staggering complexity.

Therefore, we “remove” any influence on our input (eg the seller’s level of experience) may have on the estimate of the treatment effect. When we regress residues of the output model on the residue of the decision model, we estimate the effect of the non -classified variation in treatment status on the non -classified variation in the result. The result is Estimand is the causal link of the seller’s decision to follow the recommendations of the result.

In our tutorial we show how to use this method to calculate Average treatment effect (Ate), tea Average treatment effect on the treated (Att) and Conditional average treatment effect (Cate). Ate is the overall effect of the treatment (after the FBA recommendation) on the entire population of FBA sellers. That’s the question “On average how much does the recommendation change the seller outcoma compared to not following the recommendation?”

ATT focuses on sellers who actually followed the recommendation. It’s not the question “For those who followed the recommendation, what was the average effect compared to not following the recommendation?”

Cate breaks down further and looks at specific subgroups based on properties such as product category or current storage level. It’s not the question “For a particular group of sellers and products, how does it affect the recommendation them compared to not following the recommendation?”

Our Appach is agostic in terms of the type of machine learning model used. But we observe that given the scale and tabular nature of our data, gradient-boosted decision trees offer a good compromise between the high efficiency but lower accuracy of linear regression models and the high accuracy, but lower efficiency of deep-running models. Readers who are interested in the details can wait for infomers -tutorial – or read our paper in the upcoming edition of Tutorials in Operations Research newspaper.

Finally, before we make recommendations to sellers to help improve their results, we perform strictly scientific work to build recommendation algorithms, monitor their results and revise and rebuild them to make sure the seller’s results really improve.

Recognitions:

Xiaoxxi Zhao, Ethan Dee and Vivian Yu to contribute to Tutorial; FBA researchers to contribute to SELLER ASSISTANCE EFFECTCY WORKSSTRAM; Michael Miksis for managing the related product and program; FBA product leaders and engineers to push the result of this worksstream into their respective products; Alexandre Belloni and Xinyang Shen for their constructive suggestions; and WW FBA management for their support.

Leave a Comment Cancel reply