当前位置：网站首页>Recall to order: multi-objective sorting distillation recall

Recall to order: multi-objective sorting distillation recall

2022-07-22 14:57:00 【kaiyuan_ sjtu】

author | It's raining
Arrangement | NewBeeNLP

The recommendation system generally includes recall 、 Rough row 、 Fine discharge and other stages , Each stage performs its own function .

Recall is mainly about taking out all the contents that users may be interested in , Sort （ Rough row 、 Fine discharge ） It mainly predicts the degree of user interest in each content . Recall provides data for the subsequent stages of the entire recommendation system , Ensure the supremum of the effect . Sort the data provided by the recall for a more detailed analysis 、 Prediction and sequencing , Ensure that the end user sees the results . Recall can be said to serve the subsequent sorting model .

Recall and sorting are two stages of recommendation , It can be said to be an independent process . In industry , It may be two waves of people who do recall and sorting . Therefore, the goals of the two may be inconsistent . such as ： Fine typesetting hopes to give higher scores to content with high conversion rate or high browsing duration , The recall still takes out the content that users like to click , Finally, the ability of fine discharge is limited .

In order to align the recall target with the subsequent ranking , We need a recall method that aims at the goal of the ranking model , Ensure to provide the data required by the sorting model .

Multi target distillation recall

Tencent proposed a multi-objective MMoE Distillation DSSM Recall method DMTL（ Pictured 1 Shown ）.

MMoE Our two goals are click and read duration .DSSM It just modeled the user's click interest , Recall the content that users may like to click , It doesn't take into account that users have a high reading time . So the author uses MMoE Distillation DSSM Model , Increase the reading time of the market .

For click tasks , The positive sample is the clicked content , Negative samples are randomly sampled according to the click frequency . Negative samples do not use exposed samples that are not clicked , After all DMTL It's a recall model , Not a sort model .

For long-term tasks , The positive sample is reading for more than 50s（ The average of all durations ） Click content , Others are negative samples . It means clicking on the positive sample , Indicates the duration positive sample （ That is, the reading time is longer than 50s）. Is the sample Characteristics of , among It's user characteristics , Is the content feature .

It's the predicted click through rate pCTR, Is the predicted conversion rate pCVR, yes pCTCVR.

Author use MMoE Learning at the same time CTR and CVR, then pCTCVR Distill to DSSM.MMoE yes teacher The Internet ,DSSM yes student The Internet .

Predicted pCTCVR .

fitting CTVR Of loss For cross entropy ： .

fitting CTR Of loss by ：

Final MMoE Trained loss by ：

In order to CTCVR Of the fraction distilled to DSSM, First define DSSM Learned .

User characteristics learned for user tower , Learned the content characteristics for the content tower .

Distillation loss Defined as KL The divergence ：

Final loss That is to say MMoE（teacher The Internet ） and DSSM（student The Internet ） Of loss The sum of the .

The paper experiment also shows the offline AUC And the average online reading time has improved .

Offline distillation method

DMTL It's training at the same time MMoE and DSSM Of . If there's a lot of data , The time complexity of training is high . We can train a version first MMoE, Then put the logit preserved , The logit Can be used to guide DSSM Training .

In the recommendation system ,DSSM and MMoE Updated every day . We can MMoE Generated on the same day logits Save it , Then use it for the next day DSSM Training for . because DSSM Will continue batch Each inside user With all the item Of logits, and MMoE It may not calculate a <user,item> Yes logits. At this time, we can grade the fine row （ such as ctr and cvr branch ） preserved .

The fine fraction obtained by offline distillation is not timely , After all, it happened yesterday , It may be different from the scores learned online today . But in real practice , I found that using yesterday's score can still achieve good results .

In zhangjunlin ： Application of knowledge distillation in Recommendation System [3] in , The author proposed many methods of off-line distillation , Yes logits, without-logits（point-wise, pair-wise,list-wise）. The author tried point-wise Method , The sorting of fine arrangement is used as the weight . Online time 、 Click on 、 Interaction and other indicators 2+% To 6+% Between different degrees of promotion .

I also use the scoring and sorting of multi-objective fine sorting to distill DSSM, Click through rate has little effect （ Maybe because other recall models are improving click through , Therefore, it adds DSSM Loss in click through rate ）, But the interaction rate has greatly improved .

Communicate together

Want to Study and progress with you ！『NewBeeNLP』 At present, many communication groups in different directions have been established （ machine learning / Deep learning / natural language processing / Search recommendations / Figure network / Interview communication / etc. ）, Quota co., LTD. , Quickly add the wechat below to join the discussion and exchange ！（ Pay attention to it o want Notes Can pass ）

format,png

Reference resources

[1] Distillation based Multi-task Learning: A Candidate Generation Model for Improving Reading Duration.
[2] Modeling task relationships in multi-task learning with multi-gate mixture-of-experts.
[3] Zhang Junlin ： Application of knowledge distillation in Recommendation System ：https://zhuanlan.zhihu.com/p/143155437