[NeurIPS 2023] Estimating Propensity for Causality-based Recommendation without Exposure Data
- References
- Paper: Estimating Propensity for Causality-based Recommendation without Exposure Data
- Official Code: GitHub Repository
- Conference: NeurIPS 2023
- Presentation Slides
Abstract
- Causality-based Recommendation Systems [RS]:
- [Focus] item exposure [cause]→ user-item interactions [causal effects == result]
- ↔︎ Conventional Correlation-based RS 대표적인 고전 RS 모델 논문 찾아보기
- Existing Causality-based RS
: additional input[exposure data and/or propensity scores] 필요 for training. 초기 Causality-based RS 모델 논문 찾아보기 1. exposure data 2. propensity scores == probability of exposure 3. exposure data & propensity scores - [Problem] Such data: often Not available in real-world situations b/c technical or privacy constraints - [Solution in this paper] - Propensity Estimation for Causality-based Recommendation (PROPCARE) - Only interaction data 사용. for training and inference - [Method] Relating the pairwise characteristics between propensity and item popularity
→ Theoretical analysis on the *bias of the causal effect* under our model estimation.
→ Empirically evaluate PROPCARE through both quantitative and qualitative experiments.
1. Introduction
- RS 활용도: applications such as
- streaming services
- online shopping
- job searching
- Primary aim of RS: boosting sales, user engagement, …
⇒ User Interactions(clicking or purchasing items)에 의존
user-item interactions == clicking, purchasing
Classical paradigm:
- Predict user-item interactions
- Interaction Probability → Recommend items to users 참고문헌 [8, 22, 33, 35] 참조.
- [Limitation] Ignore the Causal Impact behind recommendation
Recent Studies for Causality-based Recommendation Systems [RS]
- Treatments [Cause] → User’s behavior [Result] 참고문헌 [38, 30, 28, 37] 참고
- Treatments: Recommend/Expose the item or not
- Behavior: Click or Purchase
- Assumption (1): Recommending items의 관점에서,
- With the higher causal effect > with the higher interaction probability.
- User’s behavior에서의 Causal Effect 정량화 ← Observed data & Counterfactual treatment
- Assumption (2): Exposure Data or the Propensity Scores가 training 단계에서 관측가능.
- Exposure Data == item이 user에게 recommend 되었는지 여부
- Exposure == Recommendation
- Propensity Scores == user에게 item이 recommending/exposing될 확률
- Exposure Data == item이 user에게 recommend 되었는지 여부
- [Problem] Exposure Data를 얻을 수 없음.
e.g. e-commerce platform에서 item 구매 이력은 feasible.
- But purchases가 w/ exposure인지 w/o exposure인지 알 수 없음. b/c technical and privacy constraints.
training 단계에서 exposure data and/or propensity scores 없음
→ 기존의 causality-based recommenders [RS] 사용 불가.
[Solution in this paper] for Causality-based RS
- Setup: Exposure and propensity scores: Not observable
- Some previous works: Estimate propensity scores ← e.g. addressing biases in recommendation 참고문헌 [38, 42, 1, 14, 21] 참조.
- Limitations
- 대부분의 SOTA[state-of-the-art] methods: Propensity estimator을 훈련시키기 위해 Exposure Data가 필요.
- Prior knowledge를 propensity estimator에 통합 실패 → Not robust estimation
- Limitations
- [Paper Solution] New framework: PROPCARE
- 각 user의 각 item에 대한 propensity score, exposure를 estimation
- above limitations 해결
- data gap 문제 해결
- Observation: (propensity scores, item popularity) pairwise characteristic
- 언제? == user-item interaction의 확률이 잘 control될 때.
- Assumption 1: probability of user-item interaction is well controlled. ****empirically validated in Sect. 4.2.
- → propensity estimation을 위해 item popularity를 prior knowledge로 포함시킴.
- Theoretical analysis on the bias of the estimated causal effect.
- → estimation에 영향을 미치는 factors를 알 수 있음.
- → model & experiment design.
Previous Propensity Estimation Vs PROPCARE

- PROPCARE 장점.
- Propensity or Exposure Data가 전혀 필요하지 않음.
- Prior Information을 Robust Estimation에 포함시켜 사용.
- Paper Contributions
- Previous Causality-based RS: propensity score and/or exposure data are often unavailable but required for model training or inference. → 이러한 문제 해결.
- (propensity, item popularity) pairwise relationship을 prior knowledge로 통합 → more Robust Propensity Estimation
- model에 영향을 주는 factors에 대한 분석 제공.
- PROPCARE의 효과를 quantitative and qualitative 결과를 통해 검증.
3. Preliminaries
Data notations
- Typical recommendation dataset
- \(D = \{(Y_{u,i})\}\): collection of observed training user-item interaction data
- Interactions between users-items.: Purchases or Clicks [Result]
- \(Y_{u,i} \in \{0,1\}\): observed interaction
- \(Y_{u,i} = 1\): user u interacted to item i
- User \(u \in \{1, 2, 3,..., U\}\)
- Item \(i \in \{1, 2, 3,..., I\}\)
- \(Y_{u,i} \in \{0,1\}\): observed interaction
- Unobservable indicator variable for Exposure [Cause]
- \(Z_{u,i} \in \{0, 1\}\)
- \(Z_{u,i} = 1\): item i is exposed/recommended to user u
- \(Z_{u,i} \in \{0, 1\}\)
- Propensity score := probability of exposure
- \(p_{u,i} := P\left(Z_{u,i}=1\right)\)
Causal effect modeling
- Potential outcomes for different exposure statuses: \(Z_{u,i} = 0 \ or \ 1\)
- 윗점차: Exposure 여부. [Cause]
- 값: Interaction 여부. [Result]
- \(Y_{u,i}^0 \in \{0,1\}\): Exposure ❌일 때의 interaction
- \(Y_{u,i}^1 \in \{0,1\}\): Exposure ⭕일 때의 interaction
- 문제점: real-world에서는 특정 (u,i)에 대하여 \(Y_{u,i}^0\) 또는 \(Y_{u,i}^1\)만 관측할 수 있음. [Counterfactual Nature] 참고문헌 [9] 참조.
- Counterfactual Model
- Causal Effect: \(\tau_{u,i} := Y_{u,i}^1-Y_{u,i}^0\in \{-1, 0, 1\}\) means Exposure(Recommendation) → Interaction Relationship
\(\tau_{u,i} = 1\): recommending item i to user u ⇒ user-item interaction 증가.
- 3가지 주체(Users, Sellers, Platforms) 모두 positive causal effects를 결과로 하는 recommendation으로부터 이득을 얻음.
\(\tau_{u,i} = -1\): recommending item i to user u ⇒ user-item interaction 감소.
\(\tau_{u,i} = 0\): recommending or not ⇒ user-item interaction에 영향을 끼칠 수 없음.
- Causal Effect: \(\tau_{u,i} := Y_{u,i}^1-Y_{u,i}^0\in \{-1, 0, 1\}\) means Exposure(Recommendation) → Interaction Relationship
Causal effect estimation
- Causal effect \(\tau_{u,i}\): [Counterfactual Nature] 때문에 observed data로부터 direct하게 계산 불가능. ⇒ Estimation 필요.
- \(Y_{u,i}^1, Y_{u,i}^0\)을 동시에 observation할 수 없다.
- CausCF or Doubly robust estimator
- direct parametric models라서 potential outcomes의 prediction error에 민감
- → 고품질 labeled Exposure Data 필요. (for parametric models)
- → this paper의 setup이 아님.
- direct parametric models라서 potential outcomes의 prediction error에 민감
- This paper: IPS estimator \(\in\) Non-parametric approach 사용.
[Appendix B]
\[ \hat{Y}_{u,i}^1 = \frac{Z_{u,i} Y_{u,i}}{p_{u,i}} , \quad \hat{Y}_{u,i}^0 = \frac{\left(1-Z_{u,i}\right) Y_{u,i}}{1-p_{u,i}} \text{: Unbiased (IPS) Estimator} \\ \text{Since} \qquad Y_{u,i}=Z_{u,i}Y_{u,i}^1 + \left(1-Z_{u,i}\right)Y_{u,i}^0 \quad (a.10) \\ \qquad \qquad \mathbb{E}\left[Z_{u,i}\right]= 1\cdot p_{u,i} + 0\cdot \left(1-p_{u,i}\right)=p_{u,i} \]
Interaction model
- Assumption: Relationship betw. interactions, propensity and relevance
\[ y_{u,i} = p_{u,i} \times r_{u,i} \quad (2) \]
- \(y_{u,i}=P\left(Y_{u,i} =1\right)\): Probability of interaction betw. user u and item i [Result: Interaction Prob.]
- \(p_{u,i} := P\left(Z_{u,i}=1\right)\): Propensity score := probability of exposure [Cause: Exposure Prob.]
- \(r_{u,i}\): Probability that item i is relevant to user u [Relevant Prob.]
4. Proposed Approach: PROPCARE
4.1. Naive propensity estimator
Setup: only interaction data are observable
Objective: Estimate propensity scores and exposure
- Main focus: Estimation of propensity scores \(\hat{p_{u,i}}\)
- → Propensity로부터 corresponding Exposure은 쉽게 sampling 가능. using threshold. [Section 4.3]
Naive Loss Function for the interaction model \(y_{u,i} = p_{u,i} r_{u,i}\)
\[ \mathcal{L}_{\text{naive}} = - Y_{u,i} \times \log f_p(\mathbf{x}_{u,i}; \Theta_p) f_r(\mathbf{x}_{u,i}; \Theta_r) - (1 - Y_{u,i}) \times \log(1 - f_p(\mathbf{x}_{u,i}; \Theta_p) f_r(\mathbf{x}_{u,i}; \Theta_r)) \quad (3) \]
- \(\mathbf{x}_{u,i} = f_e \left(u,i;\Theta_e\right)\): Joint user-item embedding output
- \(f_e\): learnable embedding function
- \(\mathbf{x}_{u,i}: f_p, f_r\)의 Input으로 사용.
- \(f_p\): learnable propensity function → Estimated propensity score \(\hat{p_{u,i}}\)
- \(f_r\): learnable relevance function → Estimated relevance probability \(\hat{r_{u,i}}\)
- Each learnable function \(f_*\): parameter을 가짐 → \(\Theta_*\): parameter set of \(f_*\) → MLP로 parameter 학습됨.
- 문제점: \(\hat{y_{u,i}} = f_p \left(\mathbf{x_{u,i}};\Theta_p\right) \times f_r \left(\mathbf{x_{u,i}};\Theta_r\right)\): \(f_p\), \(f_r\)이 곱해진 form
- → \(\mathcal{L}_{\text{naive}}\)로부터 \(f_p\) or \(f_r\)을 학습시킬 수 없음. \(\hat{y_{u,i}}=\hat{p_{u,i}} \times \hat{r_{u,i}}\)가 학습됨.
- → Propensity Score 추정 불가능. \(\hat{p_{u,i}} = f_p \left(\mathbf{x_{u,i}};\Theta_p\right)\)을 구할 수 없음.
- \(\mathbf{x}_{u,i} = f_e \left(u,i;\Theta_e\right)\): Joint user-item embedding output
4.2. Incorporating prior knowledge
- Naive Loss Function의 해결법: Prior Knowledge로 \(f_p\) or \(f_r\)을 constrain 하는 것.
- Observation: more popular items will have a higher chance to be exposed [Popularity → Exposure Prob.]
\(pop_i\): Popularity of item i
\[ pop_i := \frac{\sum_{u=1}^U Y_{u,i}}{\sum_{j=i}^I \sum_{u=1}^U Y_{u,j}} \]
- 모든 (user, item) pairs 의 interactions #중, item i에 대하여 얼마나 많은 user가 interaction을 하였는가?
- Popularity of item i ⇒ 전체 interaction 중 item i에 대한 비중.
직관적이지만 popularity-exposure 관계를 설명하기에 적절하지 못함.
→ 반례: 높은 interaction 확률을 가진 item ⇒ 높은 노출 chance를 가지는 경향이 있음.
- [Interaction Prob. → Exposure Prob.]
- [문제점] Popularity, Interaction Prob. → Exposure Prob.
- Causal Effect: Exposure → Interaction
- Thus, Popularity factor을 소거해야 함.
[해결책] Popularity를 propensity/exposure estimation에 대한 Prior Knowledge로 통합시킴.
- → Interaction Prob. control. → Assumption 1 (Pairwise Relationship on Popularity and Propensity)
Assumption 1 (Pairwise Relationship on Popularity and Propensity) == Prior Knowledge
- user: u
- pair of items: (i, j)
- \(pop_i > pop_j, \ y_{u,i} \approx y_{u,j}\ \Rightarrow \ p_{u,i} > p_{u,j}\)
- **정확히는, \(y_{u,i} \approx y_{u,j} \ \Rightarrow \ \left( pop_i > pop_j\Leftrightarrow \ p_{u,i} > p_{u,j} \right )\)
Empirical validation of Assumption 1
Assumption 1을 만족시키는 3가지 dataset(DH_original, DH_personalized and ML)에서의 실험적 검증. by calculating

Figure 2
- x-axis: bins [intervals]: Inverse Similarity in Interaction Prob. \(|y_{u,j} - y_{u,i}|\)
- ps. Inverse Similarity == Distance.
- y-axis: \(\text{ratio}_b\): 각 bin에 대하여 Assumption 1을 만족시키는 items의 확률
- Estimate \(y_{u,i}\) from \(Y_{u,i}\) using logistic matrix factorization 참고문헌 [11] 참조.
- Obtain \(p_{u,i}\) from ground truth values in the datasets
- 각 user별로 item pairs (i,j)에 대하여, \(|y_{u,j} - y_{u,i}|\) [Similarity in interaction prob.]을 기준으로 각 bin에 배치
- 실험의 가정 상 \(y_{u,i} \approx y_{u,j}\) 이므로, \(|y_{u,j} - y_{u,i}|\)은 0.5 까지만 실험.
- \(\text{ratio}_b\) 계산. user u에 대하여 특정 bin b에 존재하는 item pairs 중 \(pop_i > pop_j \Rightarrow \ p_{u,i} > p_{u,j}\)을 만족시키는 pairs의 개수. [확률]
\[ ratio_b = \frac{1}{U} \sum_{u=1}^U \frac{\text{\# item pairs }(i,j) \text{ for user } u \text{ in bin } b \text{ s.t. } (p_{u,j} - p_{u,i})(\text{pop}_j - \text{pop}_i) > 0}{\text{\# item pairs }(i,j) \text{ sampled for user } u \text{ in bin } b}\quad (4) \]
Integrating prior knowledge
\(\mathcal{L_\text{naive}}\)에서 \(\hat{y_{u,i}}=\hat{p_{u,i}} \times \hat{r_{u,i}}\)가 학습됨 → \(f_p\) or \(f_r\)을 학습시킬 수 있도록, 분리된 Loss function 사용.
\(pop_i > pop_j, y_{u,i} \approx y_{u,j}\)일 때, \(f_p \left(\mathbf{x_{u,i}}\right) > f_p \left(\mathbf{x_{u,j}}\right)\) 여야 함. From. Assumption 1 [Prior Knowledge]
\[ \text{loss} = -\log\left[ \sigma\left( f_p \left(\mathbf{x_{u,i}}\right)-f_p \left(\mathbf{x_{u,j}}\right) \right) \right] \in \left(0, \infty \right) \quad (5) \]
\(f_p \left(\mathbf{x_{u,i}}\right)-f_p \left(\mathbf{x_{u,j}}\right) \uparrow \ \Rightarrow \text{loss} \downarrow\): 따라서 loss를 minimize하려는 방향이 우리의 Prior Knowledge에 잘 부합한다.
\(\sigma\): Sigmoid function.
loss: pair-wise 방식으로 popularity를 잘 사용했음.
Above \(\text{loss}\)의 Advantages
- \(f_p, f_r\) 분리
- Interaction data으로 계산이 가능한 item popularity만 사용. ← \(pop_i := \frac{\sum_{u=1}^U Y_{u,i}}{\sum_{j=i}^I \sum_{u=1}^U Y_{u,j}}\)
- \(pop_i \neq pop_j\)이므로 예측값 \(\hat{p_{u,i}} \approx 0 \ \text{or}\ 1\) 방지. [Section 4.5의 Remark 2]
Final Popularity-loss function: Popularity → Exposure Prob. \(\hat{p_{u,i}}\) [Assumption 1]
\[ \mathcal{L}_{\text{pop}} = -\kappa_{u,i,j} \log \left[ \sigma(\text{sgn}_{i,j} \cdot (f_p(\mathbf{x}_{u,i}) - f_p(\mathbf{x}_{u,j}))) + \sigma(\text{sgn}_{i,j} \cdot (f_r(\mathbf{x}_{u,j}) - f_r(\mathbf{x}_{u,i}))) \right] \quad (6) \\ \arg\min_{\Theta_\text{pop}} \mathcal{L_\text{pop}} = \hat{\Theta_{\text{pop}}} \ \rightarrow \ \hat{p_{u,i}}, \hat{r_{u,i}} \]
- \(sgn_{i,j} = \text{sign}\left(pop_i - pop_j\right) \in \{1, -1\}\): \(pop_i > pop_j, pop_i < pop_j\) 모두 고려.
- \(\kappa_{u,i,j} = e^{\eta\left(y_{u,i}-y_{u,j}\right)^2}, \eta<0\): weighting function. \(|y_{u,i}-y_{u,j}| \downarrow \ \ \Rightarrow \ \kappa_{u,i,j} \uparrow\)
- \(\eta\): learnable parameter
- Assumption 1의 조건에 부합할수록 loss의 가중치를 크게 만듦.: \(y_{u,i} \approx y_{u,j}\) 조건 고려.
- Interaction model: \(y_{u,i} = p_{u,i} \times r_{u,i}\) ⇒ for a fixed \(y_{u,i}\), \(p_{u,i} \uparrow \ \ \Rightarrow r_{u,j} \downarrow\)
- \(f_p\) 만 고려하지 않고, \(f_r\)까지 고려. → model training을 더 향상시킴.
- 뒷 부분 relevance term이 j - i 인 이유. \(p_{u,i} \uparrow \ \ \Rightarrow r_{u,j} \downarrow\)
4.3. Propensity learning
\(\mathcal{L_\text{naive}}:\hat{y_{u,i}}=\hat{p_{u,i}} \times \hat{r_{u,i}}\)가 학습됨 → Interaction model \(y_{u,i} = p_{u,i} \times r_{u,i}\) 최적화
\(\mathcal{L}_{\text{pop}}:\hat{p_{u,i}},\hat{r_{u,i}}\)가 학습됨. [Pairwise-loss] → Popularity를 Propensity learning을 위한 Prior information으로 사용.
→ 통합시킨 total loss function. \(\mathcal{L_\text{total}}\) 사용하자.
Propensity score \(\hat{p_{u,i}}\) Regularization for [\(\hat{p_{u,i}} \approx 0 \ \text{or}\ 1\) 방지].
→ Regularization Term: \(\mu \cdot \text{KLD}\left(Q \| \text{Beta}(\alpha, \beta)\right)\), Regularization parameter: \(\mu\)
\[ \mathcal{L_\text{total}} = \sum_{u,i,j} \left(\mathcal{L}_{\text{naive}} + \lambda \cdot\mathcal{L}_{\text{pop}}\right) + \mu \cdot \text{KLD}\left(Q \| \text{Beta}(\alpha, \beta)\right) \quad (7) \\ \arg\min_{\Theta_{\text{total}}} \mathcal{L_\text{total}} = \hat{\Theta_{\text{total}}} \ \rightarrow \ \hat{p_{u,i}}, \hat{r_{u,i}} \]
소수의 인기 아이템만 노출확률이 크기 때문에, propensity scores [ground-truth]는 long-tailed distribution.
- → 마찬가지로 long-tailed인 Beta distribution을 propensity scores의 regularization에 사용. [선행연구: 참고문헌 [4, 15] 참조.]
- \(Q\): Empirical distribution of all estimated propensity scores \(\hat{p_{u,i}}\)
- \(\alpha, \beta\): parameters which are selected to simulate a long-tailed shape.
- \(\text{KLD}\left(\cdot \| \cdot\right)\): Kullback-Leibler Divergence betw. two distributions. → 작을수록 예측 분포와 실제 분포가 비슷함.
\(\lambda, \mu\): trade-off hyper-parameters [weighting term.]
- \(\lambda: \mathcal{L_\text{naive}}, \mathcal{L}_{\text{pop}}\) 조율.
- \(\mu\): Regularization 조율.
Estimated propensity score \(\hat{p_{u,i}}\) → \(\hat{Z_{u,i}}\) 예측.
- \(\hat{Z_{u,i}}=1 \quad if \ \ \text{Norm}\left(\hat{p_{u,i}}\right) \geq \epsilon\)
- \(\hat{Z_{u,i}}=0 \quad \text{otherwise}\)
- \(\epsilon\): threshold hyper-parameter
- \(\text{Norm}\): Normalization function such as Z-score normalization
Algorithms [in Appendix A]: update all learnable parameters based on the total loss

image.png
4.4. Causality-based recommendation
- DLCE: Debiased Learning for the Causal Effect 참고문헌 [30] 참조. → DLCE loss가 충분히 이해되지 않음..
SOTA [state-of-the-art] Causality-based Recommender w/ IPS estimator
Input: Interaction \(Y_{u,i}\), Exposure \(Z_{u,i}\), Propensity \(p_{u,i}\)
Output: Ranking Score \(\hat{s_{u,i}}\) for each user-item (u,i) pair
- User u를 위한 Recommendation Ranking을 정할 때 사용되는, each item의 Ranking Score.
Given \((u, i, j) \quad s.t. \ i \neq j\), the DLCE loss function
\[ \mathcal{L}_{\text{DLCE}}= \frac{Z_{u,i}Y_{u,i}}{\max(p_{u,i},\chi^1)} \log \left(1+e^{-\omega(\hat{s_{u,i}}-\hat{s_{u,j}})}\right) + \frac{(1-Z_{u,i})Y_{u,i}}{\max(1-p_{u,i},\chi^0)} \log \left(1+e^{\omega(\hat{s}_{u,i}-\hat{s}_{u,j})}\right) \quad (8) \]
\[ \mathcal{L}_{\text{DLCE}}= \frac{Y_{u,i}}{\max(p_{u,i},\chi^1)} \log \left(1+e^{-\omega(\hat{s}_{u,i}-\hat{s}_{u,j})}\right) \times \mathbb{I} \left(Z_{u,i}=1\right) \\ \qquad \qquad \qquad \ + \ \frac{Y_{u,i}}{\max(1-p_{u,i},\chi^0)} \log \left(1+e^{\omega(\hat{s}_{u,i}-\hat{s}_{u,j})}\right) \times \mathbb{I} \left(Z_{u,i}=0\right) \quad (8) \]
\[ \hat{s_{u,i}} = f_s \left(u,i,\Theta_s\right), \quad \arg\min_{\Theta_{s}} \mathcal{L_\text{DLCE}} = \hat{\Theta_{s}} \ \rightarrow \ \hat{s_{u,i}} \]
- \(\chi^1, \chi^0, \omega\): hyper-parameters
[This paper] Ground truth 대신 추정치 사용.
- \(p_{u,i} \rightarrow \hat{p_{u,i}}\)
- \(Z_{u,i} \rightarrow \hat{Z_{u,i}}\)
\[ \mathcal{L}_{\text{PC-DLCE}}= \frac{\hat{Z_{u,i}}Y_{u,i}}{\max(\hat{p_{u,i}},\chi^1)} \log \left(1+e^{-\omega(\hat{s_{u,i}}-\hat{s_{u,j}})}\right) + \frac{(1-\hat{Z_{u,i}})Y_{u,i}}{\max(1-\hat{p_{u,i}},\chi^0)} \log \left(1+e^{\omega(\hat{s_{u,i}}-\hat{s_{u,j}})}\right) \quad \left(8'\right) \]
\[ \hat{s_{u,i}} = f_s \left(u,i,\Theta_s\right), \quad \arg\min_{\Theta_{s}} \mathcal{L_\text{PC-DLCE}} = \hat{\Theta_{s}} \ \rightarrow \ \hat{s_{u,i}} \]
- PC: PROPCARE
4.5. Theoretical property
기존 Causal Effect의 IPS estimator: Unbiased Estimator.
\[ \hat{\tau_{u,i}}^{IPS}=\frac{Z_{u,i} Y_{u,i}}{p_{u,i}} - \frac{\left(1-Z_{u,i}\right) Y_{u,i}}{1-p_{u,i}} \text{: Unbiased Estimator} \quad (1) \]
[This paper] Ground truth 대신 추정치 사용. → IPS estimator: Biased Estimator.
- \(p_{u,i} \rightarrow \hat{p_{u,i}}\)
- \(Z_{u,i} \rightarrow \hat{Z_{u,i}}\)
\[ \hat{\tau_{u,i}}^{PC-IPS}=\frac{\hat{Z_{u,i}} Y_{u,i}}{\hat{p_{u,i}}} - \frac{\left(1-\hat{Z_{u,i}}\right) Y_{u,i}}{1-\hat{p_{u,i}}} \text{: Biased Estimator} \quad (1') \]
Proposition 1
\[ \text{bias}\left(\hat{\tau_{u,i}}^{PC-IPS}\right)= \left( \frac{p_{u,i}+\mathbb{E}\left[ \hat{Z_{u,i}}-Z_{u,i} \right]}{\hat{p_{u,i}}} -1 \right) Y_{u,i}^1 - \left( \frac{1-p_{u,i}-\mathbb{E}\left[ \hat{Z_{u,i}}-Z_{u,i} \right]}{\hat{1-p_{u,i}}} -1 \right) Y_{u,i}^0 \ \quad (9) \]
Remark 1
\(\text{bias}\left(\hat{\tau_{u,i}}^{PC-IPS}\right)\)의 major factors:
\[ \frac{p_{u,i}}{\hat{p_{u,i}}}, \ \frac{1-p_{u,i}}{1-\hat{p_{u,i}}}, \ \mathbb{E}\left[ \hat{Z_{u,i}}-Z_{u,i} \right] \quad \ \rightarrow \quad \left( \hat{p_{u,i}}=p_{u,i}, \ \hat{Z_{u,i}}=Z_{u,i} \ \Rightarrow \text{bias}\left(\hat{\tau_{u,i}}^{PC-IPS}\right)=0\right) \]
Remark 2
\[ \hat{p_{u,i}} \approx \text{0 or 1} \ \Rightarrow \ \text{bias}\left(\hat{\tau_{u,i}}^{PC-IPS}\right) \approx \pm \infty \]
- Exposure variable: \(Z_{u,i} \in \{0, 1\}\) is Binary variable. → F1 score같은 binary classification metrics 사용.
- \(\mathbb{E}\left[ \hat{Z_{u,i}}-Z_{u,i} \right] \rightarrow\text{bias}\left(\hat{\tau_{u,i}}^{PC-IPS}\right)\)이므로 \(\hat{Z_{u,i}}\)를 잘 추정할수록 bias가 작아짐.
- Propensity: \(p_{u,i} := P\left(Z_{u,i}=1\right)\) is Continuous variable. → KLD, Kendall’s Tau같은 metrics 사용. [Section 5.2]
- Remark 2에서 \(\hat{p_{u,i}} \not\approx \text{0 or 1}\)이어야 함. → Eq. (7)처럼 Regularization 사용하는 것이 좋다.
5. Experiment
- PROPCARE가 quantitative & qualitative experiments 모두에서 효과적임을 보인다.
5.1. Experiment setup
Datasets
- 3가지 standard Causality-based Recommendation Benchmarks: DH_original, DH_personalized, MovieLens 100K (ML 100K)
- DH_original, DH_personalized \(\in\) DunnHumby dataset
- purchase and ptomotion logs @ 오프라인 소매점, 93주 기간동안.
- DH_original: 주간 전단지 → Exposure → ground-truth Propensity Scores
- DH_personalized: Simulation → ground-truth Propensity Scores
- ML 100K
- Users’ Rating on movies
- Simulated Propensity Scores ← ratings & user behaviors
- PROPCARE: ground-truth propensity scores → Model Output Evaluation에만 사용.
- Note: training 단계에서 ground-truth values 사용 ❌
- Datasets → training/validation/test sets
- Statistics: number or average values of Key variables [User, Item, Observed Interaction, Exposure, Causal Effect, Propensity]

image.png
Baselines
- PROPCARE vs Baselines [other methods]
- Propensity Estimators
- Ground-truth values: propensity \(p_{u,i}\), exposure \(Z_{u,i}\) → input of DLCE on training
- Ground-truth: datasets → Propensity Score & Exposure values
- Estimate propensity \(\hat{p_{u,i}}\) → Derive exposire \(\hat{Z_{u,i}}\) → input of DLCE on training
- Random: Propensity Scores \(\in \left(0, 1\right)\) randomly
- Item Popularity (POP): Propensity Scores = Normalization of POP \(\in \left(0, 1\right)\)
- CJBPR: Propensity → Relevance → Propensity → Relevance → … point-wise optimization
- EM: Expectation-Maximization algorithm → Propensity Scores point-wise learning
- Ground-truth values: propensity \(p_{u,i}\), exposure \(Z_{u,i}\) → input of DLCE on training
Parameter settings
- Validation data → Tuning hyper-parameters
- PROPCARE: Use the trade-off hyper-parameters as
- \(\lambda = 10\)
- \(\mu=0.4\)
- Other settings: Appendix C.2.
- PROPCARE: Use the trade-off hyper-parameters as
Evaluation metrics
- Performance of Causality-based Recommendation → Evaluation metrics [Appendix C.3.]
- CP@10, CP@100: Causal effect-based Precision (CP)
- CDCG: Causal effect-based Discounted Cumulative Gain (CDCG)
5.2. Results and discussions
- PROPCARE > Baselines: additional experiments in [ Appendix D. ]
Performance comparison
- PROPCARE > Baselines in two aspects
- The downstream causality-based recommendation using the estimated propensity and exposure
- The accuracy of the estimated propensity and exposure

- Performance of Causality-based Recommendation [Evaluation metrics 비교.]
- Ground-truth: performance 가장 좋다. [Evaluation metrics 가장 큼.]
- 실제 propensity, exposure values를 DLCE에 그대로 사용하기 때문.
- → But real-world에서는 사용 불가.
- PROPCARE: 가장 Ground-truth 값에 가까움. 특히 DH_personalized에서는 큰 차이 ❌
- PROPCARE > CJBPR, EM
- → Pairwise method by Assumption 1.이 좋다.
- 실제 propensity, exposure values를 DLCE에 그대로 사용하기 때문.

- Propensity, Exposure Estimation Accuracy
- POP: Baselines 중에서, Kendall’s Tau 기준 가장 좋다.
- But Table 2를 보면 POP의 causality metrics는 좋지 못함.
- KLD 값이 큰 것으로 보아, propensity score의 distribution 예측이 잘 되지 않았기 때문. → ill-fit propensity distribution.
- F1 score가 작음 → Exposure estimation도 잘 되지 못하였음.
- But Table 2를 보면 POP의 causality metrics는 좋지 못함.
- PROPCARE: F1 score, KLD에서 효과가 좋음 & Table 2에서 causality metrics도 좋은 성능을 의미.
- Tau scores가 다른 baselines보다 약간 나쁘지만, 나머지 두 지표가 좋음.
- → Propensity Score, Exposure 둘 모두 estimation 잘됨. → Causal Performance 좋음.
- Tau scores가 다른 baselines보다 약간 나쁘지만, 나머지 두 지표가 좋음.
- Causality-based Recommendation: multiple factors에 의하여 영향을 받음. → influencing factors가 무엇이 있는가? [5.2. 마지막 단락]
Ablation study
\[ \mathcal{L}_{\text{pop}} = -\kappa_{u,i,j} \log \left[ \sigma(\text{sgn}_{i,j} \cdot (f_p(\mathbf{x}_{u,i}) - f_p(\mathbf{x}_{u,j}))) + \sigma(\text{sgn}_{i,j} \cdot (f_r(\mathbf{x}_{u,j}) - f_r(\mathbf{x}_{u,i}))) \right] \quad (6) \]
Derive 5 variants
- NO_P: Removing the constraint on estimated \(\hat{p_{u,i}}\) by deleting the term with \(f_p(\mathbf{x}_{u,i}) − f_p(\mathbf{x}_{u,j})\)
- NO_R: Removing the constraint on estimated \(\hat{r_{u,i}}\) by deleting the term with \(f_r(\mathbf{x}_{u,j}) − f_r(\mathbf{x}_{u,i})\)
- NO_P_R: Removing \(\mathcal{L}_{\text{pop}}\) entirely from the overall loss \(\mathcal{L}_{\text{total}}\) to eliminate Assumption 1 altogether
- NEG: Reversing Assumption 1 by replacing \(\text{Sgn}_{i,j}\) with \(-\text{Sgn}_{i,j}\) to assume that more popular items have smaller propensity scores
- Removing the condition \(\left( pop_i > pop_j\Leftrightarrow \ p_{u,i} > p_{u,j} \right )\)
- \(\kappa=1\): Setting all \(\kappa_{u,i,j} = 1\) → equal weighting of all training triplets.
- Removing the condition \(y_{u,i} \approx y_{u,j}\)

image.png - x-axis: dataset
- y-axis: performance
- PROPCARE: best performance
- NEG: worst performance → Assumption 1 is most important.
Effect of regularization
\[ \mathcal{L_\text{total}} = \sum_{u,i,j} \left(\mathcal{L}_{\text{naive}} + \lambda \cdot\mathcal{L}_{\text{pop}}\right) + \mu \cdot \text{KLD}\left(Q \| \text{Beta}(\alpha, \beta)\right) \quad (7) \\ \arg\min_{\Theta_{\text{total}}} \mathcal{L_\text{total}} = \hat{\Theta_{\text{total}}} \ \rightarrow \ \hat{p_{u,i}}, \hat{r_{u,i}} \]
Regularization parameter: \(\mu\)

image.png - \(\mu \approx 0 \ \Rightarrow \ \text{performance CDCG} \downarrow\)
- \(\mu \uparrow \ \Rightarrow \ \text{performance CDCG} \uparrow, \quad \mu_{\text{peak}} \in \left[0.2, 0.8\right]\)
Factors influencing causality-based recommendation
방법 1: ground-truth propensity or exposure values에 Noises Injection [각각 (b), (a)]

image.png - Ground-truth propensity scores \(p_{u,i}\)를 DLCE training에 사용하면서, \(Z_{u,i}\)를 \(0↔1\)로 일부분 randomly flip. [\(Z_{u,i}\) 오염.]
x-axis: Flip ratio
y-axis: CDCG performance
오염 비중이 커질 수록 성능 급격히 하락.
- → Causality-based Recommendation: Exposure의 Estimation에 매우 민감.
- Ground-truth exposure \(Z_{u,i}\)를 DLCE training에 사용하면서, Add Gaussian Noises to the propensity scores. [\(p_{u,i}\) 오염.]
- x-axis: Variance of Noises
- y-axis: CDCG performance
- 오염 비중이 커질 수록 성능 적당히 하락.
- → Causality-based Recommendation: Propensity Score의 Estimation에 적당히 민감.
방법 2: Correlation betw. Estimation Accuracy & Recommendation Performance
Dataset: Only DH_original dataset [3개의 dataset 중 그냥 하나 뽑은 듯함.]

image.png - x-axis: Estimation Accuracy
- KLD, Kendall’s Tau: Propensity Scores
- F1 score: Exposure
- y-axis: CDCG performance [Recommendation Performance]
- x-axis: Estimation Accuracy
5.3. Case study
PROPCARE: Ranking-based Recommendation

image.png - Top-5 Recommend items [User ID 2308, DH_personalized dataset]
- Ground-truth: DLCE가 ranking list를 효과적으로 생성하였음.
- Most items가 Positive Causal Effect.
- \(\tau_{u,i} := Y_{u,i}^1-Y_{u,i}^0 = 1\)
- → Recommending item i to user u ⇒ user-item interaction[click or purchase] 증가.
- All items in positive causal effect: 모두 purchased 되었음. → Goal of Causality-based Recommendation 성공.
- Most items가 Positive Causal Effect.
- CJBPR, PROPCARE: Purchased items의 Causal Effects는 각기 다르다.
- CJBPR - strawberries: causal effect ❌
- Recommending or not ⇒ user-item interaction[Purchasing]에 영향을 끼칠 수 없음.
- PROPCARE - infant soy: Positive causal effect
- Recommending item i to user u ⇒ user-item interaction[click or purchase] 증가.
- CJBPR - strawberries: causal effect ❌
- POP: Negative Causal Effect를 가진 item(tortilla chips)도 recommend를 한다.
- → POP: 좋은 method가 아님.
6. Conclusion
- PROPCARE: w/o ground-truth of propensity and exposure data
- Observation of (propensity scores, item popularity) → Key Assumption → Prior Information → Causality-based RS
- Factors for bias in estimated causal effects
- Empirical studies: PROPCARE > Baselines [other methods]
- Future research suggestion:
- Direct exposure estimation w/o propensity scores [i.e, w/o propensity estimation]
- Parametric causal effect estimators [IPS estimator: Non-parametric approach]
