[NeurIPS 2023] Estimating Propensity for Causality-based Recommendation without Exposure Data

Exposure Data & Propensity Scores 없을 때의 Causality-based RS
Published

July 15, 2025

Modified

Invalid Date

Abstract

  • Causality-based Recommendation Systems [RS]:
    • [Focus] item exposure [cause]→ user-item interactions [causal effects == result]
    • ↔︎ Conventional Correlation-based RS 대표적인 고전 RS 모델 논문 찾아보기

  • Existing Causality-based RS

: additional input[exposure data and/or propensity scores] 필요 for training. 초기 Causality-based RS 모델 논문 찾아보기 1. exposure data 2. propensity scores == probability of exposure 3. exposure data & propensity scores - [Problem] Such data: often Not available in real-world situations b/c technical or privacy constraints - [Solution in this paper] - Propensity Estimation for Causality-based Recommendation (PROPCARE) - Only interaction data 사용. for training and inference - [Method] Relating the pairwise characteristics between propensity and item popularity

    → Theoretical analysis on the *bias of the causal effect* under our model estimation.
    
    → Empirically evaluate PROPCARE through both quantitative and qualitative experiments.
    

1. Introduction

  • RS 활용도: applications such as
    • streaming services
    • online shopping
    • job searching
  • Primary aim of RS: boosting sales, user engagement, …
    • ⇒ User Interactions(clicking or purchasing items)에 의존

    • user-item interactions == clicking, purchasing

Classical paradigm:

  1. Predict user-item interactions
  2. Interaction Probability → Recommend items to users 참고문헌 [8, 22, 33, 35] 참조.
  • [Limitation] Ignore the Causal Impact behind recommendation

Recent Studies for Causality-based Recommendation Systems [RS]

  • Treatments [Cause] → User’s behavior [Result] 참고문헌 [38, 30, 28, 37] 참고
    • Treatments: Recommend/Expose the item or not
    • Behavior: Click or Purchase
  • Assumption (1): Recommending items의 관점에서,
    • With the higher causal effect > with the higher interaction probability.
  • User’s behavior에서의 Causal Effect 정량화 ← Observed data & Counterfactual treatment
  • Assumption (2): Exposure Data or the Propensity Scores가 training 단계에서 관측가능.
    • Exposure Data == item이 user에게 recommend 되었는지 여부
      • Exposure == Recommendation
    • Propensity Scores == user에게 item이 recommending/exposing될 확률
  • [Problem] Exposure Data를 얻을 수 없음.
    • e.g. e-commerce platform에서 item 구매 이력은 feasible.

      • But purchases가 w/ exposure인지 w/o exposure인지 알 수 없음. b/c technical and privacy constraints.
    • training 단계에서 exposure data and/or propensity scores 없음

      → 기존의 causality-based recommenders [RS] 사용 불가.

[Solution in this paper] for Causality-based RS

  • Setup: Exposure and propensity scores: Not observable
  • Some previous works: Estimate propensity scores ← e.g. addressing biases in recommendation 참고문헌 [38, 42, 1, 14, 21] 참조.
    • Limitations
      1. 대부분의 SOTA[state-of-the-art] methods: Propensity estimator을 훈련시키기 위해 Exposure Data가 필요.
      2. Prior knowledge를 propensity estimator에 통합 실패 → Not robust estimation
  • [Paper Solution] New framework: PROPCARE
    • 각 user의 각 item에 대한 propensity score, exposure를 estimation
    1. above limitations 해결
    2. data gap 문제 해결
  • Observation: (propensity scores, item popularity) pairwise characteristic
    • 언제? == user-item interaction의 확률이 잘 control될 때.
    • Assumption 1: probability of user-item interaction is well controlled. ****empirically validated in Sect. 4.2.
      • → propensity estimation을 위해 item popularity를 prior knowledge로 포함시킴.
  • Theoretical analysis on the bias of the estimated causal effect.
    • → estimation에 영향을 미치는 factors를 알 수 있음.
    • → model & experiment design.

Previous Propensity Estimation Vs PROPCARE

image.png
  • PROPCARE 장점.
    1. Propensity or Exposure Data가 전혀 필요하지 않음.
    2. Prior Information을 Robust Estimation에 포함시켜 사용.
  • Paper Contributions
    1. Previous Causality-based RS: propensity score and/or exposure data are often unavailable but required for model training or inference. → 이러한 문제 해결.
    2. (propensity, item popularity) pairwise relationship을 prior knowledge로 통합 → more Robust Propensity Estimation
    3. model에 영향을 주는 factors에 대한 분석 제공.
    4. PROPCARE의 효과를 quantitative and qualitative 결과를 통해 검증.

3. Preliminaries

Data notations

  • Typical recommendation dataset
    • \(D = \{(Y_{u,i})\}\): collection of observed training user-item interaction data
  • Interactions between users-items.: Purchases or Clicks [Result]
    • \(Y_{u,i} \in \{0,1\}\): observed interaction
      • \(Y_{u,i} = 1\): user u interacted to item i
    • User \(u \in \{1, 2, 3,..., U\}\)
    • Item \(i \in \{1, 2, 3,..., I\}\)
  • Unobservable indicator variable for Exposure [Cause]
    • \(Z_{u,i} \in \{0, 1\}\)
      • \(Z_{u,i} = 1\): item i is exposed/recommended to user u
  • Propensity score := probability of exposure
    • \(p_{u,i} := P\left(Z_{u,i}=1\right)\)

Causal effect modeling

  • Potential outcomes for different exposure statuses: \(Z_{u,i} = 0 \ or \ 1\)
    • 윗점차: Exposure 여부. [Cause]
    • 값: Interaction 여부. [Result]
    • \(Y_{u,i}^0 \in \{0,1\}\): Exposure ❌일 때의 interaction
    • \(Y_{u,i}^1 \in \{0,1\}\): Exposure ⭕일 때의 interaction
    • 문제점: real-world에서는 특정 (u,i)에 대하여 \(Y_{u,i}^0\) 또는 \(Y_{u,i}^1\)만 관측할 수 있음. [Counterfactual Nature] 참고문헌 [9] 참조.
  • Counterfactual Model
    • Causal Effect: \(\tau_{u,i} := Y_{u,i}^1-Y_{u,i}^0\in \{-1, 0, 1\}\) means Exposure(Recommendation) → Interaction Relationship
      • \(\tau_{u,i} = 1\): recommending item i to user u ⇒ user-item interaction 증가.

        • 3가지 주체(Users, Sellers, Platforms) 모두 positive causal effects를 결과로 하는 recommendation으로부터 이득을 얻음.
      • \(\tau_{u,i} = -1\): recommending item i to user u ⇒ user-item interaction 감소.

      • \(\tau_{u,i} = 0\): recommending or not ⇒ user-item interaction에 영향을 끼칠 수 없음.

Causal effect estimation

  • Causal effect \(\tau_{u,i}\): [Counterfactual Nature] 때문에 observed data로부터 direct하게 계산 불가능. ⇒ Estimation 필요.
    • \(Y_{u,i}^1, Y_{u,i}^0\)을 동시에 observation할 수 없다.
  • CausCF or Doubly robust estimator
    • direct parametric models라서 potential outcomes의 prediction error에 민감
      • → 고품질 labeled Exposure Data 필요. (for parametric models)
      • → this paper의 setup이 아님.
  • This paper: IPS estimator \(\in\) Non-parametric approach 사용.
    • [Appendix B]

      \[ \hat{Y}_{u,i}^1 = \frac{Z_{u,i} Y_{u,i}}{p_{u,i}} , \quad \hat{Y}_{u,i}^0 = \frac{\left(1-Z_{u,i}\right) Y_{u,i}}{1-p_{u,i}} \text{: Unbiased (IPS) Estimator} \\ \text{Since} \qquad Y_{u,i}=Z_{u,i}Y_{u,i}^1 + \left(1-Z_{u,i}\right)Y_{u,i}^0 \quad (a.10) \\ \qquad \qquad \mathbb{E}\left[Z_{u,i}\right]= 1\cdot p_{u,i} + 0\cdot \left(1-p_{u,i}\right)=p_{u,i} \]

    \[ \hat{\tau_{u,i}}^{IPS}= \hat{Y}_{u,i}^1 - \hat{Y}_{u,i}^0 = \frac{Z_{u,i} Y_{u,i}}{p_{u,i}} - \frac{\left(1-Z_{u,i}\right) Y_{u,i}}{1-p_{u,i}} \text{: Also Unbiased Estimator} \quad (1) \]

Interaction model

  • Assumption: Relationship betw. interactions, propensity and relevance

\[ y_{u,i} = p_{u,i} \times r_{u,i} \quad (2) \]

  • \(y_{u,i}=P\left(Y_{u,i} =1\right)\): Probability of interaction betw. user u and item i [Result: Interaction Prob.]
  • \(p_{u,i} := P\left(Z_{u,i}=1\right)\): Propensity score := probability of exposure [Cause: Exposure Prob.]
  • \(r_{u,i}\): Probability that item i is relevant to user u [Relevant Prob.]

4. Proposed Approach: PROPCARE

4.1. Naive propensity estimator

  • Setup: only interaction data are observable

  • Objective: Estimate propensity scores and exposure

    1. Main focus: Estimation of propensity scores \(\hat{p_{u,i}}\)
    2. → Propensity로부터 corresponding Exposure은 쉽게 sampling 가능. using threshold. [Section 4.3]
  • Naive Loss Function for the interaction model \(y_{u,i} = p_{u,i} r_{u,i}\)

    \[ \mathcal{L}_{\text{naive}} = - Y_{u,i} \times \log f_p(\mathbf{x}_{u,i}; \Theta_p) f_r(\mathbf{x}_{u,i}; \Theta_r) - (1 - Y_{u,i}) \times \log(1 - f_p(\mathbf{x}_{u,i}; \Theta_p) f_r(\mathbf{x}_{u,i}; \Theta_r)) \quad (3) \]

    • \(\mathbf{x}_{u,i} = f_e \left(u,i;\Theta_e\right)\): Joint user-item embedding output
      • \(f_e\): learnable embedding function
      • \(\mathbf{x}_{u,i}: f_p, f_r\)의 Input으로 사용.
    • \(f_p\): learnable propensity function → Estimated propensity score \(\hat{p_{u,i}}\)
    • \(f_r\): learnable relevance function → Estimated relevance probability \(\hat{r_{u,i}}\)
    • Each learnable function \(f_*\): parameter을 가짐 → \(\Theta_*\): parameter set of \(f_*\) → MLP로 parameter 학습됨.
    • 문제점: \(\hat{y_{u,i}} = f_p \left(\mathbf{x_{u,i}};\Theta_p\right) \times f_r \left(\mathbf{x_{u,i}};\Theta_r\right)\): \(f_p\), \(f_r\)이 곱해진 form
      • \(\mathcal{L}_{\text{naive}}\)로부터 \(f_p\) or \(f_r\)을 학습시킬 수 없음. \(\hat{y_{u,i}}=\hat{p_{u,i}} \times \hat{r_{u,i}}\)가 학습됨.
      • → Propensity Score 추정 불가능. \(\hat{p_{u,i}} = f_p \left(\mathbf{x_{u,i}};\Theta_p\right)\)을 구할 수 없음.

4.2. Incorporating prior knowledge

  • Naive Loss Function의 해결법: Prior Knowledge로 \(f_p\) or \(f_r\)을 constrain 하는 것.
  • Observation: more popular items will have a higher chance to be exposed [Popularity → Exposure Prob.]
    • \(pop_i\): Popularity of item i

      \[ pop_i := \frac{\sum_{u=1}^U Y_{u,i}}{\sum_{j=i}^I \sum_{u=1}^U Y_{u,j}} \]

      • 모든 (user, item) pairs 의 interactions #중, item i에 대하여 얼마나 많은 user가 interaction을 하였는가?
      • Popularity of item i ⇒ 전체 interaction 중 item i에 대한 비중.
    • 직관적이지만 popularity-exposure 관계를 설명하기에 적절하지 못함.

    • → 반례: 높은 interaction 확률을 가진 item ⇒ 높은 노출 chance를 가지는 경향이 있음.

      • [Interaction Prob. → Exposure Prob.]
      • [문제점] Popularity, Interaction Prob. → Exposure Prob.
      • Causal Effect: Exposure → Interaction
        • Thus, Popularity factor을 소거해야 함.
    • [해결책] Popularity를 propensity/exposure estimation에 대한 Prior Knowledge로 통합시킴.

      • → Interaction Prob. control. → Assumption 1 (Pairwise Relationship on Popularity and Propensity)

Assumption 1 (Pairwise Relationship on Popularity and Propensity) == Prior Knowledge

  • user: u
  • pair of items: (i, j)
  • \(pop_i > pop_j, \ y_{u,i} \approx y_{u,j}\ \Rightarrow \ p_{u,i} > p_{u,j}\)
    • **정확히는, \(y_{u,i} \approx y_{u,j} \ \Rightarrow \ \left( pop_i > pop_j\Leftrightarrow \ p_{u,i} > p_{u,j} \right )\)
    : Popularity, Propensity의 대소 관계가 같다. if Interaction Prob. fixed.**

Empirical validation of Assumption 1

Assumption 1을 만족시키는 3가지 dataset(DH_original, DH_personalized and ML)에서의 실험적 검증. by calculating

Figure 2

Figure 2

  • x-axis: bins [intervals]: Inverse Similarity in Interaction Prob. \(|y_{u,j} - y_{u,i}|\)
    • ps. Inverse Similarity == Distance.
  • y-axis: \(\text{ratio}_b\): 각 bin에 대하여 Assumption 1을 만족시키는 items의 확률
  1. Estimate \(y_{u,i}\) from \(Y_{u,i}\) using logistic matrix factorization 참고문헌 [11] 참조.
  2. Obtain \(p_{u,i}\) from ground truth values in the datasets
  3. 각 user별로 item pairs (i,j)에 대하여, \(|y_{u,j} - y_{u,i}|\) [Similarity in interaction prob.]을 기준으로 각 bin에 배치
    • 실험의 가정 상 \(y_{u,i} \approx y_{u,j}\) 이므로, \(|y_{u,j} - y_{u,i}|\)은 0.5 까지만 실험.
  4. \(\text{ratio}_b\) 계산. user u에 대하여 특정 bin b에 존재하는 item pairs\(pop_i > pop_j \Rightarrow \ p_{u,i} > p_{u,j}\)을 만족시키는 pairs의 개수. [확률]

\[ ratio_b = \frac{1}{U} \sum_{u=1}^U \frac{\text{\# item pairs }(i,j) \text{ for user } u \text{ in bin } b \text{ s.t. } (p_{u,j} - p_{u,i})(\text{pop}_j - \text{pop}_i) > 0}{\text{\# item pairs }(i,j) \text{ sampled for user } u \text{ in bin } b}\quad (4) \]

Integrating prior knowledge

  • \(\mathcal{L_\text{naive}}\)에서 \(\hat{y_{u,i}}=\hat{p_{u,i}} \times \hat{r_{u,i}}\)가 학습됨 → \(f_p\) or \(f_r\)을 학습시킬 수 있도록, 분리된 Loss function 사용.

  • \(pop_i > pop_j, y_{u,i} \approx y_{u,j}\)일 때, \(f_p \left(\mathbf{x_{u,i}}\right) > f_p \left(\mathbf{x_{u,j}}\right)\) 여야 함. From. Assumption 1 [Prior Knowledge]

    \[ \text{loss} = -\log\left[ \sigma\left( f_p \left(\mathbf{x_{u,i}}\right)-f_p \left(\mathbf{x_{u,j}}\right) \right) \right] \in \left(0, \infty \right) \quad (5) \]

    • \(f_p \left(\mathbf{x_{u,i}}\right)-f_p \left(\mathbf{x_{u,j}}\right) \uparrow \ \Rightarrow \text{loss} \downarrow\): 따라서 loss를 minimize하려는 방향이 우리의 Prior Knowledge에 잘 부합한다.

    • \(\sigma\): Sigmoid function.

    • loss: pair-wise 방식으로 popularity를 잘 사용했음.

  • Above \(\text{loss}\)의 Advantages

    1. \(f_p, f_r\) 분리
    2. Interaction data으로 계산이 가능한 item popularity만 사용. ← \(pop_i := \frac{\sum_{u=1}^U Y_{u,i}}{\sum_{j=i}^I \sum_{u=1}^U Y_{u,j}}\)
    3. \(pop_i \neq pop_j\)이므로 예측값 \(\hat{p_{u,i}} \approx 0 \ \text{or}\ 1\) 방지. [Section 4.5의 Remark 2]
  • Final Popularity-loss function: Popularity → Exposure Prob. \(\hat{p_{u,i}}\) [Assumption 1]

    \[ \mathcal{L}_{\text{pop}} = -\kappa_{u,i,j} \log \left[ \sigma(\text{sgn}_{i,j} \cdot (f_p(\mathbf{x}_{u,i}) - f_p(\mathbf{x}_{u,j}))) + \sigma(\text{sgn}_{i,j} \cdot (f_r(\mathbf{x}_{u,j}) - f_r(\mathbf{x}_{u,i}))) \right] \quad (6) \\ \arg\min_{\Theta_\text{pop}} \mathcal{L_\text{pop}} = \hat{\Theta_{\text{pop}}} \ \rightarrow \ \hat{p_{u,i}}, \hat{r_{u,i}} \]

    1. \(sgn_{i,j} = \text{sign}\left(pop_i - pop_j\right) \in \{1, -1\}\): \(pop_i > pop_j, pop_i < pop_j\) 모두 고려.
    2. \(\kappa_{u,i,j} = e^{\eta\left(y_{u,i}-y_{u,j}\right)^2}, \eta<0\): weighting function. \(|y_{u,i}-y_{u,j}| \downarrow \ \ \Rightarrow \ \kappa_{u,i,j} \uparrow\)
      • \(\eta\): learnable parameter
      • Assumption 1의 조건에 부합할수록 loss의 가중치를 크게 만듦.: \(y_{u,i} \approx y_{u,j}\) 조건 고려.
    3. Interaction model: \(y_{u,i} = p_{u,i} \times r_{u,i}\) ⇒ for a fixed \(y_{u,i}\), \(p_{u,i} \uparrow \ \ \Rightarrow r_{u,j} \downarrow\)
      • \(f_p\) 만 고려하지 않고, \(f_r\)까지 고려. → model training을 더 향상시킴.
      • 뒷 부분 relevance term이 j - i 인 이유. \(p_{u,i} \uparrow \ \ \Rightarrow r_{u,j} \downarrow\)

4.3. Propensity learning

  1. \(\mathcal{L_\text{naive}}:\hat{y_{u,i}}=\hat{p_{u,i}} \times \hat{r_{u,i}}\)가 학습됨 → Interaction model \(y_{u,i} = p_{u,i} \times r_{u,i}\) 최적화

  2. \(\mathcal{L}_{\text{pop}}:\hat{p_{u,i}},\hat{r_{u,i}}\)가 학습됨. [Pairwise-loss]PopularityPropensity learning을 위한 Prior information으로 사용.

    → 통합시킨 total loss function. \(\mathcal{L_\text{total}}\) 사용하자.

  3. Propensity score \(\hat{p_{u,i}}\) Regularization for [\(\hat{p_{u,i}} \approx 0 \ \text{or}\ 1\) 방지].

    Regularization Term: \(\mu \cdot \text{KLD}\left(Q \| \text{Beta}(\alpha, \beta)\right)\), Regularization parameter: \(\mu\)

    \[ \mathcal{L_\text{total}} = \sum_{u,i,j} \left(\mathcal{L}_{\text{naive}} + \lambda \cdot\mathcal{L}_{\text{pop}}\right) + \mu \cdot \text{KLD}\left(Q \| \text{Beta}(\alpha, \beta)\right) \quad (7) \\ \arg\min_{\Theta_{\text{total}}} \mathcal{L_\text{total}} = \hat{\Theta_{\text{total}}} \ \rightarrow \ \hat{p_{u,i}}, \hat{r_{u,i}} \]

    • 소수의 인기 아이템만 노출확률이 크기 때문에, propensity scores [ground-truth]는 long-tailed distribution.

      • → 마찬가지로 long-tailed인 Beta distribution을 propensity scores의 regularization에 사용. [선행연구: 참고문헌 [4, 15] 참조.]
      • \(Q\): Empirical distribution of all estimated propensity scores \(\hat{p_{u,i}}\)
      • \(\alpha, \beta\): parameters which are selected to simulate a long-tailed shape.
      • \(\text{KLD}\left(\cdot \| \cdot\right)\): Kullback-Leibler Divergence betw. two distributions. → 작을수록 예측 분포와 실제 분포가 비슷함.
    • \(\lambda, \mu\): trade-off hyper-parameters [weighting term.]

      • \(\lambda: \mathcal{L_\text{naive}}, \mathcal{L}_{\text{pop}}\) 조율.
      • \(\mu\): Regularization 조율.
    • Estimated propensity score \(\hat{p_{u,i}}\)\(\hat{Z_{u,i}}\) 예측.

      • \(\hat{Z_{u,i}}=1 \quad if \ \ \text{Norm}\left(\hat{p_{u,i}}\right) \geq \epsilon\)
      • \(\hat{Z_{u,i}}=0 \quad \text{otherwise}\)
      • \(\epsilon\): threshold hyper-parameter
      • \(\text{Norm}\): Normalization function such as Z-score normalization
    • Algorithms [in Appendix A]: update all learnable parameters based on the total loss

      image.png

4.4. Causality-based recommendation

  • DLCE: Debiased Learning for the Causal Effect 참고문헌 [30] 참조. → DLCE loss가 충분히 이해되지 않음..
    • SOTA [state-of-the-art] Causality-based Recommender w/ IPS estimator

    • Input: Interaction \(Y_{u,i}\), Exposure \(Z_{u,i}\), Propensity \(p_{u,i}\)

    • Output: Ranking Score \(\hat{s_{u,i}}\) for each user-item (u,i) pair

      • User u를 위한 Recommendation Ranking을 정할 때 사용되는, each item의 Ranking Score.
    • Given \((u, i, j) \quad s.t. \ i \neq j\), the DLCE loss function

      \[ \mathcal{L}_{\text{DLCE}}= \frac{Z_{u,i}Y_{u,i}}{\max(p_{u,i},\chi^1)} \log \left(1+e^{-\omega(\hat{s_{u,i}}-\hat{s_{u,j}})}\right) + \frac{(1-Z_{u,i})Y_{u,i}}{\max(1-p_{u,i},\chi^0)} \log \left(1+e^{\omega(\hat{s}_{u,i}-\hat{s}_{u,j})}\right) \quad (8) \]

      \[ \mathcal{L}_{\text{DLCE}}= \frac{Y_{u,i}}{\max(p_{u,i},\chi^1)} \log \left(1+e^{-\omega(\hat{s}_{u,i}-\hat{s}_{u,j})}\right) \times \mathbb{I} \left(Z_{u,i}=1\right) \\ \qquad \qquad \qquad \ + \ \frac{Y_{u,i}}{\max(1-p_{u,i},\chi^0)} \log \left(1+e^{\omega(\hat{s}_{u,i}-\hat{s}_{u,j})}\right) \times \mathbb{I} \left(Z_{u,i}=0\right) \quad (8) \]

      \[ \hat{s_{u,i}} = f_s \left(u,i,\Theta_s\right), \quad \arg\min_{\Theta_{s}} \mathcal{L_\text{DLCE}} = \hat{\Theta_{s}} \ \rightarrow \ \hat{s_{u,i}} \]

      • \(\chi^1, \chi^0, \omega\): hyper-parameters
    • [This paper] Ground truth 대신 추정치 사용.

      • \(p_{u,i} \rightarrow \hat{p_{u,i}}\)
      • \(Z_{u,i} \rightarrow \hat{Z_{u,i}}\)

      \[ \mathcal{L}_{\text{PC-DLCE}}= \frac{\hat{Z_{u,i}}Y_{u,i}}{\max(\hat{p_{u,i}},\chi^1)} \log \left(1+e^{-\omega(\hat{s_{u,i}}-\hat{s_{u,j}})}\right) + \frac{(1-\hat{Z_{u,i}})Y_{u,i}}{\max(1-\hat{p_{u,i}},\chi^0)} \log \left(1+e^{\omega(\hat{s_{u,i}}-\hat{s_{u,j}})}\right) \quad \left(8'\right) \]

      \[ \hat{s_{u,i}} = f_s \left(u,i,\Theta_s\right), \quad \arg\min_{\Theta_{s}} \mathcal{L_\text{PC-DLCE}} = \hat{\Theta_{s}} \ \rightarrow \ \hat{s_{u,i}} \]

      • PC: PROPCARE

4.5. Theoretical property

  • 기존 Causal Effect의 IPS estimator: Unbiased Estimator.

    \[ \hat{\tau_{u,i}}^{IPS}=\frac{Z_{u,i} Y_{u,i}}{p_{u,i}} - \frac{\left(1-Z_{u,i}\right) Y_{u,i}}{1-p_{u,i}} \text{: Unbiased Estimator} \quad (1) \]

  • [This paper] Ground truth 대신 추정치 사용. → IPS estimator: Biased Estimator.

    • \(p_{u,i} \rightarrow \hat{p_{u,i}}\)
    • \(Z_{u,i} \rightarrow \hat{Z_{u,i}}\)

    \[ \hat{\tau_{u,i}}^{PC-IPS}=\frac{\hat{Z_{u,i}} Y_{u,i}}{\hat{p_{u,i}}} - \frac{\left(1-\hat{Z_{u,i}}\right) Y_{u,i}}{1-\hat{p_{u,i}}} \text{: Biased Estimator} \quad (1') \]

Proposition 1

\[ \text{bias}\left(\hat{\tau_{u,i}}^{PC-IPS}\right)= \left( \frac{p_{u,i}+\mathbb{E}\left[ \hat{Z_{u,i}}-Z_{u,i} \right]}{\hat{p_{u,i}}} -1 \right) Y_{u,i}^1 - \left( \frac{1-p_{u,i}-\mathbb{E}\left[ \hat{Z_{u,i}}-Z_{u,i} \right]}{\hat{1-p_{u,i}}} -1 \right) Y_{u,i}^0 \ \quad (9) \]

Remark 1

\(\text{bias}\left(\hat{\tau_{u,i}}^{PC-IPS}\right)\)의 major factors:

\[ \frac{p_{u,i}}{\hat{p_{u,i}}}, \ \frac{1-p_{u,i}}{1-\hat{p_{u,i}}}, \ \mathbb{E}\left[ \hat{Z_{u,i}}-Z_{u,i} \right] \quad \ \rightarrow \quad \left( \hat{p_{u,i}}=p_{u,i}, \ \hat{Z_{u,i}}=Z_{u,i} \ \Rightarrow \text{bias}\left(\hat{\tau_{u,i}}^{PC-IPS}\right)=0\right) \]

Remark 2

\[ \hat{p_{u,i}} \approx \text{0 or 1} \ \Rightarrow \ \text{bias}\left(\hat{\tau_{u,i}}^{PC-IPS}\right) \approx \pm \infty \]

  • Exposure variable: \(Z_{u,i} \in \{0, 1\}\) is Binary variable. → F1 score같은 binary classification metrics 사용.
    • \(\mathbb{E}\left[ \hat{Z_{u,i}}-Z_{u,i} \right] \rightarrow\text{bias}\left(\hat{\tau_{u,i}}^{PC-IPS}\right)\)이므로 \(\hat{Z_{u,i}}\)를 잘 추정할수록 bias가 작아짐.
  • Propensity: \(p_{u,i} := P\left(Z_{u,i}=1\right)\) is Continuous variable. → KLD, Kendall’s Tau같은 metrics 사용. [Section 5.2]
    • Remark 2에서 \(\hat{p_{u,i}} \not\approx \text{0 or 1}\)이어야 함. → Eq. (7)처럼 Regularization 사용하는 것이 좋다.

5. Experiment

  • PROPCARE가 quantitative & qualitative experiments 모두에서 효과적임을 보인다.

5.1. Experiment setup

Datasets

  • 3가지 standard Causality-based Recommendation Benchmarks: DH_original, DH_personalized, MovieLens 100K (ML 100K)
  • DH_original, DH_personalized \(\in\) DunnHumby dataset
    • purchase and ptomotion logs @ 오프라인 소매점, 93주 기간동안.
    • DH_original: 주간 전단지 → Exposure → ground-truth Propensity Scores
    • DH_personalized: Simulation → ground-truth Propensity Scores
  • ML 100K
    • Users’ Rating on movies
    • Simulated Propensity Scores ← ratings & user behaviors
  • PROPCARE: ground-truth propensity scores → Model Output Evaluation에만 사용.
    • Note: training 단계에서 ground-truth values 사용 ❌
  • Datasets → training/validation/test sets
    • Statistics: number or average values of Key variables [User, Item, Observed Interaction, Exposure, Causal Effect, Propensity]

    image.png

Baselines

  • PROPCARE vs Baselines [other methods]
  • Propensity Estimators
    • Ground-truth values: propensity \(p_{u,i}\), exposure \(Z_{u,i}\) → input of DLCE on training
      1. Ground-truth: datasets → Propensity Score & Exposure values
    • Estimate propensity \(\hat{p_{u,i}}\) → Derive exposire \(\hat{Z_{u,i}}\) → input of DLCE on training
      1. Random: Propensity Scores \(\in \left(0, 1\right)\) randomly
      2. Item Popularity (POP): Propensity Scores = Normalization of POP \(\in \left(0, 1\right)\)
      3. CJBPR: Propensity → Relevance → Propensity → Relevance → … point-wise optimization
      4. EM: Expectation-Maximization algorithm → Propensity Scores point-wise learning

Parameter settings

  • Validation data → Tuning hyper-parameters
    • PROPCARE: Use the trade-off hyper-parameters as
      • \(\lambda = 10\)
      • \(\mu=0.4\)
    • Other settings: Appendix C.2.

Evaluation metrics

  • Performance of Causality-based Recommendation → Evaluation metrics [Appendix C.3.]
    1. CP@10, CP@100: Causal effect-based Precision (CP)
    2. CDCG: Causal effect-based Discounted Cumulative Gain (CDCG)
    \[ \text{CP@K} = \frac{1}{U} \sum_{u=1}^{U} \sum_{i=1}^{I} \frac{\mathbf{1}(\text{rank}_u(\hat{s}_{u,i}) \le K)\tau_{u,i}}{K} \quad (a.11) \\ \text{CDCG} = \frac{1}{U} \sum_{u=1}^{U} \sum_{i=1}^{I} \frac{\tau_{u,i}}{\log_2 (1 + \text{rank}_u(\hat{s}_{u,i}))} \quad (a.12) \]

5.2. Results and discussions

  • PROPCARE > Baselines: additional experiments in [ Appendix D. ]

Performance comparison

  • PROPCARE > Baselines in two aspects
    1. The downstream causality-based recommendation using the estimated propensity and exposure
    2. The accuracy of the estimated propensity and exposure

image.png
  • Performance of Causality-based Recommendation [Evaluation metrics 비교.]
  • Ground-truth: performance 가장 좋다. [Evaluation metrics 가장 큼.]
    • 실제 propensity, exposure values를 DLCE에 그대로 사용하기 때문.
      • → But real-world에서는 사용 불가.
    • PROPCARE: 가장 Ground-truth 값에 가까움. 특히 DH_personalized에서는 큰 차이 ❌
    • PROPCARE > CJBPR, EM
      • → Pairwise method by Assumption 1.이 좋다.

image.png
  • Propensity, Exposure Estimation Accuracy
  • POP: Baselines 중에서, Kendall’s Tau 기준 가장 좋다.
    • But Table 2를 보면 POP의 causality metrics는 좋지 못함.
      • KLD 값이 큰 것으로 보아, propensity score의 distribution 예측이 잘 되지 않았기 때문. → ill-fit propensity distribution.
    • F1 score가 작음 → Exposure estimation도 잘 되지 못하였음.
  • PROPCARE: F1 score, KLD에서 효과가 좋음 & Table 2에서 causality metrics도 좋은 성능을 의미.
    • Tau scores가 다른 baselines보다 약간 나쁘지만, 나머지 두 지표가 좋음.
      • → Propensity Score, Exposure 둘 모두 estimation 잘됨. → Causal Performance 좋음.
  • Causality-based Recommendation: multiple factors에 의하여 영향을 받음. → influencing factors가 무엇이 있는가? [5.2. 마지막 단락]

Ablation study

\[ \mathcal{L}_{\text{pop}} = -\kappa_{u,i,j} \log \left[ \sigma(\text{sgn}_{i,j} \cdot (f_p(\mathbf{x}_{u,i}) - f_p(\mathbf{x}_{u,j}))) + \sigma(\text{sgn}_{i,j} \cdot (f_r(\mathbf{x}_{u,j}) - f_r(\mathbf{x}_{u,i}))) \right] \quad (6) \]

  • Derive 5 variants

    1. NO_P: Removing the constraint on estimated \(\hat{p_{u,i}}\) by deleting the term with \(f_p(\mathbf{x}_{u,i}) − f_p(\mathbf{x}_{u,j})\)
    2. NO_R: Removing the constraint on estimated \(\hat{r_{u,i}}\) by deleting the term with \(f_r(\mathbf{x}_{u,j}) − f_r(\mathbf{x}_{u,i})\)
    3. NO_P_R: Removing \(\mathcal{L}_{\text{pop}}\) entirely from the overall loss \(\mathcal{L}_{\text{total}}\) to eliminate Assumption 1 altogether
    4. NEG: Reversing Assumption 1 by replacing \(\text{Sgn}_{i,j}\) with \(-\text{Sgn}_{i,j}\) to assume that more popular items have smaller propensity scores
      • Removing the condition \(\left( pop_i > pop_j\Leftrightarrow \ p_{u,i} > p_{u,j} \right )\)
    5. \(\kappa=1\): Setting all \(\kappa_{u,i,j} = 1\) → equal weighting of all training triplets.
      • Removing the condition \(y_{u,i} \approx y_{u,j}\)

    image.png
    • x-axis: dataset
    • y-axis: performance
    • PROPCARE: best performance
    • NEG: worst performance → Assumption 1 is most important.

Effect of regularization

\[ \mathcal{L_\text{total}} = \sum_{u,i,j} \left(\mathcal{L}_{\text{naive}} + \lambda \cdot\mathcal{L}_{\text{pop}}\right) + \mu \cdot \text{KLD}\left(Q \| \text{Beta}(\alpha, \beta)\right) \quad (7) \\ \arg\min_{\Theta_{\text{total}}} \mathcal{L_\text{total}} = \hat{\Theta_{\text{total}}} \ \rightarrow \ \hat{p_{u,i}}, \hat{r_{u,i}} \]

  • Regularization parameter: \(\mu\)

    image.png
    • \(\mu \approx 0 \ \Rightarrow \ \text{performance CDCG} \downarrow\)
    • \(\mu \uparrow \ \Rightarrow \ \text{performance CDCG} \uparrow, \quad \mu_{\text{peak}} \in \left[0.2, 0.8\right]\)

Factors influencing causality-based recommendation

  • 방법 1: ground-truth propensity or exposure values에 Noises Injection [각각 (b), (a)]

    image.png
      1. Ground-truth propensity scores \(p_{u,i}\)를 DLCE training에 사용하면서, \(Z_{u,i}\)\(0↔1\)로 일부분 randomly flip. [\(Z_{u,i}\) 오염.]
      • x-axis: Flip ratio

      • y-axis: CDCG performance

      • 오염 비중이 커질 수록 성능 급격히 하락.

        • → Causality-based Recommendation: Exposure의 Estimation에 매우 민감.
      1. Ground-truth exposure \(Z_{u,i}\)를 DLCE training에 사용하면서, Add Gaussian Noises to the propensity scores. [\(p_{u,i}\) 오염.]
      • x-axis: Variance of Noises
      • y-axis: CDCG performance
      • 오염 비중이 커질 수록 성능 적당히 하락.
        • → Causality-based Recommendation: Propensity Score의 Estimation에 적당히 민감.
  • 방법 2: Correlation betw. Estimation Accuracy & Recommendation Performance

    • Dataset: Only DH_original dataset [3개의 dataset 중 그냥 하나 뽑은 듯함.]

      image.png
      • x-axis: Estimation Accuracy
        • KLD, Kendall’s Tau: Propensity Scores
        • F1 score: Exposure
      • y-axis: CDCG performance [Recommendation Performance]

5.3. Case study

  • PROPCARE: Ranking-based Recommendation

    image.png
    • Top-5 Recommend items [User ID 2308, DH_personalized dataset]
    1. Ground-truth: DLCE가 ranking list를 효과적으로 생성하였음.
      • Most items가 Positive Causal Effect.
        • \(\tau_{u,i} := Y_{u,i}^1-Y_{u,i}^0 = 1\)
        • Recommending item i to user u ⇒ user-item interaction[click or purchase] 증가.
      • All items in positive causal effect: 모두 purchased 되었음. → Goal of Causality-based Recommendation 성공.
    2. CJBPR, PROPCARE: Purchased items의 Causal Effects는 각기 다르다.
      • CJBPR - strawberries: causal effect ❌
        • Recommending or not ⇒ user-item interaction[Purchasing]에 영향을 끼칠 수 없음.
      • PROPCARE - infant soy: Positive causal effect
        • Recommending item i to user u ⇒ user-item interaction[click or purchase] 증가.
    3. POP: Negative Causal Effect를 가진 item(tortilla chips)도 recommend를 한다.
      • → POP: 좋은 method가 아님.

6. Conclusion

  • PROPCARE: w/o ground-truth of propensity and exposure data
  • Observation of (propensity scores, item popularity) → Key Assumption → Prior Information → Causality-based RS
  • Factors for bias in estimated causal effects
  • Empirical studies: PROPCARE > Baselines [other methods]
  • Future research suggestion:
    1. Direct exposure estimation w/o propensity scores [i.e, w/o propensity estimation]
    2. Parametric causal effect estimators [IPS estimator: Non-parametric approach]