Full code repository is available here.

PURPOSE

I aim to uncover the changes in sentiment (positive or negative) and bias (warm and competence stereotypes) expressed towards the implicated groups (e.g, Israeli and/or Jewish people and Arabic and/or Muslim people) by bipartisan, political influencers on X one year before and after the 2023 Israel-Hamas Conflict.

To fast forward the results section, please click here.

<aside> 💡

BACKGROUND

To understand how stereotypes emerge and evolve, Fiske et al. (2002) proposed the Stereotype Content Model (SCM), which posits that all stereotypes convey 2 primary dimensions - warmth and competence.

Consistent with the early works of SCM, Jewish, Turkish, Arabic, and Middle Eastern people are regarded as having low warmth (LW) and low competence (LC) compared to white Americans, with the only exception of high competence (HC) attributed to Jewish people (Fiske et al., 2002; Lee & Fiske, 2006; Cuddy et al., 2007).

</aside>

Figure 1. Fiske et. al. (2002)’s findings of how Arabic and Jewish people and other social groups were perceived in American society.

Figure 1. Fiske et. al. (2002)’s findings of how Arabic and Jewish people and other social groups were perceived in American society.

TOOLS AND TECHNIQUES

  1. Web scraping: Selenium, Beautiful Soup
  2. Data cleaning/manipulation: Pandas
  3. Stereotype Dictionary-based Text Analysis: SADCAT (WordNet; Nicolas et al., 2019), LIWC (Cohn et al., 2004)
  4. Natural Language Processing: word embeddings analysis (BERT and word2vec models), WEAT (Caliskan et al., 2017; Charlesworth et al., 2021)
  5. Statistical Modeling/Visualization: R (lmer, tidyverse, matplotlib, ggplot)

DATA CLEANING & PREPROCESSING

  1. Raw Data

The data collection process began with an automated web scraper for tweets (Selenium), and scraped data were exported into JSON files (BeautifulSoup). Then, raw HTML were parsed into csv files in the following format.

Note: Data presented below are only for presentation purposes.

Poster Date Tweet No of Likes No of Retweets No of Replies No of Views
[account] 1/27/24 18:53 You dont mourn… 2956 66 296 147823
[account] 1/28/24 14:51 [Person]: Protesters calling… 2977 5325 1863 10771519
[account] 2/5/24 13:13 Ive yet to hear [person] say… 6595 1518 550 230148
[account] 2/6/24 16:06 Over 6 million ppl have been… 1817 670 46 46360
  1. Subject and Tweets Anonymization

All eligible accounts (accounts with relevant tweets for both studied periods) are anonymized with a two-digit subject ID, which are recorded in Subject Ledger.csv.

random_ids = random.sample(range(10, 99), len(df)) 

df["Subject ID"] = random_ids

#Use to create new ledger file
df_nametoID = df[["Name", "Subject ID"]]
df_nametoID.to_csv('Supplementary Materials/Subject Ledger.csv', index=False) 

All analyzed tweets are anonymized with a four-digit tweet ID, which are recorded in Tweets Ledger.csv.

tweets_random_ids = random.sample(range(1000, 9999), 
														len(df_noName_A) + len(df_noName_B))

# Add a new column to indicate the source of the corpus 
df_tweet_before["Source"] = "Before" 
df_tweet_after["Source"] = "After" 

# Concatenate the DataFrames 
df_tweet_ledger = pd.concat([df_tweet_before, df_tweet_after], ignore_index=True)
df_tweet_ledger.to_csv('Supplementary Materials/Tweets Ledger.csv', index=False)

An example use for the ledger file with pandas map function.

# Load in existing Ledger file
ledger = pd.read_csv("Supplementary Materials/Subject Ledger.csv") 
ledger_records = ledger.to_dict(orient='records')

ledger_dict = {}

for l in ledger_records:
    ledger_dict[l['Name']] = l['Subject ID']

df["Subject ID"] = df["Name"].map(ledger_dict)