Full code repository is available here.
I aim to uncover the changes in sentiment (positive or negative) and bias (warm and competence stereotypes) expressed towards the implicated groups (e.g, Israeli and/or Jewish people and Arabic and/or Muslim people) by bipartisan, political influencers on X one year before and after the 2023 Israel-Hamas Conflict.
To fast forward the results section, please click here.
<aside> 💡
BACKGROUND
To understand how stereotypes emerge and evolve, Fiske et al. (2002) proposed the Stereotype Content Model (SCM), which posits that all stereotypes convey 2 primary dimensions - warmth and competence.
Consistent with the early works of SCM, Jewish, Turkish, Arabic, and Middle Eastern people are regarded as having low warmth (LW) and low competence (LC) compared to white Americans, with the only exception of high competence (HC) attributed to Jewish people (Fiske et al., 2002; Lee & Fiske, 2006; Cuddy et al., 2007).
</aside>

Figure 1. Fiske et. al. (2002)’s findings of how Arabic and Jewish people and other social groups were perceived in American society.
The data collection process began with an automated web scraper for tweets (Selenium), and scraped data were exported into JSON files (BeautifulSoup). Then, raw HTML were parsed into csv files in the following format.
Note: Data presented below are only for presentation purposes.
| Poster | Date | Tweet | No of Likes | No of Retweets | No of Replies | No of Views |
|---|---|---|---|---|---|---|
| [account] | 1/27/24 18:53 | You dont mourn… | 2956 | 66 | 296 | 147823 |
| [account] | 1/28/24 14:51 | [Person]: Protesters calling… | 2977 | 5325 | 1863 | 10771519 |
| [account] | 2/5/24 13:13 | Ive yet to hear [person] say… | 6595 | 1518 | 550 | 230148 |
| [account] | 2/6/24 16:06 | Over 6 million ppl have been… | 1817 | 670 | 46 | 46360 |
All eligible accounts (accounts with relevant tweets for both studied periods) are anonymized with a two-digit subject ID, which are recorded in Subject Ledger.csv.
random_ids = random.sample(range(10, 99), len(df))
df["Subject ID"] = random_ids
#Use to create new ledger file
df_nametoID = df[["Name", "Subject ID"]]
df_nametoID.to_csv('Supplementary Materials/Subject Ledger.csv', index=False)
All analyzed tweets are anonymized with a four-digit tweet ID, which are recorded in Tweets Ledger.csv.
tweets_random_ids = random.sample(range(1000, 9999),
len(df_noName_A) + len(df_noName_B))
# Add a new column to indicate the source of the corpus
df_tweet_before["Source"] = "Before"
df_tweet_after["Source"] = "After"
# Concatenate the DataFrames
df_tweet_ledger = pd.concat([df_tweet_before, df_tweet_after], ignore_index=True)
df_tweet_ledger.to_csv('Supplementary Materials/Tweets Ledger.csv', index=False)
An example use for the ledger file with pandas map function.
# Load in existing Ledger file
ledger = pd.read_csv("Supplementary Materials/Subject Ledger.csv")
ledger_records = ledger.to_dict(orient='records')
ledger_dict = {}
for l in ledger_records:
ledger_dict[l['Name']] = l['Subject ID']
df["Subject ID"] = df["Name"].map(ledger_dict)