A Psycholinguistic Analysis Of Changes In Stereotypes And Hate Speech Associated With Muslim And/Or Arab Individuals And Jewish And/Or Israel Individuals In Political Discourse On “X” In The Year Before And After October 7th

Full code repository is available here.

PURPOSE

I aim to uncover the changes in sentiment (positive or negative) and bias (warm and competence stereotypes) expressed towards the implicated groups (e.g, Israeli and/or Jewish people and Arabic and/or Muslim people) by bipartisan, political influencers on X one year before and after the 2023 Israel-Hamas Conflict.

To fast forward the results section, please click here.

<aside> 💡

BACKGROUND

To understand how stereotypes emerge and evolve, Fiske et al. (2002) proposed the Stereotype Content Model (SCM), which posits that all stereotypes convey 2 primary dimensions - warmth and competence.

Warmth dimension attributes sociability and morality.
Competence dimension attributes intelligence, independence, and capability.

Consistent with the early works of SCM, Jewish, Turkish, Arabic, and Middle Eastern people are regarded as having low warmth (LW) and low competence (LC) compared to white Americans, with the only exception of high competence (HC) attributed to Jewish people (Fiske et al., 2002; Lee & Fiske, 2006; Cuddy et al., 2007).

</aside>

Figure 1. Fiske et. al. (2002)’s findings of how Arabic and Jewish people and other social groups were perceived in American society.

Figure 1. Fiske et. al. (2002)’s findings of how Arabic and Jewish people and other social groups were perceived in American society.

TOOLS AND TECHNIQUES

Web scraping: Selenium, Beautiful Soup
Data cleaning/manipulation: Pandas
Stereotype Dictionary-based Text Analysis: SADCAT (WordNet; Nicolas et al., 2019), LIWC (Cohn et al., 2004)
Natural Language Processing: word embeddings analysis (BERT and word2vec models), WEAT (Caliskan et al., 2017; Charlesworth et al., 2021)
Statistical Modeling/Visualization: R (lmer, tidyverse, matplotlib, ggplot)

DATA CLEANING & PREPROCESSING

Raw Data

The data collection process began with an automated web scraper for tweets (Selenium), and scraped data were exported into JSON files (BeautifulSoup). Then, raw HTML were parsed into csv files in the following format.

Note: Data presented below are only for presentation purposes.

Poster	Date	Tweet	No of Likes	No of Retweets	No of Replies	No of Views
[account]	1/27/24 18:53	You dont mourn…	2956	66	296	147823
[account]	1/28/24 14:51	[Person]: Protesters calling…	2977	5325	1863	10771519
[account]	2/5/24 13:13	Ive yet to hear [person] say…	6595	1518	550	230148
[account]	2/6/24 16:06	Over 6 million ppl have been…	1817	670	46	46360

Subject and Tweets Anonymization

All eligible accounts (accounts with relevant tweets for both studied periods) are anonymized with a two-digit subject ID, which are recorded in Subject Ledger.csv.

random_ids = random.sample(range(10, 99), len(df)) 

df["Subject ID"] = random_ids

#Use to create new ledger file
df_nametoID = df[["Name", "Subject ID"]]
df_nametoID.to_csv('Supplementary Materials/Subject Ledger.csv', index=False)

All analyzed tweets are anonymized with a four-digit tweet ID, which are recorded in Tweets Ledger.csv.

tweets_random_ids = random.sample(range(1000, 9999), 
														len(df_noName_A) + len(df_noName_B))

# Add a new column to indicate the source of the corpus 
df_tweet_before["Source"] = "Before" 
df_tweet_after["Source"] = "After" 

# Concatenate the DataFrames 
df_tweet_ledger = pd.concat([df_tweet_before, df_tweet_after], ignore_index=True)
df_tweet_ledger.to_csv('Supplementary Materials/Tweets Ledger.csv', index=False)

An example use for the ledger file with pandas map function.

# Load in existing Ledger file
ledger = pd.read_csv("Supplementary Materials/Subject Ledger.csv") 
ledger_records = ledger.to_dict(orient='records')

ledger_dict = {}

for l in ledger_records:
    ledger_dict[l['Name']] = l['Subject ID']

df["Subject ID"] = df["Name"].map(ledger_dict)