Moderating Tamil Content on Social Media

Graphic for CDT Research report, entitled “Moderating Tamil Content on Social Media.” Illustration of a hand, with a variety of golden rings and bracelets on their wrist and fingers, seen pinching / holding on to a blue speech bubble with three dots indicating that someone is contemplating expressing themselves. A deep green background with a kolam pattern.

Tamil is a language with a long history. Spoken by over 80 million people worldwide, or over 1% of the world’s population, early inscriptions in the language date back to the 5th Century B.C.E (Murugan & Visalakshi , 2024). The language is spoken widely in India (predominantly in Tamil Nadu and Puducherry), in Sri Lanka, and across diaspora communities in Malaysia, Thailand, Canada, the United Kingdom, the United States, and beyond. Despite the widespread use of the language, there remains limited understanding of how major social media platforms moderate content in Tamil. This report examines the online experiences of Tamil users and explores the challenges of applying consistent content moderation processes for this language.

This report is part of a series that examines content moderation within low-resource and indigenous languages in the Global South. Low-resource languages are languages in which sufficient high-quality data is not available to train models, making it difficult to develop robust content moderation systems, particularly automated systems (Nicholas & Bhatia, 2023). In previous case studies conducted in the series, we found that this lack of high-quality and native datasets impeded effective and accurate moderation of Maghrebi Arabic and Kiswahili content (Elswah, 2024a; Elswah, 2024b). Inconsistent and inaccurate content moderation results in lower trust among users in the Global South, and limits their ability to express themselves freely and access information.

This report dives into Tamil speakers’ experiences on the web, particularly on popular social media platforms and online forums run by Western and Indian companies. We highlight the impact of Tamil speakers’ perception of poor content moderation, particularly against a backdrop of democratic backsliding and growing repression of speech and civic participation in India and Sri Lanka (Vesteinsson, 2024; Nadaradjane, 2022). Ultimately, what emerges in this case study is a fragmented information environment where Tamil speakers perceive over-moderation while simultaneously encountering under-moderated feeds full of hate speech.

We used a mixed-method approach, which included an online survey of 147 frequent social media users in India and Sri Lanka; 17 in-depth interviews with content moderators, content creators, platforms’ Trust & Safety representatives, and digital rights advocates; and a roundtable discussion with Tamil machine learning and data experts. The methods are detailed in the report’s appendix.

Based on these methods, we found that:

1. Tamil speakers use a range of Western-based social media platforms and Indian platforms. Our survey indicates that Western social media platforms are more popular among Tamil speakers, while local TikTok alternatives are gaining popularity due to India’s TikTok ban. Online, Tamil speakers use tactics to circumvent content moderation, employing “algospeak” or computer-mediated communication, and, at other times, code-mixed and transliterated Tamil using Latin script for ease and convenience. These tactics complicate moderation.

2. Tech companies pursue various approaches to moderate Tamil content online, but mostly adhere to global or localized approaches. The global approach employs the same policies for all users worldwide, and relies on moderators and policy members who are not hired based on linguistic or regional expertise. Moderators are assigned content from across the world. In contrast, the local approach tailors some policies to meet Tamil language-specific guidance, and relies on more Tamil speakers to moderate content. Some Indian companies employ a hybrid approach, often making occasional localized adjustments for Tamil speakers.

3. Tamil speakers, like others, routinely face inconsistent moderation, which they attribute to the fact that their primary language is not English. On the one hand, they encounter what they believe are under-moderated information environments, full of targeted abuse in Tamil. On the other hand, they encounter what they suspect is unfair over-moderation targeting Tamil speech in particular.

4. A majority of survey respondents are concerned about politically-motivated moderation and believe that content removals and restrictions are used to silence their voices online, particularly when they speak about politics. A few users also suspect that they experience “shadowbanning,” or a range of opaque, undisclosed moderation decisions by platforms, particularly when they use certain words or symbols commonly used by or associated with the Tamil community.

5. Despite a vibrant Tamil computing community, investment in automated moderation in Tamil still falls significantly short due to a lack of accessible resources, will, and financial constraints for smaller social media companies.

Read the full report.

This post was originally published on this site be sure to check out more of their content

Moderating Tamil Content on Social Media

Subscribe Now