Previous research in digital censorship focused by and large on studying censorship of applications and networks that are heavily controlled by oppressive governments such as China. My research goal is to broaden these studies beyond censorship in autocracies and include western social media, with an explicit focus on Turkish censorship of Twitter, while showing evidence of how aggregating users’ data from pubic APIs can lead to privacy leaks of users’ political affiliations.
In this research, we made fundamental contributions in the areas of censorship and privacy. We conducted large scale measurements of Twitter in Turkey, and introduced an approach to systematically label censored Twitter posts while showing that it is possible to find more censored tweets than those published by Twitter. We show a simple way to bypass Twitter censorship inside of the censoring country without using a proxy/VPN, which can be used by ordinary users. Our framework can be applied to any geographical boundaries or targeted users’ groups. We propose a novel set of rules to construct a data flow graph of censored tweets that unmasks influential users and their community association that results in revealing their political and social affiliation. This aggregate analysis of publicly reachable tweets is particularly critical to users’ privacy if unmasked by oppressive governments, or malicious persons who can thereafter leverage these users to spread malicious content. Using standard machine learning and NLP algorithms for topic clustering, we show that the dynamics of censorship in democratic countries, such as Turkey, are different than those in dictatorial regimes that are targeting collective action discussions. Our results show that the censors in Turkey target topics that could negatively impact the outcome of an election and the ruling party’s (AKP) political interest. Expanding this work to examine the impact of the failed Turkish coup 2016 on censorship, we identified a new censored topic that is also deemed adversarial to AKP. The overwhelming majority of the censored tweets pre-coup, from 2014 to 2015, are on government corruption and Kurdish/terrorism issues, with some pro-government censored tweets, because current Turkish law enables individuals to pursue due process against defamatory posts. On the other hand, post-coup results show that in additional to the previously identified censored Kurdish topics, Gülen-movement related topics are also heavily censored, a movement the government claims to be the mastermind behind the coup. More notably, this empirical comparison study reveals evidence of 72% decline in the size of censored tweets and 40% decrease of overall streamed tweets post-coup, which is likely attributed to self-censorship. Active users’ footprint in our data show that 22% of users self-censored their accounts by switching to “protected” mode, and 5% by deleting their accounts attributing to both declines. Unlike activists who regularly post political content and are likely to be censored entirely, we found that self-censoring users who switched their privacy setting or deleted their accounts are average users who tweeted neutral posts and few political during the coup.