With all the discussion going on recently, and this post providing some data, I was inspired to try to help with an objective approach to try to measure the toxicity in the sub. This analysis is meant to be independent and not to promote any particular view or what to do and not to do, just to provide data to have meaningful discussion, without having to solely rely on opinions and perceptions (TL;DR at the end).

The approach uses a neural network (BERT) more specifically) to make predictions if a piece of text is toxic or not. BERT is a NN architecture that revolutionized natural language processing and reached state of the art in most of the NLP benchmarks when it was released (2018). Since then, the transformer architecture has been applied extensively in many NLP tasks.

The NN I used has been trained in a kaggle competition about predicting toxicity, where the organizers provided a dataset where they labeled a lot of texts as toxic or not. Because of this, the neural network is trained to predict what the dataset creators considered toxic comments, which is still subjective, but its nonetheless provides a consistent metric against all samples that we will analyse, and we can use it to make comparisons and view trends. They defined toxicity as "anything rude, disrespectful or otherwise likely to make someone leave a discussion".

The code I used and the neural network weights I got from this github repo (Sentiment Analysis on the comments of Pewdiepie videos). My adapted code and all the data used to generate the charts can be found on this repo (credit goes to the original author). You can investigate all the details in the repository, including what posts the NN has classified as toxic or not (in the csv files), and verify the results for yourself or producing new analysis.

The neural network is trained to predict if a piece of text is toxic or not. Given a text, it will output a value between 0 and 1 that we can interpret as the probability that the NN thinks the text is toxic. If the value is greater than 0.5, we consider the text as toxic. This value is called "Toxic Score" here for simplicity.

First thing that is being debated recently is if the sub is getting more toxic in recent times. Here is the data I got from the experiment:

The X axis is the date and Y axis is the average toxic score for all threads created in that day. Blue line is the average over a week and pink line is the average over a month. Yellow vertical lines mark the release date of a new league.

I put the averages instead of raw daily data to help see trends, because daily values are too chaotic. First thing we can notice, as expected, is that toxicity goes way down right before a league, and then spike in the first weeks after release, and then it gradually goes down again (on average). We have the lower lows right before every league release and the higher heights right after. It's a repeating pattern that follows the league release cycle. For those that have been here for some time this is nothing new.

One particular interesting thing to notice though, is that Metamorph, Delirium and Harvest leagues appear to have had lower lows before league start and higher heights after league start. I interpret this as the sub getting more polarized in these last 3 leagues, and what I mean is that we see very low toxicity in the weeks before release (lower than in the past recent years), following higher spikes in toxicity (higher in magnitude and duration) and then sudden drops. But give your thoughts and interpretation in the comments. On average there doesn't seem to be a clear trend in constant increasing toxicity over time, but more pronounced spikes and taking longer to go down, but going down more also. I will leave the cause of this to be discussed.

Also interesting to notice is how Harbinger and Incursion leagues have the lowest spikes in toxicity over the course of the league, and were generally well received by the community (also were simple leagues, just kill mobs, is harder to upset a lot of players with that). Also Talisman was an early league where we saw some high spikes (I didn't play it, but people say it was received badly). Breach league which is a favorite of many, receive its share of toxic posts right after league start, but it seems it dropped faster. IIRC breach league as a difficult league for new players at the begining, as in you opened a breach and you just died swarmed by mobs (also there was a bug that mobs were invulnerable at the edge). Also betrayal league toxic score just didn't seem to go down all league, probably due to the performance problems of non SSD users related to that league content.

The other thing I wanted to do is to compare PoE sub with other gaming subs, with some notoriously known for being toxic:

Average Toxic Score (past 7 months)

Toxic Scores for gaming subs over time in the past 7 months

Compared to other gaming subs, PoE appears to be in a good spot so far (on average), but higher than other ARPG competitors like D3 and Wolcen. On Delirium league release though, the sub toxicity score spiked to be #1 for a short period of time, even surpassing Dota2, the highest toxic sub on average. In the harvest league release, the sub seems to be less toxic so far (we still have limited data).

Other interesting thing to notice is that Valorant sub started the lowest in toxicity and after the game released the toxicity grew over time, surpassing PoE recently. Also it appears to be (as expected?) that PvP games are higher in toxicity than PvE games (with the exception of Warframe?). There are more interesting things I would like to discuss but will do so in the comments to not make this post longer and go off topic.


  • Things that were to be expected:
    • The sub gets way less toxic right before league release when everyone is exited, and more toxic right after every league release, with no exception, since 2015 (and even since 2013, but I did not include it in the chart because the data has higher variance because of the lower post count at the time);
    • Harbinger and Incursion were the most well received leagues in the sub in terms of low toxicity;
    • The PoE sub is less toxic than notoriously toxic subs (especially PvP games);
  • Unexpected things / My interpretations:
    • The sub doesn't appear to be getting more toxic over time on average;
    • The sub got more polarized since Delirium league (by polarized I mean very low toxicity right before release and then higher spikes with increased durations). I will leave the causes of this to be discussed;
    • Despite the discussions about toxicity, harvest have low scores of toxicity so far (the league is still going so we might only see the effects after). I think what is promptly discussions recently are the effects of the higher contrast between the sub pre league and pos league release (especially Delirium), in a very short amount of time, and the release of the new stash tabs. I actually expected to see higher spike because of new stash tabs (and also in the case of when the salvage box was released), but the discussions didn't seem to be much more toxic than average from the data. Also human fatigue might be a factor if you are here riding the constant wave changes in toxicity for a long time it can be exhausting;
    • Betrayal league score just didn't seem to go down, my interpretation is probably because its content affected non SSD users and took time to fix. Also because of this, at the time, there was also another discussion by the mod team with the community after syntesis league about being more strict for the content that is considered toxic in the sub. The solution is still unclear to me and difficult, but will give my personal notice at the end.
Limitations of this analysis:

  • The text analysis is limited to the first 128 characters (this is due to the neural network architecture and also computational constraints). If the text don't display toxicity in the first 128 characters, the sample will be classified as not toxic, and vice versa. I don't think this is a problem because it averages out. But you can teak the analysis and see if you get different results with the code I provided.
  • Investigating what the NN classify as toxic, you will notice misclassifications, eg: "Look at this fucking sick item I made" tends to be given a higher toxic score because of the swearing. But then again I think this averages out, because we are only looking for trends over time and comparisons between subs, not interested in the perfect accuracy score.
  • The NN can't capture toxicity in images (memes)

* (Blue asterisk on the first chart): Investigated what happened on this spike (date is 2019-10-02) and it was a spam in threads creation, apparently from bots, related to porn websites.

Personal notice:

I tried to leave my personal thoughts on this out of the analysis, but because I think I have a good suggestion to the problem I will explain my point of view and give the suggestion at the end. If you are not interested, just skip it. This is a very difficult problem to solve with no clear solution. On the light of the mods removing posts that they consider to be toxic, there is obvious concerns, which I agree, about "censoring" valid complaints about the game. This is a very difficult thing to balance. I don't want things to be censored, but I also don't want GGG to stop communicating with us in this sub.

In the 2019 discussion prompted my the mods at the time, that was much similar to this one, I was against removing toxic posts or comments and letting people do their job and downvote stuff. This thread specifically I think captures the essence of problem (mods are human and have subjective views on what is toxic) and this is me playing devil's advocate with him (even though I agree with his point) to show that there is no easy solution (would you rather have GGG stop communicating here?).

Since then I noticed that my proposed solution of letting people downvote toxic comments doesn't work every time (it works most of the time, but not on specific occasions). I've seen multiple poorly worded toxic comments get a lot of upvotes, because other people must sense the same frustration, so they upvote, regardless, even if they would not word it the same way. This is even worse at league start because people enjoying the game are not checking reddit posts to downvote stuff. The people that are mostly on reddit and not playing the game are the people that are not satisfied. So someone makes a toxic post or comment, and it gets heavily upvoted, because the people sorting by new or reading all comments are mostly people not satisfied with the league. So one would argue that we just shrug it off and get used to it, because we see the same patterns every time. But I think something should be done if GGG stopped communicating through the sub or hinted that it would decrease communication. I think this might be the case as indicated in the mod post, saying GGG employees contacted them.

GGG employees being humans like us, some stuff that gets posted can affect them personally, and we ought to have sympathy for them. Yes GGG is a company and they are in to make profit, as should they in a healthy way. But that is no excuse to treat the employees different than you would like to be treated (this is more because of anonymity). Some people are so invested in this game and don't have the clarity of mind to calm down in the moment that when they are frustrated they just leash out what they want to say wherever the way they want (unfortunately there is no way of fixing emotional control of people). But maybe there is a way to not let it escalate out of proportion?

So is there some middle ground? This is my suggestion, I don't know if its possible or practical, (the mods please answer): if there was a way to keep a log of all the posts and comments removed, the community could policy the mods (even if you need to develop something for this, ask the community, there are a lot of devs here). Because otherwise the community won't know what is getting removed and can't decide if the decisions to remove are being fair. Actually they can't be fair for everyone, because people have different definitions on what is toxic. But at least we can see what is getting removed and push back if we think valid criticism is being removed.

Also my suggestion for GGG: don't wait for things to blow up to address them, be very quick and assertive (I think this is getting better in recent leagues). In my opinion the community is more polarized in the past leagues because they discovered if a problem blows out of proportion, its more likely that it will get addressed ("make those angry reddit threads", as jokingly said by Chris and taken literally for some), so people just learned when they want something, if there is a lot of noise, it will get attention, but which often brings heavy emotions out of people.


