What issues do Indian Media talk about?
A data-driven approach using Times of India headlines data from 2001 to 2020.
Introduction
There has been a lot of talk about how news outlets are biased and publish specific stuff for propaganda.
Now I have a particular view about it and you might have another viewpoint. But let’s look at what data suggests regarding the same.

I will use the Times of India headlines data set available here. It has 3,297,172 headlines, along with headline category and published date. The dataset spans from 2001–01–01 to 2020–06–30.
Part I: News before and after coronavirus?
As we all know in 2020 news has been dominated by COVID-19 this is evident from the graph below. But I was interested in what was India reading about prior to Covid-19.
Below you can see what India was reading about in 2019 vs 2020.
Since the data set for 2020 ends in June 2020, the number of news articles for 2020 (87k+) is less than 2019 (170k+) and might not be representative.


We can see from the graph above in 2019 India has varied reading interests including Bollywood, astrology, politics, crime, and cricket!
Since 2019 was an election year in India a lot of the news has been dominated by “Lok sabha elections” and “PM Narendra Modi”. We can also see that in 2020 the news has been dominated by the coronavirus.
Part II: Sentiments about major political parties in India?
A lot of the news has been about politics in India. So I was interested in how much has been written about the two biggest political parties in India and what sentiments have been conveyed about them.
Yes, I am talking about BJP vs Congress.
In the entire data set BJP was mentioned 17,684 times and Congress was mentioned 10,404 times. BJP was mentioned 1.7 times more than Congress.
The below graph shows the aggregated sentiments for the news headlines regarding the two parties for the entire dataset.

You can see that BJP has been mentioned more in the news in all three sentiment categories — positive, negative, and neutral. Interestingly for both parties, positive news coverage has been more than negative news coverage.
Now let’s take a look at how these sentiments change year over year for both parties. The graphs below show these three sentiments for Congress and BJP from 2001 to 2020.



We can see the high degree of correlation for all the sentiments between the two parties. Also, BJP has a higher degree of positive and negative news than congress across the years. This tells us BJP does get more media coverage both positive and negative!
Part III: What the political parties are talking about in 2019?
As you might know, 2019 was an election year in India. So naturally, there was a lot of news about politics. Let us see what BJP and Congress have been talking about in 2019. BJP got covered in 1152 articles while Congress in 840 articles, no surprises here!


Obviously Both the parties are talking about the Lok Sabha elections in 2019. Farm loans seem to be a key issue in the 2019 elections (as it is in most elections in my opinion). Another interesting thing to note is that articles featuring congress mention Prime Minister Narendra Modi a lot while the Congress party leader Rahul Gandhi has not been mentioned in the top 10 trigrams for BJP. We also see the verb “win” in the context of BJP as BJP won the 2019 general elections in India.
Conclusion
In this article, we took a look at what India reads about using the Times of India news headlines dataset from 2001 to 2020.
- We found that prior to the COVID-19 taking over the news, India reads a lot about Politics, Cricket, Crime, and Bollywood. I was surprised to see Astrology as one of the top results as well.
- Then we looked at the sentiment analysis for the major political parties in India — BJP, and Congress. It is noteworthy to see that BJP is always in the news both for good and bad reasons.
- Additionally, we saw what both the political parties are talking about in the 2019 general elections. Congress seems to be mentioning Prime Minister Narendra Modi a lot. Also, farmers seem to be at the center of the 2019 elections, as they are at the moment in 2020.
The findings here are observational, not the result of a formal study. So the real question remains:
What issues do Indian Media talks about?
The code for the analysis is available on my GitHub here!
If you want me to answer specific questions based on this dataset, leave them in the comments below!