Analyzing OngoingWorlds posts
π
Jul 15, 2017
β 2 minutes
The previous post used Scrapy to extract post data from the website OngoingWorlds. Here are a few conclusions from that spider crawl:
I collected the game ID, post ID and date/time for each post from the play-by-email roleplaying community OngoingWorlds into an Sqlite3 database. Even with this very limited dataset, some interesting queries can be run:
Most popular games (by number of posts)
SELECT game_id, COUNT(*) FROM post GROUP BY game_id ORDER BY COUNT(*) DESC LIMIT 10;
Rank | Game | Total posts |
---|---|---|
1 | Blue Dwarf | 15040 |
2 | Hero High | 3453 |
3 | 2778 A.D. | 2894 |
4 | The Land of Ecilith | 2276 |
5 | The Avengers~Lower Levels | 2111 |
6 | Heroes Association | 1288 |
7 | Hunted | 1265 |
8 | Circle of Nine | 1176 |
9 | Fairy Tail ZERO | 1125 |
10 | MLP fans! | 1118 |
Most popular games (per hour of the day)
SELECT
game_id AS GameID,
(
SELECT strftime("%H", timestamp) AS Hour
FROM post AS inner
WHERE inner.game_id = outer.game_id
GROUP BY Hour
ORDER BY COUNT(*) DESC
LIMIT 1) AS MostPopularHour,
COUNT(*) TotalPosts
FROM post AS outer
GROUP BY GameID
ORDER BY MostPopularHour, TotalPosts DESC
For easier viewing I exported the result to a CSV spreadsheet, as follows:
sqlite3 -header -csv results.db 'SELECT game_id AS GameID, (SELECT strftime("%H", timestamp) AS Hour FROM post AS inner WHERE inner.game_id = outer.game_id GROUP BY Hour ORDER BY COUNT(*) DESC LIMIT 1) AS MostPopularHour, COUNT(*) TotalPosts FROM post AS outer GROUP BY GameID ORDER BY MostPopularHour, TotalPosts DESC' > results-agg.csv