Go back to main download site
To download a corpus select a corpus size - given in number of sentences - and download the corresponding data file.
News
Year Country Downloads
2007 10K 30K 100K 300K 1M
2009 10K 30K 100K 300K 1M
2010 10K 30K 100K 300K 1M
2019 10K 30K 100K 300K 1M
2020 10K 30K 100K 300K 1M
2021 10K 30K 100K 300K 1M
2022 10K 30K 100K 300K 1M
Newscrawl-public
Year Country Downloads
2018 10K 30K 100K 300K 1M
Web
Year Country Downloads
2002 10K 30K 100K 300K 1M
2014 Kazakhstan 10K 30K 100K 300K 1M
2014 Lithuania 10K 30K 100K 300K 1M
2014 Moldova 10K 30K 100K 300K 1M
2015 Azerbaijan 10K 30K 100K 300K 1M
2015 Estonia 10K 30K 100K 300K 1M
2015 Latvia 10K 30K 100K 300K 1M
2015 Lithuania 10K 30K 100K 300K 1M
2015 Soviet Union 10K 30K 100K 300K 1M
2015 Tajikistan 10K 30K 100K 300K 1M
2015 Tuvalu 10K 30K 100K 300K 1M
2015 Ukraine 10K 30K 100K 300K 1M
2015 Uzbekistan 10K 30K 100K 300K 1M
2016 Lithuania 10K 30K 100K 300K 1M
2016 Moldova 10K 30K 100K 300K 1M
2016 Tajikistan 10K 30K 100K 300K 1M
2016 Uzbekistan 10K 30K 100K 300K 1M
2017 Georgia 10K 30K 100K 300K 1M
Web-public
Year Country Downloads
2019 Russia 10K 30K 100K 300K 1M
Wikipedia
Year Country Downloads
2007 10K 30K 100K 300K 1M
2010 10K 30K 100K 300K 1M
2014 10K 30K 100K 300K 1M
2016 10K 30K 100K 300K 1M
2021 10K 30K 100K 300K 1M
Go back to main download site