A corpus of newswire stories recently made available by Reuters, Ltd. Details about the collection and how to obtain it can be found at Reuters home page for corpora. There is also a mailing list for discussions about the collection. I have written, along with Yiming Yang, Tony Rose, and Fan Li, a JMLR paper describing the collection and defining a corrected version of the collection, RCV1-v2. Two formatted versions of RCV1-v2, and other useful files, are available as online appendices to that paper.
Return to home page for David D. Lewis