Threat Intelligence Blog

Posted March 17, 2010

Docstoc is an online document sharing service that allows users to upload files like Microsoft Office documents, text files, and pdf files and share them with the greater internet community. Launched in November 2007, the service has become very popular as a way to find and distribute content in those formats and now offers more than 13 million documents.

In May 2009 Docstoc offered DocCash, announced as “a service where users can now make money by uploading documents to Docstoc”. In the DocCash program, users are compensated a portion of all Google AdSense earnings generated when the documents they uploaded to Docstoc are viewed. The service expressly prohibits the uploading and sharing of documents when the user does not own copyright, and will remove content and even ban users who violate the policy when brought to their attention. However, this environment is ripe for copyright abuse, far too easy and inviting for those individuals looking to make a quick dollar.

Take for example the following Docstoc user profiles, all publically available. For each user account, we made note of the number of documents they uploaded as of this writing and the time elapsed between that user’s first and last upload (also as of this writing).

svhUSER1
Example 1’s profile page

Example 1
Number of documents: 4,033
Time between first and last file uploaded: less than 24 hours. All files uploaded on March 3, 2010.

scwUSER2
Example 2’s profile page

Example 2
Number of documents: 3,683
Time between first and last file uploaded: less than 24 hours. All files uploaded on March 7, 2010.

mnyUSER3
Example 3’s profile page

Example 3
Number of documents: 4,283
Time between first and last file uploaded: less than 24 hours. All files uploaded on March 7, 2010.

hilUSER4
Example 4’s profile page

Example 4
Number of documents: 17,142
Time between first and last file uploaded: 4 days, from November 26, 2009 to December 30, 2009.

Although very remotely possible, it is very unlikely that the owners of these accounts own the copyright to such large amounts of content. It is more likely that these account owners scraped search engine results pages for queries like filetype:doc or filetype:pdf and then took advantage of Docstoc’s API to upload files in an automated manner, allowing for the volume of content to be posted so quickly.

In fact, Cyveillance has uncovered significant number of documents posted through DocStoc that include copyright statements of those other than the account owners. It is critical for brand and copyright owners to vigorously protect their intellectual property and, when identified, pursue the offenders. If not, brand equity is at risk in addition to the potential loss of common copyright protection as their content becomes public domain.

In the following two examples, the account owners attempt to earn money by uploading vast amounts of content to the site. In this case however, it appears the account owners have scraped content from different sources across the web, stitched small parts bits to form meaningless paragraphs on a single topic, and uploaded the content as a rich text file to Docstoc. The spammer is likely hoping that esoteric content, although of low value (or no value), will generate traffic from long tail search queries.

serUSER5
Example 5’s profile page

Example 5
Number of documents: 64,166
Time between first and last file uploaded: 6 days, from March 9, 2010 to March 15, 2010.

qeeUSER6
Example 6’s profile page

Example 6
Number of documents: 2,510
Time between first and last file uploaded: 6 days, from February 25, 2010 to today.

Like youtube.com, blogspot.com, and other sites where content can be added by users, spam and the display of copyrighted content is an issue. The situation is made even worse when uploading such content is incented with cash to upload content. Like the other services mentioned, Docstoc has come of age but is responsible to offer an environment that clearly discourages copyright abuse and should take strong steps to ensure the content uploaded by its users is not in violation of their own policies. Otherwise they will become known as a passive accomplice in copyright abuse and spam generation.

To minimize the chance that one’s own content that should not be made public is copied from one’s website and posted by others in services like Docstoc, Cyveillance recommends that companies regularly check to make sure that their sensitive internal documents as well as public, but copyrighted documents are not posted online by others, including their vendors, partners, or employees. As we encourage with our own customers brand and copyright owners need to take an aggressive posture in their own protection otherwise their own investments are diminished.

Additional Posts

Are AV Reviews Providing a False Sense of Security?

PC World recently reviewed Norton Internet Security 2010 praising the tool as “one of the top ...

Typosquatting and Brand Owners; Comments from Ben Edelman

In mid-February Harvard researchers Tyler Moore and Benjamin Edelman posted their research on the ...