Press question mark to see available shortcut keys

Estimating the damage done by #copyright to #Wikipedia :

"While copyright governs the distribution of creative content in industries like publishing and computer software, its impact on creative reuse has largely evaded empirical analysis. I use the digitization of both copyrighted and non-copyrighted issues of one publication, Baseball Digest, to measure the impact of copyright on a prominent venue for reuse: Wikipedia. While the overall impact of digitization on reuse is positive, copyright hurts both the extent of reuse and the level of internet traffic to affected Wikipedia pages. The impact of copyright is more pronounced for images compared to text and becomes economically significant only post-digitization.

First, I identify a set of about five hundred baseball players nominated for election to the Baseball Hall of Fame and a similar number of comparable basketball players, each of whom was active between 1944 and 1984. For these players I collect data on their Wikipedia pages before and after the digitization event including the number of images, text and internet traffic. The research design (see Figure 2) proceeds by comparing the change in content between the In-Copyright pages (player debuts after 1964) and Out-of-Copyright pages (player debut before 1964) for both baseball and basketball players, before and after digitization. Estimating these three levels of differences provides me with an estimate of the impact of copyright on reuse in Wikipedia that is robust to a number of different alternative explanations.
The results suggest that copyright significantly impacts the reuse of images but not text. Out-of-Copyright players are estimated to have about 93%124% greater images after digitization (depending on the specification) as compared to In-Copyright players. Second, I find that while overall levels of reuse in creases substantially post-digitization, digitization disproportionately benefits Out-of-Copyright players. Instrumental variables estimation suggests that an increase in images for Out-of-Copyright pages is associated with about a 25% increase in traffic. In calculations reported in Section 4 I estimate that the lower bound of the loss to society from this diminished value of Wikipedia is on the order of $300,000 annually.

Google Books is a Google initiative that has as its objective the digitization of all books ever published. On 9th December 2008, Google Books announced that it would digitize magazines in addition to books,4 including all issues of “Baseball Digest”5 published between 1942 and 2008. Issues of the magazine published before 1964 are out of copyright, while those published after will retain copyright till 2019 (see section 6.1 for more details). Therefore, even though all digital issues of Baseball Digest were freely available to be read, only those published before 1964 could be legally reused.
Baseball Digest therefore represents a unique case where depending on the publication date of the periodical (before or after 1964) and the date of access (before or after December 2008), the underlying material differs across both the nature of digitization and copyright. Digitized issues are easily accessed after December 2008, but not before while issues published before 1964 are out of copyright while those published after remain under copyright.

Means of outcome variables by treatment and control groups are presented in Table 2. Analyzing the differences in the number of images for Out-of-Copyright and In-Copyright pages before digitization reveals that copyright has a small but significant impact on the use of images. While the difference in the number of images is close to zero for basketball players, the “before digitization” difference in the number of images for baseball players is about 0.093 images.
Table 2 also indicates that Out-of-Copyright baseball players have on average 0.303 images before digitization which increases to 1.672 images after. Correspondingly, traffic also increases from 75.568 to 141.226 page-views per month. For In-Copyright players however the gains are more modest. Images increase from 0.21 images per page to 0.906 images per page, while traffic increases from 57.188 to 100.554 page-views per month. The differences are not so stark for basketball players. Out-of-Copyright basketball players experience a gain of 0.262 images as compared to a similar 0.186 images for In-Copyright basketball players. Differences in mean traffic to these groups are somewhat different however, though standard errors are large. Out-of-Copyright basketball players gain about 67.317 page-views per month while In-Copyright players gain about 86.037 page-views per month. This suggests that more recent players are experiencing a greater increase in traffic as compared to older players over time.

Finally, I also consider the impact of copyright on text. The results indicate that the estimates for β4 in regressions that consider the reuse of text are slightly negative and not significant at the 10% level (Table 5) stressing that copyright does not seem to impact the reuse of text in the same way that it does images. Interviews with Wikipedia editors suggests that while it is easy to summarize text without violating copyright, doing so for images is hard. The intricacies of “fair use” law, and the differing nature of the underlying media therefore seems to be one reason for this variation on the impact of copyright. Proportional models that use log-transformed dependent variables (Table B.3) seem to be in accordance with these results.

Ultimately, the evidence from these three tests taken together seems to suggest strongly that an important channel through which copyright affects internet traffic to Wikipedia pages is the reduced number of images on a given page caused by copyright.

The study reveals that copyright does seem to have a negative impact on digital reuse. A number of results of the study point to this conclusion. First, reuse of content on Wikipedia from Baseball Digest increases substantially post-digitization. When compared to basketball players, baseball players have a 157% increase in images and 12.82% increase in the amount of text on their pages. Second, copyright reduces the level of reuse once digitized content is made available – players who make their debut after 1964 are less likely to benefit from digitization than are players who played before 1964. Third, the impact of copyright on reuse is particularly salient for the reuse of images, while text seems less affected. Out-of-Copyright players had on average 93%-124% greater number of images after digitization as compared to In-Copyright players. Finally, copyright on Baseball Digest post-1964 is shown to have an impact on internet traffic. Out-of-Copyright player pages receive approximately 25% more visits per month as compared to In-Copyright players. I also find a positive correlation between the increase in the number of images and internet traffic to pages. Instrumental variables estimates suggest that the increase in images leads directly to higher levels of internet traffic on baseball player’s pages.
A back-of-the envelope calculation suggests that a lower bound on the loss to social welfare from copyright is about $267,335 annually for Wikipedia. In order to arrive at this estimate two pieces of data are needed: (a) the approximate value of a page-view to society and (b) estimated page-views lost due to copyright. For piece (a) I use webindetail.com which provides the estimated daily earnings of Wikipedia from potential advertising which would equal to $2.2 million dollars daily for about 400 million daily pageviews.9 This translates into a value to Wikipedia of about $0.0055 per page-view from advertising. For piece (b) results from this study suggest that for every missing image, a Wikipedia page receives about 33.1 fewer page-views per month (Table 6) and that pages have 0.598 fewer images on average (Table 5) due to copyright. Therefore, a page affected by copyright is expected to lose about $0.1088 per month. For the set of 319 pages affected by copyright in this study therefore, this translates into an annual loss of about $416 or a net present value of about $20,800.10 Assuming that about 5% of all 4.1 million articles on Wikipedia are affected in a similar way, this translates into an annual loss of about $267,335 or a net present value of about $13.36 million. These estimates are economically significant for Wikipedia in the light of estimates of the economic value of Wikipedia itself which is pegged to be at about $43.5 million per year (Greenstein, 2013). Further, these estimates represent only a lower bound on lost surplus because advertising rates capture only the valuation advertisers place on reader eyeballs and do not calculate value to readers including value of derivative works of Wikipedia pages."
Shared publiclyView activity