Home > Researchers > Ways to provide Open Access to your work > Open Access Repositories > Self-Archiving FAQs
Self-archiving FAQs PDF Print


1. Preservation

"I worry about self-archiving because archived eprints may not continue to exist or to be accessible in perpetuum on-line, the way they were on-paper."

This worry is misplaced. It is not really a worry about self-archiving at all, but about the online medium itself. As such, it needs to be directed toward the primary database in question, which is the toll-access refereed journal literature, currently in the hands of publishers and libraries, and most of it already in both paper and digital form. That is the official version of record. If you are worried about the preservation of the online version, it is to its publishers and subscribing/licensing librarians that your worry needs to be addressed. The preprints and postprints that are being self-archived by their authors in their institutional eprint archives today are intended to maximize impact by providing immediate open access; they are merely open-access supplements to that toll-based primary literature at this time, not substitutes for it.


To put even this misdirected worry into perspective, we must remember that print-on-paper is not permanent either. The only relevant parameter is the probability of future access. The on-paper probability, such as it is, is achieved by generating (a) multiple copies that are (b) geographically distributed  (c) in a (relatively) robust medium and can be made (d) visible to the human eye.


All four of these properties can be achieved (and have been) on-line too, and the resulting preservation probability can be made as good as, or even better than, the current probability on-paper. That should be the end of the story: For once this concern is no longer grounded in actual, objective probabilities, but only in prior habits and attendant intuitions, then we are talking about biasses and superstitions and not about actual risks.


There are a few side issues: People worry about global power-failures, or global dictatorships. They should remind themselves that these are matters of probability too, and have their equivalents in paper.

People also, by analogy with current unreadable documents in obsolete word-processors or peripherals, worry about whether the digital code, even if preserved, will always be accessible and visible to the eye. The answer is again probability: The reason print-on-paper has been faithfully preserved across generations (when it has been) is that the literate world's collective interests were vested in ensuring that it should do so. This same continuity of collective interests will exist for the digital corpus too, for the same reasons, except that digital code will be much easier to keep migrating to every successive new technology than print on-paper to every successive building or regime ever was. (And there is always the option for those who are still not confident enough in the technology, despite the odds, of printing out hard copies as back-up: Indeed, that is a good way to put the magnitude of one's preservation worries to the test: Who will still feel the need to keep hard copies, and of how much of the corpus, once it's all on-line and accessible to everyone, everywhere, at all times?)


In short, setting up active preservation programs implemented by digital librarians is indeed important and necessary; but it would be completely irrational to interpret the need for robust preservation programs as a reason for any hesitation or delay whatsoever about proceeding with self-archiving right now -- a fortiori, because, for the time being, self-archiving is merely a supplement to, not a substitute for, the existing, modes of preservation, on paper and online. If and when the day should ever come when primary journal publishers decide to downsize and become peer-review service-providers only, cutting costs by offloading the access and archiving burden entirely onto the network of institutional archives, then that institutional network will be quite ready, willing and able to take over the distributed digital preservation burden for its collective research legacy. But that time is not now, hence this worry (about self-archiving now) is misplaced.


2. Authentication

"I worry about self-archiving because you can never be sure whether you are reading the definitive version of an eprint on-line, the way you can be sure on-paper."

Again, the rational way to put this into context and proportion is to remind ourselves that the authenticity of an on-paper version is just a matter of probability too, and that the very same factors that  maximize that probability on-paper can maximize it on-line too. Indeed, if we wish, we can make both the probability and the verifiability of authenticity on-line much higher than it currently is on-paper through techniques such as public hash/time-stamping and encryption .

Nor should the authentication issue be confused with the issue of Peer-Review (7) or Journal Certification (5) (separate questions), nor with the question of " Version Control (23) ": There will be self-archived preprints, revised drafts, final accepted, drafts (postprints; but not necessarily the publisher's PDF), updated, corrected post-postprints, peer comments, author replies, revised second editions. In all of this, the refereed, accepted final draft is one crucial "milestone," but not the only one, in the embryology of knowledge (and not even always the best one).


And last, some of the "authentication" worries arise from conflating self-archiving and self-publication . To say it in longhand: The main objective of the self-archiving initiative is the freeing of the refereed drafts from access/impact barriers. The refereed draft has already been "authenticated" by the journal that peer-reviewed it. Do not confuse that authentication with some worry you may have about whether this self-archived draft is indeed what the author purports it to be. The only thing the author is "self-certifying" in this case is that this is indeed the journal-certified final draft. There is of course always a possibility that it is not the journal-certified final draft; but that was also true when the author sent you an on-paper reprint. The probabilities can, as usual, be tightened to make them as high as we feel comfortable with in either case (especially with institution-CV-based self-archiving ). And, as in the case of preservation , self-archiving is at this stage merely a supplement, not a substitute for existing forms of authentication. (Eprints, however, should always contain a link to the DOI of the publisher's official version.)


So, again, there are no rational authentication concerns at all to deter us from self-archiving immediately.


3. Corruption

"I worry about self-archiving because eprints can be altered or otherwise corrupted on-line in ways they could not be corrupted on-paper."

If the "authentication" worry (2) is the worry about "self-corruption" by the author who has self-archived his own paper, this second "corruption" worry is about "allo-corruption" by parties other than the author.

Again, the answer is that simple and effective means are available to ensure that an on-line draft is uncorrupted with as high a probability as we feel we need. So this too is a non-problem. (Nor should it, again, be conflated with self-publicationissues, which are irrelevant to the self-archiving of refereed, journal-published papers.) Whatever level of incorruptibility we feel we need, we can have it for self-archived papers too.

Consequently, corruptibility worries provide no rational basis at all for deterring us from self-archiving immediately.


4. Navigation (info-glut)

"I worry about self-archiving because there is already too much to read, and it is already too hard to navigate it on paper; adding eprints will just make this situation even worse.

This worry deserves even less space than the others. It is incontestable that the information glut --http://www.sims.berkeley.edu/how-much-info/summary.html -- is far more navigable and manageable on-line than on-paper.


The primary objective of self-archiving is to free the refereed journal literature from impact-blocking access-tolls on-line. That literature is already being published  on-paper. (If you think it should not be, it is with the journals and their referees that you need to take issue, not with self-archiving or the on-line medium!) When it is all accessible toll-free on-line, there is no need for anyone to feel any more (or less) obliged to read the refereed literature than they did on-paper. Keeping it either off-line or toll-based is certainly no cure for the information glut (if there is one); it merely makes the existing access-tolls the arbitrary arbiters of whether or not one reads something, rather than the reader's own rational judgement. (And unrefereed preprints can of course always be ignored altogether, if the reader wishes, on-line just as on-paper.)

In short, no rational deterrent at all to immediate self-archiving from concerns about navigation or information glut.


5. Certification

"I worry about self-archiving because papers are not certified on-line, the way they are in a journal on-paper."

This worry is again based on conflating publication and archiving : The journal publisher (and referees) provide the certification; the archive merely provides access. The author, in self-archiving, "self-certifies" his refereed, published draft as indeed being the self-same draft that the journal refereed and published (and certified). And this being the case is, as usual, a matter of probability, whether on-line or on-paper. And that probability can be made as high as we feel we need.

Again, no rational deterrent to immediate self-archiving in the certification worry.


6. Evaluation

"I worry about self-archiving because there is no evaluative process on-line as there is on-paper."

Again, a conflation of publishing and archiving :  Journal editors and their referees evaluate  drafts and revisions, and if/when they are satisfied that their journal's quality standards have been met, they certify the final draft as having met them (peer review). The author self-archives the peer-reviewed postprints (and unrefereed preprints, and perhaps revised post-postprints), tagging them correspondingly. We can decide how high a probability we need that the peer-reviewed draft is indeed the peer-reviewed draft, but that is not the problem of evaluation , but just the question of Authentication (2) again.

So there is no rational deterrent to immediate self-archiving anywhere in the evaluation worry.


7. Peer review

"I worry about self-archiving because on-line eprints are not refereed, as they are on-paper: What will become of peer review?"

Again, a conflation of publishing and archiving, as well as of preprints and postprints : The author self-archives both pre-refereeing preprints and refereed postprints (etc.), and each is clearly tagged as such. The peer review continues to be performed by the referees, as it always was. Peer-review is medium-independent, and self-archiving in no way alters the peer review system.


Part of the impetus for the groundless worry that self-archiving or open access are somehow at odds with peer review comes from "peer-review reformers," who have somehow managed to link their completely independent reform agenda to the open-access agenda (probably because of a misinterpretation of the implications of self-archiving the unrefereed preprint).


Those who wish to reform or replace peer review first need to go out and test their alternatives, to demonstrate whether or not they will generate a literature of a quality, reliability, and useability at least equal to the one we have now. But meanwhile, self-archiving is about providing open access to the peer-reviewed literature we have now, such as it is, to free it from access-tolls, not from peer-review.



The Invisible Hand of Peer Review.


Peer Review Reform Hypothesis-Testing http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0480.html

A Note of Caution About "Reforming the System" http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/1170.html

Self-Selected Vetting vs. Peer Review: Supplement or Substitute?http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2341.html

No rational deterrent to immediate self-archiving in the peer-review worry.


8. Paying the piper

"I worry about self-archiving because someone surely has to pay for all this: you can't get something for nothing!"

There are many fallacies embedded in this worry, among them misunderstandings about the nature of global networked communication. Internet connectivity is now a standard part of the infrastructure of most of the world's universities and research institutions. If you are not equally worried about who pays for your emails, websites, and web-browsing, you should not be worrying about your self-archiving either. Moreover, paying access-tolls is not paying the pertinent piper here anyway! (I.e., it is not publishers who are paying for universities' network infrastructure!)


The refereed research literature is minuscule compared to the rest of the traffic on the Web . It is the flea on the tail of the dog. Worry about the storage and band-width for the growing daily creation and use of audio, video, and multimedia (most of it non-research use!) by researchers at universities and research institutions before even beginning to fret about the refereed flea.


As usual, there is also some of the archiving/publishing conflation here, thinking that we must find some sort of counterpart for the printing/distribution costs, somewhere. But there isn't any. The cost per-paper of permanent online archiving is virtually zero, yet everyone, everywhere, has access to it all, forever. This is a Gutenberg expense that has simply vanished in the PostGutenberg Galaxy, leaving only the Cheshire Cat's Grin.

There is indeed one essential publishing cost that still needs to be paid, but it has nothing to do with Internet use: It is the cost of implementing peer review. That cost, however, is only 10-30% of the access-tolls currently being paid, and hence could easily be paid out of the annual toll savings.


The last of the "who-pays-the-piper" worries is, I think, a variant of the Capitalism (14) worry. The best way to dispel it is is to note that refereed publishing in the PostGutenberg Galaxy, once the literature has been freed through self-archiving, is likely (apart from whatever optional add-on products and services there may still be a market for) to downsize into aservice ( peer review), provided to the author-institution, instead of the toll-based product (the text) that was provided to the reader-institution in the Gutenberg era.

Nothing hinges on this, however, for as long as the world wants to keep paying for the toll-based product, even after the refereed literature has been self-archived, the piper will be fully paid, yet the literature will be free of all its access/impact barriers.

No rational deterrent to immediate self-archiving in the who-pays-the-piper worries.


9. Downsizing

"I worry about self-archiving because it may force journal publishers to shrink to a non-sustainable size, and then where would we be?"

No one can predict with certainty the evolutionary path that scientific/scholarly journal publishing will take once toll-free online access to the entire refereed corpus provided by author/institution self-archiving has prevailed. The toll-based market for the on-paper version, for the publisher's official on-line version or for other options may continue indefinitely, or it might shrink but re-stabilize at a lower level, or it might disappear altogether -- and this could happen relatively slowly or relatively quickly.


It is not clear in advance which of the current established journal publishers will want to continue doing what, under what conditions. The bottom line is that the only remaining essential service will be peer review. If and when that is the only service for which there remains a market, either current toll-access journal publishers will be able and willing to downsize to that new open-access journal niche, or they will terminate their journal operations, in which case their titles (that is, each journal's editor, editorial board, referees, and authorship) will simply migrate to new on-line-only open-access journal publishers who stand ready to adapt to the new niche [e.g., the Institute of Physics's New Journal of Physics Public Library of Science,   and BioMed Central ]. Because self-archiving is distributed, gradual and anarchic, rather than growing locally, suddenly, and systematically, journal by journal, however, evolution will be gradual rather than abrupt, leaving plenty of time to adjust to a leveraged transition.

No rational deterrent to immediate self-archiving in worries about publisher downsizing.


10. Copyright

"I worry about self-archiving because it is illegal, it violates copyright agreements, and can jeopardize my career and livelihood."

Please see the sections on copyright and on legal ways to self-archive despite restrictive copyright transfer agreements.

In brief, over 90% of journals already officially support self-archiving, and among those who do not yet support it, many will agree to author self-archiving if the author asks; and for those that still don't, self-archiving the preprint before submission and a "corrigenda" file after acceptance is sufficient, and completely legal. What career and livelihood depend on are peer review and research impact, and all self-archiving authors continue to enjoy both; neither one needs to be sacrificed for the other.


Ironically, the open access is also being held back by those well-meaning advocates who think that open access is dependent upon or equivalent to copyrght reform, with authors needing to retain copyright. (This is as incorrect and counterproductive as the belief that open access requires or entails peer-review reform, or that the only way to attain open access is through a transition to open-access ("golden") journal publishing.)

Another alternative is to provide "almost-OA" by depositing embargoed articles as Closed Access instead of OpenAccess.InstitutionalRepositories can all implement the semi-automatic

No rational deterrent to immediate self-archiving in copyright worries.


11. Plagiarism

"I worry about self-archiving because it is so much easier to steal someone else's text on-line, and publish it as one's own, than it is to do so on-paper."

This is again a matter of probability: Yes, "it is much easier to steal someone else's text on-line, and publish it as one's own, than it is to do so on-paper," but it is also much easier to detect such thefts on-line; and it is possible to do both (steal and detect) on-paper too.


Depending on how important we find it to do so, we can make escape from detection so improbable on-line that it becomes harder to plagiarize on-line than on-paper. It is not clear, however, whether it is even all that important to do so. Worries about plagiarism are usual based on the archiving/publishing conflation : Once one's findings have been refereed and published, it is hard for anyone else to derive any benefit from them at the expense of the author (the peer-reviewed version settles all subsequent authorship disputes).


Pre-refereeing preprints are another story; they are dealt with partly in the prior discussion of Authentication (2), and partly under Priority (12), below.


For refereed postprints, however, refraining from self-archiving them because  of worries about plagiarism would be no more rational than refraining from publishing them on-paper in the first place, for the very same reason.

No rational deterrent to immediate self-archiving in plagiarism worries.


12. Priority

"I worry about self-archiving because one cannot establish priority on-line as one can on-paper."

Establishing priority is again a matter of probability, but it can readily be made much more definitive and reliable (and earlier) on-line than on-paper if we wish. See Authentication (2). More important, for the all-important refereed postprints, priority has already been established by publishing them, and the self-archiving is merely to maximize access and impact.

No rational deterrent to immediate self-archiving in priority worries.


13. Censorship

"I worry about self-archiving because censors could decide what can and cannot appear on-line."

This worry too is probably based in part on the usual archiving/publishing conflation (casting the Web and the Archive in the role of a Publisher who refuses to publish your work).

It is true that one's on-line literary goods are at the mercy of the archives and archivists. But one's analog on-paper literary goods were likewise at the mercy of the libraries. They could have chosen to "censor" our work too.

Again, it is just a matter of deciding how tight we wish to make the probabilities in this medium. Mirroring, caching/harvesting and distributed coding already go some way toward taking it out of any potentially sinister local hands. And for refereed, accepted postprints, this argument (against enhancing their access) makes no sense at all.

No rational deterrent to immediate self-archiving in worries about censorship.


14. Capitalism

"I worry about self-archiving because access-tolls are hallmarks of capitalism, market economics, supply and demand, free enterprise. Give-aways smack either of socialism, or market interference, or non-sustainability."

This too is merely a superstition. There are plenty of perfectly capitalistic precedents for give-aways, advertising being the most prominent one. If the thought of advertisers curtailing the potential impact of their ads by charging potential customers for access to them makes no sense, then it makes just as little sense to curtail the potential impact of research findings by charging potential users for access to them.


Nor is there any market interference in self-archiving one's own refereed research: If institutions and individuals want to pay for access-tolls to the on-paper version, or the publisher's official PDF, or further options, they can still do so; but there is no longer any need or justification for continuing to hold the essentials (the peer-reviewed draft) hostage to those toll-based options in the PostGutenberg era, any more than there was any need or justification for continuing to hold the essentials of long-distance communication hostage to postal transport costs in the era of telephony. (Rather than capitalism being under assault from self-archiving, trying to prevent researchers from benefiting from this new, more efficient and economical way of disseminating and maximizing the impact of their refereed research smacks of protectionism.)


Two variants on the capitalism-worry arise from scepticism about the eventual transition from providing a toll-based product to the reader-institution to providing a peer-review service to the author-institution. Note that, strictly speaking, it is not even necessary to answer these worries, as this eventual transition is hypothetical, whereas freeing the refereed literature now through self-archiving is not; but here are replies anyway:

Question 1: "Won't paying directly for the peer review service lead to inflated peer-review costs by the most prestigious journals?"

Question 2: "Won't peer-review revenues lower standards, so that lower-quality work is accepted in order to get more peer-review revenue?"

The answer to both is similar: Referees referee for free, and journal quality and prestige (and impact) depend on rejectionrates. Trying to inflate revenue by lowering acceptance thresholds simply lowers quality, thereby favoring the competition, with its higher standards. This is a built in counter-weight. Likewise for raising peer-review rates: As referees referee for free, there is no reason one journal should charge more than another, and if they do, they risk driving not only the authors but also the unpaid referees to the competition. Because the competitive commodity in this anomalous give-away domain is quality, and nothing else.


A proposal has occasionally been voiced to preserve access-toll-barriers by buying authors off from self-archiving, by offering to share the revenue with them (royalty payments). But the trade-off between imprint-income and impact-incomeis so disproportionate for this anomalous domain that there is not faintly enough money available to make (refereed-research) authors prefer sacrificing their potential impact in exchange.

No rational deterrent to immediate self-archiving in worries about capitalism.


15. Readability

"I worry about self-archiving because it is inconvenient to read texts on screen, and hard on the eyes. It is also not suitable for bed, beach or bathroom reading."

At the moment it is undeniable that for extended, discursive reading, on-paper is still preferable to on-line. This will no doubt change, but even now it is no reason whatsoever for not self-archiving. First, a large proportion of the scientific and scholarly use of the refereed research literature consists of browsing and searching, not linear reading, and for this, on-line navigation is already incomparably superior. Second, there is still that vast potential readership to consider, whose access to your research in any form is currently blocked by unaffordable access tolls (Odlyzko 1999a , 1999b ;http://www.arl.org/stats/index.html ); for that entire disenfranchised population, it's either online or not at all. And last, even for linear reading, the archived version can always be printed off.

No rational deterrent to immediate self-archiving in worries about readability.


16. Graphics

"I worry about self-archiving because on-line graphics have coarser resolution than on-paper and require too much storage capacity and transmission time."

Graphics too will no doubt improve. With a few exceptions, such as fine arts and histology, digital graphics are already good enough. Users can always decide whether or not they feel they need to access the deluxe hard copy; no need to make a pre-emptive decision on their behalf, as the on-line version is in any case a supplement, not a substitute, for the time being. And graphics are quite a natural test-bed to see whether there is still any market left for any toll-based add-ons. In many cases, web illustrations are already considerably better than paper, with the potential for higher resolution and greater dynamic range, especially as links. This is particularly true for illustrations in fields where the data are collected digitally in the first place, such as Astronomy.

No rational deterrent to immediate self-archiving in worries about graphics.


17. Publishers' future

"I worry about self-archiving because of what it might do to journal publishers' future."

See the replies about Paying the Piper (8), Downsizing (9), and Capitalism (14), but note that this is all speculation and hypothesis, on both sides: If and when it should ever become necessary to do so --  it is not yet clear whether and when it will be necessary and all evidence to date is to the contrary  --  then those journal publishers who are willing and able to cut inessential costs and downsize to a new open-access journal-publishing niche will be able to do so in a leveraged transition. In cases where they are not willing or able, new online-only open-access journal publishers [e.g., the Institute of Physics's New Journal of Physics,   Public Library of Science,   BioMed Central] will stand ready to take over the titles. The remaining peer-review service costs per submitted paper can be paid for by the author-institution out of 10-30% of its annual 100% access toll-savings. And refereed journal publication is only a small portion of publication, most of the rest of which, being non-give-away, will proceed on-line much the way it does on-paper.

No rational deterrent to immediate self-archiving in worries about publishers' future.


18. Libraries'/Librarians' future

"I worry about self-archiving because of what it might do to libraries' and librarians' future."

The refereed serials literature is all going on-line anyway, irrespective of the speed or success of the self-archiving initiative. If this requires restructuring of some librarian skills and functions, this will take place in any case. Some have thought that managing digital serials collections will fill the gap, but it is not clear how much management those will need, apart from paying the annual access toll-bills! Author/Institution Eprint Archives, on the other hand, will call for more digital librarian skills, in everything from helping researchers to do the self-archiving, to maintaining the institution's Eprint Archive and seeing to its continued interoperability with the rest of the world's Eprint Archives, its upgrading, and its preservation.


Moreover, in implementing and maintaining the institutional Eprint Archives, Libraries will be investing in the solution of their serials crisis. Of the 100% annual access-toll budget that this can potentially save, after 10-30% of it has been redirected to cover author-institution peer-review costs, the remaining 70-90% can be used to fund other librarians' activities, including the purchase of non-give-away materials such as books (whether on-paper or on-line).

No rational deterrent to immediate self-archiving in worries about libraries'/librarians future.


19. Learned Societies' future

"I worry about self-archiving because of what it might do to Learned Societies' future."

Learned Societies are potential allies in and beneficiaries of the self-archiving initiative. First, they are us. Whatever is good for research, and for research impact, is therefore also good for Learned Societies.

But many of them are also journal publishers, and hence may one day be facing downsizing pains. Unlike commercial publishers, however, their first and last allegiance will of course be to research and researchers, that is, us. We will hear rationalizations about needing the access-toll revenues to fund "good works" such as meetings, scholarships and lobbying. But it will quickly become evident that, on the one hand, some of these good works are not essentials either, and certainly nothing that we would want to sacrifice research impact for; and the subset of these good works that really is essential (e.g., meetings) will prove to be able to fund itself other ways too, rather than needing to be subsidized at the expense of research impact. (Imagine explicitly asking the society membership, once the causal connection between access and impactbecomes common knowledge: "Are you willing to continue subsidizing your society's good works with your own lost research impact, by foregoing open-access and letting toll-access continue to decide who can and cannot use your [give-away] research?")


Learned Societies (and perhaps also University Presses) are also natural candidates for taking over the serials titles of commercial journal publishers who prefer to discontinue journal operations rather than scale down to just becoming peer-review service providers.

No rational deterrent to immediate self-archiving in worries about Learned Societies' future.


20. University conspiracy

"I worry about self-archiving because I worry that universities may have other plans for their researchers' writings, such as Eprint Archive Access-Tolls."

This worry seems to be based on some (one hopes) over-suspicious views about university administrators and their motives.


We should not forget that the give-away refereed literature is esoteric, with virtually no "market" per paper. So whereas there might be a basis for suspicion about what our hard-pressed universities might like to do if they could get their hands on our exoteric, non-give-away work (royalty-bearing books and textbooks), there's not much they could do to squeeze revenue out of our no-market, give-away refereed research reports even if they wanted to. On the contrary, our universities, like ourselves, co-benefit far more from the potential impact-income of our research output -- maximized by removing all access-barriers -- than from any potential imprint-income that could be squeezed out of it by in effect co-opting the "P" from the publishers' S/L/P (Subscription/License/Pay-Per-View) access-tolls and using it to charge institutional archive access-tolls.

Moreover, our universities' potential access-toll savings, and relief from their serials crises, are completely dependent on freeing access to our research. Any sign of university-levied archive-access tolls would simply serve to keep the current access tolls in place (simply changing the hand on the udder of the toll-based cash-cow).

No rational deterrent to immediate self-archiving in worries about University conspiracy.


21. Serendipity

"I worry about self-archiving because of those lucky happenstances that happen only when browsing index cards, library shelves, and journal contents."

This worry, despite its charm, does not deserve much space: With time, it will become evident that on-screen digital searching and browsing can be every bit as serendipitous as on-paper analog searching and browsing; chance adjacency effects are every bit as potent either way. The searching and browsing will simply be less exhausting to the limbs and fingers.

No rational deterrent to immediate self-archiving in worries about loss of serendipity.


22. Tenure/Promotion

"I worry about self-archiving because it does not count as refereed publication, and might even interfere with the chances for refereed publication."

Yet another instance of the archiving/publishing conflation: The self-archiving initiative is aimed at freeing refereedpublication from access toll-based access/impact barriers (not from refereeing).  Unrefereed preprints do not count as publications on-line any more than they do on-paper.

The other half of this worry is probably a variant of the Copyright (10) concerns ( q.v. ) as well as concerns about Embargo policies ( Harnad 2000a , 2000b ), both of which are groundless.

No rational deterrent to immediate self-archiving in worries about tenure/promotion .


23. Version control

"I worry about self-archiving because there may be many versions and there is no way to be sure which is which, and whether it is the right one."

There will be self-archived preprints, revised drafts, final accepted, drafts (postprints [ but not necessarily the publisher's proprietary PDF), updated, corrected post-postprints, peer comments, author replies, revised second editions. OAI-compliant Eprint Archives will tag each version with a unique identifier. All versions will be retrieved by a cross-archive OAI search , and the "hits" can then be identified and compared by the user to select the most recent, official or definitive draft, exactly as if they had all been found in the same index catalogue.


24. Napster

"I worry about self-archiving because it seems to be stealing, like Napster or Gnutella."

Author-end give-aways of their own digital products via self-archiving are the antithesis of consumer-end rip-offs ofothers' non-give-away digital products via napster www.napster.com or gnutella gnutella.wego.com.

It is very important to clearly distinguish and distance the two , because any inadvertent or willful conflation of the self-archiving initiative with napster can only retard the progress of the self-archiving initiative toward the optimal and inevitable. ("Information is free" is nonsense: There is and always was both give-away and non-give-away information. Steal the latter and you simply kill the incentive to provide it in the first place.)


25. Mark-Up

"I worry about self-archiving because it would jeopardize proper mark-up."

Mark-up (the tagging of all functional parts of a document, such as titles, headings, sections, figures, tables, paragraphs, and any other potentionally identifiable and manipulable sub-parts) is becoming increasingly important in digital documents. The most general mark-up "language" is called SGML and the subset of SGML that has been provisionally adopted for digital documents on the web is called XML . Most authors today use either Word, PDF,  HTML , or TEX to create and render their documents. The documents thus produced do not have markup that is rich enough or flexible enough to allow important functions such as reference linking , flexible re-formatting, and reliable, intact migration to future formats for permanent preservation . This richer markup is currently provided by publishers and it must be done by hand and is therefore costly.


Hence an Eprint archive of documents self-archived without XML markup is only a short-term archive. A long-term archive requires the rich markup provided by publishers. But if present-day user preference for the free open-access documents prevents publishers from being able to recover their markup costs, will both the benefits of markup and the long-term functionality of the archived documents be lost?


The solution to this problem is the following:

(1) For now, self-archiving is not a substitute for what publishers do and provide, but a supplement to it, providing a parallel open-access version of the peer-reviewed text for any user whose institution cannot afford access to the publisher's toll-access version. The publisher's marked-up version will have more functionality, for those who can afford to pay for it, but the peer-reviewed full-text will at last be accessible to everyone, already maximizing its research impact today. This is the immediate short-term goal of self-archiving.

(2) Once the short-term goal of open access is attained, several alternative sequels become possible, and no one yet knows which of them will actually take place. The two main alternatives are:

(a) Nothing else changes. The self-archived version is accessible to all would-be users for free, and the publisher's marked-up version continues to be accessible only to those who can afford to pay. The publisher's revenues continue to pay for the mark-up, and its benefits are reserved for those who can afford to pay for them, as before, but the full-text without the markup (in WORD, HTML, PDF, or TEX) is available to everyone else. It should be clear that if (a) is the eventual outcome, then that is no reason to hold us back from immediate self-archiving, as we have everything to gain from it (maximized access), and nothing to lose. The status quo continues, in parallel, along with the immediate effects of open access.


There is another possibility, however, and perhaps a more likely one:

(b) User preference for the open-access version reduces demand for the publisher's marked-up version to such an extent that its costs can no longer be covered from access tolls as they had been in the past. How is markup to be provided and paid for now?


If (b) is the eventual outcome, then because open-access will prevail, the cost-recovery can no longer be on the reader/institution end, in the form of access tolls. However, the reader/institutions also happen to be the author/institutions. Hence they are in a position to redirect a portion of their annual windfall toll savings to cover the remaining essential costs per outgoing paper rather than per incoming paper, as now. The collective cost currently paid by all subscribing institutions combined averages $1500 per incoming paper. If all subscribing institutions instead get back their portions of these costs, then the ~$500 per paper cost of peer review can easily be paid out of these annual windfall savings, with plenty of savings to spare. The cost per-paper of physical archiving is negligible: How much would markup cost, per paper, over and above peer review?


No one knows exactly, yet, but it is likely that a good deal of the task of markup can be offloaded onto the authors, just as digital text preparation has been, with the development of user-friendly XML markup tools. WORD will soon generate automatic XML versions, just as it now generates automatic HTML (and they will no doubt be equally inadequate, needing to be supplemented by some windows-based hand-manipulation by the author). But overall, it is likely that the pressure of necessity will inspire more and more effective and easy-to-use author-based markup capability.

The pressure of necessity that drives these adaptive changes, however, will come from the existence of the free open-access version. So markup concerns provide no reason to hold us back from immediate self-archiving.


26. Classification

"I worry about self-archiving because we would first need a subject classification system."

There are (at least) two ways to think of University Digital Archives, both of them important and valid, but definitely not the same:


(1) The University Digital Archive as the university digital library -- or, more specifically, the university digital library for all of the university's own scholarly, scientific and pedagogic output. (This includes journal articles, books, teaching materials, and any other digital content the university produces and wishes to include in its digital output.) See SPARC's position paper on institutional repositories and MIT's DSpace. There is no question but that a rigorous system of classification and tagging -- to make such a total university digital output navigable and integrable and interoperable with corresponding digital output from other universities in similar University Digital Archives -- would be extremely important to have, indeed a prerequisite for the usefulness and usability of such a university digital output library.


(2) The University Eprint Archive as a means of providing open access to all of the university's peer-reviewed research output (before and after peer review). Almost without exception, this is the work that also appears in the peer-reviewed journals sooner or later (indeed, that is how it gets peer-reviewed).


It should be clear that (2) is a very special subset of (1). But it should be equally clear that that special subset does not have any particular or pressing classification problem! These are not books. They are journal articles. Our journal articles are not indexed in our university library card catalogues (only the journals in which they appear are). When we want to search the journal literature, we do not look to any university classification system: we go to indexing services such as INSPEC, MEDLINE, ISI, etc. (Those do have their own classification systems, but it is unlikely that any of those classifications could out-perform google-style boolean search on an inverted full-text index, especially if aided by citation-frequency-based, hit-based, recency-based, or relevance-based ranking of search output, as done, for example, by citebase).

It is important to make it crystal clear that the peer-reviewed research corpus -- and those University Eprint Archives for which that particular corpus is the main target literature at this time -- do not have a classification problem, and need not and should not wait for any solution to any classification problem before getting on with the infinitely more pressing task of getting themselves filled with their university's research output -- so that they can at last start plugging the chronic leak in its potential impact!


Agenda (1) (the university digital output library) is very important and worth pursuing; it is also an extremely valuable collaborator to agenda (2) (open access to peer-reviewed research through institutional self-archiving) -- but only if the two agendas facilitate rather than restrain one another (as any implication that agenda (2) has classification problems to solve would most definitely do).


27. Secrecy

"I worry about self-archiving because it would compromise the secrecy of patents and sponsored research."

Self-archiving is only for research results one wishes to make public, just as publishing is. Whatever one does not wish to publish, one does not self-archive. (Eprint Archives also have the option of depositing a text for internal use only, not accessible to the public, if/when this is judged useful.)


28. Affordability

"I worry about self-archiving because it will interfere with making toll-access more affordable."

The immediate purpose of self-archiving is to maximize research impact, not to make toll-access more affordable. Research impact has been unavoidably lost (by research and researchers) since the beginning of refereed research publication because of the high costs of providing paper access. The online medium now makes it possible for them to put an end to this cumulative impact loss. Of course, universally affordable toll-access would have the same effect (if it were truly universal -- i.e., if the universities of all potential users of all refereed research could afford to access it all). It would be splendid if journal publishers could provide universally affordable toll-access, and they are certainly encouraged to work toward doing so. But in the meanwhile, it is quite understandable that today's researchers prefer not to wait (for when and if universally affordable toll-access arrives). They will self-archive to maximize their research impact now (while they are still alive and compos mentis).


Some may think the competition to the toll-access version from the open-access version will keep toll-access less affordable; some may think it will have the opposite effect, encouraging cost-cutting and downsizing to the essentials, making it more affordable. If the price of the value-added toll-access version becomes affordable enough, and the demand for its added-value is sufficient to sustain the market, then it is demand for the open-access version that will shrink, and along with it the incentive to self-archive, for the universal affordability will make any further impact loss negligible.

That is not where we are right now, however, and researchers would be rather foolish to wait patiently to see how things may or may not eventually turn out if they were to continue to renounce their potential daily impact even today, when it is no longer necessary.


29. Sitting Pretty

"I don't worry about self-archiving because there is really no problem: My institution gives me all the access and impact I want or need already. I'm satisfied!"


If a researcher -- especially a researcher at a well-off institution -- does not exercise some critical reflection, the natural feeling is: "Where's the problem? I and others at my institution were already well-off in paper days. Now, in the online era we are even better off, with desktop online access to everything, instead of having to walk to the library, and with licensing 'big deals' that get us even more journals than we used to have!"


This is related in part to the "Harvards vs. Have-Nots" misconception.http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/3178.html It is also a symptom of not having understood the causal connection between access and impact. Yes, the better-off institutions enjoy better access to the peer-reviewed journal literature than the less-well-off institutions (and better access than they had in paper-days). But no institution can afford toll-access to all or even most of the 25,000 peer-reviewed journals that exist. And most institutions can afford toll-access to only a small and shrinking portion of them: http://www.arl.org/stats/index. html . And even the Harvards (not only the Have-Nots) are groaning under their growing serials-budget expenditures. So no researcher, at any institution, has access to more than a fraction of what there is. And usage patterns in those lucky fields where open online access is most advanced show that when everything is accessible and a keystroke is the only barrier, users make vastly more use of the literature. http://cfa-www.harvard.edu/~kurtz/jasis-abstract.html


So much for access. The other side of the coin is even more important: Researchers at prestigious institutions will also say that they only write for one another. But they don't really mean it. All researchers are interested in their research impact (citation counts), not only because that is one of the things that advances their careers and funds their further research, but also it is a measure of the size and importance of their contribution to knowledge. Few researcher are aware -- because thedata on the strong causal connection between access and impact are new and still being gathered -- of the size of their own and their institution's cumulative daily, weekly, monthly, and yearly impact-loss owing to access-denial to those would-be users world-wide whose institutions cannot afford the toll-access to their work.http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0025.gif

In this equation, the Harvards are losing almost as much as the Have-Nots, because they are losing the potential impact from the users at the Have-Not institutions, which vastly outnumber the Harvards! Yes, the Harvards may be somewhat better off in their own access to the research output of others; but the following is just as true of them as it is of the Have-Nots: For every one of the 2,500,000 articles published annually in the 25,000 research journals it is a fact that it is not accessible to most of its potential users because of unaffordable toll-barriers. And (this too is critical): this would remain true even if all 25,000 journals were sold at cost.

It remains only to point out to the researchers who think they are sitting pretty today exactly how big their cumulating daily, weekly, monthly, and yearly impact loss is, as long as they delay making their research output open-access by self-archiving it. Estimates like the Kurtz study above show that this needless impact loss is substantial in terms of download impact. According to the most widely cited study of citation impact it is 336%.http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0006.gif


30. Rechanneling toll-savings

"I worry about self-archiving because if the toll system collapses, there will be no way for my institution to rechannel its library toll-savings from buying in the peer-reviewed research output of other institutions to paying journals for the peer-reviewing of our own institutional research output."

Where there is a will, there is a way. Necessity is the Mother of Invention.


31. Waiting for Gold

"I worry about self-archiving because open-access journals are the only stable solution."

There are two roads to open access to the 2,500,000 yearly articles appearing yearly in the planet's 25,000 peer-reviewed journals. The "golden road" is to create or convert 25,000 open access journals. The "green road" is for authors to self-archive the articles they publish in the toll-access journals. The golden road awaits the creation or conversion of 21,500 open-access journals, one by one. (There are about 3500 open-access journals to date, since 1991 http://www.doaj.org/). The green road awaits only self-archiving. The roads are both worth taking and complementary, but the golden road is long, slow, and uncertain, whereas the green road is short, fast, and already proven (already providing three times as much open access yearly, and stably ongoing since 1991). The optimal open-access strategy is hence a dual one:

(1: GOLD) Publish your articles in an open-access journal whenever a suitable one exists today (currently 3500, <15%)


(2: GREEN) Publish the rest of your articles in the toll-access journal of your choice (currently 21,500, >85%) and also self-archive them in your institutional open-access eprint archives.

It has to be clearly understood that (1) the library's serials crisis and its attendant journal pricing/affordability problem isnot the same as (2) the researchers' article access/impact problem. OA self-archiving (green) solves (1) but not (2). If and only if the stable end-game for journal publishing in the online age is indeed destined to be gold (and no one today knows whether or not that is so), then the green road is also the fastest and surest way to get us there. See Publishers' future (17). But there is no certainty that gold is indeed the destined end-game, publishers know this (that is why 92% of journals are already green), and hence it is irrational for librarians or researchers to resist or delay self-archiving -- with its certain capacity to generate 100% OA once it is mandated, leading to its certain benefits for research -- for the reason that it is merely green and not gold, and does not necessarily lead to gold!


32. Poisoned Apple

"I worry about self-archiving even if the journal gives me the green light to do so, because if I do, the light may change to red."

Over ninety percent of journals have already given their green light to author self-archiving for at least six reasons. Here are those six reasons, in approximate order of priority:


(1) OA is Optimal and Inevitable for Research and Researchers. Open Access (OA) is clearly on the way. Its benefits to research and researchers -- in terms of enhanced research usage and impact -- are demonstrated and undeniable. Its progress is unstoppable. Going green is a natural way for research journal publishers to show support for OA and confirm that they are not in conflict with what is in the best interests of research and researchers. Opposing OA today is becoming increasingly bad public relations for journal publishers.


(2) Green is a Hedge Against Gold. At the same time, the risks of converting to OA journal publishing ("gold") are still considerable: There are still uncertainties about who will pay and with what, and how much it should cost and for what. The OA cost-recovery model has not yet been tested long, and only by about 5% of journals. Hence going green is a rational hedge against pressure to go gold: "If authors want OA so badly, let them show it by providing it for themselves, with our green light and blessing, rather than pressuring us to make all the sacrifices, and take all the risk upon ourselves."


(3) The Risk of Going Green is Low: There are physics journals that have been effectively green since 1991, and some of their contents have been 100% OA through self-archiving for years now, yet their subscription revenues have not eroded. The American Physical Society (APS) was the first green publisher. One physics journal (JHEP), born gold (subsidised), evenconverted back to green, successfully, by migrating to a green publisher (IOP) .


(4) If/When It Ever Came To That, Green Would Allow Publishers a Gradual Leveraged Transition to Gold. OA growth by author self-archiving is gradual and anarchic, article by article, rather than sudden and all-or-none, journal by journal. It gives journal publishers time to adapt to OA. If and when there should ever be a transition to gold, a prior green preparatory phase will allow this to be a stable leveraged transition rather than an abrupt and catastrophic one. (Equally important, the very same user-institution subscription/license cancellations that would drive the transition to gold, if/when they occurred, would at the same time generate the annual author-institution windfall savings that would then cover the institutional costs for author-institution-end (outgoing) payment for publication in place of the current user-institution-end (incoming) payment for subscription.)


(5) OA Enhances Journal Impact Too. Enhanced impact not only benefits reasearch, as well as authors and their careers, but it benefits journals too, as the journal impact factor (which helps sell journals) is the average of its articles' individual impacts.


(6) Research Institutions and Funders Are Already Beginning to Mandate Self-Archiving. See ROARMAP for the over 40 Green OA self-archiving mandates already adopted worldwide.


In the light of these 6 reasons for publishers to go green now, it is hard to imagine why anyone would dream that authors taking publishers up on their green light today, by going ahead and self-archiving -- thereby generating still more OA, and still more demand for and reliance upon OA -- would make it easier rather than harder for any journal not to be green than it is today, when 92% are already green. On the contrary, authors failing to go ahead and self-archive even now that the publisher's light is green would give opponents of OA strong grounds for arguing that the research community does not need or want OA as much as it purports to after all, and hence that there is no real call for either green or gold!


33. IRs: DL or OA?

"I don't bother with research self-archiving because it's just a small subcomponent of the Institutional Digital Library of the future, and not the first or highest priority ."

Reaching 100% Open Access is an urgent, specific, focussed priority for research institutions. The purpose is to put an end to the needless loss of institutional research impact, productivity and progress that occurs just because researchers at other institutions who wish to access, use, apply, and build upon any given institution's research output cannot all do so, because their own institution cannot afford the journal in which the research happens to be published.

The library community can play an invaluable role in solving this problem if it stays focussed on the solution, which is to fill their own institution's dedicated Open Access Institional Repositories (OA IRs) with 100% of their own annual institutional research output (primarily journal articles and theses) as quickly as possible.


If libraries instead merely subordinate OA self-archiving to the far more diffuse and open-ended target of generic Digital Library (DL) archiving, then the library community will have missed its historic opportunity to contribute to the OA solution. University provosts and pro-vice-chancellors are, naturally enough, turning to their libraries in order to implement their OA IRs. It is up to the libraries to stay focussed on the OA target rather than letting it dissipate in digital diffuseness. Once they have achieved 100% annual self-archiving for their own institutional research output, they can progress naturally in other digital directions. An institution can even create two IRs -- one of them an OA IR, focussed on and dedicated specifically to OA content (research articles and theses), the other a generic DL IR for all other kinds of digital content -- as long as the OA IR is given full priority until it has reached its annual 100%. Other DL contents will only benefit from the expertise and direction gained from the successful convergence on the specific OA target content.


The individual efforts of dedicated and focussed librarians at CERNQueensland University of Technology, and University of Minho, for example, have created the most successful OA IRs to date, providing models for institutions worldwide to emulate. If institutions instead adopt the diffuse, generic DL model for their IRs, they will, like Leacock's horseman, jump on their institutional steeds and ride off in all digital directions, getting nowhere fast.


How the library community proceeds at this crucial juncture will determine what causal role it will have played in the historic transition to OA. Although researchers began OA self-archiving earlier, the library community will get uncontested historic credit for having sounded the clarion call -- for what eventually became OA -- in alerting the academic community to the serials crisis. This gave birth to SPARC, which was initially dedicated only to using libraries' consortial bargaining power to try to drive journal prices down into a more affordable range; SPARC at first did not promote OA self-archiving; but once its attention was drawn to it, SPARC joined the OA movement in 2001. Although it first lent more support to OA publishing ("gold"), SPARC and the world research library community eventually also began to put their weight behind OA self-archiving, supporting it in two ways:


(1) Individual institutional libraries and librarians began explicitly promoting and supporting OA self-archiving (e.g.,CalTechSt. Andrews) and


(2) SPARC put its official support behind Institutional Repositories (IRs).

But the problem now is that library community support for OA IRs has also become conflated with an entirely different agenda for IRs: the generic Digital Library (DL) agenda, whose first priority is not maximising the usage and impact of institutional research output, but the curation and preservation of all manner of institutional digital content: incoming/outgoing, research/teaching/archival, OA/non-OA, text/non-text. This diffuse and open-ended agenda now risks blunting and dissipating the potential impact of the library-based OA IR movement, sending it off instead in all digital directions for years to come. One can only hope that the library community will realise that providing immediate access to it own institutional research output is far more urgent at this historic juncture than curating and preserving generic digital content.


34. Priorities

"I haven't the time to self-archive: I already have more to do than I can manage."

(a) Self-archiving takes less than 10 minutes per paper. (How many papers does one write per year?)

(b) Self-archiving can be delegated (to library staff, to students, to clerical help).

(c) The benefits of self-archiving -- in terms of impact, impact income, and research progress probably make the slight extra time per paper as well spent as the time writing the paper itself.

(d) Two international, interdisciplinary JISC author surveys have found that 95% of authors will find the time to self-archive if required by their institutions or research funders. The results from the institutions that have already mandated self-archiving confirm this.

A variant on this question concerns self-archiving mandates -- by researchers' funders and universities -- that are increasingly being adopted, proposed and petitioned for:

If OA is so beneficial for research and researchers, and they want it so much, then why don't they self-archive spontaneously, without the need for mandates?

(1) Self-archiving mandates are no more (nor less) a matter of coercion than publish-or-perish mandates:

(2) Is it in researchers' interests to publish their findings? Yes. Can they be relied upon to do so without publish-or-perish? No.

(3) Why don't more researchers self-archive spontaneously? The reasons (in no particular order) are:

(3a) Unawareness of the possibility of self-archiving

(3b) Unawareness of the benefits of self-archiving

(3c) Worries that self-archiving might be illegal

(3d) Worries that self-archiving might reduce one's chances of getting published

(3e) Worries that self-archiving means abandoning peer review

(3f) Worries that self-archiving is technically hard to do

(3g) Worries that self-archiving is time-consuming

(3i) Laziness

(3j) and dozens of other worries

(4) But not only do Alma Swan's international, interdisciplinary author surveys show that 95% of authors will nevertheless comply with self-archiving mandates (over 80% of them willingly), but Arthur Sale's actual analyses of the success rate of mandated institutional repositories, compared to unmandated ones, fully bear out the results of the surveys: Self-archiving mandates work, just as publish-or-perish mandates (and public smoking bans, and seat-belt regulations) work.

Swan, A. (2006) The culture of Open Access: researchers' views and responses, in Jacobs, N., Eds. Open Access: Key Strategic, Technical and Economic Aspects, chapter 7. Chandos.

Sale, A. (2006) The Impact of Mandatory Policies on ETD Acquisition. D-Lib Magazine 12(4) April 2006,

Sale, A. (2006) Comparison of content policies for institutional repositories in Australia. First Monday, 11(4), April 2006.

Sale, A. (2006) The acquisition of open access research articles. First Monday, 11(9), October 2006.

Sale, A. (2007) The Patchwork Mandate D-Lib Magazine 13 1/2 January/February

Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier. Ariadne 35 (April 2003).


35. Royalties

"I worry about self-archiving because it might make me lose sales or photocopy royalties on my articles."

(1) How much sales and photocopy revenue do you actually make on your peer-reviewed journal articles?

(2) Is that what you wrote them for?

(3) What about all the usage and citations you are losing from all those potential users whose institutions don't happen to be able to afford to subscribe to (or photocopy) the journal in which your article happens to have been published?

Last Updated on Friday, 06 March 2009 17:50