3 minute read.2026 public data file now available
Once a year we release all metadata records for content registered with Crossref in a public data file. This year’s version, containing nearly 180 million records, is now available. It includes metadata associated with all Crossref-registered DOIs in JSON-lines format.
All our metadata is openly available via our REST API at all time and this file provides the same information all in one place for those who find that format useful for their tools and analysis. You can access the file via Academic Torrents at https://doi-org.pluma.sjfc.edu/10.13003/nggf-vt1j or directly from AWS. For further guidance and tips, see our documentation. The complete, compressed files are 208 GB.
Our metadata has several sources:
- Primarily, it comes from records deposited by over 24,000 members spread across over 160 countries. This year, we are pleased to have added a number of new countries by expanding our GEM program, which supports participation in the Crossref community from those in the most economically disadvantaged regions.
- Second, we enrich the data by adding automated matches, for example by adding DOIs to deposited references, and organisation identifiers to funders. We are undertaking a renewal of our matching processes, starting later this year with matching funders to ROR identifiers.
- Finally, we use selected third party sources to enrich the metadata. Currently we include retractions from the Retraction Watch database.
Most of the data can be freely reused and is not subject to copyright. Some limitations are applied to abstracts. See our documentation for more details about licensing.
Why do we do this?
The community is key to everything we do. Without the thousands of members depositing metadata, we would have nothing to share. And without countless organisations and individuals making use of the metadata, it would have no impact or value. Our mission is to serve our community, and making metadata publicly and openly available is one of our key values. The public data file is just one of a number of ways in which we enable metadata retrieval.
Over the last year, there have been over 600 downloads of the public data file. In addition, we see around 2 billion hits to our public APIs each month. We are always excited to hear about the diverse and interesting ways in which metadata can be used.
What’s different this year?
Thanks to the rich metadata, the records deposited with Crossref are interconnected with many types of relationships between works, people, and organisations, that tell the story of the research endeavor. The latest public data file reflects the current status of the research nexus as we know it and we’re delighted to share it with the community.
This year’s dataset contains 12.7 million new records (a 7.6% increase since last year). Across the board, we’re also seeing richer metadata records, with more abstracts (up 15%), ORCID identifiers for authors (up 20%), ROR identifiers for organisations (up 250%), and links to grant identifiers for funding (reaching 50,000 records).
Research integrity is a current theme in our community. We can see that members increasingly look to leverage metadata in service of asserting trust in their works. An additional 27% more records have Crossmark enabled, meaning that the member responsible is open about research integrity practices, and committed to communicating corrections, retractions, and other post-publication changes. In addition, this year’s snapshot contains retractions from the Retraction Watch database.
If you have any questions or feedback about the public data file, or would like to discuss how you can use it, head over to our community forum and join the conversation.