Crossref and the Public Knowledge Project (PKP) have been working closely together for many years, sharing resources and supporting our overlapping communities of organisations involved in communicating research. Now we’re delighted to share that we have agreed on a new set of objectives for our partnership, centred on further development of the tools that our shared community relies upon, as well as building capacity to enable richer metadata registration for organisations using the Open Journal Systems (OJS).
To mark Crossref’s 25th anniversary, we launched our first Metadata Awards to highlight members with the best metadata practices.
GigaScience Press, based in Hong Kong, was the leader among small publishers, defined as organisations with less than USD 1 million in publishing revenue or expenses. We spoke with Scott Edmunds, Ph.D., Editor-in-Chief at GigaScience Press, about how discoverability drives their high metadata standards.
What motivates your organisation/team to work towards high-quality metadata? What objectives does it support for your organisation?
Our objective is to communicate science openly and collaboratively, without barriers, to solve problems in a data- and evidence-driven manner through Open Science publishing. High-quality metadata helps us address these objectives by improving the discoverability, transparency, and provenance of the work we publish. It is an integral part of the FAIR principles and UNESCO Open Science Recommendation, playing a role in increasing the accessibility of research for both humans and machines. As one of the authors of the FAIR principles paper and an advisor of the Make Data Count project, I’ve also personally been very conscious to practice what I preach.
On behalf of the Nominating Committee, I’m pleased to share the slate of candidates for the 2025 board election.
Each year we do an open call for board interest. This year, the Nominating Committee received 51 submissions from members worldwide to fill five open board seats.
We have four large member seats and one small member seat open for election in 2025. We maintain a balanced board of 8 large member seats and 8 small member seats. Size is determined based on the organization’s membership tier (small members fall in the $0-$1,650 tiers and large members in the $3,900 - $50,000 tiers).
In 2022, we wrote a blog post “Rethinking staff travel, meetings, and events” outlining our new approach to staff travel, meetings, and events with the goal of not going back to ‘normal’ after the pandemic and said that in the future we would report on our efforts to balance online and virtual events, work life balance for staff, and track our carbon emissions. In December 2024, we wrote a blog post, “Summary of the environmental impact of Crossref,” that gave an overview of 2023 and provided the first report on our carbon emissions. Our report on 2023 only just made it into 2024, so we are happy to report on 2024 a little sooner in the year.
If you take a peek at our blog, you’ll notice that metadata and community are the most frequently used categories. This is not a coincidence – community is central to everything we do at Crossref. Our first-ever Metadata Sprint was a natural step in strengthening both. Cue fanfare!. And what better way of celebrating 25 years of Crossref?
We designed the Crossref Metadata Sprint as a relatively short event where people can form teams and tackle short problems. What kind of problems? While we expected many to involve coding, teams also explored documenting, translating, researching—anything that taps into our open, member-curated metadata. Our motivation behind this format was to create a space for networking, collaboration, and feedback, centered on co-creation using the scholarly metadata from our REST API, the Public Data File, and other sources.
What have we learned in planning
The journey towards the event was filled with valuable lessons and learnings from our community. Our initial call received submissions from 71 people, which was exciting but presented the first challenge: we felt our event would work better with a relatively smaller group. An additional challenge we faced was the enthusiasm from people from different regions of the world who were eager to join, but needed support to attend in person. It reminded us how global our community is, and how important it is to think about different ways of making participation possible, especially in future events.
We also wanted to make sure that participation wasn’t limited by technical background. The selection process included a preliminary review by several members of our team to bring in a mix of perspectives and reduce bias. The event welcomed participants from all kinds of expertise levels, including colleagues who had never worked with APIs before. We sought to provide common ground for all with several group calls, where we presented introductions to our tools and used the opportunity to collect requests about tools, specific data, and questions from the participants that could enhance their preparation during the sprint.
At the Crossref Metadata Sprint
I’ve recently stumbled upon the following quote from a recognized data scientist:
Numbers have an important story to tell. They rely on you to give them a clear and convincing voice. (Stephen Few) 1
It made me think that we can replace numbers for metadata and the idea still holds. Surrounded by the paleontological collections of the National Museum of Natural History, on 8th of April in Madrid, 21 participants and 5 Crossref staff came together to work on twelve different projects. These ranged from improvements to our Public Data file formats and exploring metadata completeness, to tackling multilingual metadata challenges, understanding citation impact for retracted works, and connecting Retraction Watch metadata with other knowledge graphs metadata.
The different teams that participated in the first Crossref Metadata Sprint.
The initial hours were the most energetic (but not chaotic!) as most of the participants had the chance to interact in person for the first time, ideas were exchanged, and pre-formed groups became more stable (however, one of the advantages of the format is that teams don't have to be rigid). Twelve coffee- and tea-powered projects started taking shape, a few of which are part of larger ideas under development. By the end of the second day, we saw:
Author changes between preprints and published articles.
Coverage of funding information by publisher.
Enriching citations with Crossref metadata.
Funding metadata completeness.
Improvement to the Public Data File.
Interoperability between Crossref DOIs and hash-based identifiers.
University of Tetova’s metadata coverage.
Retraction Watch data mash-up.
Perspective about AI-driven multilingual metadata.
Public Data File in Google Big Query.
Visibility of retractions across citations.
Visualising Crossref geographic member data.
Our team worked as part of some of these projects, providing valuable insights and feedback to the participants. We ended the first session with a group dinner and re-energised for the second day, which started with everybody fully immersed in their tasks. As we approached the conclusion, the groups started preparing some quick slides for a short presentation (that you can find here).
Our team and the participants left excited and looking forward to the next opportunity to collaborate. We certainly see the potential of recreating these spaces, and we’ll work on future editions in a different location. All of the project summaries and notes will remain stored in our metadata sprint Gitlab repo. Would you like to know more about any of these ideas? Let us know in the comments.
The first Crossref Metadata Sprint in a nutshell
Participants
None of this would’ve been possible without our enthusiastic participants. Huge thanks to everyone! Here is the full list of those who attended our inaugural Sprint: