Metadata Matching

Metadata matching is the task of finding an identifier for an item based on a structured or unstructured “description” of it. Some examples include:

  • finding a DOI for a cited article based on a citation string
  • finding the ROR ID for an organisation based on an affiliation string
  • finding the ORCID ID for a researcher based on the person’s name and affiliation
  • finding the grant DOI based on an award number and a funder name

Whether done manually as part of the publishing workflow, semi-automatically as an auto-complete functionality for forms, or fully automatically to fill in metadata gaps in larger databases, metadata matching gives us a more complete picture of the research nexus by discovering missing relationships between various entities within and throughout the scholarly record.

Read more about metadata matching in the blog series:

In April 2025, we launched the matching project, which is a major effort to rebuild Crossref’s metadata matching workflows using modern software development and data science practices. The goal is to create a dedicated consolidated matching workflow that will eventually replace all existing and future production matching processes, with results made available through the REST API. This project covers six matching tasks: bibliographic reference matching, funder name matching, preprint matching, affiliation matching, grant matching, and title matching.

project phasematching taskinputtarget identifierstatus
1funder matchingfunder organisation nameROR IDin production as part of the legacy CS system; matches available in the REST API
2preprint matchingjournal article metadatapreprint DOIin production as part of the legacy CS system; matches not available in the REST API
2affiliation matchingaffiliation stringROR IDnot in production
2grant matchingfunding metadatagrant DOInot in production
3reference matchingbibliographic referenceDOIin production as part of the legacy CS system; matches available in the REST API
3title matchingjournal titleinternal Crossref journal IDin production as part of the legacy CS system; matches not available in the REST API

Additional reading and resources

Funder name matching

More coming soon…

Preprint matching

Background reading: Discovering relationships between preprints and journal articles

Recommended strategy: code

A ground truth evaluation dataset: dataset

A dataset with relationships between preprints and journal articles discovered by matching within Crossref data: dataset

Affiliation matching

Recommended strategy: code

A ground truth evaluation dataset: dataset

A dataset with relationships involving research organisations discovered by matching within Crossref data: dataset

Grant matching

Background reading:

Citation matching

Background reading:

Recommended startegy for unstructured citation matching: code

Title matching

More coming soon…

Page owner: Dominika Tkaczyk   |   Last updated 2025-July-25