AI-assisted data extraction for systematic reviews

Systematic-review extraction, with the receipts.

Paperglean reads your PDFs and fills your extraction template — every value grounded to a verbatim quote on a numbered page. You verify in one click, and the journal-ready audit package writes itself.

model claude-opus-4-8 · prompts sha256 7c2b… · reviewed by j.osei · κ = 0.91

The review screen. Click an evidence chip and the source viewer jumps to the page, quote highlighted — verification is one look and one click, not a re-read.

verbatim citation grounding dual reviewer · Cohen's κ prompt hashes archived pinned model versions one-zip audit package Zotero import

The pipeline

From PDF to verified, exportable data — every step on the record.

One uniform path for every paper. Each stage writes to an append-only audit log, and provenance — models, prompt hashes, timestamps — is stamped onto every row you export.

01

Upload

Drag-and-drop full-text PDFs, or import a Zotero collection with its curated, DOI-first metadata.

02

Uniform OCR

One engine produces one canonical, archived text per paper — the exact text the model cites against. A quality gate catches scans and garbled OCR.

03

Grounded extraction

Claude fills your template; the Citations API binds every value to a verbatim quote and page.1 A second pass maps values to your controlled vocabulary — evidence is re-attached in code and can't be altered.

04

Human verification

A primary reviewer verifies every row; a configurable sample goes to a second reviewer. Field-level disagreements surface on the Agreement page.

05

Export & audit

A workbook shaped by your template, evidence quotes as cell comments — plus the one-zip audit package: METHODS.md, prompts, provenance, hashes.

Why grounding matters

Citations, not vibes.

Generic AI summarisation paraphrases — and sometimes invents. Paperglean never asks the model to be trusted: every extracted value is bound to a verbatim cited_text and page from the archived source, so checking a value means reading one highlighted sentence, not the whole paper.

The evidence record is attached outside the model after extraction and never passes through the normalisation step — it cannot be altered or fabricated downstream. If a value has no supporting quote, the review screen says so, loudly.

Reviewers can flip the source viewer between the original PDF and the OCR text at any time, to see exactly what the model read.

“…respondents were willing to pay a premium of 18–24% for garments carrying a verified organic-cotton label…”
cited_text · p. 7 field effect_size confidence high verified by j.osei · 2026-06-04 14:12 UTC

The workbench

Everything a rigorous review needs. Nothing it doesn't.

Built around how evidence-synthesis teams actually work — templates, second reviewers, agreement statistics, and a record you can hand to an editor.

Template-driven extraction

Declare your fields, controlled vocabularies, row unit, and export shape in a per-project template — a form editor with raw-JSON mode for power users. The same pipeline serves any review domain, no code change.

Dual-reviewer agreement

A second reviewer independently verifies a configurable sample. Per-field agreement and Cohen's κ are computed continuously; conflicts queue up for resolution, field by field.

One-zip audit package

Generated METHODS.md, per-paper provenance, the full grounded evidence record, exact prompts and template, cost logs, and a SHA-256 manifest — ready to attach to a journal submission.

Zotero import

Pull the included papers' PDF attachments straight from a Zotero collection. Zotero stays the bibliographic database of record; imports are content-hash deduplicated, so re-runs are safe.

Figure-aware extraction

Key results often live in charts, not prose. Figures are located by OCR, re-cropped at high resolution from the original PDF, and shown to the extraction model alongside the text.

Provenance, pinned

Exact model versions — never floating aliases — stamped on every paper. DOI identity via Crossref, duplicate detection, and an append-only audit log of every pipeline step and human correction.

Journal compliance

Built to get past reviewers and editors.

An extraction tool is only useful if your methods survive peer review. Paperglean is designed around RAISE — the consensus framework for responsible AI in evidence synthesis that journals and institutions are adopting2 — and the disclosure requirements of Cochrane, BMJ, The Lancet, and PRISMA-AI.

How Paperglean maps to each requirement →

  • Human oversight No value reaches an export without an attributed human verification.
  • Transparency Tool, model versions, prompts, and verification procedure — documented automatically.
  • Reproducibility Exact prompts, template, and model IDs in every audit package — enough to replicate.
  • Validation Inter-rater agreement and Cohen's κ on the double-reviewed sample, reported as standard.
Human data-extraction error rates reach ~50% in systematic reviews. AI-assisted extraction with human verification is not just permitted — it is arguably the more rigorous method.2

after the RAISE consensus framework · PMC12644243

Pricing

Pay per paper. Never per feature.

typical paper ≈ $0.08–0.15 · sign-up credit $2 ≈ 15–25 papers3

Traceability is the product, so no plan hides it: the audit package, dual review, agreement statistics, and Zotero import ship on the free tier too.

Your next review, with the receipts.

Create a project, drop in three PDFs, and watch every value arrive with its quote attached.

Notes & sources — our claims come grounded, too

  1. Anthropic Citations API: extracted values are bound to verbatim source spans with page locations; see the methods record in any audit package.
  2. RAISE — Responsible AI use in Systematic Evidence Synthesis, multi-stakeholder consensus framework: PMC12644243.
  3. Typical per-paper cost $0.08–0.15 depending on paper length and template complexity; the credit ledger itemises provider cost and platform markup per paper.