AI-assisted data extraction for systematic reviews
Paperglean reads your PDFs and fills your extraction template — every value grounded to a verbatim quote on a numbered page. You verify in one click, and the journal-ready audit package writes itself.
model claude-opus-4-8 · prompts sha256 7c2b… · reviewed by j.osei · κ = 0.91
The review screen. Click an evidence chip and the source viewer jumps to the page, quote highlighted — verification is one look and one click, not a re-read.
The pipeline
One uniform path for every paper. Each stage writes to an append-only audit log, and provenance — models, prompt hashes, timestamps — is stamped onto every row you export.
Drag-and-drop full-text PDFs, or import a Zotero collection with its curated, DOI-first metadata.
One engine produces one canonical, archived text per paper — the exact text the model cites against. A quality gate catches scans and garbled OCR.
Claude fills your template; the Citations API binds every value to a verbatim quote and page.1 A second pass maps values to your controlled vocabulary — evidence is re-attached in code and can't be altered.
A primary reviewer verifies every row; a configurable sample goes to a second reviewer. Field-level disagreements surface on the Agreement page.
A workbook shaped by your template, evidence quotes as cell comments — plus the one-zip audit package: METHODS.md, prompts, provenance, hashes.
Why grounding matters
Generic AI summarisation paraphrases — and sometimes invents. Paperglean never asks the model
to be trusted: every extracted value is bound to a verbatim cited_text and page
from the archived source, so checking a value means reading one highlighted sentence, not the whole paper.
The evidence record is attached outside the model after extraction and never passes through the normalisation step — it cannot be altered or fabricated downstream. If a value has no supporting quote, the review screen says so, loudly.
Reviewers can flip the source viewer between the original PDF and the OCR text at any time, to see exactly what the model read.
“…respondents were willing to pay a premium of 18–24% for garments carrying a verified organic-cotton label…”
The workbench
Built around how evidence-synthesis teams actually work — templates, second reviewers, agreement statistics, and a record you can hand to an editor.
Declare your fields, controlled vocabularies, row unit, and export shape in a per-project template — a form editor with raw-JSON mode for power users. The same pipeline serves any review domain, no code change.
A second reviewer independently verifies a configurable sample. Per-field agreement and Cohen's κ are computed continuously; conflicts queue up for resolution, field by field.
Generated METHODS.md, per-paper provenance, the full grounded evidence record, exact prompts and template, cost logs, and a SHA-256 manifest — ready to attach to a journal submission.
Pull the included papers' PDF attachments straight from a Zotero collection. Zotero stays the bibliographic database of record; imports are content-hash deduplicated, so re-runs are safe.
Key results often live in charts, not prose. Figures are located by OCR, re-cropped at high resolution from the original PDF, and shown to the extraction model alongside the text.
Exact model versions — never floating aliases — stamped on every paper. DOI identity via Crossref, duplicate detection, and an append-only audit log of every pipeline step and human correction.
Journal compliance
An extraction tool is only useful if your methods survive peer review. Paperglean is designed around RAISE — the consensus framework for responsible AI in evidence synthesis that journals and institutions are adopting2 — and the disclosure requirements of Cochrane, BMJ, The Lancet, and PRISMA-AI.
Human data-extraction error rates reach ~50% in systematic reviews. AI-assisted extraction with human verification is not just permitted — it is arguably the more rigorous method.2
after the RAISE consensus framework · PMC12644243
Pricing
typical paper ≈ $0.08–0.15 · sign-up credit $2 ≈ 15–25 papers3
Traceability is the product, so no plan hides it: the audit package, dual review, agreement statistics, and Zotero import ship on the free tier too.
Create a project, drop in three PDFs, and watch every value arrive with its quote attached.