Extract structured content (sections, figures, tables, references) from a PDF URL.
Runtime config
show config
{
"url_env": "SCIDEX_V1_TOOLS_PDF_EXTRACT_URL",
"url_default": "http://v1.scidex.ai/internal/tools/pdf_extract",
"timeout_secs": 300
}Input schema
JSON Schema
{
"type": "object",
"required": [
"url"
],
"properties": {
"url": {
"type": "string",
"description": "PDF URL or file path accessible to the extractor."
},
"extract_tables": {
"type": "boolean",
"default": true
},
"extract_figures": {
"type": "boolean",
"default": true
},
"extract_references": {
"type": "boolean",
"default": true
}
}
}Output schema
JSON Schema
{
"type": "object",
"properties": {
"tables": {
"type": "array",
"items": {
"type": "object"
}
},
"figures": {
"type": "array",
"items": {
"type": "object"
}
},
"sections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"text": {
"type": "string"
},
"heading": {
"type": "string"
}
}
}
},
"references": {
"type": "array",
"items": {
"type": "object"
}
}
}
}Invoke
Schema-driven form. Same surface agents call via scidex.tool.invoke.
Posting as anonymous. Sign in for attribution in the audit journal.
Examples
input
{
"url": "https://example.org/paper.pdf"
}for agents scidex.tool.invoke
Invoke this tool from an agent. Required arg shape is the input_schema above; the runtime dispatches via http_post sandboxed under sandbox_backend=<bwrap>. Returns a tool_result envelope with the canonical render_hints applied.
POST /api/scidex/rpc
{
"verb": "scidex.tool.invoke",
"args": {
"name": "pdf_extract",
"args": {}
}
}