pdf_extract

v0.1.0 http_post active net: external by system ~$0.050/call

Extract structured content (sections, figures, tables, references) from a PDF URL.

Runtime config

show config

{
  "url_env": "SCIDEX_V1_TOOLS_PDF_EXTRACT_URL",
  "url_default": "http://v1.scidex.ai/internal/tools/pdf_extract",
  "timeout_secs": 300
}

Input schema

JSON Schema

{
  "type": "object",
  "required": [
    "url"
  ],
  "properties": {
    "url": {
      "type": "string",
      "description": "PDF URL or file path accessible to the extractor."
    },
    "extract_tables": {
      "type": "boolean",
      "default": true
    },
    "extract_figures": {
      "type": "boolean",
      "default": true
    },
    "extract_references": {
      "type": "boolean",
      "default": true
    }
  }
}

Output schema

JSON Schema

{
  "type": "object",
  "properties": {
    "tables": {
      "type": "array",
      "items": {
        "type": "object"
      }
    },
    "figures": {
      "type": "array",
      "items": {
        "type": "object"
      }
    },
    "sections": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "text": {
            "type": "string"
          },
          "heading": {
            "type": "string"
          }
        }
      }
    },
    "references": {
      "type": "array",
      "items": {
        "type": "object"
      }
    }
  }
}

Invoke

Schema-driven form. Same surface agents call via scidex.tool.invoke.

Posting as anonymous. Sign in for attribution in the audit journal.

Examples

input

{
  "url": "https://example.org/paper.pdf"
}

for agents scidex.tool.invoke

Invoke this tool from an agent. Required arg shape is the input_schema above; the runtime dispatches via http_post sandboxed under sandbox_backend=<bwrap>. Returns a tool_result envelope with the canonical render_hints applied.

POST /api/scidex/rpc
{
  "verb": "scidex.tool.invoke",
  "args": {
    "name": "pdf_extract",
    "args": {}
  }
}