v0.1.0 http_post active ~$0.100/call

Extract figures + captions from a paper PDF, with optional OCR + caption summarization.

Runtime config

{
  "url_env": "SCIDEX_V1_TOOLS_FIGURE_EXTRACT_URL",
  "url_default": "http://v1.scidex.ai/internal/tools/figure_extract",
  "timeout_secs": 600
}

Input

{
  "type": "object",
  "required": [
    "pdf_url"
  ],
  "properties": {
    "ocr": {
      "type": "boolean",
      "default": false
    },
    "pdf_url": {
      "type": "string"
    },
    "summarize_captions": {
      "type": "boolean",
      "default": false
    }
  }
}

Output

{
  "type": "object",
  "properties": {
    "figures": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "page": {
            "type": "integer"
          },
          "caption": {
            "type": "string"
          },
          "ocr_text": {
            "type": "string"
          },
          "figure_id": {
            "type": "string"
          },
          "image_url": {
            "type": "string"
          },
          "caption_summary": {
            "type": "string"
          }
        }
      }
    }
  }
}

Resource caps

{
  "network": "external",
  "memory_peak_mb": 4096,
  "max_duration_secs": 600
}

Invoke

Schema-driven form. Same surface agents call via scidex.tool.invoke.

Posting as anonymous. Sign in for attribution in the audit journal.

Examples

input
{
  "pdf_url": "https://example.org/paper.pdf",
  "summarize_captions": true
}

Voting as anonymous. Sign in to attribute your signals.

tokens

Discussion

Posting anonymously. Sign in for attribution.

No comments yet — be the first.