Extract information from one or more documents.

Given a document, this endpoint will extract information from it.

Supported document formats are:
- pdf
- jpeg / jpg
- png
- tiff
- webp

Path Parameters

organization_id integer required
unique ID of your workspace
project_id integer required
is a unique ID of your project
model_id integer required
is a unique ID of your trained AI model

Query Parameters

return_bboxes string
whether or not to return bounding boxes
return_annotated_pages string
whether or not to return the images correspondent to each page, with the found matches, as a base64 encoded string

multipart/form-data

Request Body required

file binary required
file to upload

Responses

application/json

Schema
Example (from schema)

Schema

model string
Name of the model used to extract the information
all_required_fields_found boolean
If all required labels were found while processing the documents
all_confidence_thresholds_met boolean
If all the expected confidence for the labels were met while processing the documents
all_data_conversion_passed boolean
If all data was converted according to the labels configuration while processing the documents
total_credits_used integer
Total of credits consumed by the extraction
documents object[]
Information extracted from the documents, per document
name string
Name of the document
all_required_fields_found boolean
If all required labels were found while processing this specific document
all_confidence_thresholds_met boolean
If all the expected confidence for the labels were met while processing this specific document
all_data_conversion_passed boolean
If all data was converted according to the labels configuration while processing this specific document
credits_used integer
Total of credits consumed by the extraction for this specific document
matches object[]
All the information extracted, per document
bboxes array[]
All the bounding boxes found, per document
annotated_pages string[]
All the document pages with annotated date
hitl object
Information about the HITL
review_url string
the API endpoint to retrieve that review
status string
the HITL status (pending or completed)
result object[]
All the issues found by the HITL
confidences object[]
List all the matches bellow the expected threshold
label string
Label associated with the value
value string
Value found
expected_confidence number
Expected confidence for the label
found number
Found confidence for the label
labels object[]
All project labels and configuration for each label
id integer
The ID of the label
name string
The label's name.

{
  "model": "Invoice Extraction v1",
  "all_required_fields_found": true,
  "all_confidence_thresholds_met": true,
  "all_data_conversion_passed": true,
  "total_credits_used": 24,
  "documents": [
    {
      "name": "Invoice.pdf",
      "all_required_fields_found": true,
      "all_confidence_thresholds_met": true,
      "all_data_conversion_passed": true,
      "credits_used": 24,
      "matches": [
        {}
      ],
      "bboxes": [
        [
          [
            {
              "TLx": 0.05868118572292801,
              "TLy": 0.03465982028241335,
              "TRx": 0.06412583182093164,
              "TRy": 0.03465982028241335,
              "BRx": 0.06412583182093164,
              "BRy": 0.04150620453572957,
              "BLx": 0.05868118572292801,
              "BLy": 0.05868118572292801,
              "text": "John Doe",
              "label": "First name",
              "confidence": 0.9936525225639343
            }
          ]
        ]
      ],
      "annotated_pages": [
        "data:image/png;base64,iVBORw0KGgo..."
      ],
      "hitl": {
        "review_url": "https://api.deepopinion.ai/organizations/1/projects/1/hitl/1/reviews/1",
        "status": "pending",
        "result": [
          {
            "confidences": [
              {
                "label": "First name",
                "value": "John Doe",
                "expected_confidence": 90,
                "found": 70
              }
            ]
          }
        ]
      }
    }
  ],
  "labels": [
    {
      "id": 1,
      "name": "POS"
    }
  ]
}

Extract information from one or more documents.​

Extract information from one or more documents.