Extract information from one or more documents.
Given a document, this endpoint will extract information from it.
Supported document formats are:
- pdf
- jpeg / jpg
- png
- tiff
- webp
Path Parameters
- organization_id integer required
unique ID of your workspace
- project_id integer required
is a unique ID of your project
- model_id integer required
is a unique ID of your trained AI model
Query Parameters
- return_bboxes string
whether or not to return bounding boxes
- return_annotated_pages string
whether or not to return the images correspondent to each page, with the found matches, as a base64 encoded string
- multipart/form-data
Request Body required
- file binary required
file to upload
- 200
- 400
- 401
- 404
- 500
OK
- application/json
- Schema
- Example (from schema)
Schema
- model string
Name of the model used to extract the information
- all_required_fields_found boolean
If all required labels were found while processing the documents
- all_confidence_thresholds_met boolean
If all the expected confidence for the labels were met while processing the documents
- all_data_conversion_passed boolean
If all data was converted according to the labels configuration while processing the documents
- total_credits_used integer
Total of credits consumed by the extraction
documents object[]
Information extracted from the documents, per document
name stringName of the document
all_required_fields_found booleanIf all required labels were found while processing this specific document
all_confidence_thresholds_met booleanIf all the expected confidence for the labels were met while processing this specific document
all_data_conversion_passed booleanIf all data was converted according to the labels configuration while processing this specific document
credits_used integerTotal of credits consumed by the extraction for this specific document
matches object[]
All the information extracted, per document
bboxes array[]All the bounding boxes found, per document
annotated_pages string[]All the document pages with annotated date
hitl object
Information about the HITL
review_url stringthe API endpoint to retrieve that review
status stringthe HITL status (pending or completed)
result object[]
All the issues found by the HITL
confidences object[]
List all the matches bellow the expected threshold
label stringLabel associated with the value
value stringValue found
expected_confidence numberExpected confidence for the label
found numberFound confidence for the label
labels object[]
All project labels and configuration for each label
id integerThe ID of the label
name stringThe label's name.
{
"model": "Invoice Extraction v1",
"all_required_fields_found": true,
"all_confidence_thresholds_met": true,
"all_data_conversion_passed": true,
"total_credits_used": 24,
"documents": [
{
"name": "Invoice.pdf",
"all_required_fields_found": true,
"all_confidence_thresholds_met": true,
"all_data_conversion_passed": true,
"credits_used": 24,
"matches": [
{}
],
"bboxes": [
[
[
{
"TLx": 0.05868118572292801,
"TLy": 0.03465982028241335,
"TRx": 0.06412583182093164,
"TRy": 0.03465982028241335,
"BRx": 0.06412583182093164,
"BRy": 0.04150620453572957,
"BLx": 0.05868118572292801,
"BLy": 0.05868118572292801,
"text": "John Doe",
"label": "First name",
"confidence": 0.9936525225639343
}
]
]
],
"annotated_pages": [
"data:image/png;base64,iVBORw0KGgo..."
],
"hitl": {
"review_url": "https://api.deepopinion.ai/organizations/1/projects/1/hitl/1/reviews/1",
"status": "pending",
"result": [
{
"confidences": [
{
"label": "First name",
"value": "John Doe",
"expected_confidence": 90,
"found": 70
}
]
}
]
}
}
],
"labels": [
{
"id": 1,
"name": "POS"
}
]
}
No document was provided. At least one is necessary
User is not authenticated. This means token is either not present or invalid
The project, or model doesn't exist or the user doesn't have the required permission to access the resource.
Something is wrong with the API.