Changelog

April 16, 2025

🚀Features

Standardize V2.1 released.
The only difference is that fields which weren't found are now an explicit null. For example, we will produce {"rentAmount": null} where before we used to omit rentAmount alltogether if not found.
Scheduled Change: Version 2.1 will become default for all API users and website visitors. This will happen April 23, 2025.
If you must retain the old behavior, you can set stdVersion=2.0 in the API.
Expanded file‑format support – Word (.doc, .docx), HTML, plain text, and JSON uploads are now supported via both the website and API. Word and HTML files are converted to PDF; plain text and JSON files are parsed natively and do not consume parsing credits. Excel support is coming soon.

Parallel batch processing: Standardize batch jobs now execute documents in parallel instead of sequentially, significantly reducing overall processing time.

The promotion period for high effort level standardization ends April 23, 2025. From that date, setting effortLevel='high' will be charged 4 credits per page. This is double the standard rate of 2 credits per standardization.
The promotion period for review action ends April 23, 2024. From that date onwards, the review action will be charged 2 credit per page. This was previously free for our beta users.

February 25, 2025

Standardization V2 is now the new default in the website. We will continue supporting all versions on the API and website, but encourage users to try the new version. It will continue improving in the coming weeks, and we appreciate the feedback we have been getting from users to help improve it and make it even better.

The document viewer in the website has received a makeover, allowing you to more seamlessly move between viewing your document, text results, and standardizations. We hope you enjoy!
Uploading documents now auto-suggests relevant schemas based on the document type, allowing for a more streamlined flow.

The cost of schema creation and refinement has been reduced from 2 credits per page to 1 credit per page. We think this will let users feel more free to create more schemas and iterate on them in the Improve tab.
The cost of classification has been reduced from 1 credit per document, to 0.1 credit per page (rounded up), as this is more fair for most users.
The cost of analyze has been reduced to 0.5 credit per page (rounded up), and has been simplified into a uniform pricing regardless of whether you are analyzing a single document or multiple documents. The behavior has also changed to be more verbose.

February 8, 2025

We are pleased to announce that Standardization V2 has been released! It is available via both the website and the API. The new version is faster, more accurate, and more flexible than the previous one. However, it might not immediately work better for everyone, and V1 is still available and will remain the default on the website for a while longer. From the API, Standardization V2 has a separate endpoint: /v2/standardize/batch (see API docs for details). Also see a new article on Standardization in the Help Center that explains a bit about how the new version works, and what its input parameters are for. Please start experimenting with V2 and give us feedback, it will still be improving over the next weeks before the launch is finalized and it becomes the new default.

February 4, 2025

The parsing model has been upgraded to V2 in the API (in addition to the website). The POST /document endpoint accepts an optional parseVersion parameter, which can be set to 1 or 2 (default is now 2).
We have improved how we spatially display documents to the AI in standardization and analysis, which should improve results.

January 18, 2025

Standardizations can now be downloaded as Excel files from the API as well, under the endpoint /standardization/{standardization_id}/download/excel-url, which gives you a temporary URL to download the Excel file. This feature is free of charge.

The parsing model has been upgraded to a new version (V2), which improves accuracy with tables, checkmarks, and handwriting recognition. This update is now the default on the website and will become the default in the API in one week. The POST /document endpoint now accepts an optional parseVersion parameter, which can be set to 1 or 2. The default remains 1 for now but will switch to 2 in one week. To continue using the old version, set parseVersion to 1.

January 6, 2025

We added an ability to download individual standardizations as an Excel file. This feature is currently available only via the website, under the Standardization tab: click Download -> Excel. The Excel file will contain the same information as the standardization details page, but in a more structured format: non-array fields will be in a sheet called 'main', and array fields will be in separate sheets named after the array field. This feature is free of charge, and will be available in the API soon.

In document parsing, we removed the underscore padding in tables, as it caused issues with some documents. Newly parsed documents will revert to the previous behavior of having empty table cells filled with a simple empty string. For standardization with standardizationMode='sectionBased', we will still use padding to improve results.

January 3, 2025

You can now right-click tabs in the dashboard menu for opening a new tab (this was not possible before).
We have disabled the mobile view for the dashboard, as it was not optimized for mobile devices and caused issues.

December 30, 2024

We added an API endpoint to POST a new schema from scratch. Up until now, a schema could only be updated from an existing schema, but now you can add a schema object directly using the API. The endpoint is POST /schema - find more details in the API docs.

December 17, 2024

A bug was fixed where previously we allowed schemas to have fields with type=enum, which is not a valid type in JSON Schemas (enum is an additional key in a field, not a type). We only allow the types 'string', 'number', 'integer', 'boolean', 'object', 'array'.

December 7, 2024

Added the ability to download a PDF with the OCR layer baked in. Available both in the API at the endpoint document/{document_id}/download/ocr-url or via the website, under the Documents tab: click Download -> File (OCR Layer). In further detail, this feature allows you to download your PDF - which may be handwritten or contain images - with DocuPanda's OCR layer placed on top of the document in invisible font on the word level. This allows you to search your PDF, or highlight / copy text from it, even if the original document was just a scan. This service is free of charge.

In document parsing, we added underscore padding in tables (instead of empty string), which improves readability / rendering and standardization results, as it makes it easier for the AI to keep track of table columns. This affects anyone using the document.result.text output in its raw form, or anyone using standardization with standardizationMode='sectionBased'.