In the United States, Tax Season descends upon the country every April, requiring millions of Americans to spend hours deciphering cryptic documents and performing complex math just to figure out what they owe. Wouldn’t it be grand if there was a way for a computer to take all the relevant documents and extract out exactly what the IRS is looking for? Lending Document AI from Google Cloud supports common document types used for Income Tax Filing, such as W-2s and 1099s. These advancements in machine learning technology now makes it possible to alleviate some anxiety leading up to April 15th.
Lending Document AI is a Document Understanding solution that allows for classification and parsing of documents commonly used in the mortgage lending industry. The data in these unstructured files is then converted into a structured format, which can be stored in a database or used for analysis and calculations. You can read more about the product in the announcement blog post. For this tax filing use case, we will focus on automatically classifying and parsing the 2020 editions of the following forms:
This sample application creates an automated pipeline where the user can bulk upload a collection of PDFs, the Lending Document Splitter & Classifier will classify each document and send each PDF to the appropriate specialized parser to extract the data, which can then be used to calculate an individual tax return and fill out a 1040 Form.
Let’s explore how this application works. You can check out the sample code in this GitHub Repository.
Here is an outline of the architecture of this application. As you can see, it utilizes Cloud Run and Firestore in Native Mode for the web application in addition to Document AI.
- The User uploads multiple PDF files to the web application, hosted on Cloud Run.
- An API call is made to the Lending Document Splitter & Classifier for each PDF file.
- The output of the classifier (e.g. W-2, 1099-MISC, etc.) is then mapped to an appropriate specialized parser in the Google Cloud Project.
- Each document file is sent to the appropriate specialized parser that matches the document type.
- The entities are extracted by the parser processor and the data is written to Firestore.
- The raw data is now retrieved from Firestore and displayed to the User showing the file classification and extracted values from each form.
- The data values from all the forms are used together to calculate an individual income tax return.
- The Calculated Tax Rates/Incomes/Deductions are displayed to the User in a Tabular Format matching the IRS Form 1040. The app also displays which form data was used for each field. (Some output fields use values from multiple forms, such as line 25b.)
Want to try this out for yourself? Here’s how you can deploy and run this application using a Google Cloud Project. You can run this in Cloud Shell (Quickstart) or on your local machine.
|NOTE: The Lending Processors in this Demo are in Limited GA as of March 2022. If you have a business use case for these processors, you can fill out and submit the Document AI limited access customer request form.|
1. Clone the GitHub Repository to get the sample code.
git clone https://github.com/GoogleCloudPlatform/document-ai-samples.git
2. Enter the directory for the tax pipeline demo
3. Install Python and the Google Cloud SDK if they aren’t already installed.
4. Install the python libraries:
pip install -r requirements.txt
5. Create a new Google Cloud project, and enable billing if you don’t already have one.
6. Enable the Document AI API:
7. Setup application default credentials:
Deploy demo application
1. Edit the
config.yaml file, adding your own Project Details
docai_processor_location: us # Document AI Processor Location (us OR eu)
docai_project_id: YOUR_PROJECT_ID # Project ID for Document AI Processors
collection: tax_demo_documents # Set with your preferred Firestore Collection Name
project_id: YOUR_PROJECT_ID # Project ID for Firestore Database
2. Run setup scripts to create the processors and Cloud Run app in your project.
gcloud run deploy tax-demo --source .
3. Visit the Deployed Web Page (You should get a link from the deployment command)
4. Upload Documents. I created some sample documents you can download from the sample-docs folder of the repository.
This demo currently supports the following Document Types (2020 Editions)
5. Click “Upload” Button, wait for processing to complete.
- The page will display the steps completed for each document file. These are also written to stdout for troubleshooting purposes.
6. View the extracted values from each file.
7. Click “Calculate Taxes” to see the tax calculation output
Warning: This is NOT financial advice, for educational purposes only.
Congratulations! You now have a fully functional tax processing application that can also be modified for use with other workflows that require data from multiple specialized documents.
The Document AI API is flexible and modular enough that most of the code in this example can be reused for any specialized processor.
Now tax returns can be filed with minimal manual effort!
If you want to learn more about Document AI, check out the Cloud Documentation and these videos:
- Getting started with the Document AI platform
- Process billions of pages and cut operational costs with DocAI
And if you want more hands-on experience, I recommend following these step-by-step codelabs to get started with the key features of Document AI:
By: Holt Skinner (Developer Relations Engineer)
Source: Google Cloud Blog
Our humans need coffee too! Your support is highly appreciated, thank you!