aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Engineering
  • Solutions
  • Technology
  • Tools

Automating Income Taxes With Document AI

  • aster.cloud
  • April 30, 2022
  • 4 minute read

In the United States, Tax Season descends upon the country every April, requiring millions of Americans to spend hours deciphering cryptic documents and performing complex math just to figure out what they owe. Wouldn’t it be grand if there was a way for a computer to take all the relevant documents and extract out exactly what the IRS is looking for? Lending Document AI from Google Cloud supports common document types used for Income Tax Filing, such as W-2s and 1099s. These advancements in machine learning technology now makes it possible to alleviate some anxiety leading up to April 15th.

Lending Document AI is a Document Understanding solution that allows for classification and parsing of documents commonly used in the mortgage lending industry. The data in these unstructured files is then converted into a structured format, which can be stored in a database or used for analysis and calculations. You can read more about the product in the announcement blog post. For this tax filing use case, we will focus on automatically classifying and parsing the 2020 editions of the following forms:


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

  • W-2
  • 1099-DIV
  • 1099-INT
  • 1099-MISC
  • 1099-NEC

This sample application creates an automated pipeline where the user can bulk upload a collection of PDFs, the Lending Document Splitter & Classifier will classify each document and send each PDF to the appropriate specialized parser to extract the data, which can then be used to calculate an individual tax return and fill out a 1040 Form.

Overview

Let’s explore how this application works. You can check out the sample code in this GitHub Repository.

Read More  Bigtable Autoscaling: Deep Dive And Cost Saving Analysis

Here is an outline of the architecture of this application. As you can see, it utilizes Cloud Run and Firestore in Native Mode for the web application in addition to Document AI.

 

  1. The User uploads multiple PDF files to the web application, hosted on Cloud Run.
  2. An API call is made to the Lending Document Splitter & Classifier for each PDF file.
  3. The output of the classifier (e.g. W-2, 1099-MISC, etc.) is then mapped to an appropriate specialized parser in the Google Cloud Project.
  4. Each document file is sent to the appropriate specialized parser that matches the document type.
  5. The entities are extracted by the parser processor and the data is written to Firestore.
  6. The raw data is now retrieved from Firestore and displayed to the User showing the file classification and extracted values from each form.
  7. The data values from all the forms are used together to calculate an individual income tax return.
  8. The Calculated Tax Rates/Incomes/Deductions are displayed to the User in a Tabular Format matching the IRS Form 1040. The app also displays which form data was used for each field. (Some output fields use values from multiple forms, such as line 25b.)

Step-by-Step directions

Want to try this out for yourself? Here’s how you can deploy and run this application using a Google Cloud Project. You can run this in Cloud Shell (Quickstart) or on your local machine.

 

NOTE: The Lending Processors in this Demo are in Limited GA as of March 2022. If you have a business use case for these processors, you can fill out and submit the Document AI limited access customer request form.
Read More  Introducing Intel Data Center GPU Flex Series For The Intelligent Visual Cloud

 

Install dependencies

1. Clone the GitHub Repository to get the sample code.

git clone https://github.com/GoogleCloudPlatform/document-ai-samples.git

2. Enter the directory for the tax pipeline demo

cd document-ai-samples/tax-processing-pipeline-python

 

3. Install Python and the Google Cloud SDK if they aren’t already installed.

4. Install the python libraries:

pip install -r requirements.txt

5. Create a new Google Cloud project, and enable billing if you don’t already have one.

6. Enable the Document AI API:

 

gcloud services enable documentai.googleapis.com

7. Setup application default credentials:

 

gcloud auth application-default login

 

Deploy demo application

1. Edit the config.yaml file, adding your own Project Details

 

docai_processor_location: us # Document AI Processor Location (us OR eu)
 docai_project_id: YOUR_PROJECT_ID # Project ID for Document AI Processors
 firestore:
     collection: tax_demo_documents # Set with your preferred Firestore Collection Name
     project_id: YOUR_PROJECT_ID # Project ID for Firestore Database

 

2. Run setup scripts to create the processors and Cloud Run app in your project.

 

python3 setup.py
gcloud run deploy tax-demo --source .

 

3. Visit the Deployed Web Page (You should get a link from the deployment command)

 

4. Upload Documents. I created some sample documents you can download from the sample-docs folder of the repository.

This demo currently supports the following Document Types (2020 Editions)

  • W-2
  • 1099-DIV
  • 1099-INT
  • 1099-MISC
  • 1099-NEC

5. Click “Upload” Button, wait for processing to complete.

  • The page will display the steps completed for each document file. These are also written to stdout for troubleshooting purposes.

 

6. View the extracted values from each file.

 

7. Click “Calculate Taxes” to see the tax calculation output

Read More  Cloud Wisdom Weekly: 3 Ways Serverless Can Save Money And Accelerate App Development

 

Conclusion

Warning: This is NOT financial advice, for educational purposes only.

Congratulations! You now have a fully functional tax processing application that can also be modified for use with other workflows that require data from multiple specialized documents.

The Document AI API is flexible and modular enough that most of the code in this example can be reused for any specialized processor.

Now tax returns can be filed with minimal manual effort!

 

If you want to learn more about Document AI, check out the Cloud Documentation and these videos:

  • Getting started with the Document AI platform
  • Process billions of pages and cut operational costs with DocAI

And if you want more hands-on experience, I recommend following these step-by-step codelabs to get started with the key features of Document AI:

  • Optical Character Recognition (OCR) with Document AI (Python)
  • Form Parsing with Document AI (Python)
  • Specialized Processors with Document AI (Python)
  • Managing Document AI processors with Python

 

 

By: Holt Skinner (Developer Relations Engineer)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Artificial Intelligence
  • Document AI
  • Google Cloud
  • Python
  • Tutorial
You May Also Like
View Post
  • Computing
  • Multi-Cloud
  • Technology

How to create an AWS free tier account

  • July 10, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

How to configure multiple AWS CLI authentication credentials

  • July 10, 2025
View Post
  • Technology

Formula E accelerates its work with Google Cloud Storage and Google Workspace

  • July 9, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

What is database as a service (DBaaS)?

  • July 7, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

The cloud’s role in PQC migration

  • July 7, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Hybrid cloud has hit the mainstream – but firms are still confused about costs

  • July 7, 2025
View Post
  • Technology

Building secure, scalable AI in the cloud with Microsoft Azure

  • July 5, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Turns out OpenAI is the customer behind Oracle’s mysterious $30 billion cloud deal

  • July 3, 2025

Stay Connected!
LATEST
  • How to create an AWS free tier account
    • July 10, 2025
  • How to configure multiple AWS CLI authentication credentials
    • July 10, 2025
  • 3
    Formula E accelerates its work with Google Cloud Storage and Google Workspace
    • July 9, 2025
  • What is database as a service (DBaaS)?
    • July 7, 2025
  • The cloud’s role in PQC migration
    • July 7, 2025
  • 6
    Hybrid cloud has hit the mainstream – but firms are still confused about costs
    • July 7, 2025
  • 7
    Building secure, scalable AI in the cloud with Microsoft Azure
    • July 5, 2025
  • 8
    Turns out OpenAI is the customer behind Oracle’s mysterious $30 billion cloud deal
    • July 3, 2025
  • aster-cloud-erp-bill_of_materials_2 9
    What is an SBOM (software bill of materials)?
    • July 2, 2025
  • aster-cloud-sms-pexels-tim-samuel-6697306 10
    Send SMS texts with Amazon’s SNS simple notification service
    • July 1, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    A looming hyperscaler exodus? UK IT leaders are thinking of ditching US cloud providers – here’s why
    • June 26, 2025
  • Genome 2
    AlphaGenome: AI for better understanding the genome
    • June 25, 2025
  • aster-cloud-website-pexels-goumbik-574069 3
    Host a static website on AWS with Amazon S3 and Route 53
    • June 27, 2025
  • Camping 4
    The Summer Adventures : Camping Essentials
    • June 27, 2025
  • 6 edge monitoring best practices in the cloud
    • June 25, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.