Case Study No. 3

AI Agent that transforms scanned packing lists into Structured Records.

A food company that fulfills approximately 1,700 orders weekly (around 6,800 orders per month) is seeking to build an automated workflow for processing packing lists.

The project involves reading PDF packing lists stored in Google Drive, extracting structured data (order number, product names, and quantities), detecting pen color in checkboxes to assign packer identity (with a predefined mapping), and updating a Google Sheet with the results. The workflow should be automated, low-maintenance once set up, and ideally cost-efficient.

Warehouse worker in a safety vest and helmet holding a tablet, with icons showing packages being scanned, approved, and marked as pending.

Project Scope

The fulfillment team hand-checks every PDF packing list, then re-keys order numbers, SKUs, and quantities into a Google Sheet. With nearly 7,000 orders a month, this manual transcription is slow, error-prone, and offers no clear audit trail of which packer prepared each order.

  • Industry: Fresh & packaged foods
  • Order Volume: ≈ 1 700 orders per week (≈ 6 800 per month)
  • Current Assets: PDF packing lists in Google Drive, a shared Google Sheet for fulfilment metrics

Project Objectives

Design and implement a hands-off, low-maintenance workflow that:

  • Reads PDFs directly from Google Drive as soon as they land in the folder.
  • Extracts structured data including order number, order name, and quantity with high accuracy.
  • Identifies the packer automatically by detecting the ink colour used to tick check-boxes (e.g., blue = Alicia, black = Ravi, red = Moana).
  • Writes the parsed data into the master Google Sheet in real time, appending a packer name for traceability.

AI Agent Toolkit

To automate the extraction of data from scanned packing lists, each tool plays a clear role from handling incoming files to pulling out key details and saving them in a tracking system.

We’ve also added simple analogies to help explain what each tool does and how they work together as one smooth process.

Tool or Service
n8n
Role in Workflow
Coordinates the step-by-step process of parsing and logging packing lists.
Analogy
Workflow supervisor – controls the full data extraction process.
Tool or Service
Google Drive
Role in Workflow
Provides the source PDFs containing the scanned packing lists.
Analogy
Incoming mailbox – where scanned delivery documents are dropped off.
Tool or Service
PDF.co API
Role in Workflow
Converts PDFs into image files for better OCR and visual processing.
Analogy
Document converter – turns physical scans into readable digital formats.
Tool or Service
OpenAI GPT-4o
Role in Workflow
Reads the images and extracts key structured fields like item names, quantities, pen colors, etc.
Analogy
Data analyst – reads the documents and fills in the correct fields from handwritten info.
Tool or Service
Google Sheets
Role in Workflow
Logs the extracted information such as order numbers, SKUs, and packers into a structured table.
Analogy
Operations ledger – keeps a live, organized record of processed packing list data.

The Solution

A fully automated data extraction and processing agent was built using n8n, Google Drive, PDF.co, OpenAI GPT-4o, and Google Sheets. Once a new PDF packing list is uploaded to Google Drive, the system triggers automatically.

The PDF is converted into a readable image using PDF.co, then analyzed step-by-step by GPT to extract order names, order numbers, quantities, dates, and even pen color used and it is used to identify the packer. Finally, the parsed and enriched data is appended into a Google Sheet with no manual intervention required, ensuring fast, accurate, and consistent data logging.

Google Sheet template for tracking orders with columns for ID, order number, product names, quantity, date, pen color, associated packer, and status.
Set-up

Before running the automated workflow in n8n, make sure the Google Sheet is structured to capture the extracted data accurately.

The current sheet is organized into the following columns:

Automation workflow that retrieves files from Google Drive, processes PDFs to extract order details like names, numbers, quantities, dates, and pen colors, identifies the associated packer, and logs the data into Google Sheets.

1

ID

A unique identifier assigned to each processed packing list entry. This ensures no duplicate records are stored and helps track each row distinctly

2

Order Number

Extracted from the packing list, this identifies the unique order code

3

Product Names

Lists the product names as detected by GPT-4o from the scanned PDF.

4

Quantity

Displays the corresponding quantity of each product.

6

Date

Indicates the processing or packing date as recognized from the document.

7

Pen Color

Stores the detected pen color used to mark or check items, which is used to determine the packer’s identity.

8

Packer

Based on pen color analysis, this column logs the associated packer’s name.

9

Status

A checkbox column used for manual review or marking completion after validation.

Building the Solution

Illustration of an n8n automation pulling multiple PDF files from Google Drive for processing.

1Detect New Files from Google Drive

n8n continuously monitors a designated Google Drive folder for newly uploaded PDF packing lists. As soon as a new file is detected, the automation is triggered automatically with no manual checking required.

This workflow runs reliably in the background, ensuring every new packing list is processed in real time without delay or oversight.

Google Sheet template for managing AI-generated blog content, captions for multiple platforms, email content, and publishing status.

2PDF to Image Conversion and Hosting with PDF.co

The automated agent integrates with PDF.co to convert scanned PDFs into high quality images, enabling accurate data extraction using ChatGPT. When a PDF is uploaded to Google Drive, each page is processed and securely hosted by PDF.co for fast and reliable access in the workflow.

This streamlined setup reduces latency and supports efficient steps such as structured data extraction and pen color based packer identification

How Prompt Engineering powers this AI Agent

What makes this solution effective is not just the use of GPT 4o, but the intentional design behind each prompt. Every step in the workflow leverages these four prompt engineering techniques: Zero-Shot Prompting, Role Instructioning, Input Reference, and Output Constraints.

These structured instructions ensure the AI knows exactly what to extract, how to behave, and how to return precise, clean outputs every time. As a result, each output aligns perfectly with the task at hand and requires no manual editing or supervision.
Interested? Let's talk

3AI Data Extraction with ChatGPT

The automated agent utilizes ChatGPT to perform data extraction by prompting the AI model to identify key fields such as order name, amount, quantity, and date from each scanned packing list image. Once the images are hosted via PDF.co, a structured prompt is sent to ChatGPT, guiding it to detect and return the relevant information as clean, structured text.

This method of prompt-based extraction eliminates the need for complex parsing logic and minimizes manual data handling. It ensures accurate, scalable data capture that adapts to varied packing list formats with ease.

Order Names Extraction

Smart Prompting

Your prompt looks like:

You are a document parser.

Analyze the attached scanned order form image and extract only the Orders Name.

Here is the scanned order form image: {{ $('PDFco Api').item.json.body[0] }}

Return only the Orders Name, with no additional words, explanations, or punctuation.

Prompt Breakdown

Structured AI prompt for extracting only the order number from a scanned order form image, with labeled sections for role instructioning, task specification, input referencing, and output formatting.

Prompting Technique

How it's used?

Role Instructioning
The prompt starts by assigning the AI a specific identity: “You are a document parser,” which defines its purpose and scope within the task.
Task Specification
It explicitly tells the AI to “extract only the Orders Name,” ensuring a focused and constrained operation.
Input Referencing
The prompt points to the source of data using a reference token: {{ $('PDFco Api').item.json.body[0] }}, directing the AI to analyze a specific input.
Output Formatting
It ends with clear formatting rules: “Return only the Orders Name, with no additional words, explanations, or punctuation,” to ensure clean, predictable output.

Order Number Extraction

Smart Prompting

Your prompt looks like:

You are a document parser.

Analyze the attached scanned order form image and extract only the Order Number.

Here is the scanned order form image: {{ $('PDFco Api').item.json.body[0] }}

Return only the Order Number, with no additional words, explanations, or punctuation.

Prompt Breakdown

Structured AI prompt for extracting only the order name from a scanned order form image, with labeled sections for role instructioning, task specification, input referencing, and output formatting.

Prompting Technique

How it's used?

Role Instructioning
The AI is assigned a specific function at the start: “You are a document parser.” This sets the role and narrows the expected behavior.
Task Specification
The instruction clearly states what to extract: “extract only the Order Number,” guiding the AI to focus on one specific data point.
Input Referencing
The prompt identifies the input using a token: {{ $('PDFco Api').item.json.body[0] }}, telling the AI exactly which data to process.
Output Formatting
The prompt identifies the input using a token: {{ $('PDFco Api').item.json.body[0] }}, telling the AI exactly which data to process.

Quantity Extraction

Smart Prompting

Your prompt looks like:

You are a document parser.

Analyze the scanned order form image at: {{ $('PDFco Api').item.json.body[0] }}

Extract only the Quantity values (not the "Ship Quantity"). If multiple orders are present, extract all corresponding quantities.

Return only the Quantity values—no extra words, explanations, or punctuation.

Prompt Breakdown

Image showing a structured AI prompt for extracting only the quantity values from a scanned order form image, with labeled sections for role instructioning, input referencing, task specification, and output formatting.

Prompting Technique

How it's used?

Role Instructioning
The AI is assigned the role of a “document parser” at the beginning, establishing its function and narrowing its behavior.
Task Specification
The instruction clearly defines the extraction scope: “Extract only the Quantity values (not the 'Ship Quantity')...”, ensuring the task is precise and well-bounded.
Input Referencing
The prompt includes the exact location of the scanned form image via a dynamic token: {{ $('PDFco Api').item.json.body[0] }}, telling the AI where to look.
Output Formatting
The AI is directed to return results in a strict format: “no extra words, explanations, or punctuation,” which enforces consistency and clean data extraction.

4Pen Color Analysis with ChatGP

The automated agent includes a Pen Color Analysis step powered by OpenAI. Using advanced image recognition, the system detects the color of the checkmarks in each scanned packing list image and maps it to a specific packer using a predefined color to packer reference.

This step ensures that every entry in the Google Sheet includes not only the order details but also the correct packer identity automatically assigned by the AI.

Quantity Extraction

Smart Prompting

Your prompt looks like:

You are a document parser.

Analyze the scanned order form image at: {{ $('PDFco Api').item.json.body[0] }}.

Locate the scribbled or check mark pen color beside the "Packer" section and extract it. 

Only output is just the pen color (e.g. Red, Blue, Purple)

Prompt Breakdown

Structured AI prompt for identifying an associated name based on pen color used, including an embedded color-to-name mapping, with labeled sections for role instructioning, task specification, output formatting, embedded knowledge base, and input referencing.

Prompting Technique

How it's used?

Role Instructioning
The AI is given a clear identity at the start: “You are a document parser,” which sets its function and expected behavior.
Task Specification
The AI is told “Locate the scribbled or check mark pen color beside the ‘Packer’ section and extract it,” which defines the specific task and visual cue to focus on.
Input Referencing
The input source is precisely defined using: {{ $('PDFco Api').item.json.body[0] }}, ensuring the AI knows what to analyze.
Output Formatting
The AI is told “Locate the scribbled or check mark pen color beside the ‘Packer’ section and extract it,” which defines the specific task and visual cue to focus on.

Associated Packer

Smart Prompting

Your prompt looks like:

You are a document parser.

Extract and identify the associated name based on the pen color used.

Return only the name—no additional words, formatting, or punctuation.

Color-to-Name Mapping:
Red → Olivia Thompson
Blue → Jack Williams
Purple → Sophie Patel

Here: {{ $json.content }}

Prompt Breakdown

Structured AI prompt for extracting only the pen color from a scanned order form, with labeled sections for role instructioning, input referencing, task specification, and output formatting.

Prompting Technique

How it's used?

Role Instructioning
The AI is assigned the role “You are a document parser,” which clearly defines its task and function.
Task Specification
The instruction “Extract and identify the associated name based on the pen color used” gives the AI a specific goal tied to visual and logical processing.
Input Referencing
The input source is precisely defined using: {{ $('PDFco Api').item.json.body[0] }}, ensuring the AI knows what to analyze.
Output Formatting
The AI is told “Locate the scribbled or check mark pen color beside the ‘Packer’ section and extract it,” which defines the specific task and visual cue to focus on.
Embedded Knowledge Base
The prompt includes an inline reference table: “Color-to-Name Mapping” and this acts as a local rulebook, letting the AI interpret the color-to-name mapping without outside data.

5Recording AI Generated Data to Google Sheets

The final step records all extracted data points retrieved using ChatGPT directly into a Google Sheet for structured reporting and easy review. Each product in the packing list is entered as a separate row, including fields such as order name, order number, quantity, date, pen color, and assigned packer.

By using ChatGPT, the agent ensures accurate and consistent extraction before writing to the sheet. This organized data logging provides real time and structured records of packing activities, making it easy to monitor, review, and analyze order fulfillment at scale.

Google Sheet containing completed order data with columns for ID, order number, product names, quantity, date, pen color, associated packer, and status, with all entries marked as completed.
Results at a glance.
This automation replaces repetitive manual encoding with a fully autonomous, AI-powered system that processes scanned packing lists, extracts structured data, and updates records in real time.

It ensures speed, accuracy, and traceability for over 6,000+ orders every month, all without daily human involvement.

Automated PDF Intake and Conversion

Instant File Detection and Image Preparation

Every time a new packing list is added to a designated Google Drive folder, the system automatically:

- Detects the new file via n8n’s Google Drive trigger
- Uses PDF.co to convert the PDF into high-quality image files
- Prepares each file for AI-based extraction by removing formatting noise

This ensures that scanned documents are clean, consistent, and ready for analysis without any manual conversion.

AI Extraction of Order and Packer Data

Instant File Smart Text Parsing + Visual Ink Detection and Image Preparation

Each image is passed through OpenAI GPT Vision, which extracts:

- Order number, product name, quantity, and other key fields
- Packer identity by detecting ink color used in checkbox markings

The combination of text recognition and color-based logic allows the system to go beyond OCR, providing context-aware, structured results from semi-structured input.

Real-Time Update to Master Google Sheet

Live Fulfillment Logging With Full Traceability

As soon as data is extracted, it’s instantly appended to a shared Google Sheet used for tracking order fulfillment. The log includes:

- Full order details (date, items, quantities)
- The detected packer name based on ink color
- Time-stamped entries for auditing and validation

This provides a centralized, always-up-to-date view of operations, accessible by any team member, anytime.

Fully Hands-Free Processing at Scale

Built for Volume, Designed for Peace of Mind

Capable of handling thousands of PDFs per month, this system:

- Runs without daily human input
- Requires little to no manual correction
- Reduces fulfillment errors caused by data entry mistakes
- Saves hundreds of hours monthly by eliminating repetitive admin work

The automation is reliable, scalable, and built to last, ensuring lean operations with high output accuracy.

Comparison:
AI-Powered Packing List Parser vs. Manual Data Entry Staff

This is a comparison of the estimated monthly costs and daily output capacity between an Automated AI Agent and traditional content or virtual assistants (VAs).

Evaluation Criteria

Automated AI Agent

Manual Content Assistant / Virtual Assistant (VA)

Estimated Monthly Cost
✅ Ranges from $80 to $200 per month, covering automation tools, OCR parsing, and cloud storage infrastructure
❌ Ranges from $600 to $2,500 per month, depending on full-time salary or hourly data entry wages
Daily Output Capacity
✅ Can process between 100 to 500 scanned packing lists per day, triggered by folder updates on Google Drive
❌ Ranges from $600 to $2,500 per month, depending on full-time salary or hourly data entry wages

What's next?

Book a Call

This AI agent shows that when precision parsing meets thoughtful prompt engineering, the result isn’t just automation, it’s transformation.

If you’re overwhelmed by manual data entry, complex parsing, or repetitive workflows, this is your blueprint.

Built once. Scales without limits.

Whether you’re managing a growing volume of scanned forms or looking for a seamless way to capture every detail with accuracy, this extraction engine can be cloned, adapted and launched to fit your exact workflow and data structure.

Ready to automate

your data capture without
compromising precision?

Fill out the form and let’s get started.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.