Project Setup
In this first step, you will set up the project structure, install all dependencies, and verify that PyMuPDF, OpenAI Vision, and FastAPI are working together. By the end of this lesson, you will have a running server ready to accept document uploads.
Architecture Overview
The Document Intelligence App has four main processing stages that documents flow through:
- Upload Handler: Accepts PDF, image, and document files via drag-and-drop or API endpoint.
- Text Extraction: PyMuPDF extracts text, tables, and layout from digital PDFs. tabula handles complex table structures.
- Vision Analysis: GPT-4 Vision processes scanned documents, handwritten notes, and complex visual layouts.
- Structured Output: Pydantic models validate and structure extracted data into clean JSON for downstream systems.
Document Upload
|
v
[File Type Detection]
|
+--- Digital PDF ---> [PyMuPDF] ---> Text + Tables
| |
+--- Scanned/Image -> [GPT-4 Vision] -> Visual Understanding
| |
+------------------------------------+---+
|
v
[Pydantic Validation]
|
v
[Structured JSON Output]
Step 1: Create the Project Structure
Create the following directory structure:
doc-intelligence/
+-- .env
+-- .env.example
+-- requirements.txt
+-- app/
| +-- __init__.py
| +-- main.py # FastAPI entry point
| +-- config.py # Environment config
| +-- extraction/
| | +-- pdf_extractor.py # PyMuPDF text extraction
| | +-- table_extractor.py# tabula table extraction
| | +-- layout_analyzer.py# Page layout analysis
| +-- vision/
| | +-- vision_analyzer.py# GPT-4 Vision integration
| +-- structuring/
| | +-- schemas.py # Pydantic extraction schemas
| | +-- extractor.py # Field extraction logic
| +-- pipeline/
| | +-- processor.py # Document processing pipeline
| | +-- queue.py # Async job queue
| +-- models/
| +-- document.py # Document data models
+-- frontend/
| +-- index.html # Upload and review UI
+-- uploads/
+-- results/
+-- tests/
+-- test_extraction.py
Run these commands to create the structure:
# Create project directory
mkdir -p doc-intelligence/{app/{extraction,vision,structuring,pipeline,models},frontend,uploads,results,tests}
# Create __init__.py files
touch doc-intelligence/app/__init__.py
touch doc-intelligence/app/extraction/__init__.py
touch doc-intelligence/app/vision/__init__.py
touch doc-intelligence/app/structuring/__init__.py
touch doc-intelligence/app/pipeline/__init__.py
touch doc-intelligence/app/models/__init__.py
Step 2: Define Dependencies
Create requirements.txt with all the packages we need:
# requirements.txt
fastapi==0.115.6
uvicorn[standard]==0.34.0
python-dotenv==1.0.1
pydantic-settings==2.7.1
python-multipart==0.0.20
# PDF Processing
PyMuPDF==1.25.1
tabula-py==2.9.3
Pillow==11.1.0
# AI / Vision
openai==1.58.1
# Async processing
aiofiles==24.1.0
celery[redis]==5.4.0
# Utilities
httpx==0.28.1
python-magic==0.4.27
Step 3: Environment Configuration
Create .env.example and then copy it to .env:
# .env.example
OPENAI_API_KEY=sk-your-key-here
OPENAI_VISION_MODEL=gpt-4o
OPENAI_CHAT_MODEL=gpt-4o-mini
UPLOAD_DIR=uploads
RESULTS_DIR=results
MAX_FILE_SIZE_MB=50
ALLOWED_EXTENSIONS=pdf,png,jpg,jpeg,tiff,bmp
LOG_LEVEL=INFO
Now create the config module that loads these values with validation:
# app/config.py
from pydantic_settings import BaseSettings
from functools import lru_cache
class Settings(BaseSettings):
"""Application settings loaded from environment variables."""
# OpenAI
openai_api_key: str
openai_vision_model: str = "gpt-4o"
openai_chat_model: str = "gpt-4o-mini"
# File handling
upload_dir: str = "uploads"
results_dir: str = "results"
max_file_size_mb: int = 50
allowed_extensions: str = "pdf,png,jpg,jpeg,tiff,bmp"
# Logging
log_level: str = "INFO"
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
@property
def allowed_ext_list(self) -> list[str]:
return [ext.strip() for ext in self.allowed_extensions.split(",")]
@property
def max_file_size_bytes(self) -> int:
return self.max_file_size_mb * 1024 * 1024
@lru_cache()
def get_settings() -> Settings:
return Settings()
Step 4: Create the FastAPI Entry Point
Create app/main.py with file upload support:
# app/main.py
import logging
from pathlib import Path
from fastapi import FastAPI, UploadFile, File, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles
from fastapi.responses import FileResponse
from app.config import get_settings
settings = get_settings()
# Create directories
Path(settings.upload_dir).mkdir(parents=True, exist_ok=True)
Path(settings.results_dir).mkdir(parents=True, exist_ok=True)
logging.basicConfig(
level=getattr(logging, settings.log_level),
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
app = FastAPI(
title="Document Intelligence API",
description="AI-powered document parsing and data extraction",
version="1.0.0",
)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
app.mount("/static", StaticFiles(directory="frontend"), name="static")
@app.get("/")
async def root():
return FileResponse("frontend/index.html")
@app.get("/health")
async def health_check():
return {
"status": "healthy",
"vision_model": settings.openai_vision_model,
"max_file_size_mb": settings.max_file_size_mb,
"allowed_extensions": settings.allowed_ext_list,
}
@app.post("/api/upload")
async def upload_document(file: UploadFile = File(...)):
"""Upload a document for processing."""
ext = file.filename.rsplit(".", 1)[-1].lower() if file.filename else ""
if ext not in settings.allowed_ext_list:
raise HTTPException(
status_code=400,
detail=f"File type .{ext} not allowed. Allowed: {settings.allowed_ext_list}",
)
content = await file.read()
if len(content) > settings.max_file_size_bytes:
raise HTTPException(status_code=400, detail=f"File too large. Max: {settings.max_file_size_mb}MB")
file_path = Path(settings.upload_dir) / file.filename
with open(file_path, "wb") as f:
f.write(content)
logger.info(f"Uploaded: {file.filename} ({len(content)} bytes)")
return {
"filename": file.filename,
"size_bytes": len(content),
"status": "uploaded",
"message": "File uploaded. Use /api/process to extract data.",
}
Step 5: Verify the Setup
# Create virtual environment and install
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Copy env and add your API key
cp .env.example .env
# Start the server
uvicorn app.main:app --reload --port 8000
# Test health endpoint
curl http://localhost:8000/health
# Test file upload
curl -X POST http://localhost:8000/api/upload -F "file=@sample.pdf"
Key Takeaways
- The project separates concerns into extraction, vision, structuring, and pipeline packages.
- PyMuPDF handles digital PDFs, GPT-4 Vision handles scanned/visual documents, Pydantic validates output.
- File upload validation prevents oversized files and unsupported formats from entering the pipeline.
- The configuration module centralizes all settings and validates them at startup.
What Is Next
In the next lesson, you will build the PDF text and table extraction module — the code that reads PDF files, extracts text with layout awareness, and pulls structured data from tables using PyMuPDF and tabula.
Lilly Tech Systems