How to Use a PDF API in Python: Compress, Convert, and Extract Data in Minutes
DeveloperMay 28, 202610 min read

How to Use a PDF API in Python: Compress, Convert, and Extract Data in Minutes

Step-by-step guide to using the AllPDFMagic REST API from Python. Compress PDFs, extract text, convert Word/Excel files, and pull structured invoice data — all in under 50 lines of code.

AllPDFMagic Team

How to Use a PDF API in Python: Compress, Convert, and Extract Data in Minutes

Working with PDFs in Python used to mean wrestling with PyPDF2, reportlab, or pdfminer — brittle libraries that break on anything but the simplest PDFs. A modern PDF API removes that pain entirely: you send an HTTP request with a file, you get a processed file back. No dependencies to manage, no edge cases to handle, no C libraries to compile.

This guide walks through using the AllPDFMagic REST API from Python to compress PDFs, extract text, convert Office files, and pull structured data from invoices — all in under 50 lines of code.

Prerequisites

You need Python 3.8+ and the requests library:

pip install requests

Get a free API key at allpdfmagic.com/dashboard/api — 500 calls/month, no credit card required.

Basic setup

import requests

API_KEY = "apm_live_your_key_here"
BASE_URL = "https://www.allpdfmagic.com/api/v1"

session = requests.Session()
session.headers["Authorization"] = f"Bearer {API_KEY}"

Using a Session means the auth header is included on every request without repeating yourself.

Compress a PDF

def compress_pdf(input_path: str, output_path: str, level: str = "medium") -> int:
    """Compress a PDF. Level: low | medium | high. Returns original size - compressed size."""
    with open(input_path, "rb") as f:
        response = session.post(
            f"{BASE_URL}/pdf/compress",
            files={"file": (input_path, f, "application/pdf")},
            data={"level": level},
        )
    response.raise_for_status()
    with open(output_path, "wb") as out:
        out.write(response.content)
    original = os.path.getsize(input_path)
    compressed = os.path.getsize(output_path)
    return original - compressed

saved_bytes = compress_pdf("report.pdf", "report_compressed.pdf", level="high")
print(f"Saved {saved_bytes / 1024:.1f} KB")

Extract all text from a PDF

For PDFs with a text layer (not scanned), the /convert/pdf-to-txt endpoint is instant and free:

def extract_text(pdf_path: str) -> str:
    with open(pdf_path, "rb") as f:
        response = session.post(
            f"{BASE_URL}/convert/pdf-to-txt",
            files={"file": (pdf_path, f, "application/pdf")},
        )
    response.raise_for_status()
    return response.text

text = extract_text("contract.pdf")
print(text[:500])  # First 500 chars

For scanned PDFs without a text layer, use OCR instead:

def ocr_pdf(pdf_path: str) -> str:
    with open(pdf_path, "rb") as f:
        response = session.post(
            f"{BASE_URL}/ai/ocr",
            files={"file": (pdf_path, f, "application/pdf")},
        )
    response.raise_for_status()
    data = response.json()
    return data.get("text", "")

Convert Word/Excel/PPT to PDF

import os
from pathlib import Path

def to_pdf(input_path: str, output_path: str) -> None:
    """Convert Word, Excel, or PPT file to PDF."""
    ext = Path(input_path).suffix.lower()
    endpoint_map = {
        ".docx": "word-to-pdf", ".doc": "word-to-pdf",
        ".xlsx": "excel-to-pdf", ".xls": "excel-to-pdf",
        ".pptx": "ppt-to-pdf",  ".ppt": "ppt-to-pdf",
    }
    endpoint = endpoint_map.get(ext)
    if not endpoint:
        raise ValueError(f"Unsupported extension: {ext}")

    mime_map = {
        ".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
        ".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
        ".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
    }
    mime = mime_map.get(ext, "application/octet-stream")

    with open(input_path, "rb") as f:
        response = session.post(
            f"{BASE_URL}/convert/{endpoint}",
            files={"file": (input_path, f, mime)},
        )
    response.raise_for_status()
    with open(output_path, "wb") as out:
        out.write(response.content)

to_pdf("quarterly_report.docx", "quarterly_report.pdf")

Extract structured invoice data with AI

This is where PDF APIs earn their keep. Instead of writing regex patterns for every invoice format, the AI endpoint handles any layout:

import json

def extract_invoice(pdf_path: str) -> dict:
    """Extract structured data from an invoice PDF using AI."""
    with open(pdf_path, "rb") as f:
        response = session.post(
            f"{BASE_URL}/ai/extract-invoice",
            files={"file": (pdf_path, f, "application/pdf")},
        )
    response.raise_for_status()
    return response.json()

invoice = extract_invoice("invoice_vendor_abc.pdf")
print(json.dumps(invoice, indent=2))
# Output:
# {
#   "vendor": "ABC Supplies Pvt Ltd",
#   "invoice_number": "INV-2024-0891",
#   "date": "2024-11-15",
#   "total": 48500.00,
#   "currency": "INR",
#   "line_items": [
#     {"description": "Office chairs x10", "quantity": 10, "unit_price": 4500, "amount": 45000},
#     {"description": "GST 7%", "quantity": 1, "unit_price": 3500, "amount": 3500}
#   ]
# }

Processing a batch of invoices

import glob
from concurrent.futures import ThreadPoolExecutor, as_completed

def process_invoice_batch(folder: str) -> list[dict]:
    """Process all invoice PDFs in a folder concurrently."""
    paths = glob.glob(f"{folder}/*.pdf")
    results = []
    errors = []

    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = {executor.submit(extract_invoice, p): p for p in paths}
        for future in as_completed(futures):
            path = futures[future]
            try:
                data = future.result()
                data["_source_file"] = path
                results.append(data)
            except Exception as e:
                errors.append({"file": path, "error": str(e)})

    if errors:
        print(f"Failed: {len(errors)} files")
    return results

invoices = process_invoice_batch("./vendor_invoices")
total = sum(inv.get("total", 0) for inv in invoices)
print(f"Processed {len(invoices)} invoices. Total: ₹{total:,.2f}")

Check your quota

Always useful to check before a large batch:

def check_quota() -> dict:
    response = session.get(f"{BASE_URL}/usage/check")
    response.raise_for_status()
    return response.json()

quota = check_quota()
print(f"{quota['remaining']} API calls remaining this month")

Error handling

import requests

def safe_compress(input_path: str, output_path: str) -> bool:
    try:
        with open(input_path, "rb") as f:
            response = session.post(
                f"{BASE_URL}/pdf/compress",
                files={"file": (input_path, f, "application/pdf")},
                data={"level": "medium"},
                timeout=60,
            )
        if response.status_code == 429:
            print("Monthly quota reached. Upgrade at allpdfmagic.com/pricing")
            return False
        if response.status_code == 422:
            error = response.json().get("error", "Validation error")
            print(f"Invalid request: {error}")
            return False
        response.raise_for_status()
        with open(output_path, "wb") as out:
            out.write(response.content)
        return True
    except requests.Timeout:
        print(f"Request timed out for {input_path}")
        return False
    except requests.HTTPError as e:
        print(f"HTTP error: {e}")
        return False

Key status codes

StatusMeaning
200Success — response body is the processed file or JSON
401Invalid or missing API key
422Validation error — check the error field in response body
429Monthly quota exceeded — upgrade or wait for reset
500Backend error — retry after a few seconds

What's next

  • API Reference — all 50+ endpoints with parameters
  • SDK Guide — the official Python SDK that wraps all of this
  • Dashboard — get your API key and check usage

Frequently Asked Questions

Just the requests library (pip install requests). The API is a standard HTTP REST API — no special PDF libraries needed. Everything runs server-side.

Yes. Use /ai/ocr for scanned PDFs without a text layer. For PDFs with embedded text, /convert/pdf-to-txt is instant and free.

The Starter tier gives 500 API calls per month at no cost. Each file operation (compress, convert, extract) counts as one call. No credit card required.

The AI extraction is highly accurate for standard invoice formats. It handles vendor names, amounts, line items, GST numbers, and dates. Edge cases like handwritten invoices or non-standard layouts may need manual review.

Tags:pdf api pythonpython pdf processingpdf api tutorialextract invoice data pythoncompress pdf pythonpdf to text pythondocument automation pythonallpdfmagic api

Try Our PDF Tools

Put what you've learned into practice with our free tools.

Explore Tools