SERP N-gram Extractor

Use cases

Content gap analysis Page title optimisation Understanding SERP content patterns Competitive content research

Fetches SERP results via ValueSERP API and extracts page content using Trafilatura (unlimited timeout).

Generates bigrams via custom find_ngrams() using zip iteration.

Uses NLTK English stopwords filtering and Collections.Counter for frequency analysis.

Normalises text with special character removal and lowercase conversion.

Platform

Python script (requires Python 3.x)

Input

ValueSERP API key

Target search keyword

Geographic location

Device type (desktop/mobile)

Output

Three CSVs: content bigrams with frequency counts, title keywords (frequency > 1), SERP titles with URLs.

Features

I offer this as a managed service. You get the insights without touching the tool.

Discover which descriptive words competitors use in titles that you are missing.

Extract content blocks and XPath patterns using Claude Haiku for template analysis.

Find cannibalising pages by clustering URLs that share SERP overlap.

Monthly retainers or one-off projects. No lengthy reports that sit in a drawer.