Template Fingerprinting
Use cases
Auditing large sites by template instead of individual pages Finding pages using wrong or outdated templates Understanding site structure and page types Prioritising template-level technical SEO fixes
Classifies pages into template groups using TF-IDF vectorisation and K-Means clustering on HTML structure.
Extracts four feature dimensions: tag counts, CSS classes, ID attributes, and meta tags.
Default 5 clusters with reproducible results (random state 42).
Streamlit App
Platform
Browser-based (no installation required)
Input
Crawl CSV with URLs
Output
CSV with template cluster assignments
Features
- TF-IDF vectorisation of HTML structural features
- K-Means clustering (configurable cluster count, default 5)
- Four feature dimensions: tag counts, CSS classes, IDs, meta tags
- Reproducible results (random state 42)
- Bulk URL fetching with progress indicator
How to use
- 1 Upload CSV with URL list (requires "Address" column)
- 2 Set number of template clusters to detect
- 3 Run analysis (fetches HTML, extracts features, clusters)
- 4 Review cluster assignments (Type 0, Type 1, etc.)
- 5 Download CSV with original data plus Cluster and Page Type columns
Want me to run this for you?
I offer this as a managed service. You get the insights without touching the tool.
Related Tools
Archive.org Broken Link Mapper
Technical SEOFind lost URLs via Archive.org and auto-map redirects using fuzzy matching.
LLM Sitemap Creator
Technical SEOUse GPT to generate hierarchical sitemap structures from keywords.
Sitemap URL Extractor
Technical SEOExtract all URLs from XML sitemap indexes and child sitemaps.
Let's work together
Monthly retainers or one-off projects. No lengthy reports that sit in a drawer.
Let's Talk