Back to Tools

Keyword Deduplication Tool

Use cases

Cleaning keyword lists before clustering Removing near-duplicate keywords from exports Consolidating keyword research from multiple sources

Uses rapidfuzz with token_sort_ratio scorer and process.extractOne() to identify near-duplicate keywords at 99-point similarity threshold.

Automatically detects file encoding via chardet (first 100,000 bytes).

Keeps the highest search volume variant and exports both processed and dropped keywords for review.

Uses stqdm for progress tracking.

Streamlit App

Platform

Browser-based (no installation required)

Input

CSV or Excel file with keywords and search volumes

Encoding auto-detected via chardet

Output

Excel with deduplicated and dropped keywords

Launch App View Source

Features

  • Rapidfuzz token_sort_ratio with 99-point threshold
  • Chardet encoding detection (100KB sample)
  • Keeps highest volume variant automatically
  • stqdm progress bar integration
  • Two-sheet Excel output (xlsxwriter)
  • Supports CSV and Excel (.xlsx) input

How to use

  1. 1 Upload CSV or Excel file with keyword data
  2. 2 Select keyword column and volume column from dropdowns
  3. 3 Click Dedupe to run rapidfuzz matching
  4. 4 Review progress via stqdm progress bar
  5. 5 Download Excel with two sheets: Processed Keywords and Dropped Keywords

Let's work together

Monthly retainers or one-off projects. No lengthy reports that sit in a drawer.

Let's Talk