AI Clipboard Guardian

Your AI conversations deserve privacy.

Core Sentinel monitors your clipboard in real-time, detects PII before it reaches any LLM, and gives you one-click remediation — redact, rephrase, or encrypt.

Problem: Teams are pasting secrets, customer data, and regulated PII into AI chats without realizing the exposure risk until it is too late.

Try it Free Watch Demo

Works with ChatGPT · Claude · Gemini · Copilot · Any LLM

Current production support: Windows 10/11. Linux and iOS are planned next.

Clipboard Input

Patient: John Doe

SSN: 123-45-6789

API_KEY: sk_live_xxxxxx

↓

Sentinel Scan Engine

Regex + NER + Classifier running...

Risk score: 92 · Action: BLOCK

↓

Safe Output

Patient: J*** D**

SSN: ***-**-6789

API_KEY: ENC::9f8a...

Who Is This For

Use cases across high-trust industries

👨‍💻

Developers

Stop pasting stack traces with API keys, database URIs, and internal IPs into ChatGPT. Sentinel catches credentials even when they're embedded in code.

🏥

Healthcare / HIPAA

Patient names, DOBs, diagnoses, and medication lists get flagged before they reach any AI. Maintain HIPAA compliance effortlessly.

💼

Finance / SOX

Account numbers, transaction details, SSNs in financial documents — Sentinel blocks them from reaching uncontrolled AI endpoints.

⚖️

Enterprise / Legal

Contracts with party names, case numbers, privileged communications — keep attorney-client privilege intact when using AI assistants.

Try It Live

Interactive widget demo in your browser

Explore warn/block behavior, remediation actions, drag behavior, file-scan simulation, and guided overlay tour — no installation required.

New here? Start in 3 steps

Run the live widget walkthrough in this section.
Open the Dashboard Hub to compare Privacy, ML, and Admin views.
Review the architecture section below to understand detection + remediation flow.

Open Demo Fullscreen View Source on GitHub

The Problem

Every paste is a potential data leak

Whether you're a developer sharing code snippets, an HR professional discussing candidates, or a doctor describing symptoms — your clipboard carries secrets. Core Sentinel catches them before they leave your machine.

🧠

67% of employees paste sensitive data into AI tools

Source: Cyberhaven 2024

💸

4.45M average cost of a data breach

Source: IBM 2023

🛡

92% of organizations lack AI data loss controls

Source: Gartner 2024

How It Works

8-step technical pipeline (full walkthrough)

This is the exact runtime path from keyboard paste event to safety decision and model learning. It is intentionally deep, reproducible, and auditable.

Clipboard hooking + LLM context guard

Sentinel runs as a PyQt6 tray process and intercepts paste events only when active window matches supported LLM targets. This prevents unnecessary scanning and limits latency overhead.

if is_llm_window(active_title): text = clipboard.text() on_paste_detected(text)

↓

Text normalization + chunking (512 token windows)

Large clipboard content is normalized, then chunked into overlapping windows for robust inference on long inputs while preserving semantic context around entities.

windows = make_windows(text, size=512, overlap=128) for w in windows: score_window(w)

↓

Layer 1: Regex & deterministic patterns

Critical token families are matched first with deterministic regex. This catches exact leak signatures with near-zero recall loss on known formats.

Pattern family	Examples	Runtime action
SSN / National ID	123-45-6789	High-risk candidate
Credit card / PAN	4111 1111 1111 1111	High-risk candidate
Email / Phone	john@corp.com, +1-202-555-0110	Medium-risk candidate
Secrets / tokens	sk_live_..., JWT, API key	Force block candidate

↓

Layer 2: spaCy NER semantic entity extraction

NER provides contextual entities beyond strict formatting, including PERSON / ORG / GPE / DATE / MONEY. This catches natural-language leakage that regex misses.

PERSON ORG GPE DATE MONEY

↓

Layer 3: Fine-tuned TinyBERT contextual risk model

Windows are passed through a TinyBERT sequence classifier to estimate contextual breach probability (e.g., medical narratives, legal clauses, financial records).

Model card

Base: TinyBERT_General_4L_312D Params: ~14M Context: 512 Inference: ~22ms/window

Validation metrics

Precision 0.945 Recall 0.959 AUPRC 0.751 FPR 52.6%

↓

Risk aggregation + decision thresholds

Signals from regex, NER, and model probabilities are merged into a final 0-100 score and mapped to policy outcomes:

0-40: silent allow 40-70: warn + review 70-100: block + remediation panel

↓

Remediation workflow (one click)

Users can safely continue work with guided transformations: Redact (mask tokens), Rephrase (PII-safe rewrite), Encrypt (AES-256 reversible protection), or Override with audit trace.

↓

Active learning feedback loop

User corrections (false positives/negatives) are stored as supervised signals, then queued into periodic retraining runs. This keeps policy aligned with real operational usage without requiring raw user data collection.

Architecture & Model

Built on real ML, not just regex

Clipboard Hook → Text Extraction → Chunking (512 tokens)

↓

Layer 1: Regex/Pattern (SSN, CC, Phone, Email, IP, API keys)

↓

Layer 2: spaCy NER (PERSON, ORG, GPE, DATE, MONEY)

↓

Layer 3: Fine-tuned TinyBERT classifier (risk scoring)

↓

Risk Aggregator → Decision Engine (silent / warn / block)

↓

Remediation Panel (Redact / Rephrase / Encrypt / Override)

Training Methodology

Paper-style model development process

Core Sentinel training follows a reproducible ML protocol: synthetic corpus design, adversarial augmentation, controlled optimization, and strict holdout evaluation for deployment confidence.

Dataset construction

Balanced training split: 10,455 samples (3,485 low / 3,485 med / 3,485 high), plus 485 validation and 2,289 held-out test samples (~13,229 total).
Template generation for healthcare, finance, legal, HR, and engineering contexts.
Adversarial perturbations: obfuscation, spacing/noise, casing shifts, mixed-language spans.
Zero real user clipboard data used in base training.

Optimization configuration

Backbone: TinyBERT_General_4L_312D fine-tuned for risk classification.
Optimizer: AdamW with decoupled weight decay and linear warmup schedule.
Batching: dynamic sequence packing for 512-token windows.
Early stopping using validation AUC + precision/recall stability criteria.

Evaluation Protocol

Measured for production reliability

Metrics and thresholds

Primary metrics: Precision, Recall, F1, ROC-AUC, and false positive rate.
High-risk guardrail tuned to minimize under-blocking on secret-like tokens.
Window-level + document-level calibration to reduce single-window false alarms.
Policy boundary validation at 40/70 thresholds before release.

Operational validation

Latency budget tested under continuous clipboard monitoring workload.
Regression suite for remediation consistency (redact/rephrase/encrypt paths).
Feedback ingestion checks ensure correction events map to retraining datasets.
Release gates require no high-severity regressions on benchmark prompts.

Detection Coverage

What It Detects

SSN / National ID 🔴 Critical

Credit/Debit Card Numbers 🔴 Critical

API Keys & Secrets 🔴 Critical

Passwords & Tokens 🔴 Critical

Full Names + Context 🟡 Medium

Email Addresses 🟡 Medium

Phone Numbers 🟡 Medium

Physical Addresses 🟡 Medium

Dates of Birth 🟡 Medium

Medical Records / Diagnoses 🟡 Medium

IP Addresses 🔴 Critical

Bank Account / Routing Numbers 🔴 Critical

Passport / Driver's License 🔴 Critical

Biometric Identifiers 🔴 Critical

Vehicle Registration 🟢 Low

Employment / Salary Data 🟡 Medium

Enterprise Features

Security controls your team can operationalize

Admin dashboard
Manage all employees from one panel.

Supabase-powered telemetry
Real-time risk events across the org.

Custom sensitivity thresholds
Set department-specific controls.

Override audit trail
Track every override and remediation action.

Active learning loop
Model improves from team corrections.

On-premise deployment
No data leaves your network.

Open Source & Transparency

Core Sentinel is open-source.

Every detection rule, every model weight, every line of code — auditable. No telemetry collected without consent. Your clipboard data never leaves your machine unless YOU choose Supabase sync.

⭐ Star on GitHub Open Dashboard Hub Launch Interactive Demo

License: MIT