Core Sentinel logo Core Sentinel
AI Clipboard Guardian

Your AI conversations deserve privacy.

Core Sentinel monitors your clipboard in real-time, detects PII before it reaches any LLM, and gives you one-click remediation — redact, rephrase, or encrypt.

Problem: Teams are pasting secrets, customer data, and regulated PII into AI chats without realizing the exposure risk until it is too late.
Works with ChatGPT · Claude · Gemini · Copilot · Any LLM
Current production support: Windows 10/11. Linux and iOS are planned next.

Clipboard Input

Patient: John Doe
SSN: 123-45-6789
API_KEY: sk_live_xxxxxx

Sentinel Scan Engine

Regex + NER + Classifier running...
Risk score: 92 · Action: BLOCK

Safe Output

Patient: J*** D**
SSN: ***-**-6789
API_KEY: ENC::9f8a...
Who Is This For

Use cases across high-trust industries

👨‍💻

Developers

Stop pasting stack traces with API keys, database URIs, and internal IPs into ChatGPT. Sentinel catches credentials even when they're embedded in code.

🏥

Healthcare / HIPAA

Patient names, DOBs, diagnoses, and medication lists get flagged before they reach any AI. Maintain HIPAA compliance effortlessly.

💼

Finance / SOX

Account numbers, transaction details, SSNs in financial documents — Sentinel blocks them from reaching uncontrolled AI endpoints.

⚖️

Enterprise / Legal

Contracts with party names, case numbers, privileged communications — keep attorney-client privilege intact when using AI assistants.

Try It Live

Interactive widget demo in your browser

Explore warn/block behavior, remediation actions, drag behavior, file-scan simulation, and guided overlay tour — no installation required.

New here? Start in 3 steps
  1. Run the live widget walkthrough in this section.
  2. Open the Dashboard Hub to compare Privacy, ML, and Admin views.
  3. Review the architecture section below to understand detection + remediation flow.
The Problem

Every paste is a potential data leak

Whether you're a developer sharing code snippets, an HR professional discussing candidates, or a doctor describing symptoms — your clipboard carries secrets. Core Sentinel catches them before they leave your machine.

🧠
67% of employees paste sensitive data into AI tools
Source: Cyberhaven 2024
💸
4.45M average cost of a data breach
Source: IBM 2023
🛡
92% of organizations lack AI data loss controls
Source: Gartner 2024
How It Works

8-step technical pipeline (full walkthrough)

This is the exact runtime path from keyboard paste event to safety decision and model learning. It is intentionally deep, reproducible, and auditable.

Clipboard hooking + LLM context guard

Sentinel runs as a PyQt6 tray process and intercepts paste events only when active window matches supported LLM targets. This prevents unnecessary scanning and limits latency overhead.

if is_llm_window(active_title): text = clipboard.text() on_paste_detected(text)

Text normalization + chunking (512 token windows)

Large clipboard content is normalized, then chunked into overlapping windows for robust inference on long inputs while preserving semantic context around entities.

windows = make_windows(text, size=512, overlap=128) for w in windows: score_window(w)

Layer 1: Regex & deterministic patterns

Critical token families are matched first with deterministic regex. This catches exact leak signatures with near-zero recall loss on known formats.

Pattern familyExamplesRuntime action
SSN / National ID123-45-6789High-risk candidate
Credit card / PAN4111 1111 1111 1111High-risk candidate
Email / Phonejohn@corp.com, +1-202-555-0110Medium-risk candidate
Secrets / tokenssk_live_..., JWT, API keyForce block candidate

Layer 2: spaCy NER semantic entity extraction

NER provides contextual entities beyond strict formatting, including PERSON / ORG / GPE / DATE / MONEY. This catches natural-language leakage that regex misses.

PERSON ORG GPE DATE MONEY

Layer 3: Fine-tuned TinyBERT contextual risk model

Windows are passed through a TinyBERT sequence classifier to estimate contextual breach probability (e.g., medical narratives, legal clauses, financial records).

Model card

Base: TinyBERT_General_4L_312D Params: ~14M Context: 512 Inference: ~22ms/window

Validation metrics

Precision 0.945 Recall 0.959 AUPRC 0.751 FPR 52.6%

Risk aggregation + decision thresholds

Signals from regex, NER, and model probabilities are merged into a final 0-100 score and mapped to policy outcomes:

0-40: silent allow 40-70: warn + review 70-100: block + remediation panel

Remediation workflow (one click)

Users can safely continue work with guided transformations: Redact (mask tokens), Rephrase (PII-safe rewrite), Encrypt (AES-256 reversible protection), or Override with audit trace.

Active learning feedback loop

User corrections (false positives/negatives) are stored as supervised signals, then queued into periodic retraining runs. This keeps policy aligned with real operational usage without requiring raw user data collection.

Architecture & Model

Built on real ML, not just regex

Clipboard Hook → Text Extraction → Chunking (512 tokens)
Layer 1: Regex/Pattern (SSN, CC, Phone, Email, IP, API keys)
Layer 2: spaCy NER (PERSON, ORG, GPE, DATE, MONEY)
Layer 3: Fine-tuned TinyBERT classifier (risk scoring)
Risk Aggregator → Decision Engine (silent / warn / block)
Remediation Panel (Redact / Rephrase / Encrypt / Override)
Training Methodology

Paper-style model development process

Core Sentinel training follows a reproducible ML protocol: synthetic corpus design, adversarial augmentation, controlled optimization, and strict holdout evaluation for deployment confidence.

Dataset construction

  • Balanced training split: 10,455 samples (3,485 low / 3,485 med / 3,485 high), plus 485 validation and 2,289 held-out test samples (~13,229 total).
  • Template generation for healthcare, finance, legal, HR, and engineering contexts.
  • Adversarial perturbations: obfuscation, spacing/noise, casing shifts, mixed-language spans.
  • Zero real user clipboard data used in base training.

Optimization configuration

  • Backbone: TinyBERT_General_4L_312D fine-tuned for risk classification.
  • Optimizer: AdamW with decoupled weight decay and linear warmup schedule.
  • Batching: dynamic sequence packing for 512-token windows.
  • Early stopping using validation AUC + precision/recall stability criteria.
Evaluation Protocol

Measured for production reliability

Metrics and thresholds

  • Primary metrics: Precision, Recall, F1, ROC-AUC, and false positive rate.
  • High-risk guardrail tuned to minimize under-blocking on secret-like tokens.
  • Window-level + document-level calibration to reduce single-window false alarms.
  • Policy boundary validation at 40/70 thresholds before release.

Operational validation

  • Latency budget tested under continuous clipboard monitoring workload.
  • Regression suite for remediation consistency (redact/rephrase/encrypt paths).
  • Feedback ingestion checks ensure correction events map to retraining datasets.
  • Release gates require no high-severity regressions on benchmark prompts.
Detection Coverage

What It Detects

SSN / National ID 🔴 Critical
Credit/Debit Card Numbers 🔴 Critical
API Keys & Secrets 🔴 Critical
Passwords & Tokens 🔴 Critical
Full Names + Context 🟡 Medium
Email Addresses 🟡 Medium
Phone Numbers 🟡 Medium
Physical Addresses 🟡 Medium
Dates of Birth 🟡 Medium
Medical Records / Diagnoses 🟡 Medium
IP Addresses 🔴 Critical
Bank Account / Routing Numbers 🔴 Critical
Passport / Driver's License 🔴 Critical
Biometric Identifiers 🔴 Critical
Vehicle Registration 🟢 Low
Employment / Salary Data 🟡 Medium
Enterprise Features

Security controls your team can operationalize

Admin dashboard
Manage all employees from one panel.
Supabase-powered telemetry
Real-time risk events across the org.
Custom sensitivity thresholds
Set department-specific controls.
Override audit trail
Track every override and remediation action.
Active learning loop
Model improves from team corrections.
On-premise deployment
No data leaves your network.
Open Source & Transparency

Core Sentinel is open-source.

Every detection rule, every model weight, every line of code — auditable. No telemetry collected without consent. Your clipboard data never leaves your machine unless YOU choose Supabase sync.

License: MIT