AI Curator
The Data OS for LLM Fine-Tuning
Capture Everything. Curate with Precision. Train with Confidence.
AI Curator is the data operating system for fine-tuning. It connects to any data source — your IDE, Slack, logs, OpenWebUI conversations — and transforms raw data into training-ready datasets. Built for professionals who need robust data pipelines, but simple enough for beginners to master in minutes.
Universal Capture
Real-time capture from any tool via HTTP API. Slack, logs, IDEs, custom scripts — if it can make an HTTP request, it can stream to AI Curator.
Quality Control
Review, rate, approve or reject samples. Star ratings, categories, tags, and duplicate detection ensure your dataset is production-ready.
Export Ready
Export to 7 formats: Alpaca, MLX, JSONL, CSV, Unsloth, TRL, and more. Smart splitting, filtering, and stratification included.
Capture from Any Source
Slack
Capture conversations from Slack channels. Export team discussions, support tickets, or internal knowledge sharing as training data.
curl -X POST http://localhost:3333/api/capture -d '{"source":"slack","records":[...]}' Application Logs
Process logs in real-time. Extract error-resolution pairs, user interactions, or system events as training samples.
tail -f /var/log/app.log | processor | curl -X POST ... OpenWebUI
Official OpenWebUI plugin available. Automatically capture conversations from your self-hosted AI chat interface.
IDEs & Development
Capture code explanations, refactoring decisions, or debug sessions. Build coding assistants from your actual development workflow.
VS Code Extension → http://localhost:3333/api/capture Any HTTP Source
The Live Capture API accepts data from any tool that can make an HTTP POST request. Custom scripts, internal tools, web apps — if it can send JSON, it can stream to AI Curator.
Three Ways to Work
Web UI
Point, click, curate. Perfect for manual quality control and visual dataset exploration.
- • Drag & drop import
- • Card-based sample review
- • One-click export
- • Visual dashboards
CLI
Automate everything. Built for professionals, CI/CD pipelines, and bulk operations.
- • Bulk import/export
- • HuggingFace search & download
- • Advanced filtering & splitting
- • Scriptable workflows
Live Capture API
Our most powerful feature. Stream data in real-time from any source via HTTP.
- • Real-time HTTP ingestion
- • Webhook integrations
- • Log processors
- • Custom script support
EdukaAI Starter Pack Included
AI Curator comes with 75 premium samples ready to train immediately. Plus, download the full 400-sample EdukaAI Starter Pack to jumpstart your journey.
Get the Starter Pack →Start Capturing Today
Install AI Curator and start building your data pipeline in minutes. 100% local, privacy-first.