Search Autocomplete Engine

Challenge

Build an intelligent search suggestion system inspired by Google and YouTube

Search Autocomplete Engine Challenge

🎯 Challenge Overview

Build an intelligent search autocomplete system that provides instant, relevant suggestions as users type. This challenge is inspired by the autocomplete systems used by Google Search, YouTube, Amazon, and Spotify, where fast, accurate suggestions significantly improve user experience and engagement.

🏢 Real-World Context

Search autocomplete is crucial for modern applications:

Google: Processes 8.5 billion searches daily with sub-100ms autocomplete responses
YouTube: Suggests videos based on 500+ hours of content uploaded per minute
Amazon: Powers product discovery with 70% of searches using autocomplete
Spotify: Helps users find songs/artists from 100+ million tracks

Your challenge is to build a system that can suggest relevant completions in real-time while learning from user behavior and handling typos intelligently.

📋 Requirements

Core Features

Real-Time Suggestions:
- Sub-100ms response time for autocomplete queries
- Prefix matching with fuzzy search capabilities
- Support for typo tolerance and spell correction
- Contextual suggestions based on user history
Intelligent Ranking:
- Popularity-based scoring (trending searches)
- Personalized suggestions based on user behavior
- Recency weighting for time-sensitive content
- Category-aware suggestions (products, videos, etc.)
Scalable Architecture:
- Handle 10,000+ concurrent search queries
- Support datasets with millions of searchable items
- Real-time index updates for new content
- Efficient memory usage for fast lookups

Technical Specifications

Response time: <100ms for 95% of queries
Support for 1M+ searchable terms
Handle 10k+ queries per second
Typo tolerance up to 2 character differences
Multi-language support (bonus)

🛠 Implementation Guide

Phase 1: Trie Data Structure

// Basic Trie implementation for prefix matching
class TrieNode {
    constructor() {
        this.children = new Map();
        this.isEndOfWord = false;
        this.suggestions = [];
        this.popularity = 0;
    }
}

class AutocompleteTrie {
    constructor() {
        this.root = new TrieNode();
    }
    
    insert(word, popularity = 1) {
        let current = this.root;
        
        for (let char of word.toLowerCase()) {
            if (!current.children.has(char)) {
                current.children.set(char, new TrieNode());
            }
            current = current.children.get(char);
            
            // Store top suggestions at each node
            current.suggestions.push({word, popularity});
            current.suggestions.sort((a, b) => b.popularity - a.popularity);
            current.suggestions = current.suggestions.slice(0, 10);
        }
        
        current.isEndOfWord = true;
        current.popularity = popularity;
    }
    
    search(prefix, limit = 10) {
        let current = this.root;
        
        // Navigate to prefix node
        for (let char of prefix.toLowerCase()) {
            if (!current.children.has(char)) {
                return []; // No suggestions found
            }
            current = current.children.get(char);
        }
        
        return current.suggestions.slice(0, limit);
    }
}

Phase 2: Fuzzy Search with Levenshtein Distance

// Fuzzy search for typo tolerance
class FuzzyAutocomplete {
    constructor(maxDistance = 2) {
        this.maxDistance = maxDistance;
        this.trie = new AutocompleteTrie();
    }
    
    levenshteinDistance(str1, str2) {
        const matrix = Array(str2.length + 1).fill(null)
            .map(() => Array(str1.length + 1).fill(null));
        
        for (let i = 0; i <= str1.length; i++) matrix[0][i] = i;
        for (let j = 0; j <= str2.length; j++) matrix[j][0] = j;
        
        for (let j = 1; j <= str2.length; j++) {
            for (let i = 1; i <= str1.length; i++) {
                const indicator = str1[i - 1] === str2[j - 1] ? 0 : 1;
                matrix[j][i] = Math.min(
                    matrix[j][i - 1] + 1,     // deletion
                    matrix[j - 1][i] + 1,     // insertion
                    matrix[j - 1][i - 1] + indicator // substitution
                );
            }
        }
        
        return matrix[str2.length][str1.length];
    }
    
    fuzzySearch(query, suggestions) {
        return suggestions
            .map(suggestion => ({
                ...suggestion,
                distance: this.levenshteinDistance(query, suggestion.word)
            }))
            .filter(item => item.distance <= this.maxDistance)
            .sort((a, b) => {
                if (a.distance !== b.distance) return a.distance - b.distance;
                return b.popularity - a.popularity;
            });
    }
}

Phase 3: Search Analytics and Learning

// Track search behavior for improving suggestions
class SearchAnalytics {
    constructor() {
        this.searchCounts = new Map();
        this.clickThroughRates = new Map();
        this.userSessions = new Map();
    }
    
    recordSearch(query, userId = null) {
        // Update search frequency
        const count = this.searchCounts.get(query) || 0;
        this.searchCounts.set(query, count + 1);
        
        // Track user session context
        if (userId) {
            if (!this.userSessions.has(userId)) {
                this.userSessions.set(userId, []);
            }
            this.userSessions.get(userId).push({
                query,
                timestamp: Date.now()
            });
        }
    }
    
    recordClick(query, suggestion, userId = null) {
        const key = `${query}:${suggestion}`;
        const stats = this.clickThroughRates.get(key) || {clicks: 0, impressions: 0};
        stats.clicks++;
        this.clickThroughRates.set(key, stats);
    }
    
    getTrendingQueries(timeWindow = 3600000) { // 1 hour
        const now = Date.now();
        const trending = new Map();
        
        for (let [userId, searches] of this.userSessions) {
            const recentSearches = searches.filter(
                search => now - search.timestamp < timeWindow
            );
            
            recentSearches.forEach(search => {
                const count = trending.get(search.query) || 0;
                trending.set(search.query, count + 1);
            });
        }
        
        return Array.from(trending.entries())
            .sort(([,a], [,b]) => b - a)
            .slice(0, 20);
    }
}

📊 Success Metrics

Response Time: <100ms for 95% of autocomplete requests
Suggestion Accuracy: >80% of suggestions result in clicks
Typo Handling: Correctly suggest for 90% of 1-2 character typos
Throughput: Handle 10k+ queries per second
Memory Efficiency: <1GB RAM for 1M searchable terms

🎁 Bonus Challenges

Contextual Suggestions: Different suggestions based on current page/section
Voice Search: Handle speech-to-text query suggestions
Image-Based Search: Autocomplete for visual search queries
Multi-tenancy: Different suggestion sets for different applications
Suggestion Explanations: Show why certain suggestions were recommended
Collaborative Filtering: Suggest based on similar users' searches

📚 Learning Outcomes

After completing this challenge, you'll understand:

Trie data structures and their applications
Fuzzy string matching algorithms
Search ranking and relevance scoring
Real-time system performance optimization
User behavior analytics and personalization
Scaling search systems for high throughput

🔧 Technical Stack Recommendations

Backend: Node.js, Python, or Go
Search Engine: Elasticsearch, Solr, or custom Trie
Caching: Redis for frequently accessed suggestions
Analytics: InfluxDB or Prometheus for metrics
Load Testing: Apache Bench or Artillery.io

🚀 Getting Started

Build basic Trie data structure for prefix matching
Add fuzzy search capabilities for typo tolerance
Implement popularity-based ranking system
Add search analytics and user behavior tracking
Build REST API with caching layer
Add real-time index updates for new content
Implement personalized suggestions

🔗 Helpful Resources

🎯 Sample API Endpoints

// GET /autocomplete?q=java&limit=10
{
  "query": "java",
  "suggestions": [
    {
      "text": "javascript",
      "type": "programming_language",
      "popularity": 95,
      "category": "technology"
    },
    {
      "text": "java programming",
      "type": "search_term", 
      "popularity": 87,
      "category": "education"
    }
  ],
  "responseTime": "23ms"
}

Ready to build the search experience that helps users find exactly what they're looking for? Start coding and discover the art of intelligent autocomplete! 🔍✨

Architect's View

I think this logic should sit in the Domain layer, in a service object called RecommendationService maybe? This would work with entities such as Customer, Review, Product. Of course this service will eventually be orchestrated by use cases in the Application layer, but for now a simple Test Harness will suffice to provide the inputs and display the results. Slug: SearchAutocompleteEngine

Layers

application

domain