Search Autocomplete Engine

Challenge

Build an intelligent search suggestion system inspired by Google and YouTube

Search Autocomplete Engine Challenge

🎯 Challenge Overview

Build an intelligent search autocomplete system that provides instant, relevant suggestions as users type. This challenge is inspired by the autocomplete systems used by Google Search, YouTube, Amazon, and Spotify, where fast, accurate suggestions significantly improve user experience and engagement.

🏢 Real-World Context

Search autocomplete is crucial for modern applications:

  • Google: Processes 8.5 billion searches daily with sub-100ms autocomplete responses
  • YouTube: Suggests videos based on 500+ hours of content uploaded per minute
  • Amazon: Powers product discovery with 70% of searches using autocomplete
  • Spotify: Helps users find songs/artists from 100+ million tracks

Your challenge is to build a system that can suggest relevant completions in real-time while learning from user behavior and handling typos intelligently.

📋 Requirements

Core Features

  1. Real-Time Suggestions:

    • Sub-100ms response time for autocomplete queries
    • Prefix matching with fuzzy search capabilities
    • Support for typo tolerance and spell correction
    • Contextual suggestions based on user history
  2. Intelligent Ranking:

    • Popularity-based scoring (trending searches)
    • Personalized suggestions based on user behavior
    • Recency weighting for time-sensitive content
    • Category-aware suggestions (products, videos, etc.)
  3. Scalable Architecture:

    • Handle 10,000+ concurrent search queries
    • Support datasets with millions of searchable items
    • Real-time index updates for new content
    • Efficient memory usage for fast lookups

Technical Specifications

  • Response time: <100ms for 95% of queries
  • Support for 1M+ searchable terms
  • Handle 10k+ queries per second
  • Typo tolerance up to 2 character differences
  • Multi-language support (bonus)

🛠 Implementation Guide

Phase 1: Trie Data Structure

// Basic Trie implementation for prefix matching
class TrieNode {
    constructor() {
        this.children = new Map();
        this.isEndOfWord = false;
        this.suggestions = [];
        this.popularity = 0;
    }
}

class AutocompleteTrie {
    constructor() {
        this.root = new TrieNode();
    }
    
    insert(word, popularity = 1) {
        let current = this.root;
        
        for (let char of word.toLowerCase()) {
            if (!current.children.has(char)) {
                current.children.set(char, new TrieNode());
            }
            current = current.children.get(char);
            
            // Store top suggestions at each node
            current.suggestions.push({word, popularity});
            current.suggestions.sort((a, b) => b.popularity - a.popularity);
            current.suggestions = current.suggestions.slice(0, 10);
        }
        
        current.isEndOfWord = true;
        current.popularity = popularity;
    }
    
    search(prefix, limit = 10) {
        let current = this.root;
        
        // Navigate to prefix node
        for (let char of prefix.toLowerCase()) {
            if (!current.children.has(char)) {
                return []; // No suggestions found
            }
            current = current.children.get(char);
        }
        
        return current.suggestions.slice(0, limit);
    }
}

Phase 2: Fuzzy Search with Levenshtein Distance

// Fuzzy search for typo tolerance
class FuzzyAutocomplete {
    constructor(maxDistance = 2) {
        this.maxDistance = maxDistance;
        this.trie = new AutocompleteTrie();
    }
    
    levenshteinDistance(str1, str2) {
        const matrix = Array(str2.length + 1).fill(null)
            .map(() => Array(str1.length + 1).fill(null));
        
        for (let i = 0; i <= str1.length; i++) matrix[0][i] = i;
        for (let j = 0; j <= str2.length; j++) matrix[j][0] = j;
        
        for (let j = 1; j <= str2.length; j++) {
            for (let i = 1; i <= str1.length; i++) {
                const indicator = str1[i - 1] === str2[j - 1] ? 0 : 1;
                matrix[j][i] = Math.min(
                    matrix[j][i - 1] + 1,     // deletion
                    matrix[j - 1][i] + 1,     // insertion
                    matrix[j - 1][i - 1] + indicator // substitution
                );
            }
        }
        
        return matrix[str2.length][str1.length];
    }
    
    fuzzySearch(query, suggestions) {
        return suggestions
            .map(suggestion => ({
                ...suggestion,
                distance: this.levenshteinDistance(query, suggestion.word)
            }))
            .filter(item => item.distance <= this.maxDistance)
            .sort((a, b) => {
                if (a.distance !== b.distance) return a.distance - b.distance;
                return b.popularity - a.popularity;
            });
    }
}

Phase 3: Search Analytics and Learning

// Track search behavior for improving suggestions
class SearchAnalytics {
    constructor() {
        this.searchCounts = new Map();
        this.clickThroughRates = new Map();
        this.userSessions = new Map();
    }
    
    recordSearch(query, userId = null) {
        // Update search frequency
        const count = this.searchCounts.get(query) || 0;
        this.searchCounts.set(query, count + 1);
        
        // Track user session context
        if (userId) {
            if (!this.userSessions.has(userId)) {
                this.userSessions.set(userId, []);
            }
            this.userSessions.get(userId).push({
                query,
                timestamp: Date.now()
            });
        }
    }
    
    recordClick(query, suggestion, userId = null) {
        const key = `${query}:${suggestion}`;
        const stats = this.clickThroughRates.get(key) || {clicks: 0, impressions: 0};
        stats.clicks++;
        this.clickThroughRates.set(key, stats);
    }
    
    getTrendingQueries(timeWindow = 3600000) { // 1 hour
        const now = Date.now();
        const trending = new Map();
        
        for (let [userId, searches] of this.userSessions) {
            const recentSearches = searches.filter(
                search => now - search.timestamp < timeWindow
            );
            
            recentSearches.forEach(search => {
                const count = trending.get(search.query) || 0;
                trending.set(search.query, count + 1);
            });
        }
        
        return Array.from(trending.entries())
            .sort(([,a], [,b]) => b - a)
            .slice(0, 20);
    }
}

📊 Success Metrics

  • Response Time: <100ms for 95% of autocomplete requests
  • Suggestion Accuracy: >80% of suggestions result in clicks
  • Typo Handling: Correctly suggest for 90% of 1-2 character typos
  • Throughput: Handle 10k+ queries per second
  • Memory Efficiency: <1GB RAM for 1M searchable terms

🎁 Bonus Challenges

  1. Contextual Suggestions: Different suggestions based on current page/section
  2. Voice Search: Handle speech-to-text query suggestions
  3. Image-Based Search: Autocomplete for visual search queries
  4. Multi-tenancy: Different suggestion sets for different applications
  5. Suggestion Explanations: Show why certain suggestions were recommended
  6. Collaborative Filtering: Suggest based on similar users' searches

📚 Learning Outcomes

After completing this challenge, you'll understand:

  • Trie data structures and their applications
  • Fuzzy string matching algorithms
  • Search ranking and relevance scoring
  • Real-time system performance optimization
  • User behavior analytics and personalization
  • Scaling search systems for high throughput

🔧 Technical Stack Recommendations

  • Backend: Node.js, Python, or Go
  • Search Engine: Elasticsearch, Solr, or custom Trie
  • Caching: Redis for frequently accessed suggestions
  • Analytics: InfluxDB or Prometheus for metrics
  • Load Testing: Apache Bench or Artillery.io

🚀 Getting Started

  1. Build basic Trie data structure for prefix matching
  2. Add fuzzy search capabilities for typo tolerance
  3. Implement popularity-based ranking system
  4. Add search analytics and user behavior tracking
  5. Build REST API with caching layer
  6. Add real-time index updates for new content
  7. Implement personalized suggestions

🔗 Helpful Resources

🎯 Sample API Endpoints

// GET /autocomplete?q=java&limit=10
{
  "query": "java",
  "suggestions": [
    {
      "text": "javascript",
      "type": "programming_language",
      "popularity": 95,
      "category": "technology"
    },
    {
      "text": "java programming",
      "type": "search_term", 
      "popularity": 87,
      "category": "education"
    }
  ],
  "responseTime": "23ms"
}

Ready to build the search experience that helps users find exactly what they're looking for? Start coding and discover the art of intelligent autocomplete! 🔍✨

Architect's View

I think this logic should sit in the Domain layer, in a service object called RecommendationService maybe? This would work with entities such as Customer, Review, Product. Of course this service will eventually be orchestrated by use cases in the Application layer, but for now a simple Test Harness will suffice to provide the inputs and display the results. Slug: SearchAutocompleteEngine

Layers

application
domain

© 2025 Sliced Logic. All rights reserved.

Search Autocomplete Engine