search-engine(7)project specificationsearch-engine(7)
$cd ../← back to projects

./search-engine/

ARCHIVED · Feb 2023 — Mar 2023

Python search engine that indexed 56,000+ web pages across 88 domains.

[python][flask][beautifulsoup][information retrieval]

§01description

Engineered a Python-based search engine that indexed and processed 56,000+ web pages across 88 different domains, delivering ranked results within 300ms.

Built an inverted index system using BeautifulSoup for HTML parsing and custom tokenization. Optimized memory usage by partitioning the index into separate files for each letter of the alphabet, enabling partial loading at query time. Served results through a Flask web GUI that returns the top-K ranked links for any query.

Index creation processes the full 56K-page corpus in approximately 2 hours. Reduced average query response time by 35% through custom tokenization and algorithm optimization strategies.

§Ggallery

[img.01] Search Engine — terminal search results over 56,000 indexed pages

Search Engine — terminal search results over 56,000 indexed pages
search-engine(7)project specificationsearch-engine(7)