diff options
author | jwansek <eddie.atten.ea29@gmail.com> | 2021-12-14 12:07:25 +0000 |
---|---|---|
committer | jwansek <eddie.atten.ea29@gmail.com> | 2021-12-14 12:07:25 +0000 |
commit | 24cd893099c570696a3796642a2a05d17872c39e (patch) | |
tree | f9a954e2bfda87e1bf483a155967d72d187cbd27 | |
parent | df2a6e1a882b9b85f42f391f2bd129c9663c9e42 (diff) | |
download | searchEngine-24cd893099c570696a3796642a2a05d17872c39e.tar.gz searchEngine-24cd893099c570696a3796642a2a05d17872c39e.zip |
-rw-r--r-- | README.md | 32 |
1 files changed, 31 insertions, 1 deletions
@@ -1 +1,31 @@ -# searchEngine
\ No newline at end of file +# searchEngine + +## Setup + +- `sudo pip3 install -r requirements.txt` + +- Install nltk and spacy `en_core_web_sm` + +## Index files + +- Unzip Wikibooks.zip to a given directory + +- Run `documents.py` with the first argument as the path to the HTML files: + +- e.g. `python3 documents.py ../../Wikibooks` + +- Run `terms.py`. This took me a few days. If it stops you can just run it again and it'll automatically restart at the correct place + +## Setting up TF-IDF weighting + +- `python3 tf_idf.py` + +## Searching! + +- You can use `search.py` to conduct searches. Make search terms command line arguments. + +- e.g. `python3 search.py AQA GCSE Computer Science` + +- Results are printed to stdout, to `searches/` as a markdown file + +- It will be rendered as HTML and shown in a web browser automatically
\ No newline at end of file |