aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorjwansek <eddie.atten.ea29@gmail.com>2021-12-14 12:07:25 +0000
committerjwansek <eddie.atten.ea29@gmail.com>2021-12-14 12:07:25 +0000
commit24cd893099c570696a3796642a2a05d17872c39e (patch)
treef9a954e2bfda87e1bf483a155967d72d187cbd27
parentdf2a6e1a882b9b85f42f391f2bd129c9663c9e42 (diff)
downloadsearchEngine-24cd893099c570696a3796642a2a05d17872c39e.tar.gz
searchEngine-24cd893099c570696a3796642a2a05d17872c39e.zip
added to the READMEHEADmaster
-rw-r--r--README.md32
1 files changed, 31 insertions, 1 deletions
diff --git a/README.md b/README.md
index e5cb074..98c90d3 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,31 @@
-# searchEngine \ No newline at end of file
+# searchEngine
+
+## Setup
+
+- `sudo pip3 install -r requirements.txt`
+
+- Install nltk and spacy `en_core_web_sm`
+
+## Index files
+
+- Unzip Wikibooks.zip to a given directory
+
+- Run `documents.py` with the first argument as the path to the HTML files:
+
+- e.g. `python3 documents.py ../../Wikibooks`
+
+- Run `terms.py`. This took me a few days. If it stops you can just run it again and it'll automatically restart at the correct place
+
+## Setting up TF-IDF weighting
+
+- `python3 tf_idf.py`
+
+## Searching!
+
+- You can use `search.py` to conduct searches. Make search terms command line arguments.
+
+- e.g. `python3 search.py AQA GCSE Computer Science`
+
+- Results are printed to stdout, to `searches/` as a markdown file
+
+- It will be rendered as HTML and shown in a web browser automatically \ No newline at end of file