aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md32
1 files changed, 31 insertions, 1 deletions
diff --git a/README.md b/README.md
index e5cb074..98c90d3 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,31 @@
-# searchEngine \ No newline at end of file
+# searchEngine
+
+## Setup
+
+- `sudo pip3 install -r requirements.txt`
+
+- Install nltk and spacy `en_core_web_sm`
+
+## Index files
+
+- Unzip Wikibooks.zip to a given directory
+
+- Run `documents.py` with the first argument as the path to the HTML files:
+
+- e.g. `python3 documents.py ../../Wikibooks`
+
+- Run `terms.py`. This took me a few days. If it stops you can just run it again and it'll automatically restart at the correct place
+
+## Setting up TF-IDF weighting
+
+- `python3 tf_idf.py`
+
+## Searching!
+
+- You can use `search.py` to conduct searches. Make search terms command line arguments.
+
+- e.g. `python3 search.py AQA GCSE Computer Science`
+
+- Results are printed to stdout, to `searches/` as a markdown file
+
+- It will be rendered as HTML and shown in a web browser automatically \ No newline at end of file