From 9c1af7071738298d0885d399dfc3ba06826c314e Mon Sep 17 00:00:00 2001
From: Tikhon Vodyanov <tikhon.vodyanov@Tikhons-MacBook-Air.local>
Date: Sat, 2 Aug 2025 13:36:15 +0200
Subject: [PATCH] README

---
 .DS_Store | Bin 6148 -> 6148 bytes
 README.md | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 121 insertions(+), 1 deletion(-)

diff --git a/.DS_Store b/.DS_Store
index 14a15a24afedb9e3f39906eb9ebcaa9ad3a319ba..665975a27634a21fe48dd4dea719bf39e7da15b0 100644
GIT binary patch
literal 6148
zcmeHK%TB^T6g>lkhi-7^#-v|B;vW=*1&JCWKOip+u~39!bl1JVV&eC?(sO4fq%E*D
z#@yT7In(L6cV^Bsoelug*-kHkCV)C!u+(62z~s63o|Syd3DMXbE4=licruRS@g7?n
z{-Of1c2$fp=G8)e{p6)+(_7!x?C{Ivt}xH6GFj$=&ptA&8ncZs!4qa!Vwz{;v%s$7
z0c*r$CP(?%G1PF4KCaNiEr#gjv)i#`*;Oo<d&<laHU;i`VRkw0%W%Dn;^iYbe7n)W
z9eZq}iyLB#Sqt+rUxjRExrhP#AK5v!G2r+QWGG2r*|9gN%M8!hVs2{-t51RrqZ7O$
z$<N2GR(2hjPE8e11yq3p3dmQDRzG#9r7EBbr~*3$<ogiN1w)UiL;LAqVUGaB3Y)F5
zF29Wu6L<_grVjapW?U%Ig&KdvFfN?!f#-!DQ->}b#veY6@7eehim|;je_-h_p+hZI
z0aYNcK-t|kW&iITKmX^0^iCB}1^$%+rqUgDJDif=TlXd>du>3!r;7<Mb+`**#csvq
e%2s?%x5oBBD#XxZ>X0oo{SmM-XrT)HsscZx2#baQ

delta 91
zcmZoMXfc=|#>CJzu~2NHo}wrt0|NsP3otO`Fcg;s7v<&T=cP|9RA*$IEXFLq`3ti!
q%O)02=FRLJ{2V}Cn?EvtXP(S2qRR==cmRkQCfo2xZ;laJ!VCZcG8Ps9

diff --git a/README.md b/README.md
index dbc662c..4e835dd 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,122 @@
-# team-6
+# Project Genesis Backend
 
+This repository contains the backend service for Project Genesis, a powerful application that combines a Retrieval-Augmented Generation (RAG) system with image generation capabilities.
+
+## Description
+
+Project Genesis Backend is a Node.js service built with Express.js. It leverages a Qdrant vector database and LlamaIndex to provide advanced question-answering capabilities based on a custom knowledge base. The system can process and understand a variety of documents, allowing users to ask complex questions and receive accurate, context-aware answers.
+
+Additionally, the backend integrates with Google's Vertex AI to offer powerful image generation features, allowing for the creation of visuals based on textual descriptions.
+
+## Core Technologies
+
+- **Backend Framework:** Node.js, Express.js
+- **Vector Database:** Qdrant
+- **LLM Orchestration:** LlamaIndex
+- **Image Generation:** Google Vertex AI
+
+## Configuration (.env)
+
+Before running the application, you need to set up your environment variables. Create a `.env` file in the root of the project by copying the `.env.example` file (if one is provided) or by creating a new one.
+
+```bash
+cp .env.example .env
+```
+
+### Key Environment Variables:
+
+- **LLM Configuration:** While the demo was built using OpenAI keys (`OPENAI_API_KEY`, `OPENAI_MODEL`), LlamaIndex is highly flexible. You can easily configure it to use any open-source or self-hosted Large Language Model (LLM) of your choice.
+
+- **Image Generation (Google Vertex AI):** To enable image generation, you need to:
+    1. Set up a Google Cloud project with the Vertex AI API enabled.
+    2. Create a service account with the necessary permissions for Vertex AI.
+    3. Download the JSON key file for the service account.
+    4. Provide the path to this JSON key file in your `.env` file.
+
+## Getting Started
+
+Follow these steps to set up and run the backend service on your local machine.
+
+### Prerequisites
+
+- [Node.js and npm](https://nodejs.org/en/)
+- [Docker](https://www.docker.com/get-started)
+
+### 1. Clone the Repository
+
+```bash
+git clone https://github.com/GVodyanov/plant-desc-parser.git
+cd plant-desc-parser
+```
+
+### 2. Install Dependencies
+
+Install the required Node.js packages using npm:
+
+```bash
+npm install
+```
+
+### 3. Set Up Qdrant Vector Database
+
+Qdrant is used to store the document embeddings for the RAG system. The easiest way to get it running is with Docker.
+
+- **Download the Qdrant image:**
+  ```bash
+  docker pull qdrant/qdrant
+  ```
+
+- **Run the Qdrant container:**
+  This command starts a Qdrant container and maps the port `6333` to your local machine. It also mounts a local directory (`./storage/qdrant`) to persist the vector data, ensuring your data is not lost when the container is stopped or removed.
+
+  ```bash
+  docker run -p 6333:6333 -v $(pwd)/storage/qdrant:/qdrant/storage qdrant/qdrant
+  ```
+
+### 4. Create Embeddings
+
+The knowledge base, consisting of markdown files located in the `/storage` directory, needs to be processed and stored in the Qdrant vector database.
+
+Run the following script to create the embeddings:
+
+```bash
+node createEmbeddings.js
+```
+
+### 5. Run the Server
+
+Once the setup is complete, you can start the Express server:
+
+```bash
+npm start
+```
+
+or
+
+```bash
+node index.js
+```
+
+The server will be running on the port specified in your `.env` file (defaults to 3000).
+
+## Customizing the RAG Data
+
+You can easily customize the knowledge base of the RAG system by adding your own data.
+
+### Adding New Documents
+
+Place your own sliced markdown files in the `/storage` directory. The `createEmbeddings.js` script will automatically process all `.md` files in this folder and its subdirectories.
+
+### Converting Scientific PDFs to Markdown
+
+For converting complex documents like scientific PDFs into clean markdown, we recommend using [Marker](https://github.com/datalab-to/marker). It is a powerful tool that can accurately extract text, tables, and other elements from PDFs.
+
+### Slicing Markdown Files
+
+After converting your documents to markdown, you need to slice them into smaller, more manageable chunks for the RAG system. This helps improve the accuracy of the retrieval process.
+
+We recommend using the `UnstructuredMarkdownLoader` with the `mode="elements"` option for the best results. This will split the markdown file by its headers, titles, and other structural elements.
+
+For a detailed guide on how to implement this, you can refer to the following example Colab notebook:
+
+[LangChain Unstructured Markdown Loader Example](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/document_loaders/unstructured_markdown.ipynb)