README

2025-08-02 13:36:15 +02:00 · 2025-08-02 13:36:15 +02:00 · 9c1af70717
commit 9c1af70717
parent 231b83afb5
2 changed files with 121 additions and 1 deletions
--- a/.DS_Store
+++ b/.DS_Store
--- a/README.md
+++ b/README.md
@ -1,2 +1,122 @@
-# team-6
+# Project Genesis Backend

+This repository contains the backend service for Project Genesis, a powerful application that combines a Retrieval-Augmented Generation (RAG) system with image generation capabilities.
+
+## Description
+
+Project Genesis Backend is a Node.js service built with Express.js. It leverages a Qdrant vector database and LlamaIndex to provide advanced question-answering capabilities based on a custom knowledge base. The system can process and understand a variety of documents, allowing users to ask complex questions and receive accurate, context-aware answers.
+
+Additionally, the backend integrates with Google's Vertex AI to offer powerful image generation features, allowing for the creation of visuals based on textual descriptions.
+
+## Core Technologies
+
+- **Backend Framework:** Node.js, Express.js
+- **Vector Database:** Qdrant
+- **LLM Orchestration:** LlamaIndex
+- **Image Generation:** Google Vertex AI
+
+## Configuration (.env)
+
+Before running the application, you need to set up your environment variables. Create a `.env` file in the root of the project by copying the `.env.example` file (if one is provided) or by creating a new one.
+
+```bash
+cp .env.example .env
+```
+
+### Key Environment Variables:
+
+- **LLM Configuration:** While the demo was built using OpenAI keys (`OPENAI_API_KEY`, `OPENAI_MODEL`), LlamaIndex is highly flexible. You can easily configure it to use any open-source or self-hosted Large Language Model (LLM) of your choice.
+
+- **Image Generation (Google Vertex AI):** To enable image generation, you need to:
+    1. Set up a Google Cloud project with the Vertex AI API enabled.
+    2. Create a service account with the necessary permissions for Vertex AI.
+    3. Download the JSON key file for the service account.
+    4. Provide the path to this JSON key file in your `.env` file.
+
+## Getting Started
+
+Follow these steps to set up and run the backend service on your local machine.
+
+### Prerequisites
+
+- [Node.js and npm](https://nodejs.org/en/)
+- [Docker](https://www.docker.com/get-started)
+
+### 1. Clone the Repository
+
+```bash
+git clone https://github.com/GVodyanov/plant-desc-parser.git
+cd plant-desc-parser
+```
+
+### 2. Install Dependencies
+
+Install the required Node.js packages using npm:
+
+```bash
+npm install
+```
+
+### 3. Set Up Qdrant Vector Database
+
+Qdrant is used to store the document embeddings for the RAG system. The easiest way to get it running is with Docker.
+
+- **Download the Qdrant image:**
+  ```bash
+  docker pull qdrant/qdrant
+  ```
+
+- **Run the Qdrant container:**
+  This command starts a Qdrant container and maps the port `6333` to your local machine. It also mounts a local directory (`./storage/qdrant`) to persist the vector data, ensuring your data is not lost when the container is stopped or removed.
+
+  ```bash
+  docker run -p 6333:6333 -v $(pwd)/storage/qdrant:/qdrant/storage qdrant/qdrant
+  ```
+
+### 4. Create Embeddings
+
+The knowledge base, consisting of markdown files located in the `/storage` directory, needs to be processed and stored in the Qdrant vector database.
+
+Run the following script to create the embeddings:
+
+```bash
+node createEmbeddings.js
+```
+
+### 5. Run the Server
+
+Once the setup is complete, you can start the Express server:
+
+```bash
+npm start
+```
+
+or
+
+```bash
+node index.js
+```
+
+The server will be running on the port specified in your `.env` file (defaults to 3000).
+
+## Customizing the RAG Data
+
+You can easily customize the knowledge base of the RAG system by adding your own data.
+
+### Adding New Documents
+
+Place your own sliced markdown files in the `/storage` directory. The `createEmbeddings.js` script will automatically process all `.md` files in this folder and its subdirectories.
+
+### Converting Scientific PDFs to Markdown
+
+For converting complex documents like scientific PDFs into clean markdown, we recommend using [Marker](https://github.com/datalab-to/marker). It is a powerful tool that can accurately extract text, tables, and other elements from PDFs.
+
+### Slicing Markdown Files
+
+After converting your documents to markdown, you need to slice them into smaller, more manageable chunks for the RAG system. This helps improve the accuracy of the retrieval process.
+
+We recommend using the `UnstructuredMarkdownLoader` with the `mode="elements"` option for the best results. This will split the markdown file by its headers, titles, and other structural elements.
+
+For a detailed guide on how to implement this, you can refer to the following example Colab notebook:
+
+[LangChain Unstructured Markdown Loader Example](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/document_loaders/unstructured_markdown.ipynb)