How does one use vllm with pytorch 2.2.2 and python 3.11?Charlie Parker
Title: How does one use vllm with pytorch 2.2.2 and python 3.11? I'm trying to use the vllm library with pytorch 2.2.2 and python 3.11. Based on the GitHub issues, it seems vllm 0.4.1 supports python 3.11. However, I'm running into issues with incompatible pytorch versions when installing vllm. The github issue mentions needing to build from source to use pytorch 2.2, but the pip installed version still uses an older pytorch. I tried creating a fresh conda environment with python 3.11 and instal
27 April 2024 at 09:34

How does one use vllm with pytorch 2.2.2 and python 3.11?

Title: How does one use vllm with pytorch 2.2.2 and python 3.11?

I'm trying to use the vllm library with pytorch 2.2.2 and python 3.11. Based on the GitHub issues, it seems vllm 0.4.1 supports python 3.11.

However, I'm running into issues with incompatible pytorch versions when installing vllm. The github issue mentions needing to build from source to use pytorch 2.2, but the pip installed version still uses an older pytorch.

I tried creating a fresh conda environment with python 3.11 and installing vllm:

$ conda create -n vllm_test python=3.11
$ conda activate vllm_test
(vllm_test) $ pip install vllm
...
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
vllm 0.4.1 requires torch==2.1.2, but you have torch 2.2.2 which is incompatible.

I also tried installing pytorch 2.2.2 first and then vllm:

(vllm_test) $ pip install torch==2.2.2
(vllm_test) $ pip install vllm
...
Building wheels for collected packages: vllm
  Building wheel for vllm (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for vllm (pyproject.toml) did not run successfully.
  │ exit code: 1

Can someone clarify what versions of vllm, pytorch and python work together currently? Is there a recommended clean setup to use vllm with the latest pytorch 2.2.2 and python 3.11?

I've tried creating fresh conda environments, but still run into version conflicts. Any guidance on the right installation steps would be much appreciated. Thanks!

ref: https://github.com/vllm-project/vllm/issues/2747

Fine-Tuning the Falcon 7-Billion Parameter Model with Hugging Face and oneAPI

Optimizing Large Language Models on Intel® Xeon® Processors with Intel® Advanced Matrix Extensions (Intel® AMX)

Fine-Tuning the Falcon 7-Billion Parameter Model with Hugging Face and oneAPI

Optimizing Large Language Models on Intel® Xeon® Processors with Intel® Advanced Matrix Extensions (Intel® AMX)

Create Your Own Custom ChatbotIntel
Train Large Language Models Quickly and Easily on Intel® Processors
30 November 2023 at 00:54

Create Your Own Custom Chatbot

Train Large Language Models Quickly and Easily on Intel® Processors

Accelerate Llama 2 with Intel® AI Hardware and Software OptimizationsIntel
Democratizing Access to Large Language Models
14 November 2023 at 01:08

Accelerate Llama 2 with Intel® AI Hardware and Software Optimizations

Democratizing Access to Large Language Models

How to chat with multiple pdfs (that have different information) using langchain?

Currently I have managed to make a web interface to chat with a single PDF document using langchain as a framework, OpenAI as an LLM and Pinecone as a vector store. However, when I wanted to introduce new documents (5 new documents) PDF to the vecotres store, I realized that the information is different from the first document.

I have thought about introducing the resulting embeddings of all the pdf documents to Pinecone. But I have a doubt about whether the information can be crossed when specific information is requested from only one PDF document.

So I'm thinking that another way could be to add some selectors in the same web interface so that the user can choose from the PDF they want to obtain answers from. and thus the information is directed to the specific PDF. But perhaps the user's interaction with the web interface would not be so automatic.

This is why I want to find a way to send all pdf documents to pinecone, and perhaps in the vector store itself add an index for each document or add more collections. I appreciate if anyone has worked on something similar and can give me advice to continue with my task.

Loading different document types in langchain for an all data source qa bot

I am trying to build an application which can be used to chat with multiple types of data using the different langchain and use streamlit to build the application.

I am unable to load the files properly with the langchain document loaders-

Here is the loader mapping dict-

FILE_LOADER_MAPPING = {
    ".csv": (CSVLoader, {"encoding": "utf-8"}),
    ".doc": (UnstructuredWordDocumentLoader, {}),
    ".docx": (UnstructuredWordDocumentLoader, {}),
    ".epub": (UnstructuredEPubLoader, {}),
    ".html": (UnstructuredHTMLLoader, {}),
    ".md": (UnstructuredMarkdownLoader, {}),
    ".odt": (UnstructuredODTLoader, {}),
    ".pdf": (PyPDFLoader, {}),
    ".ppt": (UnstructuredPowerPointLoader, {}),
    ".pptx": (UnstructuredPowerPointLoader, {}),
    ".txt": (TextLoader, {"encoding": "utf8"}),
    ".ipynb": (NotebookLoader, {}),
    ".py": (PythonLoader, {}),
 
}

Here is the main function-

def main():
    st.title("Docuverse")

    # Upload files
    uploaded_files = st.file_uploader("Upload your documents", type=["pdf", "md", "txt", "csv", "py", "epub", "html", "ppt", "pptx", "doc", "docx", "odt", "ipynb"], accept_multiple_files=True)
    loaded_documents = []
    if uploaded_files:
        # Process uploaded files
        for uploaded_file in uploaded_files:
            st.write(f"Uploaded: {uploaded_file.name}")
            st.write(f"Uploaded: {type(uploaded_file)}")
            ext = os.path.splitext(uploaded_file.name)[-1][1:].lower()
            if ext in FILE_LOADER_MAPPING:
                loader_class, loader_args = FILE_LOADER_MAPPING[ext]
                loader = loader_class(uploaded_file, **loader_args)
            else:
                loader = UnstructuredFileLoader(uploaded_file)
            loaded_documents.extend(loader.load())

        st.write("Chat with the Document:")
        query = st.text_input("Ask a question:")

        if st.button("Get Answer"):
            if query:
                # Load model, set prompts, create vector database, and retrieve answer
                try:
                    llm = load_model()
                    prompt = set_custom_prompt()
                    CONDENSE_QUESTION_PROMPT = set_custom_prompt_condense()
                    db = create_vector_database(loaded_documents)
                    response = retrieve_bot_answer(query)

                    # Display bot response
                    st.write("Bot Response:")
                    st.write(response)
                except Exception as e:
                    st.error(f"An error occurred: {str(e)}")
            else:
                st.warning("Please enter a question.")

if __name__ == "__main__":
    main()

I am uploading a pdf named protector.pdf the error I get is

TypeError: expected str, bytes or os.PathLike object, not UploadedFile


File "/home/user/.local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
    exec(code, module.__dict__)
File "/home/user/app/app.py", line 395, in <module>
    main()
File "/home/user/app/app.py", line 371, in main
    loaded_documents.extend(loader.load())
File "/home/user/.local/lib/python3.10/site-packages/langchain/document_loaders/unstructured.py", line 86, in load
    elements = self._get_elements()
File "/home/user/.local/lib/python3.10/site-packages/langchain/document_loaders/unstructured.py", line 172, in _get_elements
    return partition(filename=self.file_path, **self.unstructured_kwargs)
File "/home/user/.local/lib/python3.10/site-packages/unstructured/partition/auto.py", line 212, in partition
    filetype = detect_filetype(
File "/home/user/.local/lib/python3.10/site-packages/unstructured/file_utils/filetype.py", line 244, in detect_filetype
    _, extension = os.path.splitext(_filename)
File "/usr/local/lib/python3.10/posixpath.py", line 118, in splitext
    p = os.fspath(p)

Here is the full code - link

I am not sure If I am correctly handling the uploaded files.

How can I resolve this?

I don't understand how the prompts work in llama_index

I have been trying to query a pdf file in my local directory using LLM, I have downloaded the LLM model I'm using in my local system (GPT4All-13B-snoozy.ggmlv3.q4_0.bin) and trying to use langchain and hugging face's instructor-large model for embedding purpose, I was able to set the service_context and then building index but I'm not able to query , I keeping getting this error regarding prompt..

I'm just starting to learn how to use LLM, hope the community helps me....

error message part1

error message part2

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from InstructorEmbedding import INSTRUCTOR
from llama_index import PromptHelper, ServiceContext
from llama_index import LangchainEmbedding
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import OpenLLM
# from langchain.chat_models.human import HumanInputChatModel
from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

documents = SimpleDirectoryReader(r'C:\Users\avish.wagde\Documents\work_avish\LLM_trials\instructor_large').load_data()

model_id = 'hkunlp/instructor-large'

model_path = "..\models\GPT4All-13B-snoozy.ggmlv3.q4_0.bin"

callbacks = [StreamingStdOutCallbackHandler()]

# Verbose is required to pass to the callback manager
llm = GPT4All(model = model_path, callbacks=callbacks, verbose=True)

embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name = model_id))

# define prompt helper
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 0.2

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

service_context = ServiceContext.from_defaults(chunk_size= 1024, llm_predictor=llm, prompt_helper=prompt_helper, embed_model=embed_model)

index = VectorStoreIndex.from_documents(documents, service_context= service_context)

query_engine = index.as_query_engine()

response = query_engine.query("What is apple's finnacial situation")
print(response)

I have been going through, the source code of the library as the error message guides but I couldn't find the problem😓

How do GPT-style LLMs produce embeddings?

I'm aware of how image models produce embeddings. You feed an image to the model and look at the activations in one of the last layers of the model. I don't think this approach generalizes to LLM models though.

For example, lets say you embed a document using a GPT model. I could envision the embedding getting calculated in a variety of ways (all use one of the last layers in the LLM model):

The LLM produces an embedding for each output token in the document and averages all of the tokens' embeddings to arrive at a document-level embedding
The LLM returns only the last token's embedding
The LLM concatenates all output token embeddings where the embedding size is equal to the context length. Shorter documents with leftover context window get padded (perhaps with 0s) so that embedding dimensions are constant across all documents

Is there a broad pattern that most models (other than those of the GPT variety) use to produce embeddings or are the methods wildly different from say GPT-3 to Llama-2?

Do LLMs suffer from a kind of Dunning-Kruger effect, giving an inflated self-assessment in domains it lacks expertise in? – genai.stackexchange.com

The Dunning–Kruger effect is a cognitive bias whereby people with low ability, expertise, or experience regarding a type of task or area of knowledge tend to overestimate their ability or knowledge. ...

What are the limitations that prevent LLM's from continuing their response – genai.stackexchange.com

Most LLMs these days have an output limit. Meaning the output cannot go further than X amount of tokens. I've seen many "hacks" on social media saying how you can ask these LLMs to "...

what is the difference between langchain and text generation web ui

I am new o this ml world, I came across two tools, one is text generation web ui like oobabooga and the other is langchain. can anyone help me understand the similarity and differences between the two?

Normal view

Title: How does one use vllm with pytorch 2.2.2 and python 3.11?