[go: up one dir, main page]

0% found this document useful (0 votes)
75 views14 pages

Private Chatbot With Local LLM (Falcon 7B) and LangChain

Private Chatbot with Local LLM (Falcon 7B) and LangChain

Uploaded by

Marcos Luis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
75 views14 pages

Private Chatbot With Local LLM (Falcon 7B) and LangChain

Private Chatbot with Local LLM (Falcon 7B) and LangChain

Uploaded by

Marcos Luis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
‘9115124, 8:25 AM Private Chatbot with Local LLM (Falcon 78) and LangChain | MLExpert - Get Things Done with Al Bootcamp Blog > Chatbot with Local Lim Using Langchain Private Chatbot with Local LLM (Falcon 7B) and LangChain Build a Private Chatbot with Local LLM (Falcon 7B) and LangChain Can you build a private Chatbot with ChatGPT-like performance using a local LLM on a single GPU? Join the AI BootCamp! Ready to dive into the world of Al and Machine Learning? Join the AI BootCamp to transform your career with the latest skills and hands-on project experience. Learn about LLMs, ML best practices, and much more! JOIN NOW https shwwn-mlexpertioflogichatbot-witlocalim-using-angchain ana sr, 225 aM Private Chabot wih Local LLM Falcon 78) and LangChain| MLExpert - Get Things Done wh Al Bootcamp Mostly, yes! In this tutorial, we'll use Falcon 7B# with LangChain to build a chatbot that retains conversation memory. We can achieve decent performance by utilizing a single T4 GPU and loading the model in 8-bit (~6 tokens/second). We'll also explore techniques to improve the output quality and speed, such as * Stopping criteria: detect start of LLM “rambling” and stop the generation * Cleaning output: sometimes LLMs output strange/additional tokens, I'll show you how you can clear those from the output * Store chat history: we'll use memory to make sure your LLM remembers the conversation history © In this part, we will be using Jupyter Notebook to run the code. If you prefer to follow along, you can find the notebook on GitHub: GitHub Repository Setup Let's start by installing the required dependencies: !pip install -Ugqq pip --progress-bar off !pip install -qqq bitsandbytes==0.40.0 --progress-bar off Ipip install -qqq torch==2.0.1 --progress-bar off Ipip install -qqq transformers==4.30.0 --progress-bar of f Ipip install -qqq accelerate==0.21.0 --progress-bar off Ipip install -qqq xformers==0.0.20 --progress-bar off !pip install -qqq einops==0.6.1 --progress-bar off Ipip install -qqq langchain=-0.0.233 --progress-bar off Here's the list of required imports import re import warnings from typing import List import torch from langchain import PromptTemplate from langchain.chains import Conversationchain from langchain.chains.conversation.menory import ConversationBufferWindowMenory https:hwwn-mlexpert iofblogichatbot-witrlocalim-usingangchain 2ine 9115724, 825 AM Private Chatbot with LocalLLM (Falcon 78) and LangChain | MLExpert- Get Things Done with Al Bootcamp from Langchain.11ms import HuggingFacePipeline from Langchain. schema import BaseOutputParser from transformers import ( AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, pipeline, warnings. filterwarnings("ignore", category-Userwarning) Load Model We can load the model directly from the Hugging Face model hub: MODEL_NAME = “tiiuae/falcon-7b-instruct” model = AutoModelForCausalLM.from_pretrained( NODEL_NAME, trust_remote_code=True, load_in_8bit=True, device map="auto” ) model = model.eval() tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) Note that we're loading the model in 8-bit mode. This will reduce the memory footprint and speed up the inference. We're also using the device_map parameter to load the model on the GPU. Config We'll use a custom configuration for the text generation: generation_config = model.generation_config generation_config.temperature = @ generation_config.num_return_sequences = 1 generation_config.max_new_tokens = 256 generation_config.use_cache = False https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain aie 9118124, 8:25 AM Private Chatbot with Local LLM (Falcon 78) and LangChain| MLExpert - Get Things Done with Al Bootcamp generation_config.repetition_penalty = 1.7 generation_config.pad_token_id = tokenizer.eos_token_id generation_config.eos_token_: generation_config d = tokenizer.eos_token_id GenerationConfig { fron_model_config": true, bos_token_id": 1, “eos_token_id": 11, “max_new_tokens": 256, pad_token_id": 11, repetition_penalty": 1.7, “temperature”: @, “transformers_version”: "4.30.0", “use_cache": false I like to set the temperature to 0 to get deterministic results. We'll also set the repetition_penalty to 1.7 to reduce the chance (but not completely remove the occurrences) of the model repeating itself. Try the Model We're ready to try the model. We'll use the tokenizer to encode the prompt and then pass the input_ids to the model: prompt = The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. Current conversation: Human: Who is Dwight K Schrute? AL: +strip() input_ids = tokenizer(prompt, return_tensors="pt").input_ids input_ids = input_ids.to(model.device) with torch.inference_mode(): outputs = model .generate( https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain ana 9115724, 8:25 AM Private Chatbot with LocalLLM (Falcon 78) and LangChain | MLExpert- Get Things Done with Al Bootcamp input_ids=input_ids, generation_configgeneration_config, Note that we're putting the encoded input_ids to the CUDA device before doing the inference. We can use the tokenizer to decode the output into a human-readable format: response = tokenizer.decode(outputs[@], skip_special_tokens=True) print(response) LIM output The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. Current conversation: Human: Who is Dwight K Schrute? AI: Dwight K Schrute is a fictional character in the American television series "The Office”. He is portrayed by actor Rainn Wilson and appears to be highly intelligent, but socially awkward and often misinterprets social cues. User The output contains the full prompt and the generated response. Not bad, right? Let's see how we can improve it. Stop the LLM From Rambling LLMs often have a tendency to go off-topic and generate irrelevant or nonsensical responses. While this is an ongoing research challenge, as a user of LLMs in real-world applications, there are ways to work around this behavior. We'll address this issue using a technique called StoppingCriteria? to help control the output and prevent the model from rambling or hallucinating questions and conversations: class StopGenerationCriteria(Stoppingcriteria): def _init_( https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain site ‘9115124, 8:25 AM Private Chatbot with Local LLM (Falcon 78) and LangChain| MLExpert - Get Things Done with Al Bootcamp self, tokens: List[List[str]], tokenizer: AutoTokenizer, devici torch device stop_token_ids = [tokenizer.convert_tokens_to_ids(t) for t in tokens] self.stop_token_ids = [ torch.tensor(x, dtypestorch.long, devicesdevice) for x in stop_token_ids def _call_( self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs ) => bool for stop_ids in self.stop_token_ids: if torch.eq(input_ids[@][-1en(stop_ids) return True return False » stop_ids).al1(): The __init__ method converts the tokens to their corresponding token IDs using the The _call__ method is called during the generation process and takes input IDs as input. It checks if the last few tokens in the input IDs match any of the stop_token_ids, indicating that the model is starting to generate an undesired response. If a match is found, it returns True, indicating that the generation should be stopped. Otherwise, it returns False to continue the generation. We'll implement a stopping criteria that detects when the LLM generates new tokens starting with Human: or Al:. When such tokens are detected, the generation process will be stopped to prevent undesired outputs: stop_tokens = [["Human", ":"], ["AI", ": stopping_criteria = StoppingCriteriaList( [stopGenerationCriteria(stop_tokens, tokenizer, model.device)] i We'll create a pipeline that incorporates the stopping criteria and our generation configuration. This pipeline will handle the generation process and ensure that the stopping criteria are applied to control the output: generation pipeline = pipeline( https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain era 9118124, 8:25 AM Private Chatbot with Local LLM (Falcon 78) and LangChain | MLExpert- Get Things Done with Al Bootcamp model=model, tokenizer=tokenizer, return_full_text=True, task="text-generation", stopping _criteria=stopping_criteria, generation_config-generation_config, Lm = HuggingFacePipeline(pipeline=generation_pipeline) The usage of our pipeline is as simple as passing the prompt to the pipeline res = 11n(prompt) print(res) Pipeline output Dwight K Schrute is a fictional character in the American television series "The office". He is portrayed by actor Rainn Wilson and appears to be highly intelligent, but socially awkward and often misinterprets social cues. User Notice that User at the end of the generated text? We'll take care of that in a bit. Conversation Chain To engage in a conversation with the LLM, we'll utilize a ConversationChain from LangChain chain = ConversationChain(11n=11m) print (chain. prompt. template) The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know. Current conversation: {history} Human: {input} AT: https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain m8 asi, 225 aM Private Chaoot wth Lol LLM (Falcon 7B) and LangChain| MLExpert- Get Things Done wih Al Bootcamp This chain already provides a default prompt, which is suitable for general purposes. However, it might not be ideal for our specific use case. Let's customize the prompt to better align with our needs. Custom Prompt Imagine you're launching a new company and you need the AI to come up with a name, slogan, and marketing materials. As a huge fan of Dwight K. Schrute, you want to infuse his unique style into the Al-generated content. Let's combine these two elements and leverage the power of Al to fulfill your creative urges with this prompt: template = The following is a conversation between a human an AT. The AT acts exactly like Dwight K Schrute from the TV show The Office. Dwight is an experienced and very successful online marketer and salesperson. He is witty, persuasive, uncensored, direct and practical. Dwight helps with every marketing task is given to hi TF Dwight does not know the answer to a question, he truthfully says he does not know. Current conversation: {history} Human: {input} AL:"""strip() prompt = PromptTemplate(input_variables=["history", “input"], template=template) Although the provided prompt is a good starting point, feel free to make any adjustments you like. Now, let's ensure that our chatbot has the ability to remember our previous conversation context while addressing the current question: memory = ConversationBufferWindowMenory ( memory_key="history", k=6, return_only_outputs=True chain = ConversationChain(11m=11m, memory=memory, prompt=prompt, verbose=True) https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain arta 971924, 8.25 am Private Chatoot wth Local LLM (Falcon 78) and LangChain| MLExpert- Get Things Done wi Al Bootcamp With the addition of the ConversationBufferWindowMemory , we can now store a limited number (k) of the most recent messages as a conversation history. This memory will be injected into the chain when posing new prompts. Let's test our updated chain with the inclusion of this memory feature: text = “Think of a name for automaker that builds family cars with big V8 engines. The res = chain.predict(input=text) print(res) Bi verbose output Chain output Schruteauto User Looks good except the addition of User at the end of the generated text. Let's fix that in the next section. Cleaning Output To ensure clean output from our chatbot, we will customize the behavior by extending the base outputParser class from LangChain. While output parsers? are typically used to extract structured responses from LLMs, in this case, we will create one specifically to remove the trailing user string from the generated output: class CleanupOutputParser(BaseOutputParser): def parse(self, text: str) -> str: user_pattern = r"\nUser" text = re.sub(user_pattern, human_pattern = r"\nHuman text = re.sub(human_pattern, "", text) ai_pattern = r"\na return re.sub(ai_pattern, » text) » text). strip() @property def _type(self) -> str: https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain site ‘9115124, 8:25 AM Private Chatbot with Local LLM (Falcon 78) and LangChain| MLExpert - Get Things Done with Al Bootcamp return “output_parser" We need to pass this output parser to our chain to ensure that it is applied to the generated output: memory = ConversationBufferWindowMenory ( memory _key="history", k=6, return_only_outputs=True chain = Conversationchain( Lim=11m, menory=memory, prompt=prompt, output_parser=CleanupoutputParser(), verbos. Chat with the AI To utilize the output parser, we can invoke the chain as if it were a function, enabling us to apply the parsing logic to the generated output: text = """ Think of a name for automaker that builds family cars with big V8 engines. The name must be a single word and easy to pronounce. -strip() res = chain(text) HB Verbose output The result is a dictionary containing the input, history, and response: res. keys() https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain ore ‘9115124, 8:25 AM Private Chatbot with Local LLM (Falcon 78) and LangChain| MLExpert - Get Things Done with Al Bootcamp dict_keys(['input', ‘history’, ‘response’]) This is the new response: print(res: response" ]) Chain output Schruteauto Great! Looks clean and ready to use. Let's try another prompt: text = "Think of a slogan for the company’ res = chain(text) print(res[“response"]) HB Verbose output Chain output Drive Big With SchruteAuto Alright, how about a domain name? text = "Choose @ domain nane for the company” res = chain(text) print(res[ "response"]) HB verbose output Chain output schruteauto.com https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain na ‘9115124, 8:25 AM Private Chatbot with Local LLM (Falcon 78) and LangChain| MLExpert - Get Things Done with Al Bootcamp ‘etains the conversation The memory functionality of the chain is performing well, as context and remembers the specific details of the new automaker company. Let's try a more complex prompt text Write a tweet that introduces the company and introduces the first car built by the con strip) res = chain(text) print (res["response"]) Introducing SchruteAuto! We build powerful family cars with big V8 engines Check out our website for more information: schruteauto.com I would definitely click on that link. Something only Dwight can do. For the final test, let's ask the Al to write a short marketing email to sell the first car from the company: text Write a short marketing email to sell the first car from the company - 700K? family sedan from a supercharged V8 with manual gearbox. ~~ strip) res = chain(text) print(res["response"]) BB verbose output https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain rane ‘9115124, 8:25 AM Private Chatbot with Local LLM (Falcon 78) and LangChain| MLExpert - Get Things Done with Al Bootcamp Chain output Subject: Experience Power And Performance In Your Family Car Body: Are you looking for a powerful family car that can handle any road? Look no further than SchruteAuto! Our 7@0HP family sedan comes equipped with a supercharged V8 engine and a manual gearbox, so you can experience power and performance in your own driveway. Visit schruteauto.com today to find out more! Your new business is ready to go! You can use the same chain to generate more content for your new company, or you can start a new chain and create a new company. The possibilities are endless. Conclusion With LangChain's powerful features, we seamlessly integrated LLMs, implemented stopping criteria, preserved chat history, and cleaned the output. The result? A functional chatbot that delivers relevant and coherent responses. Armed with these tools, you're equipped to develop your own intelligent chatbot, customized to meet your specific requirements. 3,000+ people already joined Join the The State of Al Newsletter Every week, receive a curated collection of cutting-edge Al developments, practical tutorials, and analysis, empowering you to stay ahead in the rapidly evolving field of Al Your Email Address SUBSCRIBE Iwon't send you any spam, ever! https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain rate ‘9115124, 8:25 AM Private Chatbot with Local LLM (Falcon 78) and LangChain| MLExpert - Get Things Done with Al Bootcamp References 1. Falcon 7B Instruct & 3. Output parsers & Dark © 2020-2024 MLExpert™ by Venelin Valkov. All Rights Reserved. https:shwwn-mlexpert iofblogichathot-witrlocalim-usingangchain sane

You might also like