MLWizards

MLWizards is a Python library designed to simplify the process of creating chatbots and fine-tuning models using OpenAI. It also includes features for extracting and processing text from PDFs.

Features

ChatBot: Easily create and interact with chatbots using OpenAI's GPT models.
Model Fine-Tuning: Fine-tune OpenAI models with your own data.
PDF Text Extraction: Extract and preprocess text from PDF files using OCR.

Installation

1. Install the required Python packages:

pip install pytesseract pdf2image pillow requests-html openai beautifulsoup4

2. Install Poppler:

Windows

Download Poppler from here.
Extract the contents to a directory, e.g., C:\poppler.
Add C:\poppler\bin to your system's PATH.

macOS

Use Homebrew to install Poppler:
```
brew install poppler
```

Linux

Use the package manager to install Poppler:
```
sudo apt-get install poppler-utils
```

Install Tesseract OCR:
- Follow the instructions here to install Tesseract OCR on your operating system.

Usage

ChatBot

Create and interact with a chatbot using OpenAI's GPT models.

from mlwizards import ChatBot

api_key = "your_openai_api_key"
bot = ChatBot(api_key=api_key)

while True:
    user_input = input("You: ")
    if user_input.lower() in ['exit', 'quit']:
        break
    response = bot.generate(user_input)
    bot.add_to_history('user', user_input)
    bot.add_to_history('assistant', response)
    print("Bot:", response)

Model Fine-Tuning

Fine-tune OpenAI models with your own data.

from mlwizards import fine_tune_model, upload_file

api_key = "your_openai_api_key"
training_file_path = "path/to/your/training_data.jsonl"

# Upload the training file
upload_response = upload_file(api_key, training_file_path)
training_file_id = upload_response['id']

# Fine-tune the model
fine_tune_response = fine_tune_model(api_key, training_file_id)
print(f"Fine-tuning job started. Response: {fine_tune_response}")

PDF Text Extraction

Extract and preprocess text from PDF files using OCR.

from mlwizards import extract_text_from_pdf

pdf_path = "path/to/your/file.pdf"
tesseract_cmd = "/usr/local/bin/tesseract"  # Update this path to your tesseract executable if needed
extracted_text = extract_text_from_pdf(pdf_path, tesseract_cmd)
print(extracted_text)

Functions

ChatBot Class

__init__(self, api_key, model='gpt-3.5-turbo', organization=None, project=None): Initialize the ChatBot with API key and model.
generate(self, prompt): Generate a response from the chatbot.
add_to_history(self, role, content): Add a message to the conversation history.
reset_conversation(self): Reset the conversation history.
search_internet(self, query, num_results=5): Search the internet and generate a response based on search results.

Fine-Tuning Functions

fine_tune_model(api_key, training_file_id, model='gpt-3.5-turbo', organization=None, project=None, suffix=None, hyperparameters=None): Fine-tune a model with the given training file.
upload_file(api_key, file_path, purpose='fine-tune', organization=None, project=None): Upload a file to OpenAI.

PDF Text Extraction Functions

extract_text_from_pdf(pdf_path, tesseract_cmd=None): Extract text from a PDF file using OCR.
preprocess_text(text): Preprocess extracted text by removing unnecessary whitespace and newlines.

Contributing

We welcome contributions to MLWizards! Please submit a pull request or open an issue to discuss any changes you would like to make.

License

This project is licensed under the MIT License.


This README provides a comprehensive guide to installing, using, and contributing to the `MLWizards` library. It includes instructions for installing dependencies, examples of how to use the main features, and descriptions of the key functions.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
build/lib		build/lib
dist		dist
mlwizards.egg-info		mlwizards.egg-info
mlwizards		mlwizards
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLWizards

Features

Installation

Windows

macOS

Linux

Usage

ChatBot

Model Fine-Tuning

PDF Text Extraction

Functions

ChatBot Class

Fine-Tuning Functions

PDF Text Extraction Functions

Contributing

License

About

Releases

Packages

Languages

License

ykim336/MLWizard

Folders and files

Latest commit

History

Repository files navigation

MLWizards

Features

Installation

Windows

macOS

Linux

Usage

ChatBot

Model Fine-Tuning

PDF Text Extraction

Functions

ChatBot Class

Fine-Tuning Functions

PDF Text Extraction Functions

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages