SmartScan

SmartScan is a document scanning and AI-powered search application that allows users to upload documents, extract text using OCR, and perform semantic searches using vector embeddings.

Features

Document Upload: Upload PDF, images, and text files from your device or camera
OCR Processing: Extract text from documents using Eden AI Document Parser and Image OCR
Vector Embeddings: Create semantic embeddings using OpenAI's embedding model
Semantic Search: Search documents by meaning, not just keywords
Multi-platform: Works on iOS, Android, and Web with a single codebase

Setup

Prerequisites

Node.js 16+ and npm
Expo CLI (npm install -g expo-cli)
Supabase account
Eden AI API key
OpenAI API key

Installation

Clone the repository:

git clone https://github.com/yourusername/smartscan.git
cd smartscan

Install dependencies:

npm install

Set up environment variables by creating a .env file:

EXPO_PUBLIC_SUPABASE_URL=your_supabase_url
EXPO_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
EXPO_PUBLIC_EDEN_AI_API_KEY=your_eden_ai_api_key
EXPO_PUBLIC_OPENAI_API_KEY=your_openai_api_key
EXPO_PUBLIC_APP_URL=your_app_url

Initialize the database:

npm run db:init

Set up authentication:

npm run auth:setup

Start the development server:

npm start

Database Setup

The application requires a Supabase project with the following:

Database Tables:
- users: For user profiles
- documents: For document metadata
- document_embeddings: For vector embeddings
Vector Extension:
- Make sure the vector extension is enabled in your Supabase project
Storage Bucket:
- Create a documents bucket for file storage
Authentication:
- Enable Google OAuth provider

The db:init script will help you set up most of this automatically.

Document Processing Flow

Upload: User uploads a document via the DocumentUploader component
Storage: File is stored in Supabase Storage
OCR: Eden AI extracts text from the document using Document Parser (for PDFs) or Image OCR (for images)
Embedding: OpenAI creates vector embeddings for the extracted text
Storage: Embeddings are stored in the database with PostgreSQL's vector extension
Search: Users can search using natural language queries

Component Structure

DocumentUploader.tsx: UI for uploading documents
DocumentSearch.tsx: UI for searching documents
DocumentList.tsx: UI for viewing and managing documents

Server Functions

documentQueries.uploadAndProcessDocument: Handles document upload and processing
processDocumentWithOCR: Extracts text from documents
createDocumentEmbeddings: Creates vector embeddings for extracted text
searchDocumentsByEmbedding: Performs semantic search

Troubleshooting

PDF Processing

If you encounter issues with PDF processing:

Make sure your Eden AI API key is correctly set up with access to Document Parser and Image OCR services
Verify that the PDF is not password-protected
Try converting the PDF to images before uploading for better OCR results

Vector Search

If vector search is not working:

Confirm the vector extension is enabled in Supabase
Check that the search_document_embeddings function is properly created
Verify embeddings are being stored correctly in the document_embeddings table

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
android		android
app		app
assets		assets
components		components
constants		constants
docs		docs
implementations		implementations
lib		lib
scripts		scripts
server		server
stores		stores
supabase		supabase
.env.example		.env.example
.gitignore		.gitignore
AUTHENTICATION_GUIDE.md		AUTHENTICATION_GUIDE.md
MIGRATION_SUMMARY.md		MIGRATION_SUMMARY.md
README.md		README.md
SETUP.md		SETUP.md
babel.config.js		babel.config.js
eslint.config.js		eslint.config.js
global.css		global.css
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
types.ts		types.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SmartScan

Features

Setup

Prerequisites

Installation

Database Setup

Document Processing Flow

Component Structure

Server Functions

Troubleshooting

PDF Processing

Vector Search

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

chirag405/SmartScan

Folders and files

Latest commit

History

Repository files navigation

SmartScan

Features

Setup

Prerequisites

Installation

Database Setup

Document Processing Flow

Component Structure

Server Functions

Troubleshooting

PDF Processing

Vector Search

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages