With the advent of LLMs, it is increasingly important to understand common patterns for integrating them into applications. This article explores three integration patterns leveraging OmniAI - a Ruby gem that supports OpenAI, Anthropic, Google, Mistral, etc:
Building data scrapers is a common task for engineers. Often, the source is plain text (e.g. HTML), but sometimes it’s a semi-structured format like PDFs or documents. Fortunately, this is an area where LLMs are especially helpful. This example demonstrates a script that loops through a directory of PDF receipts and generates a CSV with the following structure:
| PATH | MERCHANT | CATEGORY | DATE | DESCRIPTION | TAX | SUBTOTAL | TOTAL |
| ---------- | -------- | -------- | ---------- | ----------- | --- | -------- | ----- |
| ./acme.pdf | ACME Inc | supplies | 2025-12-31 | Stationary | 2.0 | 7.0 | 9.0 |
The example uses the vision capabilities built into most LLMs. It pairs those capabilities with requesting structured data to parse each receipt in a directory. The code uses Google (since it natively supports PDFs and is relatively inexpensive). Not every LLM supports PDFs, so as a fallback a PDF may be converted to images using a tool like MuPDF prior to processing (see Using OmniAI to Convert PDFs to Markdown with LLMs).
require 'csv'
require 'omniai/google'
client = OmniAI::Google::Client.new
format = OmniAI::Schema.format(name: "Receipt", schema: OmniAI::Schema.object(
description: "A receipt for a purchase.",
properties: {
merchant: OmniAI::Schema.string(description: "The merchant (e.g. 'ACME Inc')"),
category: OmniAI::Schema.string(enum: %w[advertising rent utilities supplies travel]),
date: OmniAI::Schema.string(description: "The date of the receipt as 'YYYY-MM-DD'"),
description: OmniAI::Schema.string(description: "A description of the receipt."),
tax: OmniAI::Schema.number(description: "The sum of all taxes (PST, GST, etc)."),
subtotal: OmniAI::Schema.number(description: "The total without taxes for the receipt."),
total: OmniAI::Schema.number(description: "The total with taxes for the receipt.")
},
required: %i[tax subtotal total]
))
result = CSV.generate do |csv|
Dir.glob("./**/*.pdf") do |path|
File.open(path, "rb") do |file|
response = client.chat(format:) do |prompt|
prompt.system("You are an expert at processing PDF receipts.")
prompt.user do |message|
message.text("Process the attached PDF receipt for the requested data.")
message.file(file, "application/pdf")
end
end
data = format.parse(response.text)
csv << [
path,
data[:merchant],
data[:category],
data[:date],
data[:description],
data[:tax],
data[:subtotal],
data[:total]
]
end
end
end
puts result
| PATH | MERCHANT | CATEGORY | DATE | DESCRIPTION | TAX | SUBTOTAL | TOTAL |
| ---------- | -------- | -------- | ---------- | ----------- | --- | -------- | ----- |
| ./acme.pdf | ACME Inc | supplies | 2025-12-31 | Stationary | 2.0 | 7.0 | 9.0 |
The above example demonstrates a basic AI integration pattern: sending input and parsing output. The input is composed of a system
message and a user
message with multiple parts (some text and a file). The output is structured and matches a specific schema.
Retrieval-Augmented Generation (RAG) is a method for narrowing down large datasets so a language model can respond more effectively to a prompt. A common example is the “AI overview” sections now appearing on many websites. This example will define a function ai_overview
that takes text and returns a domain-specific summary. Here, the domain is PDF manuals for various products (e.g., toasters, blenders, etc.). The goal is to implement a function like the following:
ai_overview("How often do I need to clean my Bambino Plus?")
To get started, the manuals (in this case as PDFs) must be converted into a machine-readable text format for passing back and forth to an LLM. This can be accomplished using the chat and vision capabilities with a specially crafted prompt asking to convert to Markdown. This example uses a shortcut here via a dedicated API offered by Mistral for OCR:
require 'omniai/mistral'
client = OmniAI::Mistral::Client.new
DOCUMENTS = [
{
name: "the-bambino-plus-instruction-book",
url: "https://assets.breville.com/Instruction-Booklets/ANZ/BES500BSS_ANZ_IB_I21_FA_WEB.pdf",
},
{
name: "the-smart-toast-instruction-book",
url: "https://assets.breville.com/Instruction-Booklets/ANZ/BTA825_IB_D18_WEB.pdf",
},
{
name: "the-fresh-and-furious-instruction-book",
url: "https://assets.breville.com/BBL620/BBL620W_ANZ_IB_F22_FA_LR.pdf",
},
]
DOCUMENTS.each do |document|
FileUtils.mkdir_p("./manuals/#{document[:name]}")
response = client.ocr(document[:url])
response.pages.each do |page|
number = page.index.next
File.write("./manuals/#{document[:name]}/#{number}.md", <<~TEXT)
---
name: "#{document[:name]}"
page: "#{number}"
---
#{page.markdown}
TEXT
end
end
This script generates a folder for each manual, splitting each page into a separate Markdown file. Each file includes front matter with metadata (e.g., document name and page number). Next, these pages need to be converted to embeddings. Embeddings are a vector representation of objects. The OmniAI#embed
method with the OpenAI provider is used to generate the embedding and save it to a file:
require 'omniai/openai'
client = OmniAI::OpenAI::Client.new
Dir.glob("./manuals/**/*.md") do |path|
next if File.exist?("#{path}.embedding")
File.open(path, "rb") do |file|
response = client.embed(file.read)
File.write("#{path}.embedding", response.embedding.join("\n"))
end
end
Inspecting the generated embeddings confirms that each Markdown file has its own vector. Since any text can be converted into an embedding, this leads to the final step: the user prompt is also turned into an embedding. This embedding is compared against the precomputed embeddings generated earlier. The closest matching manual pages are selected and sent to the LLM to generate a summary in response to the original prompt:
require 'omniai/openai'
ENTRIES = []
Dir.glob("./manuals/**/*.md") do |path|
ENTRIES << {
path: path,
embedding: File.read("#{path}.embedding").split("\n").map { |entry| Float(entry) },
}
end
# @param src [Array<Float>]
# @param dst [Array<Float>]
#
# @return [Float]
def euclidean_distance(src, dst)
Math.sqrt(src.zip(dst).map { |a, b| (a - b)**2 }.reduce(:+))
end
def search(text, limit: 5)
client = OmniAI::OpenAI::Client.new
response = client.embed(text)
embedding = response.embedding
ENTRIES
.sort_by { |entry| euclidean_distance(entry[:embedding], embedding) }
.first(limit)
.map { |entry| File.read(entry[:path]) }
end
# @param text [String]
def ai_overview(text)
client = OmniAI::OpenAI::Client.new
client.chat(stream: $stdout) do |prompt|
prompt.system <<~TEXT
You are an expert at formatting information found in product manuals:
1. Use the provided <pages>...</pages> to answer the <question>...</question>.
2. Do not use any other information in answering the question.
3. Be as concise and accurate as possible when answering the question.
TEXT
prompt.user <<~TEXT
<question>
#{text}
</question>
<pages>
#{search(text).map { |page| "<page>#{page}</page>" }.join("\n")}
</pages>
TEXT
end
end
ai_overview("How often do I need to clean my Bambino Plus?")
You need to perform a cleaning cycle on your Bambino Plus every 200 extractions (uses), as indicated by the 1 CUP and 2 CUP buttons alternately flashing. Additionally, you should clean certain parts after each use:
- The steam wand should always be cleaned after each milk texturing.
- The filter baskets and portafilter should be rinsed under hot water directly after use.
- The drip tray should be emptied and cleaned after each use or when the drip tray indicator rises.
- The group head interior and shower screen should be wiped with a damp cloth and periodically rinsed with hot water.
Descaling is required when the machine indicates it, which will be when the 1 CUP and STEAM button and the 2 CUP button flash alternately for 15 seconds.
You can also manually enter the cleaning cycle before the alert is triggered if desired.
The above example offers a more complex integration where OCR tools convert PDFs to Markdown and embeddings are generated. A realistic deployment of the above might use a vector database to save the embeddings like pgvector
and associate them with the Markdown.
The final example gives an LLM a tool and asks it to perform a set of complex tasks. In this case, the tool is a browser. The tasks are provided by the user in a simple chat UI / UX over the CLI. To begin, a browser needs to be configured. To solve this, Watir (a Selenium wrapper) is used:
require 'watir'
browser = ::Watir::Browser.new
browser.goto('https://news.ycombinator.com')
browser.element(css: '.submission .title a').click
puts browser.html
Running the above code opens Chrome, visits Hacker News, clicks a link, and prints the HTML.
This browser snippet is easily wrapped in a tool. A tool provides a structured schema for interacting with Watir. In this case, it offers three parameters that can be passed in with any invocation:
action
: an enum of either goto
, click
or html
.url
a URL to visit in the case of a goto
action.selector
a CSS selector to find in the case of a click
action.Tools may also return back text to an LLM. For example:
html
action is expected to return back the HTML on the page.click
and goto
actions might return back a status of if they worked or not.require 'watir'
class BrowserTool < OmniAI::Tool
module Action
HTML = "html"
GOTO = "goto"
CLICK = "click"
end
ACTIONS = [
Action::HTML,
Action::GOTO,
Action::CLICK,
]
description <<~TEXT
A chrome browser that can be used to goto sites, click elements, and capture HTML.
TEXT
parameter :action, :string, enum: ACTIONS, description: <<~TEXT
An action to be performed:
* `#{Action::GOTO}`: manually navigate to a specific URL
* `#{Action::HTML}`: retrieve the full HTML of the page
* `#{Action::CLICK}`: click an element using a selector (e.g. '.btn', '#submit', etc)
TEXT
parameter :url, :string, description: <<~TEXT
e.g. 'https://example.com/some/page'
Required for the following actions:
* `#{Action::GOTO}`
TEXT
parameter :selector, :string, description: <<~TEXT
e.g. 'button#submit', '.link', '#main > a', etc.
Required for the following actions:
* `#{Action::CLICK}`
TEXT
required %i[action]
# @param logger [Logger]
def initialize(logger: Logger.new($stdout))
super()
@browser = ::Watir::Browser.new
@logger = logger
end
# @param action [String]
# @param selector [String] optional
# @param url [String] optional
def execute(action:, url: nil, selector: nil)
case action
when Action::GOTO then goto(url:)
when Action::HTML then html
when Action::CLICK then click(selector:)
end
rescue StandardError => error
{ status: :error, message: error.message }
end
private
# @param url [String]
def goto(url:)
@logger.info("goto url=#{url.inspect}")
raise ArgumentError, "goto requires url" unless url
@browser.goto(url)
return { status: :ok }
end
# @return selector [String]
def click(selector:)
@logger.info("click selector=#{selector.inspect}")
raise ArgumentError, "click requires selector" unless selector
@browser.element(css: selector).click
return { status: :ok }
end
# @return [String]
def html
@logger.info("html")
@browser.html
end
end
Using the browser tool, a basic CLI may be configured that loops asking for requests via the user and providing them to the LLM. In this case, a history of prior user and assistant messages are maintained as a thread, providing the LLM extra context:
require "omniai/anthropic"
client = OmniAI::Anthropic::Client.new
logger = Logger.new($stdout)
logger.formatter = proc { |_, _, _, message| "[browser] #{message}\n" }
browser = BrowserTool.new
puts "Type 'exit' or 'quit' to leave."
prompt = OmniAI::Chat::Prompt.build do |builder|
builder.system <<~TEXT
You are tasked with assisting a user in browsing the web.
TEXT
end
loop do
print "> "
text = gets.strip
break if %w[exit quit].include?(text)
prompt.user(text)
response = client.chat(prompt, stream: $stdout, tools: [browser])
prompt.assistant(response.text)
end
Testing the CLI tool produces the following result:
Type 'exit' or 'quit' to leave.
> Visit hacker news and tell me the top 3 articles today.
I'll help you visit Hacker News and find the top 3 articles. Let me navigate to the site and retrieve the content.
[browser] goto url="https://news.ycombinator.com"
Now let me get the HTML content to see the top articles:
[browser] html
## Top 3 Articles on Hacker News Today:
### 1. **Meta: Shut Down Your Invasive AI Discover Feed. Now**
- **Source:** Mozilla Foundation (mozillafoundation.org)
- **Points:** 90 points
- **Comments:** 50 comments
### 2. **Decreasing Gitlab repo backup times from 48 hours to 41 minutes**
- **Source:** about.gitlab.com
- **Points:** 53 points
- **Comments:** 10 comments
### 3. **Why Bell Labs Worked**
- **Source:** fabiomanganiello.com
- **Points:** 32 points
- **Comments:** 12 comments
> Click on the 2nd article and give me a 1-2 paragraph summary.
[browser] click selector=".title a"
[browser] html
...
The above example provides introduces tools. They are a very useful option for providing user-specific data to an LLM and allow an LLM to accomplish more complex workflows. It also demonstrates tracking a chain of user and assistant methods. An actual deployment might need to handle permissions and eventually truncate messages.