Kevin Sylvestre

Using OmniAI to Convert PDFs to Markdown with LLMs

LLMs offer a great way to convert PDFs into Markdown. This can be especially useful for messy PDFs (scans, charts, diagrams, etc). Using MuPDF and OmniAI makes it simple.

Step 1. Using MuPDF to Convert a PDF into PNGs

This guide walks through the process of converting a PDF page-by-page to handle the conversion. This requires splitting the PDF into multiple PNGs. MuPDF is a great solution for doing this. To use, the CLI MuPDF must be installed. On macOS it can be installed with:

brew install mupdf

Once the MuPDF CLI is installed a PDF can be converted using the following:

mutool draw -o "./demo-%d.png" -r 300 -F png ./demo.pdf

This command can be executed within a Ruby program using:

require 'tmpdir'

# @param filename [String]
# @yield page
# @yieldparam file [File]
def pdf_to_pngs(filename:)
  Dir.mktmpdir do |dir|
    system("mutool", "draw", "-o", "#{dir}/%d.png", "-F", "png", filename)

    Dir.entries(dir).sort.each do |path|
      next unless path.match?(/^\d+.png$/)
      File.open("#{dir}/#{path}") do |file|
        yield(file)
      end
    end
  end
end

Step 2. Using OmniAI to Convert a PNG into Markdown

With the PDF split into pages the next step is to use OmniAI to submit a prompt converting each PNG into Markdown. This guide uses OpenAI, but any of the following LLMs support converting PNGs to Markdown:

To install use the following:

gem install omniai # required
gem install omniai-anthropic # optional
gem install omniai-google # optional
gem install omniai-mistral # optional
gem install omniai-openai # optional

Then the conversion of PNG to Markdown can be done using:

require 'omniai/openai'

# @param file [File]
# @param stream [IO]
def png_to_markdown(file:, stream: $stdout)
  client = OmniAI::OpenAI::Client.new
  completion = client.chat(stream:) do |prompt|
    prompt.system('You are an expert at converting files to markdown.')
    prompt.user do |message|
      message.text 'Convert the attached files to markdown.'
      message.file(file, "image/png")
    end
  end
end

Summary

That’s it! The combined example is as follows:

require 'tmpdir'
require 'omniai/openai'

# @param filename [String]
# @yield page
# @yieldparam file [File]
def pdf_to_pngs(filename:)
  Dir.mktmpdir do |dir|
    system("mutool", "draw", "-o", "#{dir}/%d.png", "-F", "png", filename)

    Dir.entries(dir).sort.each do |path|
      next unless path.match?(/^\d+.png$/)
      File.open("#{dir}/#{path}") do |file|
        yield(file)
      end
    end
  end
end

# @param file [File]
# @param stream [IO]
def png_to_markdown(file:, stream: $stdout)
  client = OmniAI::OpenAI::Client.new
  completion = client.chat(stream:) do |prompt|
    prompt.system('You are an expert at converting files to Markdown.')
    prompt.user do |message|
      message.text 'Convert the attached files to Markdown.'
      message.file(file, "image/png")
    end
  end
end

pdf_to_pngs(filename: "demo.pdf") { |file| png_to_markdown(file:) }

If everything worked a stream of markdown appears for each page of a PDF. To test try using the "City of Vancouver Tourism Factsheet".

This article originally appeared on https://workflow.ing/blog/articles/using-omniai-to-convert-pdfs-to-markdown-with-llms.