Dredging Manuals with iRAG

Introduction

JCMI produces machines that are complemented with user-manuals dense in jargon. I remember reading through a few hundred pages and finding the need to frequently consult a glossary for contextualizing the concepts.

Some JCMI buyers may be equipped with engineers in their teams who are comfortable with such environments; however, it is not reasonable to generalize this assumption to every client.

I found that the technical debt among certain clients is substantial and seldom alarming. I recall a few instances where my supervisor would be called and asked for clarity in something I did not see the need for; as an example, connecting ethernet switches and keeping light-sensitive components in the dark are aspects covered in great detail within the manuals. Given that our clients could benefit from the contents in them, my supervisor would save a great deal of his time at work.

After analyzing our manuals with a lens of undoing the curse of knowledge I seemed to have fallen prey to, I reached out to some clients who built up a rich history in call logs with my supervisor. A simple bridge in semantics between these individuals and manuals seemed to be a trivial solution with lasting impacts; this was my journey with iRAG at JCMI.

Challenges with Traditional RAG

A weighty component of RAG is the Embedding Function (EF). The EF’s purpose is to cipher text into numbers and store them in a given data structure, we call these vectors, or sets. We can then perform Euclidean mathematics on these sets. The cardinality of the sets help us define the dimensionality of operations.

This becomes very complicated very fast. I prefer simplicity but wanted to emphasize the dimensionality to clarify the general reference that is given to high processing in matrix multiplication and applied math when exploring AI.

Only problem is, we don’t wanna compute this much, its expensive and doesn’t yield a substantial return on investment (ROI).

Iterative RAG

In iterative RAG (iRAG), we kill EF dependence. We use the LLM to siphon data and therefore save EF-based compute. If we scale this idea, it results in tremendous saving.

Cleaned and specific data is a core aspect of an iRAG approach. I explain this part in my video below:

link here

Processing Speeds

The efficiency of an EF depends on the processing unit its executed on. iRAG depends on the LLM’s response time. I explain this in the video below:

link here

JCMI - Parts RAG

Software Engineer @ HealthlyticsAI