top of page
Search

How to Build Miva | Part 2: Balancing Multi-Agent System Collaboration with Business Realities

  • Writer: 謝昆霖(Keanu)
    謝昆霖(Keanu)
  • 4 days ago
  • 9 min read

In our previous discussion, we explored the foundational architecture of Miva Lite and the philosophy behind the BookAI System Prompt. This installment dives deeper into more complex challenges: How do we handle reader inquiries that go beyond the specific book content? What does effective collaboration look like across multiple AI Agents? And, critically, how do we strike a balance between our technological aspirations and the practical demands of business?


▎When the Book Has No Answer (and the Reader Doesn't Know It): The "Boundary Problem" Solution


In real-world scenarios, readers frequently pose questions that naturally extend "beyond the boundaries of a book." A classic example is a reader misinterpreting a title, believing they're engaging with Carnegie's original work when it's actually a reinterpretation. Since Miva offers summaries rather than direct access to original texts, we've developed two primary solutions: proactively understanding and refining the reader's intent, and managing user expectations through carefully designed experience flows.


Our technical solution for proactively understanding and refining questions involves a specialized Agent: the Intent Clarification Agent. Its mission is clear: to truly grasp what readers want to know. It processes the reader's input, identifies their core intent, then enhances and reformulates the original question, adding detail, categorizing it, and preparing it for subsequent Agents.


<Intent_Clarification_Agent_Core>
- Primary Mission: Understand what readers REALLY want to know
- Prevent cognitive overload in answer-generation agents
- Handle ambiguous, brief, or misdirected queries
- Bridge the gap between user intent and book content boundaries
</Intent_Clarification_Agent_Core>

This Agent acts as a pre-filter, analyzing brief keywords or concepts before the main answering Agent takes over. This "pre-processing" mechanism allows the answering Agent to more precisely understand the question's scope, significantly reducing the likelihood of irrelevant or incorrect responses.


LLM as UX: Leveraging LLMs for Expectation Management and Guiding Readers to "Perceive Book Information Boundaries"


"LLM as UX" is a guiding principle within the BookAI team: utilizing LLMs to enhance the user experience, whether through dynamically generated UI or direct textual guidance. This led us to design a strategy for helping readers "perceive the book's boundaries." Here are the instructions employed in Miva Pro:


Miva Pro's Handling Method:


<Handling_Insufficient_Excerpts_By_Tier>
- When book_excerpts are insufficient, provide what you can and state the limitation.
- CRITICAL: You MUST clearly transition with phrases like:
  "Based on excerpt [1], I can explain [aspect A]. However, this doesn't cover all of [topic B]. 
  To provide a complete picture, I will supplement with my own knowledge..."
- CRITICAL: You MUST clearly disclaimer mark: 
  [!! NOTE: The following details extend beyond the book excerpts I found !!]
</Handling_Insufficient_Excerpts_By_Tier>

These instructions prioritize faithful representation of the book's content. At the same time, Miva is empowered to clearly indicate the limits of the book's information and offer valuable supplementary knowledge when appropriate.


▎Paywall: A Tiered Approach to Unleashing Creativity


Monetizing AI systems is consistently a challenging endeavor.

When considering a seemingly straightforward service like "book summarization," how do we differentiate value and provide compelling reasons for users to pay? Our approach was to view this from the service industry's perspective.


Service industries sell the "service" itself, generating value from the very first interaction, though not necessarily charging immediately. Similar to how coffee shops offer samples or gyms provide trial classes, many services begin with a lighter experience. Once a customer's need reaches a certain level, the willingness to pay naturally follows. Applying this logic, we conceived a tiered "paid book summarization service" with varying levels of intensity.


This process involved many adjustable parameters: the depth of answers, breadth of analysis, extent of creative extensions, and ability to compare across multiple books. However, on the technical side of controlling the LLM, we discovered that constraining the LLM's inherent creativity offered an excellent starting point.


In our service design, the question of "whether to allow the LLM to provide information beyond the book's content" has always been a significant philosophical and technical dilemma. On one hand, we desired the LLM to be rigorously factual—"what's there, is there; what's not, is not"—remaining entirely faithful to the original text, though this could lead to a rather unengaging service. On the other hand, we wanted the LLM to offer relevant supplementary information and extensions, acknowledging that all books have knowledge boundaries, and readers' curiosity frequently transcends these limits.


Therefore, defining "creative tasks" became paramount. We needed to establish a clear consensus with both "readers" and "book providers": under what specific circumstances would we unleash the LLM's creative capabilities? This was the opportune moment to implement a subscription tiering mechanism.


Implementing Tiered "Book Summarization Service Strength"


We designed the following tiered logic: The free service focuses on introducing known content, akin to a basic recommendation from a bookstore assistant. The paid service, when faced with creative tasks, can appropriately expand its generative capabilities, much like the in-depth analysis a private reading consultant might offer. Miva Lite (Free) remains strictly focused on the book content and performs no creative work:


<Creative_Task_Handling_By_Tier>
- Creative tasks out of scope for Miva Lite.
- Example: "Miva Lite introduces books briefly. 
  For creative explorations, Miva Pro offers more capabilities."
</Creative_Task_Handling_By_Tier>

Miva Pro (Paid) is enabled to expand imagination and generate output based on finding book content:


<Creative_Task_Handling_By_Tier>
- You are good at handling creative tasks.
- "Creative Task" definition: Any request to generate new, structured artifact outline
- Using the book_excerpts and your knowledge to create detailed and insightful pieces.
</Creative_Task_Handling_By_Tier>


The underlying philosophy of this differentiated service design is straightforward: only paying users benefit from the LLM's full creative potential.


Free users receive a concise, conservative, yet reliable introduction to the book, primarily for discovery. Paying users, however, enjoy richer creative extensions and deeper analytical insights. By strategically leveraging technical limitations, we crafted a deliberate business model.


We invested significant time in creating a clear "Creative Task Reference Table," meticulously defining the extent of creative freedom permissible under various circumstances. For instance, conceptual extensions based on book content are permitted, and cross-book comparative analysis is a paid feature. However, creating viewpoints not found in the book or making inferences beyond the author's arguments is strictly forbidden.


▎Multi-Agent System: Collaboration Challenges and Optimization


Our Multi-Agent System (MAS) architecture has progressively evolved, from 2 Agents in Q2 2024 to 4 Agents by Q4. Each expansion introduced new complexities. The most challenging aspect of MAS design isn't individual Agent development, but rather the definition of global variables across Agents. Scaling such an architecture primarily involves managing how multiple independent Agents communicate effectively and access a shared database, thereby reducing future maintenance complexity. Our solution is an API-oriented composite architecture design:


Miva MAS Architecture Design Philosophy
Miva MAS Architecture Design Philosophy

Here, each Agent operates as an "independent workstation," linked through a shared database that acts as their central resource hub. This design philosophy directly draws inspiration from the Toyota Production System's concept of workstation specialization.


From Toyota Production Line Management to Agentic Workflow Management: Modern Operations Management in Practice


When tackling complex Multi-Agent Systems, we unexpectedly found that manufacturing operations management provided an excellent framework for solutions. Each Agent mirrors a specialized workstation on a production line, complete with defined inputs, processing steps, and output standards.

The core principle of Workstation Specialization dictates that instead of one versatile Agent handling all tasks, each Agent should specialize in a specific function. The Intent Clarification Agent focuses on understanding user intent, the Answer Generation Agent on crafting responses, and the Memory Management Agent on overseeing conversational memory. This division of labor allows each workstation to achieve optimal performance and facilitates independent improvements.


Modular Design Theory breaks down complex systems into standardized modules that can be developed and replaced independently. We apply this to manage our System Prompts and Instructions. Like standardized parts in manufacturing, each instruction module has a clear definition, allowing for independent development, testing, and replacement. If an Agent requires adjustment, we only need to modify its corresponding module, without compromising the stability of the entire system.

A system's overall performance is often limited by its weakest link. The Theory of Constraints (TOC) emphasizes focusing on the MAS bottleneck to achieve effective improvements. Monitoring each Agent's latency is analogous to tracking the cycle time and quality of each workstation on a production line. This enables targeted enhancements at every stage: adjusting prompts, switching LLMs, or altering workflows, all while optimizing equipment configurations for diverse task requirements. Just as workstations can be equipped with varying machine specifications, Agents requiring precise understanding might use more powerful models, while those handling simpler tasks could utilize more cost-effective ones. Through this concept of dedicated equipment, we balance performance, cost, efficiency, and customer satisfaction.


The most crucial aspect of operations management is establishing robust evaluation metrics, exemplified by the Six Sigma DMAIC process improvement method. This data-driven, five-step cycle (Define, Measure, Analyze, Improve, Control) is used to enhance, optimize, and stabilize business processes and designs. By integrating front-end GTM data with back-end semantic analysis, we've built a continuous improvement mechanism. This involves in-depth analysis of de-identified data to construct a quality evaluation matrix, primarily assessing situational understanding, the quality and stability of each process link, and critically, listening to the Voice of the Customer (VOC).


The greatest value of adopting an operations management mindset is our systematic approach to managing and evaluating complex Agentic Flows. This isn't driven by intuition or trial-and-error, but by theory and analytical support for every improvement, ensuring that each adjustment's impact can be quantitatively assessed.


▎AI Product Philosophy Respecting Originality: Validating the Business Model Promptly


While "accurately citing original works" sounds paramount, most users aren't actively concerned with whether we've quoted a specific paragraph from a particular book. What truly matters to readers is whether their problem can be effectively solved. Our core philosophy dictates that empowering readers to easily purchase books is the most authentic expression of respect for authors and publishers.


Our case studies drew inspiration from the Google Books experience. Initially, Google Book Search, a free service offering full-text search, was perceived as indirectly infringing copyrights and publishing rights, leading to lawsuits from the Association of American Publishers and the Authors Guild. The pivotal moment for settlement came when Google adjusted its strategy, displaying only partial search snippets and including purchase links. This taught us that "consumption is the most direct support for originality." In AI book summarization services, merely providing technology doesn't guarantee a viable business model; the core lies in offering services while respecting copyright and naturally guiding readers towards consumption.


Since then, UX design and lead generation have been vital commitments to our authors. This ensures users instinctively show respect for original content while aligning with our business position as a provider of knowledge assets.


Since the invention of the Gutenberg printing press, the publishing industry has matured into a traditional, copyright-centric sector. Introducing advanced AI technology and building entirely new business models presents a significant challenge. For an industry deeply concerned with copyright, earning the trust of authors and publishers proves a higher hurdle than technical prowess alone. Our strategy: first, construct a clear, concrete Business Model Proof of Concept (POC) to ascertain the true value perceived by the market, authors, and publishers.


Here, lean validation of the business model is our guiding operational principle. After assessing technical feasibility, we adopted a communication strategy centered on "proposal ideation," proactively engaging in dialogue with various stakeholders. This interactive approach allowed us to deeply understand their genuine needs, potential concerns, and unarticulated expectations. Often, concrete proposals effectively spark deeper discussions, helping both parties jointly define the boundaries of collaboration and the direction of potential development.


With the business model validated upfront, technical implementation and architectural design can be achieved and balanced more pragmatically:

  • Feature Prioritization: Identifying the truly critical features for the business model that must be implemented first.

  • Iterative User Experience Improvement: Gradually enhancing the user experience based on real feedback, rather than striving for a perfect solution from the outset.


▎Local Alignment with Books and Language: The Localization Challenge for Silicon Valley LLMs


During development, our greatest technical hurdle was the generally poor performance of LLMs trained by Silicon Valley companies when handling "local languages." Adhering to our commitment "not to use book content to train AI," we developed the BookAI Operator Legend system, leveraging In-Context Learning. This system uses the LLM's pattern-recognition capabilities to enforce a strict language control mechanism:


<taiwanese_mandarin_usage_rules>
§§§ !!!:"Data Quality"~:="資料品質";
"Project"~:="專案";"Item"~:="項目";"Row"~:="列";
"Column"~:="OPTIONS=行|欄";"Code"~:="程式碼" §§§
</taiwanese_mandarin_usage_rules>

The BookAI Operator Legend system offers several benefits. Technically, it requires no additional model training or fine-tuning. By employing specially designed symbol combinations, the system cleverly utilizes the Transformer model's internal attention mechanism, enabling the LLM to learn highly controlled language output rules.


Furthermore, considering the increasing size of modern LLMs' context windows and continuously improving sampling capabilities, the BookAI Operator Legend system demonstrates remarkable flexibility in practical applications. This means we can readily expand and update the necessary localized language mapping tables to meet evolving application scenarios.


In our design philosophy, aligning AI services with local book content and local language conventions, and striving to remain as true as possible to the author's original perspective, is an immensely challenging yet critically important task. This few-shot in-context learning method, specifically designed for dynamic local language alignment, has been fully implemented in our Miva app. We hope it will prompt the publishing industry and other enterprises to deeply consider the potential of integrating our API products: books are protected knowledge assets, and in the future, any "protected high-value knowledge assets" will fall within our service scope.


Miva is just one complete application-type product from the BookAI team. We also offer enterprise-facing API products like Coeus, Heka, and Osmi, all built upon the same core technical architecture. Through rigorous system architecture, precise Prompt engineering, and profound humanistic consideration, we aspire to be a long-term partner in the knowledge market, adept at transforming static documents and books into dynamic, marketable knowledge services.


▎Staying True to Our Original Intent in the Rapidly Evolving AI Landscape


From starting with two Agents to now a full Multi-Agent System architecture, and evolving from simple book Q&A to complex knowledge services, Miva's development journey highlights our central question: How can AI technology genuinely serve the dissemination of knowledge? There is no single answer; it demands finding a dynamic, win-win balance among human-centric care, business success, technological innovation, and social responsibility.


For us, establishing a clear framework as our starting point has proven invaluable. By centering on humanistic service, continuously attending to the interests of every stakeholder, and leveraging AI technology to achieve equilibrium, we maintain our guiding principles.


What do you believe is the most crucial ability for an AI book summarizer? Is it faithfully conveying the spirit of the original work, or providing personalized insights and extensions?


The next installment will delve further into practical case studies and "behind-the-scenes challenges" from the development process, explore how user behavior analysis guides product optimization, and share our ultimate vision for the future of AI reading services.


(End of Part 2)



 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating

© 2024 by BookAI.

  • Facebook
  • Twitter
  • Linkedin
bottom of page