AWS's Bedrock LLM service now includes fast routing and caching

As businesses shift from experimenting with generative AI prototypes to deploying them in production, cost efficiency is becoming a priority. Running large language models (LLMs) can be expensive, prompting companies to explore strategies for reducing costs. Two such approaches are leveraging caching and directing simpler queries to smaller, more affordable models. AWS introduced these features for its Bedrock LLM hosting service at its re:Invent conference in Las Vegas.

Caching to Reduce Costs and Latency

Caching helps avoid repetitive processing of the same or similar queries, significantly cutting down costs. For instance, when multiple users ask questions about the same document, caching ensures the model doesn’t reprocess each query, reducing expenses by up to 90%. It also speeds up response times, with AWS claiming latency reductions of up to 85%. Adobe, which tested caching with its generative AI applications on Bedrock, reported a 72% improvement in response time.

Intelligent Prompt Routing

The second feature, intelligent prompt routing, optimizes the balance between cost and performance by automatically directing prompts to the most suitable models within the same family. A smaller model may handle simpler queries, while more complex requests go to larger, more powerful models. This system uses a smaller language model to predict which model will best handle a given query, minimizing unnecessary expenses.

Atul Deo, AWS's director of product for Bedrock, explained, "For simple queries, there’s no need to use the most expensive, slowest model. Instead, the system identifies the appropriate model at runtime based on the incoming prompt." While similar technologies exist, AWS emphasizes its solution’s ability to route intelligently with minimal human intervention, though it currently only works within a single model family. The company plans to expand this capability in the future.

Bedrock Model Marketplace

AWS is also launching a marketplace for Bedrock, catering to the growing number of specialized models with smaller user bases. While these models require users to manage their own infrastructure capacity, unlike the standard Bedrock service, the marketplace aims to accommodate customer demand for niche solutions. Initially, about 100 specialized models will be available, with more expected to follow.

These advancements underline AWS's commitment to making generative AI more cost-effective and accessible while meeting diverse business needs.

This site uses Google AdSense ad intent links. AdSense automatically generates these links and they may help creators earn money.

Top News

10 Great Apps for Your New Android Mobile Device

Apple Jul Announcement: What a Refresh for Macbooks

Opera Browser Lets You Apply Dark Mode to Web Page

11 of the Best Laptops Evaluated Based on Budget

Digital Transformation Strategies: How to Stay Competitive

10 Tips for You to Buy the Best Phone Right Now

Apple Macbook Pro is the Best One Yet By Consumer

Current Trends and Future Prospects for Tablet Applications

10 Awesome Things to Try on Your PS4 Right Now

The Best Tech Cyber Monday Deals You Can Still Get Today

10 Great Apps for Your New Android Mobile Device

Apple Jul Announcement: What a Refresh for Macbooks

Opera Browser Lets You Apply Dark Mode to Web Page

11 of the Best Laptops Evaluated Based on Budget

SaaS Tools to Scale Your Business in 2024

The Future of Artificial Intelligence: Key Solutions for Every Industry

XP Health receives $33 million to provide workers with more affordable vision care

AWS's Bedrock LLM service now includes fast routing and caching

ad5

TikTok Resurfaces in US App Stores

The Future of Smart Homes: How Technology is Revolutionizing Our Living Spaces

The Future of Artificial Intelligence: A Technological Revolution

Contact Form

Top News

AWS's Bedrock LLM service now includes fast routing and caching

You Might Like

ad5

Contact Form