utils.llms¶

The utils.llms module is designed to facilitate communication with Large Language Models (LLMs) such as OpenAI’s GPT models. It provides utility functions that abstract away the complexities of sending prompts to the model, receiving responses, and handling various edge cases like rate limits or excessive context lengths. This module is particularly useful for applications that require robust and efficient interaction with LLMs for generating text, parsing information, or conducting analysis based on model responses.

Features and Functionalities:

ask_split and quick_ask: Core functions for sending prompts to OpenAI’s API, with built-in
token counting, retries on failure, and verbose logging. ask_split is designed for more complex interactions involving system messages and user prompts, while quick_ask offers a streamlined approach for simple prompt-response cycles.
Token Counter Decorators: Decorators that wrap around the core functions to enforce token limits,
ensuring that each query adheres to predefined constraints to manage costs and API usage efficiently.
Custom Error Handling: Implements error handling mechanisms to gracefully recover from common
issues such as network timeouts, API rate limits, and context length exceedance. The ContextLengthExceededException specifically addresses scenarios where the prompt exceeds the maximum allowed context length, enabling the application to respond appropriately.
Memory Tracking: Utilizes the tracemalloc library to monitor memory allocation and identify
potential inefficiencies, which is critical for applications processing large volumes of text or managing numerous concurrent interactions with the LLM.

class celi_framework.utils.llms.ContextLengthExceededException(message="Context length exceeded the model's maximum limit.")

Bases: Exception

Exception raised when the input exceeds the model’s maximum context length.

class celi_framework.utils.llms.ToolDescription(**data)

Bases: BaseModel

description: str

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'description': FieldInfo(annotation=str, required=True), 'name': FieldInfo(annotation=str, required=True), 'parameters': FieldInfo(annotation=Dict[str, Any], required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

name: str

parameters: Dict[str, Any]

async celi_framework.utils.llms.anthropic_bedrock_chat_completion(**kwargs)¶

async celi_framework.utils.llms.anthropic_chat_completion(**kwargs)¶

async celi_framework.utils.llms.ask_split(user_prompt, system_message, model_name, max_tokens=0, seed=777, verbose=False, max_retries=7, wait_between_retries=2, temperature=0.0, timeout=120, tool_descriptions=None, model_url=None, json_mode=False, response_format=None, token_counter=None, force_tool_use=False)¶

Sends a prompt to the OpenAI API and returns the response, with retries on error.

user_prompt can be either a string or a list of messages.

celi_framework.utils.llms.assemble_chat_messages(prompt)¶: Takes a prompt and formats it as chat messages. Prompt can be in the form: * “” - str - A single prompt string * A list of (“role”,”content”) tuples * A list of dictionaries of chat messages {“role”:”assistant”,”content”:”…”}

async celi_framework.utils.llms.cached_chat_completion(token_counter=None, base_url=None, **kwargs)¶

async celi_framework.utils.llms.call_client(base_url, **kwargs)¶

async celi_framework.utils.llms.converse_bedrock_chat_completion(**kwargs)¶

async celi_framework.utils.llms.create_chat_completion_with_retry(base_url, **kwargs)¶

celi_framework.utils.llms.get_celi_llm_cache()¶

celi_framework.utils.llms.llm_response_from_chat_completion(resp)¶

async celi_framework.utils.llms.openai_chat_completion(base_url, **kwargs)¶

celi_framework.utils.llms.quick_ask(prompt, model_name, max_tokens=None, temperature=None, seed=777, verbose=False, json_output=False, response_format=None, max_retries=3, wait_between_retries=10, timeout=90, time_increase=30, model_url=None, token_counter=None)¶

async celi_framework.utils.llms.quick_ask_async(prompt, model_name, max_tokens=None, temperature=None, seed=777, verbose=False, json_output=False, response_format=None, max_retries=3, wait_between_retries=10, timeout=90, time_increase=30, model_url=None, token_counter=None)¶

Simplified version of ask_split for quickly sending prompts to the OpenAI API, with error handling and retries.

Parameters:

prompt (str) – The user’s prompt to send to the model.
model_name (str) – Name of the GPT model to use.
max_tokens (int, optional) – Maximum number of tokens to generate.
seed (int, optional) – Seed for random number generation in the model.
verbose (bool, optional) – If True, prints additional information.
json_output (bool, optional) – If True, response will be in JSON format.
max_retries (int, optional) – Maximum number of retries on error.
wait_between_retries (int, optional) – Time in seconds to wait between retries.

Returns:

The content of the response message or error message after all retries.

Return type:

str