utils.utils¶
The utils.utils module encapsulates a variety of utility functions and classes designed to support and enhance the development of Python applications. This module includes functionalities for colorized logging, JSON and text file manipulation, dynamic content generation, and more, offering a robust toolkit for developers.
- Features:
- Colorized Logging: Utilizes ANSI escape codes to colorize log messages for enhanced readability. Includes a custom
logging handler ColorizingStreamHandler that integrates with MongoDB for advanced logging purposes.
- JSON File Manipulation: Provides functions load_json, save_json for loading and saving JSON data, streamlining
data handling and storage.
- Text Processing: Offers utilities for reading text files (read_txt), manipulating JSON objects (remove_newlines_from_json,
is_json_cleaned_of_newline), and more, facilitating the processing and analysis of textual data.
- Dynamic Content Generation: Contains methods for generating hash IDs (generate_hash_id), creating and manipulating
data structures (shuffle_json_ordering, transform_dict_to_flat_schema), and comparing JSON files (compare_json_files, detailed_compare_json_files), aiding in the creation and management of dynamic content.
- File and Directory Operations: Includes functions for working with files and directories (find_latest_file,
make_list_of_dirs), enabling efficient file system navigation and organization.
- Miscellaneous Utilities: Offers a variety of additional tools, such as a decorator for measuring function execution
time (time_it), and methods for modifying text and data structures to meet specific criteria (remove_text_chunk, dequeue_all_matching).
Usage: This module is designed to be imported and used in Python applications that require advanced logging capabilities, efficient data handling, and manipulation, as well as dynamic content generation. Its modular design allows for easy integration into existing projects, enhancing functionality without significant refactoring.
- class celi_framework.utils.utils.UnrecoverableException
Bases:
BaseException
- celi_framework.utils.utils.add_parser_model(model_name, dataclass)¶
Decorator that adds a ‘parser_model’ attribute to the result of a function.
- Parameters:
model_name (str) – The name of the parser model to add.
dataclass – The dataclass to which the ‘parser_model’ attribute is added.
- Returns:
The decorated function.
- Return type:
callable
- celi_framework.utils.utils.are_jsons_identical(json1, json2)¶
Compares two JSON objects to check if they are identical.
Args: json1 (dict): First JSON object. json2 (dict): Second JSON object.
Returns: bool: True if the JSON objects are identical, False otherwise.
- celi_framework.utils.utils.change_filename_in_path(original_path, new_filename)¶
Changes the filename in a file path to a new filename.
- Parameters:
original_path (str) – The original file path.
new_filename (str) – The new filename to replace the original filename in the path.
- Returns:
The new file path with the original directory path and the new filename.
- Return type:
str
- celi_framework.utils.utils.check_last_line(input_string, string_to_check='[END]')¶
Checks if the last line of the input string is ‘Proceed to next section.’
Args: input_string (str): The string to be checked.
Returns: bool: True if the last line is ‘Proceed to next section.’, False otherwise.
- celi_framework.utils.utils.compare_json_files(file_path1, file_path2)¶
Compares two JSON files for equality, considering both structure and values.
- Parameters:
file_path1 (str) – Path to the first JSON file.
file_path2 (str) – Path to the second JSON file.
- Returns:
True if the files are equal, False otherwise.
- Return type:
bool
- celi_framework.utils.utils.create_new_timestamp(ms=False)¶
Generates a new timestamp string.
- Parameters:
ms (bool) – Whether to include milliseconds in the timestamp. Defaults to False.
- Returns:
The generated timestamp string.
- Return type:
str
- celi_framework.utils.utils.dequeue_all_matching(update_queue, match_type, match_value)¶
Dequeues all items from the queue that match a given value.
- Parameters:
update_queue – The queue to process, expected to be an instance of queue.Queue.
match_type (str) – The type of message to match.
match_value (bool) – The value to match.
- Returns:
The count of dequeued items matching the criteria. A new queue object with non-matching items preserved.
- Return type:
int
- celi_framework.utils.utils.detailed_compare_json_files(file_path1, file_path2, path)¶
Performs a detailed comparison of two JSON files, reporting differences.
- Parameters:
file_path1 (str) – Path to the first JSON file.
file_path2 (str) – Path to the second JSON file.
path (str) – The base path for reporting differences.
- Returns:
None
- celi_framework.utils.utils.encode_class_type(cls)¶
- celi_framework.utils.utils.filter_empty_sections(toc_dict, content_dict)¶
Filters out sections from the table of contents dictionary (toc_dict) when the corresponding content in content_dict is empty or only contains newline characters.
Args: toc_dict (dict): Table of contents dictionary. content_dict (dict): Content dictionary.
Returns: dict: Filtered table of contents dictionary.
- celi_framework.utils.utils.find_latest_file(directory, pattern)¶
Find the latest file in a directory matching a given pattern.
Args: directory (str): The path to the directory to search in. pattern (str): The pattern to match the filenames against.
Returns: str: Filepath of the latest file matching the pattern. Returns None if no file is found.
- celi_framework.utils.utils.format_toc(toc_dict)¶
Formats a table of contents dictionary into a string representation.
- Parameters:
toc_dict (dict) – The table of contents dictionary.
- Returns:
A formatted string representation of the table of contents.
- Return type:
str
- celi_framework.utils.utils.generate_hash_id(obj)¶
Generates a unique template ID based on the hash of stringable object.
- Parameters:
config (stringable type) – An object that can be turned into a string to hash.
- Returns:
A unique template ID.
- Return type:
str
- celi_framework.utils.utils.generate_prompt_and_completion_id(system_message, ongoing_chat, prompt_completion=None, timestamp=None)¶
Generates a unique hash ID for the document based on the system message, user message, prompt completion, and timestamp.
- Parameters:
system_message (str) – The system message content.
user_message (str) – The user message content.
prompt_completion (str) – The prompt completion content.
timestamp (str) – The exact timestamp up to milliseconds.
- Returns:
A unique hash ID for the document.
- Return type:
str
- celi_framework.utils.utils.generate_task_specific_id(document_type, section_number, task)¶
Generates a unique hash ID based on the master template ID, section number, and task number.
- Parameters:
master_template_id (str) – The ID of the master template.
section_number (str) – The current section number being processed.
task_number (str) – The current task number being processed.
- Returns:
A unique hash ID for identifying a specific task within a document.
- Return type:
str
- celi_framework.utils.utils.get_cache_dir()¶
Returns the cache directory. Uses the standard XDG conventions of ~/.cache/celi unless XDG_CACHE_HOME is set.
Ensures that the cache directory is created.
- celi_framework.utils.utils.get_most_recent_file(directory)¶
- celi_framework.utils.utils.get_obj_by_name(name)¶
- celi_framework.utils.utils.get_parent_section(section_number)¶
Returns the parent section number of a given section. For example, the parent of ‘11.5.1.1’ would be ‘11.5.1’.
- celi_framework.utils.utils.get_section_context_as_text(section_number, toc)¶
Retrieves the contextual hierarchy for a given section number from the table of contents and formats it as a text block.
Parameters: - section_number (str): The section number to retrieve context for. - toc (dict): The table of contents mapping section numbers to headings.
Returns: - str: A formatted text block containing the section number and its contextual headings.
- celi_framework.utils.utils.is_json_cleaned_of_newline(json_obj)¶
Checks if the JSON object has been cleaned of newline characters.
Args: json_obj (dict): The JSON object to be checked.
Returns: bool: True if the JSON object is free of newline characters, False otherwise.
- celi_framework.utils.utils.isolate_last_dict(output)¶
Isolates and returns the last dictionary found in a string of concatenated JSON dictionaries.
- Parameters:
output (str) – The string containing one or more JSON dictionaries.
- Returns:
The last dictionary found in the string, or None if no dictionary is found.
- Return type:
dict
- celi_framework.utils.utils.load_json(file_path)¶
Loads a JSON file from a specified file path and returns its content as a dictionary.
- Parameters:
file_path (str) – The path to the JSON file.
- Returns:
The content of the JSON file.
- Return type:
dict
- Raises:
FileNotFoundError – If the specified file does not exist.
- celi_framework.utils.utils.load_text_file(file_path)¶
Reads the contents of a text file and returns it as a string.
Args: file_path (str): The path to the text file to be read.
Returns: str: The contents of the file.
- celi_framework.utils.utils.make_list_of_dirs(list_of_dirs)¶
Creates a list of directories if they do not already exist.
- Parameters:
list_of_dirs (list) – A list of directory paths to create.
- Returns:
None
- celi_framework.utils.utils.read_file_content(filename, directory_path)¶
Reads the content of a file given its name and directory path.
- Parameters:
filename (str) – The name of the file.
directory_path (str) – The path to the directory containing the file.
- Returns:
The content of the file, or an error message if the file cannot be read.
- Return type:
str
- celi_framework.utils.utils.read_json_from_file(file_path)¶
Reads a JSON object from a file.
- Parameters:
file_path – Path to the JSON file.
- Returns:
The JSON object.
- celi_framework.utils.utils.read_latest_file_with_pattern(dir_path, pattern_str, extension='.txt')¶
Reads the content of the latest text file in a given directory that matches a specified regex pattern.
Args: dir_path (str): The path to the directory containing the files. pattern_str (str): The regex pattern to match the filenames.
Returns: str: The content of the latest file matching the pattern, or None if no such file is found.
- Return type:
Optional[str]
- celi_framework.utils.utils.read_txt(file_path)¶
Reads the entire contents of a text file into a single string.
Args: file_path (str): The path to the file to be read.
Returns: str: A string containing the contents of the file.
- celi_framework.utils.utils.remove_file_extension(filename)¶
Removes the file extension from a filename.
- Parameters:
filename (str) – The filename from which to remove the extension.
- Returns:
The filename without its extension.
- Return type:
str
- celi_framework.utils.utils.remove_newlines_from_json(json_obj)¶
Removes newline characters from the values in a JSON object.
Args: json_obj (dict): The JSON object from which newline characters will be removed.
Returns: dict: A new JSON object with newline characters removed from values.
- celi_framework.utils.utils.remove_text_chunk(original_text, chunk_to_remove)¶
Removes a chunk of text from the end of another chunk of text if it exists.
- Parameters:
original_text (str) – The original chunk of text.
chunk_to_remove (str) – The chunk of text to remove from the end of the original text.
- Returns:
The original text with the specified chunk removed if it was found at the end.
- Return type:
str
- celi_framework.utils.utils.save_json(file, file_path)¶
Saves a dictionary to a JSON file at the specified file path.
- Parameters:
file (dict) – The dictionary to save as JSON.
file_path (str) – Path where the JSON file will be saved.
- celi_framework.utils.utils.shuffle_json_ordering(data)¶
Randomly shuffles the ordering of keys in a JSON object.
- Parameters:
data (dict) – The JSON object to shuffle.
- Returns:
A new JSON object with keys shuffled.
- Return type:
dict
- celi_framework.utils.utils.time_it(func)¶
Decorator that measures the execution time of a function.
- Parameters:
func (callable) – The function to measure.
- Returns:
The wrapped function with execution time measurement.
- Return type:
callable
- celi_framework.utils.utils.transform_dict_to_flat_filled(original_dict)¶
Transforms the given dictionary by setting the value to an empty string if ‘section body’ is ‘Body not present’. If ‘section body’ has other content, it uses the ‘content’ field. The transformed dictionary will have the same keys, but their values will be either the extracted ‘content’ or an empty string.
- Parameters:
original_dict – Dictionary with nested structure containing ‘content’ and ‘section body’
- Returns:
A new dictionary with keys mapping to ‘content’ or an empty string
- celi_framework.utils.utils.transform_dict_to_flat_schema(to_fill_dict)¶
Transform the “to be filled” dictionary to a flat schema dictionary.
Args: second_dict (dict): The “to be filled” dictionary with nested structure.
Returns: dict: Transformed dictionary with keys as section numbers and values as section headings.
- celi_framework.utils.utils.write_string_to_file(input_string, file_name, encoding='utf-8')¶
Writes a given string to a text file.
Args: input_string (str): The string to be written to the file. file_name (str): The name of the file where the string will be written.
Returns: None