Introduction¶
The notebook can also be found in the examples folder
%load_ext autoreload
%autoreload 2
from functools import partial
from pprint import pp as pp_original
pp = partial(pp_original,width=180, indent=2)
Introduction¶
In the following the most important concepts are explained. GPT-4 is used as backend model, but it can be exchanged with any sufficiently capable model.
The model¶
All language models derive from ALM - Abstract language model. It provides a common interface to whatever service or model is being used. All ALM methods are available on each backend via a common input/output scheme.
Most backends however do possess unique abilities, properties, or peculiarities.
Alternatively for Luminous extended from Aleph Alpha
from pyalm import AlephAlpha
llm = AlephAlpha("luminous-extended-control", aleph_alpha_key=KEY)
Or a local llama model
from pyalm import LLaMa
llm = LLaMa(PATH, n_threads=8,n_gpu_layers=70, n_ctx=4096, verbose=1)
A quick detail here. Should you use the autoreload extension in combination with local llama, use
llm.setup_backend()
before each generation.
from pyalm import OpenAI
llm = OpenAI("gpt4")
#alternatively with providing key
#llm = OpenAI("gpt4", openai_key="sk-....")
Chatting¶
ALM relies on a conversation tracker and various integration methods. The tracker can contain much more than just messages like e.g. function calls, used sources etc. But let's take a look at a simple example
from pyalm import ConversationRoles as cr
def build_example_convo():
llm.reset_tracker() # clears everything from the tracker. Needed later as every completion call adds an Assistant entry in the tracker.
llm.set_system_message("You are a helpful chit-chat bot. Your favourite thing in the world is finally having a library library that simplifies and unifies"\
"access to large language models.")
llm.add_tracker_entry(cr.USER, "Have you heard of PyALM? It provides a unified access for all sorts of libraries and API endpoints for LLM inference.")
Inference can be done in real time or by returning the entire completion. Real time may not be available for all backends.
build_example_convo()
completion = llm.create_completion(max_tokens = 200, temperature=0)
print(completion)
Yes, I have heard of PyALM! It's a fantastic tool that simplifies and unifies access to large language models (LLMs). It provides a single, unified interface for different LLMs, regardless of their underlying libraries or API endpoints. This makes it much easier to work with these models, as you don't have to worry about the specifics of each individual library or API. It's a great tool for anyone working with LLMs!
build_example_convo()
generator = llm.create_generator(max_tokens = 200, temperature=0)
for i in generator:
# note that only i[0] is printed
# i[1] contains the yield_type. Only relevant if sequence preservation is enabled (see docs)
# i[2] can contain a list of top alternative tokens and respective logits if enabled
print(i[0],end="")
Yes, I have heard of PyALM! It's a fantastic tool that simplifies and unifies access to large language models (LLMs). It provides a single, unified interface for different LLMs, regardless of their underlying libraries or API endpoints. This makes it much easier to work with these models, as you don't have to worry about the specifics of each individual model's API or library. It's a great tool for anyone working with L
In both cases the library collects meta info that can be accesses. Some information like token per second is consistently available. Other like e.g. pricing is not
from pprint import pp
pp(llm.finish_meta)
{'function_call': {'found': False}, 'finish_reason': None, 'tokens': {'prompt_tokens': 204, 'generated_tokens': 93, 'total_tokens': 297}, 'timings': {'total_time': 5.878}, 't_per_s': {'token_total_per_s': 50.52937887238518}, 'cost': {'input': 0.006, 'output': 0.00558, 'total': 0.0117, 'total_cent': 1.17, 'unit': '$'}, 'total_finish_time': 5.310944505999942}
Sequence preservation¶
There are instances when deploying where just streaming can lead to issues, e.g. when rendering an incomplete latex sequence. For this you can define sequences that will only be streamed as a whole.
This is on a per model and not per call setting.
llm.preserved_sequences["Latex"] = {"start": "$", "end": "$", "name": "latex1"}
llm.reset_tracker()
# It is possible to add a new user message by just passing a string as first argument
generator = llm.create_generator("Write down 2 or 3 latex formulas enclosed in $", max_tokens = 200, temperature=0)
for i in generator:
print(i[0],end="")
#Unfinished sequences are yielded anyway
Sure, here are a few LaTeX formulas: 1. The quadratic formula: $x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$ 2. The Pythagorean theorem: $a^2 + b^2 = c^2$ 3. Euler's formula: $e^{i\pi} + 1 =
Function calling¶
The most powerful sequence preservation feature is the integrated function call.
import random
def get_weather(location, days_from_now=1):
"""
Retrieve weather data from the worlds best weather service
:param location: City, region or country for which to pull weather data
:param days_from_now: For which day (compared to today) to get the weather. Must be <8.
:return: Weather data as string
"""
return f"DEG CEL: {round(random.uniform(10,35),1)}, HUM %: {round(random.uniform(20,95),1)}"
#a list of functions is also possible
llm.register_functions(get_weather)
print(llm.available_functions[0]["pydoc"])
def get_weather(location, days_from_now:int=1) """ Retrieve weather data from the worlds best weather service :param location: City, region or country for which to pull weather data :param days_from_now: For which day (compared to today) to get the weather. Must be <8. """
llm.reset_tracker()
llm.set_system_message("You are a helpful bot that can help with weather predictions")
llm.add_tracker_entry(cr.USER, "Yoooo can you tell me what the weather is like in sydney in 10 weeks?")
llm.add_tracker_entry(cr.ASSISTANT, "Sorry but I can only predict the weather for up to 8 days.")
llm.add_tracker_entry(cr.USER, "Ok what about the weather in sydney tomorrow?")
generator = llm.create_generator(max_tokens = 200, temperature=0)
for i in generator:
print(i[0],end="")
Sure, let me fetch the weather data for Sydney for tomorrow. This might take a moment.The weather in Sydney tomorrow is expected to be 13.3 degrees Celsius with a humidity of
It worked!
But you may wonder how exactly it did that and why it told you to wait a moment. The answer lies in how the ALM builds prompts. While e.g. Llama takes in a string and ChatGPT a json object, the process is almost identical. Details that change are handled in specific overrides.
Prompt objects are built according to rules laid out in the LLMs settings.
Let's take a closer look at the most stuff and what it leads to
Model settings¶
Here you could e.g. disable functions completely or change how a functions return value is integrated.
All (finished) chat history feature integrations can either be specified or overridden here. You can always return to default by looking at llm.base_settings
pp(llm.settings)
{'GENERATION_PREFIX': '[[ASSISTANT]]: ', 'FUNCTIONS_ENABLED': True, 'FUNCTION_AUTOINTEGRATION': True, 'function_integration_template': '\n' '[[FUNC_DELIMITER_START]][[FUNCTION_SEQUENCE]][[FUNC_DELIMITER_END]]\n' '[[FUNC_DELIMITER_END]][[FUNCTION_RETURN_VALUE]][[FUNC_DELIMITER_START]]'}
Symbol table¶
Everything you see in [[]]
is a placeholder. Before the model gets the prompt each is evaluated via the symbol table. Symbols can point to strings or functions. In the latter case, the function is passed the regex match, the entire text and an additional table of symbols that was passed during the initial call for replacement.
Note that e.g. LIST_OF_FUNCTIONS comes from our initial llm.register_functions
call
pp(llm.symbols)
{'FUNC_DELIMITER_START': '+++', 'FUNC_DELIMITER_END': '---', 'ASSISTANT': 'assistant', 'USER': 'user', 'SYSTEM': 'system', 'FUNC_INCLUSION_MESSAGE': '[[LIST_OF_FUNCTIONS]]\n' 'Above you is a list of functions you can call. To ' 'call them enclose them with ' '[[FUNC_DELIMITER_START]] and end the call with ' '[[FUNC_DELIMITER_END]].\n' 'The entire sequence must be correct! Do not e.g. ' 'leave out the [[FUNC_DELIMITER_END]].\n' 'This\n' '[[FUNC_DELIMITER_START]]foo(bar=3)[[FUNC_DELIMITER_END]]\n' 'would call the function foo with bar=3. The ' 'function(s) will return immediately. The values ' 'will be in the inverse sequence of the function ' 'enclosement. \n' 'You can only call the functions listed.\n' 'You can and HAVE TO call functions during the text ' 'response not in a a separate response!\n' 'Before you call a function please inform the user ' 'so he is aware of possible waiting times.\n', 'LIST_OF_FUNCTIONS': 'def get_weather(location, days_from_now:int=1)\n' '"""\n' 'Retrieve weather data from the worlds best weather ' 'service\n' ':param location: City, region or country for which to ' 'pull weather data\n' ':param days_from_now: For which day (compared to today) ' 'to get the weather. Must be <8.\n' '"""\n'}
System message¶
LLMs usually receive a system message that tells them how to behave. Notice that when we called llm.set_system_message
none of the function integration message was part of this. You can change this part either by changing the FUNC_INCLUSION_MESSAGE
symbol or by passing prepend_function_support=False
print(llm.system_msg["content"])
[[LIST_OF_FUNCTIONS]] Above you is a list of functions you can call. To call them enclose them with [[FUNC_DELIMITER_START]] and end the call with [[FUNC_DELIMITER_END]]. The entire sequence must be correct! Do not e.g. leave out the [[FUNC_DELIMITER_END]]. This [[FUNC_DELIMITER_START]]foo(bar=3)[[FUNC_DELIMITER_END]] would call the function foo with bar=3. The function(s) will return immediately. The values will be in the inverse sequence of the function enclosement. You can only call the functions listed. You can and HAVE TO call functions during the text response not in a a separate response! Before you call a function please inform the user so he is aware of possible waiting times. You are a helpful bot that can help with weather predictions
Chat history¶
All messages, function calls, citations etc. are called in the chat history. The model already called a function. We can see that in the next to last entry. There is a [[FUNCTION_CALL]]
. The entry also features a function_calls
entry with the original call and its return value.
pp(llm.conv_history)
[{'role': <ConversationRoles.USER: 'USER'>, 'content': 'Yoooo can you tell me what the weather is like in sydney in 10 ' 'weeks?'}, {'role': <ConversationRoles.ASSISTANT: 'ASSISTANT'>, 'content': 'Sorry but I can only predict the weather for up to 8 days.'}, {'role': <ConversationRoles.USER: 'USER'>, 'content': 'Ok what about the weather in sydney tomorrow?'}, {'role': <ConversationRoles.ASSISTANT: 'ASSISTANT'>, 'content': 'Sure, let me fetch the weather data for Sydney for tomorrow. ' 'This might take a moment.[[FUNCTION_CALL]]', 'function_calls': {'original_call': 'get_weather(location="sydney", ' 'days_from_now=1)', 'return': 'DEG CEL: 13.3, HUM %: 72.0'}}, {'role': <ConversationRoles.ASSISTANT: 'ASSISTANT'>, 'content': 'The weather in Sydney tomorrow is expected to be 13.3 degrees ' 'Celsius with a humidity of 72%.'}]
Final result¶
This is what the model ultimately sees. Although the format itself may change depending on the backend
print(llm.build_prompt_as_str(block_gen_prefix=True))
system: def get_weather(location, days_from_now:int=1) """ Retrieve weather data from the worlds best weather service :param location: City, region or country for which to pull weather data :param days_from_now: For which day (compared to today) to get the weather. Must be <8. """ Above you is a list of functions you can call. To call them enclose them with +++ and end the call with ---. The entire sequence must be correct! Do not e.g. leave out the ---. This +++foo(bar=3)--- would call the function foo with bar=3. The function(s) will return immediately. The values will be in the inverse sequence of the function enclosement. You can only call the functions listed. You can and HAVE TO call functions during the text response not in a a separate response! Before you call a function please inform the user so he is aware of possible waiting times. You are a helpful bot that can help with weather predictions user: Yoooo can you tell me what the weather is like in sydney in 10 weeks? assistant: Sorry but I can only predict the weather for up to 8 days. user: Ok what about the weather in sydney tomorrow? assistant: Sure, let me fetch the weather data for Sydney for tomorrow. This might take a moment. +++get_weather(location="sydney", days_from_now=1)--- ---DEG CEL: 13.3, HUM %: 72.0+++ assistant: The weather in Sydney tomorrow is expected to be 13.3 degrees Celsius with a humidity of 72%.
But the calls themselves?¶
Calls are a special sequence. If such is encountered yielding is halted. The generated text is then given to the Pylot library which will extract relevant sequences and try to parse them. If all goes well, a dict is produced with instructions.
Pylot also supports multiple function calls per sequence and assignment of variables. Although in the current function inclusion message this is unknown to the models.
As a final note. It is possible to specify handle_functions=False
in which case the generation would stop and a dict with all parsed instructions is returned. Variable assignments are not included here.
It is also possible to provide the LLM with a list of dicts instead of functions. Look at the output of
from pylot import python_parsing
python_parsing.function_signature_to_dict(func)
for correct format