This web interface and API allows you to experiment with multiple large language models using a unified interface. To log in, you need an API key.


  • In the web interface, you can enter a query, which consists of the following components:
    • prompt, which is what text we want to feed into the language model. The prompt can have variables (e.g., ${name}) which are filled in later.
    • settings, which configures how we're going to call the backend API (HOCON format):
      • model: which model to query; options are:
      • temperature: a non-negative number determining amount of stochasticity (e.g., 1 is sampling from the model, 0 is returning the maximum probability output)
      • num_completions: number of completions (sequences, independent sampled) to return
      • top_k_per_token: number of candidates per token position in each completion
      • max_tokens: maximum number of tokens before generation stops
      • stop_sequences: list of strings that will stop generation (e.g., '.' or '\n')
      • echo_prompt: Whether to include the prompt as a prefix of the completion
      • top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
      • presence_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. (OpenAI only)
      • frequency_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. (OpenAI only)
      Settings can also have variables in them (e.g., temperature).
    • environments, which specifies for each variable, a list of values (HOCON format).
  • When the query is submitted, we consider all possible assignments of values to variables. For example:
    • environments has name: [Boston, New York] and temperature: [0, 1]
    • prompt is ${name} is a
    • settings is temperature: ${temperature}
    This gives rise to 4 requests:
    • prompt: Boston is a, temperature: 0
    • prompt: Boston is a, temperature: 1
    • prompt: New York is a, temperature: 0
    • prompt: New York is a, temperature: 1


For each model group (e.g., gpt3) and time granularity (e.g., daily, monthly, total), you are given a quota of a certain number of tokens. Once you go over that number, you won't be able to use the API. However, note that requests that have already been made (by you or anyone) that are cached are not counted towards your quota. For example, if your daily quota for gpt3 is 10000, that means each day, you will get 10000 tokens.