OpenAI Dev Day - John's Digital Galaxy 🌌

- [Features](https://twitter.com/mckaywrigley/status/1721638116010983545) - [Summary](https://twitter.com/0xGeegZ/status/1721610906356912149) - ~2.5x cost reduction - ~16x longer context - Multimodal via API - Assistants API - Retrieval system - TTS (and 10x cheaper than market!) - JSON mode - Seeds - Copyright Shield (underrated) - [OpenAI statement](https://openai.com/blog/new-models-and-developer-products-announced-at-devday) - [Function calling](https://platform.openai.com/docs/guides/function-calling) lets you describe functions of your app or external APIs to models, and have the model intelligently choose to output a JSON object containing arguments to call those functions. We’re releasing several improvements today, including the ability to call multiple functions in a single message: users can send one message requesting multiple actions, such as “open the car window and turn off the A/C”, which would previously require multiple roundtrips with the model ([learn more](https://platform.openai.com/docs/guides/function-calling/parallel-function-calling)) - GPT-4 Turbo - JSON mode, forced to submit valid JSON - seed parameter - reproducible outputs - log probabbilities - writing better unit tests - https://platform.openai.com/docs/guides/text-generation/reproducible-outputs ## Zapier AI Actions https://actions.zapier.com/ https://bensbites.beehiiv.com/p/ai-actions-zapier-bridge-ai-platform-5000-apps ## Multimodal API - DallE-3: Ability to generate images programmatically. - GPT-4 Turbo with Vision: Image input for GPT-4 Turbo. - TTS and TTS HD: Text to speech in 6 preset voices. TTS for speed and TTS HD for quality. - Whisper V3: Open source. Announced, coming to API this month. ## Assistants API - natural language-based data analysis app, a coding assistant, an AI-powered vacation planner, a voice-controlled DJ, a smart visual canvas - persistent threads so it. doesn't have to deal with conversaitn history - retrieval - give more information via files - infinitely long - code interpreter - function calling - accessible via playground in openai - new thread per user - add message to threads - run assistant anytime - no more complexity with the stateful API - don't need embedding as bad, etc - Can see the threads - Shows its steps - Code interpreter AI - lets you write and execute code - any tackle you would tackle with code, code interpreter will do - random question, knows how to calculate it - Has voice text to speech via whisper API - TTS API to make it speak - with function calling, you can connect to internet - **Code Interpreter**: writes and runs Python code in a sandboxed execution environment, and can generate graphs and charts, and process files with diverse data and formatting. It allows your assistants to run code iteratively to solve challenging code and math problems, and more. - **Retrieval**: augments the assistant with knowledge from outside our models, such as proprietary domain data, product information or documents provided by your users. This means you don’t need to compute and store embeddings for your documents, or implement chunking and search algorithms. The Assistants API optimizes what retrieval technique to use based on our experience building knowledge retrieval in ChatGPT. - **Function calling**: enables assistants to invoke functions you define and incorporate the function response in their messages. ## New API - GPT-4 Turbo with Vision - TTS - Human quality speech - 0.015 / 1000 characters - [TTS guide](https://platform.openai.com/docs/guides/text-to-speech) ## Potential Uses - Obsidian AI add-on - text to speech [[My Projects/Obsidian Intelligence]]