Faster OpenAI webhooks

gpt-4 is incredibly slow to the point of almost unusable for anything aside from background processing, typically leading to process timeouts (plus it’s expensive). gpt-3.5-turbo (which you’re using according to your screenshot) is about as fast as you’re going to get while getting decent results (and way cheaper), but it’s still pretty slow when compared to things like databases and general APIs.

But there are some things you can do to make gpt-3.5-turbo faster. The fewer tokens you use in your prompt (number of messages and length of each), the faster it’s going to go. So keep it as brief as you can. And then also keep the returned result as constrained as you can as well. Or if you’re okay with sometimes getting a truncated result back because you really need it to return ASAP, make sure your max tokens value is set as low as possible while still giving you enough returned data, because it’ll just stop early mid-thought if it hits that limit, if necessary.

Basically give it less work to do. But I expect it’ll still be slower than you’ll like.

What you may not know is that the results are actually coming back almost immediately, but they’re slowly streaming in from OpenAI to Make in the background, but your scenario doesn’t see the results until it has fully received the whole response from OpenAI. If you were using the API in a different system, you’d be seeing the results right away, such as if you were watching it in the OpenAI GPT Playground or via ChatGPT. That slow typing effect isn’t just an effect, it’s really what it’s sending from the server to you. So, if you set a low max token limit, it’ll spend less time from the start of the stream until the end of the stream because the stream will be forced to end earlier (because it would end up hitting your specified limit), so you’re pretty much guaranteed a limited amount of time spent producing that stream. But if you do that, you must also consider keeping the prompt you send (message count & the token length of each message) low so that it will start streaming that response back to Make ASAP as well, because there is some time spent before it even starts streaming a response back too.

Hopefully that helps.

BTW, Google’s PaLM API doesn’t stream the response back like OpenAI’s GPT system does, but just sends it back immediately. It returns the result so much faster, but in my testing so far, there are so many differences between the two systems as far as where I’d be using them. I highly recommend applying with Google for access to their PaLM API (to everyone interested) and then check out their MakerSuite as soon as they get you access to it (unknown how long that takes) to try it out. Personality wise, GPT-3.5 and GPT-4 are hands down far better for anything human interactive, but with regard to no-personality processing of data, PaLM does a great job and is super fast. If that’s suits your use case, I recommend seeking that out. I use them both and love them both.

Good luck! :smile:

2 Likes