How to Process Audio & Images Sent By The User In ManyChat

Hi everyone, happy new year!

I have a question about ManyChat, it seems I’m not able to process the audio or images sent by the user, meaning that I can’t determine whether the input of the user is text, audio or image.

I’d like to be able to apply the following behaviour based on input format:

  1. If it’s audio, transcribe the audio to generate a respond through AI
  2. If it’s image, analyze the image with AI to generate a respond

Do you know a solution to this?

Thank you for your time and support!

Yes, this is possible with ManyChat and Make. Use filters to check if an input (audio or image) exists:

  • For audio, send it to Whisper for transcription, then use ChatGPT for a response.
  • For images, analyze them with DALL·E or another AI tool, then generate a response with ChatGPT.

Thank you for such a fast answer. I have a question about the integration you mention, do you check the format of the input via the “Watch Incoming Data” trigger from the ManyChat’s module?

Can you show what sort of output the Manychat module is producing when an audio or image is sent?

1 Like

For anyone interested, while ManyChat doesn’t give you the link to the media file at first. You can always save the last_input into a custom field. This will give you the url of the file (only in Facebook and Instagram), this way you can process this kind of files. Is not perfect but at least there is an option.

2 Likes