How to set up the workflow to use Gemini multimodal API

How to set up the workflow to use Gemini multimodal API to check the user uploaded image or pdf whether it is a legal certificate or not (compared to 2 legal examples stored in Google Drive)?