Abstract: Visual Question Answering (VQA) is an important task that combines Computer Vision and Natural Language Processing to enable computers to comprehend and answer questions based on image ...