master
/ Introduce.ipynb

Introduce.ipynb @masterview markup · raw · history · blame

Notebook

视觉问答

1. 项目介绍

视觉问答(Visual Question Answering,VQA)是一项结合计算机视觉和自然语言处理的学习任务,该项目使用 BLIP 视觉语言模态预训练模型在 VQA 任务上进行 Finetuned ,我们只需要上传图片和输入问题,便能快速生成问题答案,快来试试吧!

2. 项目结构

In [3]:
# 显示文件夹树状目录
import os
import os.path
 
def dfs_showdir(path, depth):
    if depth == 0:
        print("root:[" + path + "]")
 
    for item in os.listdir(path):
        if item[0] not in ['.', '__']:
            print("|      " * depth + "+--" + item)
            newitem = path +'/'+ item
            if os.path.isdir(newitem):
                dfs_showdir(newitem, depth +1)
 
 
if __name__ == '__main__':
    path = os.getcwd()             # 文件夹路径
    dfs_showdir(path, 0)  # 显示文件夹的树状结构
root:[/home/jovyan/work]
+--handler.py
+--app_spec.yml
+--img
|      +--demo.jpg
+--models
|      +--__init__.py
|      +--blip.py
|      +--blip_itm.py
|      +--blip_nlvr.py
|      +--blip_pretrain.py
|      +--blip_retrieval.py
|      +--blip_vqa.py
|      +--med.py
|      +--nlvr_encoder.py
|      +--vit.py
|      +--__pycache__
|      |      +--__init__.cpython-37.pyc
|      |      +--vit.cpython-37.pyc
|      |      +--blip_vqa.cpython-37.pyc
|      |      +--blip.cpython-37.pyc
|      |      +--med.cpython-37.pyc
+--configs
|      +--bert_config.json
|      +--caption_coco.yaml
|      +--med_config.json
|      +--nlvr.yaml
|      +--nocaps.yaml
|      +--pretrain.yaml
|      +--retrieval_coco.yaml
|      +--retrieval_flickr.yaml
|      +--retrieval_msrvtt.yaml
|      +--vqa.yaml
+--ckpt
|      +--model_base_vqa_capfilt_large.pth
+--demo.ipynb
+--requirement.txt
+--__pycache__
|      +--handler.cpython-37.pyc
|      +--app.cpython-37.pyc
+--env.ipynb.invalid
+--env.ipynb
+--project_requirements.txt
+--Introduce.ipynb
+--app.py

3. 项目demo

In [ ]:
# 环境安装
!/home/jovyan/.virtualenvs/basenv/bin/pip install -r requirement.txt -i https://pypi.doubanio.com/simple/
In [1]:
# 导入相关模块
from app import *
2022-08-08 17:13:03.608206: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2022-08-08 17:13:03.608246: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
load checkpoint from ./ckpt/model_base_vqa_capfilt_large.pth
In [2]:
print("Input img:")
Image.open('./img/demo.jpg').resize((256, 256))
Input img:
Out[2]:
In [4]:
print("Output Answer:")
handle({'Photo': './img/demo.jpg', 'Question': 'What is in this image?'})
Output Answer:
Runned time: 2.17 s
Out[4]:
'woman and dog'
In [ ]: