KeiStory

로컬에서 Llama 3.2 Vision 돌려보기

 

package 설치 (Cuda 버전)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

 

코드

이미지에 대해서 설명해 달라고 하였습니다.

아래 코드는 아래 링크에서 참고했습니다.

https://huggingface.co/meta-llama/Llama-3.2-11B-Vision

import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

# Llama 3.2 11B Vision Instruct 모델의 ID 지정
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"

## 허깅페이스 로그인
# evnv : HUGGINGFACE_TOKEN=token
from dotenv import load_dotenv
load_dotenv()

# from huggingface_hub import login
# login(token="token")

# 모델 로드 및 bfloat16 정밀도로 GPU 0에 설정
model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map={"":0}
)

# 모델용 프로세서 로드
processor = AutoProcessor.from_pretrained(model_id)

# CUDA(GPU) 사용 가능 여부 확인 및 장치 설정
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# 처리할 이미지의 URL
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]

# 모델 입력 형식에 맞게 채팅 템플릿 적용
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)

# 이미지와 텍스트 입력 처리
inputs = processor(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to(device)

output = model.generate(**inputs, max_new_tokens=500)

print(processor.decode(output[0]))

이미지

결과

PS D:\python-vm\test-langchain> python .\multimodel.py

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token can be pasted using 'Right-Click'.
Enter your token (input will not be visible):
Add token as git credential? (Y/n) y
Token is valid (permission: fineGrained).
Your token has been saved in your configured git credential helpers (manager).
Your token has been saved to C:\Users\junij\.cache\huggingface\token
Login successful
config.json: 100%|█████| 5.07k/5.07k [00:00<?, ?B/s]
model.safetensors.index.json: 100%|██████| 89.4k/89.4k [00:00<00:00, 11.2MB/s]
model-00001-of-00005.safetensors: 100%|█████| 4.99G/4.99G [00:50<00:00, 98.1MB/s]
model-00002-of-00005.safetensors: 100%|█████| 4.97G/4.97G [00:51<00:00, 97.0MB/s] 
model-00003-of-00005.safetensors: 100%|█████| 4.92G/4.92G [00:49<00:00, 98.7MB/s] 
model-00004-of-00005.safetensors: 100%|█████| 5.00G/5.00G [00:50<00:00, 98.1MB/s] 
model-00005-of-00005.safetensors: 100%|█████| 1.47G/1.47G [00:13<00:00, 111MB/s] 
Downloading shards: 100%|█████| 5/5 [03:38<00:00, 43.66s/it] 
Loading checkpoint shards: 100%|█████| 5/5 [00:07<00:00,  1.57s/it]
generation_config.json: 100%|█████| 215/215 [00:00<00:00, 215kB/s]
preprocessor_config.json: 100%|█████| 437/437 [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████| 55.8k/55.8k [00:00<?, ?B/s]
tokenizer.json: 100%|███| 9.09M/9.09M [00:00<00:00, 12.8MB/s]
special_tokens_map.json: 100%|████| 454/454 [00:00<00:00, 455kB/s]
chat_template.json: 100%|█████| 5.15k/5.15k [00:00<?, ?B/s]
Using device: cuda
C:\Users\junij\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\mllama\modeling_mllama.py:316: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  attn_output = F.scaled_dot_product_attention(query, key, value, attn_mask=attention_mask)
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|><|start_header_id|>assistant<|end_header_id|>

Here is a haiku for the image:

Rabbit in a coat
Stands on a dirt path, smiling
Spring in the air<|eot_id|>

반응형

공유하기

facebook twitter kakaoTalk kakaostory naver band