individual gradients with torch.autograd.grad without sum over second variableGigaByte123
I have a sampled path of a stochastic process starting from an initial point: class SDE_ou_1d(nn.Module): def __init__(self): super().__init__() self.sde_type = "into" self.noise_type = "diagonal" def f(self, t, y): #drift return -y def g(self, t, y): #vol return torch.ones_like(y) t_vec = torch.linspace(0, 1, 100) #time array mySDE = SDE_ou_1d() x0 = torch.zeros(1, 1, requires_grad=True).to(t_vec) X_t = torchsde.sdeint(mySDE, x0, t_vec,
4 May 2024 at 20:39

individual gradients with torch.autograd.grad without sum over second variable

I have a sampled path of a stochastic process starting from an initial point:


class SDE_ou_1d(nn.Module):
    def __init__(self):
        super().__init__()
        self.sde_type = "into"
        self.noise_type = "diagonal"

    def f(self, t, y): #drift
        return -y

    def g(self, t, y): #vol
        return torch.ones_like(y)

t_vec = torch.linspace(0, 1, 100)  #time array
mySDE = SDE_ou_1d()

x0 = torch.zeros(1, 1, requires_grad=True).to(t_vec)
X_t = torchsde.sdeint(mySDE, x0, t_vec, method = 'euler')

and I would like to measure the gradient with respect to the initial condition using torch.autograd.grad(), and get an output with the same shape as X_t i.e. 100x1. This gives the change in the path at every time point


X_grad = torch.autograd.grad(outputs=X_t, inputs=x0,
                           grad_outputs=torch.ones_like(X_t),
                           create_graph=False, retain_graph=True, only_inputs=True, allow_unused=True)[0]

the issue is that the gradient is a sum over all values of t.

I can do this with a for loop, but it is very slow and not practical:

X_grad_loop = torch.zeros_like(X_t)

for i in range(X_t.shape[0]):  # Loop over the first dimension of X_t which is time
    grad_i = torch.autograd.grad(outputs=X_t[i,...], inputs=x0,
                                    grad_outputs=torch.ones_like(X_t[i,...]),
                                    create_graph=False, retain_graph=True, only_inputs=True, allow_unused=True)[0]
    X_grad_loop[i,...] = grad_i

is there a way to compute this gradient with torch.autograd.grad() and no loop? thanks

AttributeError: module 'collections' has no attribute 'Sized' when trying to load a pickled model

I am trying to load a pretrained model but I'm hitting an error when I do. AttributeError: module 'collections' has no attribute 'Sized'

from fastai import *
from fastai.vision import *
from matplotlib.pyplot import imshow
import numpy as np
import matplotlib.pyplot as plt
from skimage.transform import resize
from PIL import Image
learn = load_learner("", "model.pkl")

These are the version I'm using.

torch                     1.11.0                 
torchvision               0.12.0                  
python                    3.10.14   
fastai                    1.0.60

Can someone help me fix this problem?

File c:\Users\lib\site-packages\fastai\basic_train.py:620, in load_learner(path, file, test, tfm_y, **db_kwargs)
    618 state = torch.load(source, map_location='cpu') if defaults.device == torch.device('cpu') else torch.load(source)
    619 model = state.pop('model')
--> 620 src = LabelLists.load_state(path, state.pop('data'))
    621 if test is not None: src.add_test(test, tfm_y=tfm_y)
    622 data = src.databunch(**db_kwargs)

File c:\Users\lib\site-packages\fastai\data_block.py:578, in LabelLists.load_state(cls, path, state)
    576 "Create a `LabelLists` with empty sets from the serialized `state`."
    577 path = Path(path)
--> 578 train_ds = LabelList.load_state(path, state)
    579 valid_ds = LabelList.load_state(path, state)
    580 return LabelLists(path, train=train_ds, valid=valid_ds)

File c:\Users\lib\site-packages\fastai\data_block.py:690, in LabelList.load_state(cls, path, state)
    687 @classmethod
    688 def load_state(cls, path:PathOrStr, state:dict) -> 'LabelList':
    689     "Create a `LabelList` from `state`."
--> 690     x = state['x_cls']([], path=path, processor=state['x_proc'], ignore_empty=True)
    691     y = state['y_cls']([], path=path, processor=state['y_proc'], ignore_empty=True)
...
--> 298     if not isinstance(a, collections.Sized) and not getattr(a,'__array_interface__',False):
    299         a = list(a)
    300     if np.int_==np.int32 and dtype is None and is_listy(a) and len(a) and isinstance(a[0],int):

AttributeError: module 'collections' has no attribute 'Sized'

How does one use vllm with pytorch 2.2.2 and python 3.11?

Title: How does one use vllm with pytorch 2.2.2 and python 3.11?

I'm trying to use the vllm library with pytorch 2.2.2 and python 3.11. Based on the GitHub issues, it seems vllm 0.4.1 supports python 3.11.

However, I'm running into issues with incompatible pytorch versions when installing vllm. The github issue mentions needing to build from source to use pytorch 2.2, but the pip installed version still uses an older pytorch.

I tried creating a fresh conda environment with python 3.11 and installing vllm:

$ conda create -n vllm_test python=3.11
$ conda activate vllm_test
(vllm_test) $ pip install vllm
...
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
vllm 0.4.1 requires torch==2.1.2, but you have torch 2.2.2 which is incompatible.

I also tried installing pytorch 2.2.2 first and then vllm:

(vllm_test) $ pip install torch==2.2.2
(vllm_test) $ pip install vllm
...
Building wheels for collected packages: vllm
  Building wheel for vllm (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for vllm (pyproject.toml) did not run successfully.
  │ exit code: 1

Can someone clarify what versions of vllm, pytorch and python work together currently? Is there a recommended clean setup to use vllm with the latest pytorch 2.2.2 and python 3.11?

I've tried creating fresh conda environments, but still run into version conflicts. Any guidance on the right installation steps would be much appreciated. Thanks!

ref: https://github.com/vllm-project/vllm/issues/2747

Pytorch and Matplotlib interfering

I'm facing a weird bug with Matplotlib and torch. If I run with this torch.hub.load line the plt.imshow will simply not display anything (even tho frame is a correct image). If I comment this line the plt.imshow works.

Whether this torch.hub.load line is commented or not cv2.imshow will work.

onnx_path = "my_weights.onnx"
yolo_path = "lib/yolov5/"


torch.hub.load(yolo_path, 'custom', path=onnx_path, source='local') 

video_reader = VideoReader(str(src_file))

# wait for thread to read
while not video_reader.is_ready():
    waiting += 1
    time.sleep(1)

while(video_reader.is_ready()):
   frame = video_reader.frame

   #cv2.imshow('image',frame)
   #cv2.waitKey(0)


   plt.imshow(frame)
   plt.axis('off')
   plt.show()

It seems i'm missing something but I don't see it. Any help is appreciated :)

How do I add reversible noise to the MNIST dataset using PyTorch?

I would like to add reversible noise to the MNIST dataset for some experimentation.

Here's what I am trying atm:

import torchvision.transforms as transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
from PIL import Image
import torchvision

def display_img(pixels, label = None):
    plt.imshow(pixels, cmap="gray")
    if label:    
        plt.title("Label: %d" % label)
    plt.axis("off")
    plt.show()

class NoisyMNIST(torchvision.datasets.MNIST):
    def __init__(self, root, train=True, transform=None, target_transform=None, download=False):
        super(NoisyMNIST, self).__init__(root, train=train, transform=transform, target_transform=target_transform, download=download)

    def __getitem__(self, index):
        img, target = self.data[index], self.targets[index]
        img = Image.fromarray(img.numpy(), mode="L")

        if self.transform is not None:
            img = self.transform(img)
        
        # add the noise
        noise_level = 0.3
        noise = self.generate_safe_random_tensor(img) * noise_level
        noisy_img = img + noise
        
        return noisy_img, noise, img, target

    def generate_safe_random_tensor(self, img):
        """generates random noise for an image but limits the pixel values between -1 and 1""" 
       
        min_values = torch.clamp(-1 - img, max=0)
        max_values = torch.clamp(1 - img, min=0)
       
        return torch.rand(img.shape) * (max_values - min_values) + min_values



# Define transformations to apply to the data
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert images to tensors
    transforms.Normalize((0.1307,), (0.3081,)),
])

train_dataset = NoisyMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = NoisyMNIST(root='./data', train=False, download=True, transform=transform)

np_noise = train_dataset[img_id][1]
np_data = train_dataset[img_id][0]



display_img(np_data_sub_noise, 4)

Ideally, this would give me the regular MNIST dataset along with a noisy MNIST images and a collection of the noise that was added. Given this, I had assumed I could subtract the noise from the noisy image and go back to the original image, but my image operations are not reversible.

Any pointers or code snippets would be greatly appreciated. Below are the images I currently get wit my code:

Original image:

With added noise:

And with the noise subtracted for the image with noise:

RuntimeError: 0D or 1D target tensor expected, multi-target not supported

This is the Updated Code which I have change the Dimension, I have also got Some suggestion on this code about the error which i have posted above Now , but I didn't understand Properly, can you please explain it where I have to update in the code.

From Pytorch community --

This error points to the target used in nn.CrossEntropyLoss or nn.NLLLoss having an invalid shape. Your code is unfortunately not executable, so it’s not trivial to copy/paste it to fix other issues.

From stackoverflow--

You need to focus, I previously told you to update the shape of fin_inten which you did. Now: you are throwing z_new into fin_inten which is a nn.Linear(2048,6) and right after into fin_e1 which is a nn.Linear(64,2). Your code is not consistent, you can't just throw random tensors into functions and hope for the best!

  class Classifier(pl.LightningModule):

    def __init__(self):
      super().__init__()
      self.MFB = MFB(512,768,True,256,64,0.1)
      self.fin_y_shape = torch.nn.Linear(768,512)
      self.fin_old = torch.nn.Linear(2048,2)
      self.fin = torch.nn.Linear(16 * 768, 64)
      self.fin_inten = torch.nn.Linear(2048,6)
      self.fin_e1 = torch.nn.Linear(2048,2)
      self.fin_e2 = torch.nn.Linear(2048,2)
      self.fin_e3 = torch.nn.Linear(2048,2)
      self.fin_e4 = torch.nn.Linear(2048,2)
      self.fin_e5 = torch.nn.Linear(2048,2)
      self.fin_e6 = torch.nn.Linear(2048,2)
      self.fin_e7 = torch.nn.Linear(2048,2)
      self.fin_e8 = torch.nn.Linear(2048,2)
      self.fin_e9 = torch.nn.Linear(2048,2)
      # self.reduce_x = torch.nn.Linear(768, 512)
      # self.reduce_rag = torch.nn.Linear(768, 512)



      self.validation_step_outputs = []
      self.test_step_outputs = []


    def forward(self, x,y,rag):
        x_,y_,rag_ = x,y,rag
        print("x.shape", x.shape)
        print("y.shape",y.shape)
        # print("rag.shape",rag.shape)

        # x = self.reduce_x(x)
        # rag = self.reduce_rag(rag)

        # print("x.shape", x.shape)
        # print("y.shape",y.shape)
        # print("rag.shape",rag.shape)
        # z = self.MFB(torch.unsqueeze(y, axis=1), torch.unsqueeze(rag, axis=1))
        # z_rag = self.MFB(torch.unsqueeze(y, axis=1),torch.unsqueeze(rag, axis=1))
        # z_con = torch.cat((z, z_rag), dim=1)


        # Concatenate x with y and then with rag


        # z= torch.cat((torch.cat((x, y), dim=1), rag), dim=1)


        # # Pass concatenated x with y and x with rag through your network
        # z_new = torch.squeeze(z,dim=1)
        # print("z_new shape",z_new)

        z = torch.cat((x, y, rag), dim=1)
        z_new = torch.squeeze(z, dim=1)



        c_inten = self.fin_inten(z_new)
        c_e1 = self.fin_e1(z_new)
        c_e2 = self.fin_e2(z_new)
        c_e3 = self.fin_e3(z_new)
        c_e4 = self.fin_e4(z_new)
        c_e5 = self.fin_e5(z_new)
        c_e6 = self.fin_e6(z_new)
        c_e7 = self.fin_e7(z_new)
        c_e8 = self.fin_e8(z_new)
        c_e9 = self.fin_e9(z_new)
        c = self.fin_old(z_new)

        # print("z.shape",z.shape)
        # print("z_new shape",z_new.shape)
        # print("intensity error:", c_inten.shape)
        # print("output:", c.shape)
        # print("c_e1:", c_e1.shape)
        # print("c_e2:", c_e2.shape)
        # print("c_e3:", c_e3.shape)
        # print("c_e4:", c_e4.shape)
        # print("c_e5:", c_e5.shape)
        # print("c_e6:", c_e6.shape)
        # print("c_e7:", c_e7.shape)
        # print("c_e8:", c_e8.shape)
        # print("c_e9:", c_e9.shape)
        # print("logits.shape",logits.shape)


        output = torch.log_softmax(c, dim=1)
        c_inten = torch.log_softmax(c_inten, dim=1)
        c_e1 = torch.log_softmax(c_e1, dim=1)
        c_e2 = torch.log_softmax(c_e2, dim=1)
        c_e3 = torch.log_softmax(c_e3, dim=1)
        c_e4 = torch.log_softmax(c_e4, dim=1)
        c_e5 = torch.log_softmax(c_e5, dim=1)
        c_e6 = torch.log_softmax(c_e6, dim=1)
        c_e7 = torch.log_softmax(c_e7, dim=1)
        c_e8 = torch.log_softmax(c_e8, dim=1)
        c_e9 = torch.log_softmax(c_e9, dim=1)

        return output,c_inten,c_e1,c_e2,c_e3,c_e4,c_e5,c_e6,c_e7,c_e8,c_e9


    def cross_entropy_loss(self, logits, labels):

        return F.nll_loss(logits, labels)

    def training_step(self, train_batch, batch_idx):
        #lab,txt,rag,img,name,per,iro,alli,ana,inv,meta,puns,sat,hyp= train_batch
        lab,txt,rag,img,name,intensity,e1,e2,e3,e4,e5,e6,e7,e8,e9= train_batch
        #logit_offen,a,b,c,d,e,f,g,h,i,logit_inten_target= self.forward(txt,img,rag)

        lab = train_batch[lab].unsqueeze(1)
        #print(lab)
        txt = train_batch[txt]
        rag = train_batch[rag]
        img = train_batch[img]
        name= train_batch[name]
        intensity = train_batch[intensity].unsqueeze(1)
        e1 = train_batch[e1].unsqueeze(1)
        e2 = train_batch[e2].unsqueeze(1)
        e3 = train_batch[e3].unsqueeze(1)
        e4 = train_batch[e4].unsqueeze(1)
        e5 = train_batch[e5].unsqueeze(1)
        e6 = train_batch[e6].unsqueeze(1)
        e7 = train_batch[e7].unsqueeze(1)
        e8 = train_batch[e8].unsqueeze(1)
        e9 = train_batch[e9].unsqueeze(1)

        lab = F.one_hot(lab, num_classes=2)
        intensity = torch.abs(intensity)
        intensity = F.one_hot(intensity, num_classes=6)  # Assuming you have 6 classes
        e1 = F.one_hot(e1,num_classes = 2)
        e2 = F.one_hot(e2,num_classes = 2)
        e3 = F.one_hot(e3,num_classes = 2)
        e4 = F.one_hot(e4,num_classes = 2)
        e5 = F.one_hot(e5,num_classes = 2)
        e6 = F.one_hot(e6,num_classes = 2)
        e7 = F.one_hot(e7,num_classes = 2)
        e8 = F.one_hot(e8,num_classes = 2)
        e9 = F.one_hot(e9,num_classes = 2)

        lab = lab.squeeze(dim=1)
        intensity = intensity.squeeze(dim=1)
        e1 = e1.squeeze(dim=1)
        e2 = e2.squeeze(dim=1)
        e3 = e3.squeeze(dim=1)
        e4 = e4.squeeze(dim=1)
        e5 = e5.squeeze(dim=1)
        e6 = e6.squeeze(dim=1)
        e7 = e7.squeeze(dim=1)
        e8 = e8.squeeze(dim=1)
        e9 = e9.squeeze(dim=1)



        logit_offen,logit_inten_target,a,b,c,d,e,f,g,h,i= self.forward(txt,img,rag)

        loss1 = self.cross_entropy_loss(logit_offen, lab)
        loss17 = self.cross_entropy_loss(logit_inten_target, intensity)
        loss4 = self.cross_entropy_loss(a, e1)
        loss5 = self.cross_entropy_loss(b, e2)
        loss6 = self.cross_entropy_loss(c, e3)
        loss7 = self.cross_entropy_loss(d, e4)
        loss8 = self.cross_entropy_loss(e, e5)
        loss9 = self.cross_entropy_loss(f, e6)
        loss10 = self.cross_entropy_loss(g, e7)
        loss11 = self.cross_entropy_loss(h, e8)
        loss12 = self.cross_entropy_loss(i, e9)

        loss = loss1 + loss4 + loss5 + loss6 + loss7 + loss8 +loss9 + loss10 +loss11 +loss12 + loss17

        self.log('train_loss', loss)
        return loss


    def validation_step(self, val_batch, batch_idx):
        #lab,txt,rag,img,name,per,iro,alli,ana,inv,meta,puns,sat,hyp = val_batch
        lab,txt,rag,img,name,intensity,e1,e2,e3,e4,e5,e6,e7,e8,e9= val_batch
        lab = val_batch[lab].unsqueeze(1)
        #print(lab)
        txt = val_batch[txt]
        rag = val_batch[rag]
        img = val_batch[img]
        name = val_batch[name]
        intensity = val_batch[intensity].unsqueeze(1)
        e1 = val_batch[e1].unsqueeze(1)
        e2 = val_batch[e2].unsqueeze(1)
        e3 = val_batch[e3].unsqueeze(1)
        e4 = val_batch[e4].unsqueeze(1)
        e5 = val_batch[e5].unsqueeze(1)
        e6 = val_batch[e6].unsqueeze(1)
        e7 = val_batch[e7].unsqueeze(1)
        e8 = val_batch[e8].unsqueeze(1)
        e9 = val_batch[e9].unsqueeze(1)

        lab = F.one_hot(lab, num_classes=2)

        intensity = torch.abs(intensity)
        intensity = F.one_hot(intensity, num_classes=6)
        e1 = F.one_hot(e1,num_classes = 2)
        e2 = F.one_hot(e2,num_classes = 2)
        e3 = F.one_hot(e3,num_classes = 2)
        e4 = F.one_hot(e4,num_classes = 2)
        e5 = F.one_hot(e5,num_classes = 2)
        e6 = F.one_hot(e6,num_classes = 2)
        e7 = F.one_hot(e7,num_classes = 2)
        e8 = F.one_hot(e8,num_classes = 2)
        e9 = F.one_hot(e9,num_classes = 2)
        lab = lab.squeeze(dim=1)


        intensity = intensity.squeeze(dim = 1)
        e1 = e1.squeeze(dim=1)
        e2 = e2.squeeze(dim=1)
        e3 = e3.squeeze(dim=1)
        e4 = e4.squeeze(dim=1)
        e5 = e5.squeeze(dim=1)
        e6 = e6.squeeze(dim=1)
        e7 = e7.squeeze(dim=1)
        e8 = e8.squeeze(dim=1)
        e9 = e9.squeeze(dim=1)

        logits,inten,a,b,c,d,e,f,g,h,i = self.forward(txt,img,rag)

        logits=logits.float()

        tmp = np.argmax(logits.detach().cpu().numpy(),axis=1)
        loss = self.cross_entropy_loss(logits, lab)
        lab = lab.detach().cpu().numpy()
        self.log('val_acc', accuracy_score(lab,tmp))
        self.log('val_roc_auc',roc_auc_score(lab,tmp))
        self.log('val_loss', loss)
        tqdm_dict = {'val_acc': accuracy_score(lab,tmp)}
        self.validation_step_outputs.append({'progress_bar': tqdm_dict,'val_f1 offensive': f1_score(lab,tmp,average='macro')})

        return {
                  'progress_bar': tqdm_dict,
        'val_f1 offensive': f1_score(lab,tmp,average='macro')
        }

    def on_validation_epoch_end(self):
      outs = []
      outs14=[]
      for out in self.validation_step_outputs:
        outs.append(out['progress_bar']['val_acc'])
        outs14.append(out['val_f1 offensive'])
      self.log('val_acc_all_offn', sum(outs)/len(outs))
      self.log('val_f1 offensive', sum(outs14)/len(outs14))
      print(f'***val_acc_all_offn at epoch end {sum(outs)/len(outs)}****')
      print(f'***val_f1 offensive at epoch end {sum(outs14)/len(outs14)}****')
      self.validation_step_outputs.clear()

    def test_step(self, batch, batch_idx):
        lab,txt,rag,img,name,intensity,e1,e2,e3,e4,e5,e6,e7,e8,e9= batch
        lab = batch[lab].unsqueeze(1)
        #print(lab)
        txt = batch[txt]
        rag = batch[rag]
        img = batch[img]
        name = batch[name]
        intensity = batch[intensity].unsqueeze(1)
        e1 = batch[e1].unsqueeze(1)
        e2 = batch[e2].unsqueeze(1)
        e3 = batch[e3].unsqueeze(1)
        e4 = batch[e4].unsqueeze(1)
        e5 = batch[e5].unsqueeze(1)
        e6 = batch[e6].unsqueeze(1)
        e7 = batch[e7].unsqueeze(1)
        e8 = batch[e8].unsqueeze(1)
        e9 = batch[e9].unsqueeze(1)
        lab = F.one_hot(lab, num_classes=2)
        intensity = F.one_hot(intensity, num_classes=6)
        e1 = F.one_hot(e1,num_classes = 2)
        e2 = F.one_hot(e2,num_classes = 2)
        e3 = F.one_hot(e3,num_classes = 2)
        e4 = F.one_hot(e4,num_classes = 2)
        e5 = F.one_hot(e5,num_classes = 2)
        e6 = F.one_hot(e6,num_classes = 2)
        e7 = F.one_hot(e7,num_classes = 2)
        e8 = F.one_hot(e8,num_classes = 2)
        e9 = F.one_hot(e9,num_classes = 2)
        lab = lab.squeeze(dim=1)
        intensity = intensity.squeeze(dim=1)
        e1 = e1.squeeze(dim=1)
        e2 = e2.squeeze(dim=1)
        e3 = e3.squeeze(dim=1)
        e4 = e4.squeeze(dim=1)
        e5 = e5.squeeze(dim=1)
        e6 = e6.squeeze(dim=1)
        e7 = e7.squeeze(dim=1)
        e8 = e8.squeeze(dim=1)
        e9 = e9.squeeze(dim=1)

        logits,inten,a,b,c,d,e,f,g,h,i= self.forward(txt,img,rag)

        logits = logits.float()
        tmp = np.argmax(logits.detach().cpu().numpy(force=True),axis=-1)
        loss = self.cross_entropy_loss(logits, lab)
        lab = lab.detach().cpu().numpy()
        self.log('test_acc', accuracy_score(lab,tmp))
        self.log('test_roc_auc',roc_auc_score(lab,tmp))
        self.log('test_loss', loss)
        tqdm_dict = {'test_acc': accuracy_score(lab,tmp)}
        self.test_step_outputs.append({'progress_bar': tqdm_dict,'test_acc': accuracy_score(lab,tmp), 'test_f1_score': f1_score(lab,tmp,average='macro')})
        return {
                  'progress_bar': tqdm_dict,
                  'test_acc': accuracy_score(lab,tmp),
                  'test_f1_score': f1_score(lab,tmp,average='macro')
        }
    def on_test_epoch_end(self):
        # OPTIONAL
        outs = []
        outs1,outs2,outs3,outs4,outs5,outs6,outs7,outs8,outs9,outs10,outs11,outs12,outs13,outs14 = \
        [],[],[],[],[],[],[],[],[],[],[],[],[],[]
        for out in self.test_step_outputs:
          outs.append(out['test_acc'])
          outs2.append(out['test_f1_score'])
        self.log('test_acc', sum(outs)/len(outs))
        self.log('test_f1_score', sum(outs2)/len(outs2))
        self.test_step_outputs.clear()

    def configure_optimizers(self):
      # optimizer = torch.optim.Adam(self.parameters(), lr=3e-2)
      optimizer = torch.optim.Adam(self.parameters(), lr=1e-5)

      return optimizer


  """
  Main Model:
  Initialize
  Forward Pass
  Training Step
  Validation Step
  Testing Step

  Pp
  """

  class HmDataModule(pl.LightningDataModule):

    def setup(self, stage):
      self.hm_train = t_p
      self.hm_val = v_p
      # self.hm_test = test
      self.hm_test = te_p

    def train_dataloader(self):
      return DataLoader(self.hm_train, batch_size=10, drop_last=True)

    def val_dataloader(self):
      return DataLoader(self.hm_val, batch_size=10, drop_last=True)

    def test_dataloader(self):
      return DataLoader(self.hm_test, batch_size=10, drop_last=True)

  data_module = HmDataModule()
  checkpoint_callback = ModelCheckpoint(
      monitor='val_acc_all_offn',
      dirpath='mrinal/',
      filename='epoch{epoch:02d}-val_f1_all_offn{val_acc_all_offn:.2f}',
      auto_insert_metric_name=False,
      save_top_k=1,
      mode="max",
  )
  all_callbacks = []
  all_callbacks.append(checkpoint_callback)
  # train
  from pytorch_lightning import seed_everything
  seed_everything(42, workers=True)
  hm_model = Classifier()
  gpus=1
  #if torch.cuda.is_available():gpus=0
  trainer = pl.Trainer(deterministic=True,max_epochs=20,precision=16,callbacks=all_callbacks)
  trainer.fit(hm_model, data_module)

Here is the Full Traceback Error

INFO:lightning_fabric.utilities.seed:Seed set to 42
INFO:pytorch_lightning.utilities.rank_zero:Using bfloat16 Automatic Mixed Precision (AMP)
INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.callbacks.model_summary:
   | Name        | Type   | Params
----------------------------------------
0  | MFB         | MFB    | 21.0 M
1  | fin_y_shape | Linear | 393 K 
2  | fin_old     | Linear | 4.1 K 
3  | fin         | Linear | 786 K 
4  | fin_inten   | Linear | 12.3 K
5  | fin_e1      | Linear | 4.1 K 
6  | fin_e2      | Linear | 4.1 K 
7  | fin_e3      | Linear | 4.1 K 
8  | fin_e4      | Linear | 4.1 K 
9  | fin_e5      | Linear | 4.1 K 
10 | fin_e6      | Linear | 4.1 K 
11 | fin_e7      | Linear | 4.1 K 
12 | fin_e8      | Linear | 4.1 K 
13 | fin_e9      | Linear | 4.1 K 
----------------------------------------
22.2 M    Trainable params
0         Non-trainable params
22.2 M    Total params
88.951    Total estimated model params size (MB)
Sanity Checking DataLoader 0:   0%
 0/2 [00:00<?, ?it/s]
x.shape torch.Size([10, 768])
y.shape torch.Size([10, 512])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-34-4b5757f5c04a>](https://localhost:8080/#) in <cell line: 375>()
    373 #if torch.cuda.is_available():gpus=0
    374 trainer = pl.Trainer(deterministic=True,max_epochs=20,precision=16,callbacks=all_callbacks)
--> 375 trainer.fit(hm_model, data_module)

13 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2702     if size_average is not None or reduce is not None:
   2703         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2704     return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2705 
   2706 

RuntimeError: 0D or 1D target tensor expected, multi-target not supported

I am getting the error on RuntimeError: 0D or 1D target tensor expected, multi-target not supported I need to solve this can Anyone help me to figure out How i can Solve this issue.

Triton.Lang How to handle Block sizes

I am trying to use triton-lang to perform a simple element-wise dot product between a column vector and a matrix that both have complex value. I can make the code work if I don't specify block_sizes but I can't figure out how to cut my grid and how to handle my pointers. I somewhat understand the theory on how it should work but I can't make it work.

def cdot(x: torch.Tensor, y: torch.Tensor):
    return x * y

def cdot_triton(x: torch.Tensor, y: torch.Tensor, BLOCK_SIZE):
    # preallocate the output
    z = torch.empty_like(y)

    # check arguments
    assert x.is_cuda and y.is_cuda and z.is_cuda

    # get vector size
    N = z.numel()

    # 1D launch kernel where each block gets its own program
    grid = lambda meta: (N // BLOCK_SIZE, N // BLOCK_SIZE)

    # launch the kernel
    cdot_kernel[grid](x.real, x.imag, y.real, y.imag, z.real, z.imag, N, BLOCK_SIZE)

    return z

@triton.jit
def cdot_kernel(
    x_real_ptr,
    x_imag_ptr,
    y_real_ptr,
    y_imag_ptr,
    z_real_ptr,
    z_imag_ptr,
    N: tl.constexpr,  # Size of the vector
    BLOCK_SIZE: tl.constexpr,  # Number of elements each program should process
):
    row = tl.program_id(0)
    col = tl.arange(0, 2*BLOCK_SIZE)


    if row < BLOCK_SIZE:
        idx = row * BLOCK_SIZE + col
        x_real = tl.load(x_real_ptr + 2*row)
        x_imag = tl.load(x_imag_ptr + 2*row)
        y_real = tl.load(y_real_ptr + 2*idx, mask=col<BLOCK_SIZE, other=0)
        y_imag = tl.load(y_imag_ptr + 2*idx, mask=col<BLOCK_SIZE, other=0)

        z_real = x_real * y_real - x_imag * y_imag
        z_imag = x_real * y_imag + x_imag * y_real

        tl.store(z_real_ptr + 2*idx, z_real, mask=col<BLOCK_SIZE)
        tl.store(z_imag_ptr + 2*idx, z_imag, mask=col<BLOCK_SIZE)
        
# ===========================================
# Test kernel
# ===========================================

size = 4
dtype = torch.complex64
x = torch.rand((size, 1), device='cuda', dtype=dtype)
y = torch.rand((size, size), device='cuda', dtype=dtype)


out_dot = cdot(x,y)
out_kernel = cdot_triton(x,y, BLOCK_SIZE=2)

This is the output:

tensor([[-0.1322+1.1461j, -0.1098+0.8015j,  0.2948+1.2155j, -0.1326+0.6076j],
        [-0.3687+0.4646j,  0.2349+0.5802j,  0.0568+0.9461j, -0.0457+0.3213j],
        [ 0.0523+0.9351j,  0.4409+0.5076j,  0.3956+0.4018j,  0.6230+0.9270j],
        [-0.3503+0.7194j, -0.3742+0.2311j, -0.3353+0.3884j, -0.3478+0.6724j]],
       device='cuda:0')
tensor([[-0.1322+1.1461j, -0.1098+0.8015j,  0.0617+1.0408j, -0.1988+0.4788j],
        [ 0.1147+0.2296j,  0.0686+0.1161j,  0.0647+0.4044j,  0.0795+0.6407j],
        [-0.2396+0.6326j, -0.3587+0.5878j, -0.1563+0.4028j, -0.2933+0.3294j],
        [-0.1214+0.3678j,  0.0440+0.9951j,  0.3342+1.1360j,  0.6796+0.6590j]],
       device='cuda:0')

As you can see only the 2 first values of the top row are accurate.

Any ideas on how I can make this element-wise dot product work?

Many thanks!

How to realize a polynomial regression in Pytorch / Python

I want my neural network to solve a polynomial regression problem like y=(x*x) + 2x -3.

So right now I created a network with 1 input node, 100 hidden nodes and 1 output node and gave it a lot of epochs to train with a high test data size. The problem is that the prediction after like 20000 epochs is okayish, but much worse then the linear regression predictions after training.

import torch
from torch import Tensor
from torch.nn import Linear, MSELoss, functional as F
from torch.optim import SGD, Adam, RMSprop
from torch.autograd import Variable
import numpy as np


# define our data generation function
def data_generator(data_size=1000):
    # f(x) = y = x^2 + 4x - 3
    inputs = []
    labels = []

    # loop data_size times to generate the data
    for ix in range(data_size):
        # generate a random number between 0 and 1000
        x = np.random.randint(1000) / 1000

        # calculate the y value using the function x^2 + 4x - 3
        y = (x * x) + (4 * x) - 3

        # append the values to our input and labels lists
        inputs.append([x])
        labels.append([y])

    return inputs, labels


# define the model
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = Linear(1, 100)
        self.fc2 = Linear(100, 1)


    def forward(self, x):
        x = F.relu(self.fc1(x)
        x = self.fc2(x)
        return x


model = Net()
# define the loss function
critereon = MSELoss()
# define the optimizer
optimizer = SGD(model.parameters(), lr=0.01)

# define the number of epochs and the data set size
nb_epochs = 20000
data_size = 1000

# create our training loop
for epoch in range(nb_epochs):
    X, y = data_generator(data_size)
    X = Variable(Tensor(X))
    y = Variable(Tensor(y))


    epoch_loss = 0;


    y_pred = model(X)

    loss = critereon(y_pred, y)

    epoch_loss = loss.data
    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

    print("Epoch: {} Loss: {}".format(epoch, epoch_loss))

# test the model
model.eval()
test_data = data_generator(1)
prediction = model(Variable(Tensor(test_data[0][0])))
print("Prediction: {}".format(prediction.data[0]))
print("Expected: {}".format(test_data[1][0]))

Is their a way to get way better results? I wondered if I should try to get 3 outputs, call them a, b and c, such that y= a(x*x)+b(x)+c. But I have no idea how to implement that and train my neural network.

Runtime Error: The size of tensor a (524288) must match the size of tensor b (131072) at non-singleton dimension 0

I'm trying to perform leaf disease image segmentation using Github code. Here is the code, i am facing this issue "The size of tensor a (524288) must match the size of tensor b (131072) at non-singleton dimension 0". Help me to fix this error. I am unable to understand where the error lies actually.

import torch
import torch.nn as nn
from torch.optim import Adam
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from tqdm import tqdm
import cv2
import os

# Dice coefficient and IoU calculation
def dice_coeff(pred, target):
    smooth = 1.
    pred_flat = pred.view(-1)
    target_flat = target.view(-1)
    intersection = (pred_flat * target_flat).sum()
    return (2. * intersection + smooth) / (pred_flat.sum() + target_flat.sum() + smooth)

def iu_acc(y_pred, y_true):
    smooth = 1e-12
    y_pred_pos = torch.round(torch.clamp(y_pred, 0, 1))
    intersection = torch.sum(y_true * y_pred_pos)
    sum_ = torch.sum(y_true) + torch.sum(y_pred_pos)
    jac = (intersection + smooth) / (sum_ - intersection + smooth)
    return torch.mean(jac)

# Loss function
def dice_coef_loss(pred, target):
    return 1 - dice_coeff(pred, target)

# Define your U-Net model
class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(ConvBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)

    def forward(self, x):
        x = self.relu(self.bn1(self.conv1(x)))
        return self.relu(self.bn2(self.conv2(x)))

class UpConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(UpConvBlock, self).__init__()
        self.up = nn.ConvTranspose2d(in_channels, out_channels, kernel_size=2, stride=2)
        self.conv_block = ConvBlock(out_channels * 2, out_channels)

    def forward(self, x1, x2):
        up = self.up(x1)
        return self.conv_block(torch.cat([up, x2], dim=1))

class Unet(nn.Module):
    def __init__(self, in_channels=3, num_classes=1):
        super(Unet, self).__init__()
        self.enc1 = ConvBlock(in_channels, 16)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.enc2 = ConvBlock(16, 32)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.enc3 = ConvBlock(32, 64)
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.enc4 = ConvBlock(64, 128)
        self.pool4 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.enc5 = ConvBlock(128, 256)

        self.dec6 = UpConvBlock(256, 128)
        self.dec7 = UpConvBlock(128, 64)
        self.dec8 = UpConvBlock(64, 32)
        self.dec9 = UpConvBlock(32, 16)
        self.out = nn.Conv2d(16, num_classes, kernel_size=1)

    def forward(self, x):
        x1 = self.enc1(x)
        x2 = self.pool1(x1)
        x3 = self.enc2(x2)
        x4 = self.pool2(x3)
        x5 = self.enc3(x4)
        x6 = self.pool3(x5)
        x7 = self.enc4(x6)
        x8 = self.pool4(x7)
        x9 = self.enc5(x8)
        y1 = self.dec6(x9, x7)
        y2 = self.dec7(y1, x5)
        y3 = self.dec8(y2, x3)
        y4 = self.dec9(y3, x1)
        out = self.out(y4)
        return out

# Define your dataset class
class CustomDataset(Dataset):
    def __init__(self, image_paths, mask_paths, transform=None):
        self.image_paths = image_paths
        self.mask_paths = mask_paths
        self.transform = transform

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, index):
        image_path = self.image_paths[index]
        mask_path = self.mask_paths[index]

        image = cv2.imread(image_path)
        mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)

        if self.transform:
            image = self.transform(image)
            mask = self.transform(mask)

        image = image.float() / 255.0

        return image, mask

# Define your training and evaluation functions
def train(model, train_loader, optimizer, loss_fn, device):
    model.train()
    epoch_loss = 0.0
    for images, targets in tqdm(train_loader, desc='Training', leave=False):
        images, targets = images.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = loss_fn(outputs, targets)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    return epoch_loss / len(train_loader)

def evaluate(model, val_loader, loss_fn, device):
    model.eval()
    epoch_loss = 0.0
    with torch.no_grad():
        for images, targets in tqdm(val_loader, desc='Validation', leave=False):
            images, targets = images.to(device), targets.to(device)
            outputs = model(images)
            loss = loss_fn(outputs, targets)
            epoch_loss += loss.item()
    return epoch_loss / len(val_loader)

if __name__ == '__main__':
    # Define dataset paths
    train_image_dir = '/content/drive/My Drive/brown/train/images'
    train_mask_dir = '/content/drive/My Drive/brown/train/labels'
    val_image_dir = '/content/drive/My Drive/brown/val/images'
    val_mask_dir = '/content/drive/My Drive/brown/val/labels'

    # Get image and mask paths
    train_image_paths = sorted([os.path.join(train_image_dir, f) for f in os.listdir(train_image_dir) if f.endswith('.png') or f.endswith('.jpg')])
    train_mask_paths = sorted([os.path.join(train_mask_dir, f) for f in os.listdir(train_mask_dir) if f.endswith('.png') or f.endswith('.jpg')])
    val_image_paths = sorted([os.path.join(val_image_dir, f) for f in os.listdir(val_image_dir) if f.endswith('.png') or f.endswith('.jpg')])
    val_mask_paths = sorted([os.path.join(val_mask_dir, f) for f in os.listdir(val_mask_dir) if f.endswith('.png') or f.endswith('.jpg')])

    # Create datasets
    train_dataset = CustomDataset(train_image_paths, train_mask_paths, transform=transforms.ToTensor())
    val_dataset = CustomDataset(val_image_paths, val_mask_paths, transform=transforms.ToTensor())

    # Define data loaders
    batch_size = 2
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

# Define device, model, optimizer, loss function...
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Unet(in_channels=3, num_classes=1)  # Specify the number of input channels and output classes
model = model.to(device)
optimizer = Adam(model.parameters(), lr=1e-4)

loss_fn = dice_coef_loss  # or your custom loss function

num_epochs = 10
for epoch in range(num_epochs):
    train_loss = train(model, train_loader, optimizer, loss_fn, device)
    val_loss = evaluate(model, val_loader, loss_fn, device)
    print(f'Epoch {epoch + 1}/{num_epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}')

I tried to another code from Github and got same error. From what i understand it's the issue with my input dimensions. But i am unable to solve it. Image file: dimension: 256 by 256 , bit depth: 24, extension: .jpg Mask file: dimension: 256 by 256 , bit depth: 8, extension: .png

Update: Here is my complete error

    --> 178 train_loss = train(model, train_loader, optimizer, loss_fn, device)
    179     val_loss = evaluate(model, val_loader, loss_fn, device)
    180     print(f'Epoch {epoch + 1}/{num_epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}')

train(model, train_loader, optimizer, loss_fn, device)
    127         optimizer.zero_grad()
    128         outputs = model(images)
--> 129         loss = loss_fn(outputs, targets)
    130         loss.backward()
    131         optimizer.step()

dice_coef_loss(pred, target)
     32 # Loss function
     33 def dice_coef_loss(pred, target):
---> 34     return 1 - dice_coeff(pred, target)
# Define your U-Net model
dice_coeff(pred, target)
     16     pred_flat = pred.view(-1)
     17     target_flat = target.view(-1)
---> 18     intersection = (pred_flat * target_flat).sum()
     19     return (2. * intersection + smooth) / (pred_flat.sum() + target_flat.sum() + smooth)

RuntimeError: The size of tensor a (524288) must match the size of tensor b (131072) at non-singleton dimension 0

DeepSpeed install error, help! with torch error

I'm a beginner coder studying AI. I'm studying KoreanLM at GitHub.

Actually, it's my first time using GitHub. This is how I installed the repo:

git clone https://github.com/quantumaikr/KoreanLM.git
cd KoreanLM
pip install -r requirements.txt

I'm told to order it like this, but there is a problem in the PIP part.

I ran that code on the GIT program.

The required modules are written in requirements.txt. Among them is deep speed.

If you run the pip install line, the error below occurs:

Collecting git+https://github.com/huggingface/peft.git (from -r requirements.txt (line 15))
Cloning https://github.com/huggingface/peft.git to c:\...
Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git 'C:\...'
Resolved https://github.com/huggingface/peft.git to commit 02b5aeddf9c1ea11451f10a8a26da7e5df8cca4a
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting numpy (from -r requirements.txt (line 1))
Using cached numpy-1.26.4-cp312-cp312-win_amd64.whl.metadata (61 kB)
Collecting rouge_score (from -r requirements.txt (line 2))
Using cached rouge_score-0.1.2.tar.gz (17 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting fire (from -r requirements.txt (line 3))
Using cached fire-0.6.0-py2.py3-none-any.whl
Collecting transformers>=4.28.1 (from -r requirements.txt (line 4))
Using cached transformers-4.39.2-py3-none-any.whl.metadata (134 kB)
Collecting torch (from -r requirements.txt (line 5))
Using cached torch-2.2.2-cp312-cp312-win_amd64.whl.metadata (26 kB)
Collecting sentencepiece (from -r requirements.txt (line 6))
Using cached sentencepiece-0.2.0-cp312-cp312-win_amd64.whl.metadata (8.3 kB)
Collecting tokenizers>=0.13.3 (from -r requirements.txt (line 7))
Using cached tokenizers-0.15.2-cp312-none-win_amd64.whl.metadata (6.8 kB)
Collecting deepspeed (from -r requirements.txt (line 8))
Using cached deepspeed-0.14.0.tar.gz (1.3 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× × Getting requirements to build wheel did not run successfully.
│ │ exit code: 1
╰─> [23 lines of output]

[WARNING] Unable to import torch, pre-compiling ops will be disabled. Please visit https://pytorch.org/ to see how to properly install torch on your system.
[WARNING] unable to import torch, please install it if you want to pre-compile any deepspeed ops.

I'm really discouraged from getting stuck from the module installation stage.. is there anyone who can help me please?

Rearrange 2D tensors in a batch Torch

Let us have an initial_tensor of size (batch_size, N, N) and a tensor of indexes (batch_size, N), specifying the new order of elements in each 2D tensor in a batch. The goal is to re-arrange the elements of tensors in the batch according to the index tensor to obtain a target tensor.

Currently I am able to do it on CPU using the following loop:

    for batch in range(batch_size):
        old_ids = indexes[batch]

        for i in range(N):
            for j in range(N):
                target[batch][i][j] = initial_tensor[batch][old_ids[i]][old_ids[j]]

I am looking for an equivalent vector solution to get rid of CPU utilisation.

I tried various options of utilisation of scattering and slicing, but could not figure out the equivalent for the loop.

How could I solve this Deepcopy Problem of this code?

I'm trying to use this model after fetching it from GitHub and making some modifications. However, I encounter an error like:

File "/usr/local/envs/el_dorado/lib/python3.8/site-packages/torch/_tensor.py", line 55, in __deepcopy__
    raise RuntimeError("Only Tensors created explicitly by the user "
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

and deepcopy doesn't work. Could you please help? Below is the original code. (refer github : https://github.com/drv-agwl/ViViT-pytorch/blob/master/models.py )

from torch import nn, einsum
import torch
from einops.layers.torch import Rearrange
from einops import rearrange, repeat

class PreNorm(nn.Module):
    def __init__(self, dim, fn):
        super().__init__()
        self.norm = nn.LayerNorm(dim)
        self.fn = fn

    def forward(self, x, **kwargs):
        return self.fn(self.norm(x), **kwargs)


class FSAttention(nn.Module):
    """Factorized Self-Attention"""

    def __init__(self, dim, heads=8, dim_head=64, dropout=0.):
        super().__init__()
        inner_dim = dim_head * heads
        project_out = not (heads == 1 and dim_head == dim)

        self.heads = heads
        self.scale = dim_head ** -0.5

        self.attend = nn.Softmax(dim=-1)
        self.to_qkv = nn.Linear(dim, inner_dim * 3, bias=False)

        self.to_out = nn.Sequential(
            nn.Linear(inner_dim, dim),
            nn.Dropout(dropout)
        ) if project_out else nn.Identity()

    def forward(self, x):
        b, n, _, h = *x.shape, self.heads
        qkv = self.to_qkv(x).chunk(3, dim=-1)
        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h=h), qkv)

        dots = einsum('b h i d, b h j d -> b h i j', q, k) * self.scale

        attn = self.attend(dots)

        out = einsum('b h i j, b h j d -> b h i d', attn, v)
        out = rearrange(out, 'b h n d -> b n (h d)')
        return self.to_out(out)


class FDAttention(nn.Module):
    """Factorized Dot-product Attention"""

    def __init__(self, dim, nt, nh, nw, heads=8, dim_head=64, dropout=0.):
        super().__init__()
        inner_dim = dim_head * heads
        project_out = not (heads == 1 and dim_head == dim)

        self.nt = nt
        self.nh = nh
        self.nw = nw

        self.heads = heads
        self.scale = dim_head ** -0.5

        self.attend = nn.Softmax(dim=-1)
        self.to_qkv = nn.Linear(dim, inner_dim * 3, bias=False)

        self.to_out = nn.Sequential(
            nn.Linear(inner_dim, dim),
            nn.Dropout(dropout)
        ) if project_out else nn.Identity()

    def forward(self, x):
        b, n, d, h = *x.shape, self.heads

        qkv = self.to_qkv(x).chunk(3, dim=-1)
        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h=h), qkv)
        qs, qt = q.chunk(2, dim=1)
        ks, kt = k.chunk(2, dim=1)
        vs, vt = v.chunk(2, dim=1)

        # Attention over spatial dimension
        qs = qs.view(b, h // 2, self.nt, self.nh * self.nw, -1)
        ks, vs = ks.view(b, h // 2, self.nt, self.nh * self.nw, -1), vs.view(b, h // 2, self.nt, self.nh * self.nw, -1)
        spatial_dots = einsum('b h t i d, b h t j d -> b h t i j', qs, ks) * self.scale
        sp_attn = self.attend(spatial_dots)
        spatial_out = einsum('b h t i j, b h t j d -> b h t i d', sp_attn, vs)

        # Attention over temporal dimension
        qt = qt.view(b, h // 2, self.nh * self.nw, self.nt, -1)
        kt, vt = kt.view(b, h // 2, self.nh * self.nw, self.nt, -1), vt.view(b, h // 2, self.nh * self.nw, self.nt, -1)
        temporal_dots = einsum('b h s i d, b h s j d -> b h s i j', qt, kt) * self.scale
        temporal_attn = self.attend(temporal_dots)
        temporal_out = einsum('b h s i j, b h s j d -> b h s i d', temporal_attn, vt)

        # return self.to_out(out)


class FeedForward(nn.Module):
    def __init__(self, dim, hidden_dim, dropout=0.):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(dim, hidden_dim),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, dim),
            nn.Dropout(dropout)
        )

    def forward(self, x):
        return self.net(x)


class FSATransformerEncoder(nn.Module):
    """Factorized Self-Attention Transformer Encoder"""

    def __init__(self, dim, depth, heads, dim_head, mlp_dim, nt, nh, nw, dropout=0.):
        super().__init__()
        self.layers = nn.ModuleList([])
        self.nt = nt
        self.nh = nh
        self.nw = nw

        for _ in range(depth):
            self.layers.append(nn.ModuleList(
                [PreNorm(dim, FSAttention(dim, heads=heads, dim_head=dim_head, dropout=dropout)),
                 PreNorm(dim, FSAttention(dim, heads=heads, dim_head=dim_head, dropout=dropout)),
                 PreNorm(dim, FeedForward(dim, mlp_dim, dropout=dropout))
                 ]))

    def forward(self, x):

        b = x.shape[0]
        x = torch.flatten(x, start_dim=0, end_dim=1)  # extract spatial tokens from x

        for sp_attn, temp_attn, ff in self.layers:
            sp_attn_x = sp_attn(x) + x  # Spatial attention

            # Reshape tensors for temporal attention
            sp_attn_x = sp_attn_x.chunk(b, dim=0)
            sp_attn_x = [temp[None] for temp in sp_attn_x]
            sp_attn_x = torch.cat(sp_attn_x, dim=0).transpose(1, 2)
            sp_attn_x = torch.flatten(sp_attn_x, start_dim=0, end_dim=1)

            temp_attn_x = temp_attn(sp_attn_x) + sp_attn_x  # Temporal attention

            x = ff(temp_attn_x) + temp_attn_x  # MLP

            # Again reshape tensor for spatial attention
            x = x.chunk(b, dim=0)
            x = [temp[None] for temp in x]
            x = torch.cat(x, dim=0).transpose(1, 2)
            x = torch.flatten(x, start_dim=0, end_dim=1)

        # Reshape vector to [b, nt*nh*nw, dim]
        x = x.chunk(b, dim=0)
        x = [temp[None] for temp in x]
        x = torch.cat(x, dim=0)
        x = torch.flatten(x, start_dim=1, end_dim=2)
        return x


class FDATransformerEncoder(nn.Module):
    """Factorized Dot-product Attention Transformer Encoder"""

    def __init__(self, dim, depth, heads, dim_head, mlp_dim, nt, nh, nw, dropout=0.):
        super().__init__()
        self.layers = nn.ModuleList([])
        self.nt = nt
        self.nh = nh
        self.nw = nw

        for _ in range(depth):
            self.layers.append(
                PreNorm(dim, FDAttention(dim, nt, nh, nw, heads=heads, dim_head=dim_head, dropout=dropout)))

    def forward(self, x):
        for attn in self.layers:
            x = attn(x) + x

        return x


class ViViTBackbone(nn.Module):
    """ Model-3 backbone of ViViT """

    def __init__(self, t, h, w, patch_t, patch_h, patch_w, num_classes, dim, depth, heads, mlp_dim, dim_head=3,
                 channels=3, mode='tubelet', device='cuda', emb_dropout=0., dropout=0., model=3):
        super().__init__()

        assert t % patch_t == 0 and h % patch_h == 0 and w % patch_w == 0, "Video dimensions should be divisible by " \
                                                                           "tubelet size "

        self.T = t
        self.H = h
        self.W = w
        self.channels = channels
        self.t = patch_t
        self.h = patch_h
        self.w = patch_w
        self.mode = mode
        self.device = device

        self.nt = self.T // self.t
        self.nh = self.H // self.h
        self.nw = self.W // self.w

        tubelet_dim = self.t * self.h * self.w * channels

        # x.shape: torch.Size([64, 3, 32, 64, 64]) -> torch.Size([64, 32, 64, 64, 3])
        self.to_tubelet_embedding = nn.Sequential(
            Rearrange('b c (t pt) (h ph) (w pw) -> b t (h w) (pt ph pw c)', pt=self.t, ph=self.h, pw=self.w),
            nn.Linear(tubelet_dim, dim)
        )

        # repeat same spatial position encoding temporally
        self.pos_embedding = nn.Parameter(torch.randn(1, 1, self.nh * self.nw, dim)).repeat(1, self.nt, 1, 1)

        self.dropout = nn.Dropout(emb_dropout)

        if model == 3:
            self.transformer = FSATransformerEncoder(dim, depth, heads, dim_head, mlp_dim,
                                                     self.nt, self.nh, self.nw, dropout)
        elif model == 4:
            assert heads % 2 == 0, "Number of heads should be even"
            self.transformer = FDATransformerEncoder(dim, depth, heads, dim_head, mlp_dim,
                                                     self.nt, self.nh, self.nw, dropout)

        self.to_latent = nn.Identity()

        self.mlp_head = nn.Sequential(
            nn.LayerNorm(dim),
            nn.Linear(dim, num_classes)
        )

    def forward(self, x):
        """ x is a video: (b, C, T, H, W) """
        tokens = self.to_tubelet_embedding(x)

        self.pos_embedding = self.pos_embedding.to(tokens.device)
        
        tokens += self.pos_embedding
        tokens = self.dropout(tokens)

        x = self.transformer(tokens) # output dimension: [b, nt*nh*nw, dim]
        
        b, _, _ = x.shape
        x = x.view(b, self.nt, self.nh*self.nw, -1).mean(dim=2)  # 공간적 차원에 대한 평균

        x = self.to_latent(x)
        
        return x

I've made some modifications to the output, and I attempted to use clone(), but it seems difficult because I'm using something called Hydra. Nonetheless, deepcopy should work.

after loading a pretrained pytorch .pt model file: ModuleNotFoundError: No module named 'models'

I downloaded one of the pretrained yolo models from the link: https://github.com/WongKinYiu/yolov7/releases

In this case, yolov7-tiny.pt is downloaded. Then tried to run the code to load the model and convert it to onnx file:

import torch
import onnx

model = torch.load('./yolo_custom/yolov7-tiny.pt')
input_shape = (1, 3, 640, 640)
torch.onnx.export(model, torch.randn(input_shape), 'yolov7-tiny.onnx', opset_version=11)

An error occurs on

model = torch.load('./yolo_custom/yolov7-tiny.pt')

and the error message is:

ModuleNotFoundError: No module named 'models'

The issue is reproducible even on Colab. Is there anything wrong on the steps?

NameError: name 'pil_mask' is not defined

This is my code. I have defined various operations like this:

def identity(pil_img, pil_mask, _):
    return pil_img, pil_mask

def autocontrast(pil_img, pil_mask, _):
    return ImageOps.autocontrast(pil_img), pil_mask


def equalize(pil_img, pil_mask, _):
    return ImageOps.equalize(pil_img), pil_mask



def rotate(pil_img, pil_mask, level):
    degrees = int_parameter(level, min_max_vals.rotate.max)
    if np.random.uniform() > 0.5:
        degrees = -degrees
    return pil_img.rotate(degrees, resample=Image.BILINEAR), pil_mask.rotate(degrees, resample=Image.BILINEAR)

like the above.

Now I want to use the PRIME augmentation (PRImitives of Maximum Entropy) :

but I am getting an error:

    aug_x += fn(x_tensor, pil_mask, _) * mask_t[:, i] * weight
NameError: name 'pil_mask' is not defined

and this is the PRIME code:

augmentations = [
    (identity, 1.0)
    ]
class PRIMEAugModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.augmentations = augmentations
        self.num_transforms = len(augmentations)


    def forward(self, x, mask_t):
        x_tensor = torch.from_numpy(x)
        aug_x = torch.zeros_like(x_tensor)
        for i in range(self.num_transforms):
            fn, weight = self.augmentations[i]
            if fn.__name__ == 'identity':
                aug_x += fn(x_tensor, pil_mask, _) * mask_t[:, i] * weight
            else:
                aug_x += fn(x_tensor, pil_mask) * mask_t[:, i] * weight
        return aug_x

I am confused where and how should I define PIL_mask

Is it efficient to pass model into a custom dataset to run model inference during training for sampling strategy [Pytorch]

I'm trying to design a training flow for sampling samples during training.

My data look like this:

defaultdict(list,
        {'C1629836-28004480': [0, 5, 6, 12, 17, 19, 28],
         'C0021846-28004480': [1, 7, 15],
         'C0162832-28004480': [2, 9],
         'C0025929-28004480': [3, 10, 30],
         'C1515655-28004480': [4],
         ...
        }

where key is label and value is list of data index

I custom dataset class in which my __getitem__(self, idx) function need to calculate distance between an anchor (which is chosen randomly) and other data points. It looks like this:

def __getitem__(self, idx):
    item_label = self.labels[idx] # C1629836-28004480
    item_data = self.data[item_label] # [0, 5, 6, 12, 17, 19, 28]

    anchor_index = random.sample(item_data,1)
    mentions_indices = [idx for idx in item_data if idx != anchor_index]
    
    with torch.no_grad():
        self.model.eval()
        anchor_input = ...
        anchor_embedding = self.model.mention_encoder(anchor_input)

        for idx in mention_indices: 
        ...

Another way to prevent from passing the model into custom dataset is to run inference inside the training_step function during training.

But I read somewhere that, using dataset and dataloader to prepare data to feed into model might save the training time, as they have parallel mechanism or something like that.

But in fact, I need to compute these kind of distance base on the latest state of weight of my model during training, is this parallel mechanism ensure that? Though in python variable is reference variable instead of value variable.

So which way is more professional and correct?

When using AutoModelForCausalLM, CogVLM and load_in_8bit I get this error : self and mat2 must have the same dtype, but got Half and Char

Load in 4 bit working great but when I use load in 8 bit, there is happening dtype mismatch which I can't make sense

There are no errors in CMD but the error message comes with response as self and mat2 must have the same dtype, but got Half and Char

import gradio as gr
import os, sys
from transformers import AutoModelForCausalLM, LlamaTokenizer
from PIL import Image, UnidentifiedImageError
import torch
import argparse
import time
from pathlib import Path
from itertools import chain

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
MODEL_PATH = "THUDM/cogagent-vqa-hf"
tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5')
torch_type = torch.float16

model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    low_cpu_mem_usage=True,
    #load_in_4bit=bit_4, when this is True and 8 bit false it works
    load_in_8bit=True,
    trust_remote_code=True
).eval()

I believe below the error cause is one of the below unsqueeze, to(DEVICE) or .to(DEVICE).to(torch_type) but even though I tried combinations I couldn't fix it

def post(input_text, temperature, top_p, top_k, image_prompt, do_sample):
    try:
        with torch.no_grad():
            image = Image.open(image_prompt).convert('RGB') if image_prompt is not None else None

            input_by_model = model.build_conversation_input_ids(tokenizer, query=input_text, history=[], images=([image] if image else None), template_version='base')
            inputs = {
                'input_ids': input_by_model['input_ids'].unsqueeze(0).to(DEVICE),
                'token_type_ids': input_by_model['token_type_ids'].unsqueeze(0).to(DEVICE),
                'attention_mask': input_by_model['attention_mask'].unsqueeze(0).to(DEVICE),
                'images': [[input_by_model['images'][0].to(DEVICE).to(torch_type)]],
            }
            if 'cross_images' in input_by_model and input_by_model['cross_images']:
                inputs['cross_images'] = [[input_by_model['cross_images'][0].to(DEVICE).to(torch_type)]]

            gen_kwargs = {
                "max_length": 2048,
                "temperature": temperature,
                "do_sample": do_sample,
                "top_p": top_p,
                "top_k": top_k
            }
            outputs = model.generate(**inputs, **gen_kwargs)
            outputs = outputs[:, inputs['input_ids'].shape[1]:]
            response = tokenizer.decode(outputs[0])
            response = response.split("</s>")[0]
            return response
    except Exception as e:
        return str(e)

A simple linear model is not converging

For sake of exercise, I want to write a simple network to simulate the "minimum" function f(x,y) = min(x,y), I restrict the values of input to integers between 0 and 100. Here's my code to accomplish this:

class myMinFun(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin1 = nn.Linear(2, 4)
        self.lin2 = nn.Linear(4, 1)

    def forward(self, x):
        x = self.lin1(x)
        x = F.relu(x)
        x = self.lin2(x)
        return x

model = myMinFun()
crit = torch.nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
max_int = 100
min_int = 0

for epoch in range(100):
    x1 = np.random.randint(low=min_int, high=max_int, size=10).astype(np.float32) / max_int
    x2 = np.random.randint(low=min_int, high=max_int, size=10).astype(np.float32) / max_int
    x = torch.tensor(np.column_stack((x1, x2)))
    y = torch.tensor(np.minimum(x2,x1)).unsqueeze(1)
    y_h = model(x)
    loss = crit(y_h, y)
    print(loss.item())
    model.zero_grad()
    loss.backward()
    # with torch.no_grad():
    #     print(model.lin1.weight)
    #     print(model.lin1.bias)
    #     print(model.lin2.weight)
    #     print(model.lin2.bias)
    optimizer.step()

I know that the model should converge to the values below, as setting the weights manually to these values result in 0 loss:

# with torch.no_grad():
#     model.lin1.weight = torch.nn.Parameter(torch.tensor([[1,1],[1,-1],[-1,1],[-1,-1]], dtype=torch.float32), requires_grad=True)
#     model.lin1.bias = torch.nn.Parameter(torch.tensor([0,0,0,0], dtype=torch.float32), requires_grad=True)
#     model.lin2.weight = torch.nn.Parameter(torch.tensor([[0.5,-0.5,-0.5,-0.5]], dtype=torch.float32), requires_grad=True)
#     model.lin2.bias = torch.nn.Parameter(torch.tensor([0], dtype=torch.float32), requires_grad=True)

But my model is not converging to these value, I tried different learning rates. What could cause the issue?

partial derivative with respect to input in pytorch

import torch
import numpy as np
import torch.nn as nn    
def init_weights(m,):
  if isinstance(m, torch.nn.Linear):
    torch.manual_seed(42)
    torch.nn.init.xavier_uniform_(m.weight)
    m.bias.data.fill_(0.1)
class NeuralNetwork(nn.Module):
    def __init__(self, num_features, hidden_size = 3, hidden_size2 = 4):
        super().__init__()
        
        self.layer1= nn.Linear(num_features, hidden_size)
        self.acti1 = nn.Tanh()
        
        self.layer2= nn.Linear(hidden_size, hidden_size2)
        self.acti2 = nn.Tanh()
            
        self.output = nn.Linear(hidden_size2,1)
        self.apply(init_weights)
    def forward(self, x):
        x = self.layer1(x)
        x = self.acti1(x)
        
        x = self.layer2(x)
        x = self.acti2(x)
       
        x= self.output(x)
        return x
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)

def train(X, y, model, loss_fn, optimizer):

            
    # Compute prediction error
    pred = model(X)
    loss = loss_fn(pred, y)
    # Backpropagation
    optimizer.zero_grad()
    loss.backward()
        optimizer.step()

model = NeuralNetwork(num_features=1)
print(model)
X = np.arange(20).reshape(10,2)
y = np.random.randint(2,(10,1))
X1 =torch.from_numpy(X)
y1 = torch.from_numpy(y)
X2=np.arange(20).reshape(10,2)
X_test = torch.from_numpy(X2)

y_pred = model(X_test).detach().numpy()

Here is a simple example of my problem. I would like to calculate the partial derivative $\frac{d^2 y_pred}{dX[:,0}dX[:,1]}$, also see picture. My plan is to use

[ torch.autograd.functional.hessian(model,X2[i]) for i in range(len(X2))]

But I am not sure if this would give me the result I want. Could anyone help please? Another question I would like to ask. What if X.shape1 is bigger than 2? This means I would calculate $d^n y_pred /(dX_1 ... dX_n) $, also see picture . In this case, Hessian wouldn't work. I was thinking maybe torch.autograd.grad would work but I am not sure how to use it.

Trying to backward through the graph a second time

I am trying to create custom CNN with using Linear layers.
But getting this error:

Exception has occurred: RuntimeError Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward. File "C:_\DS\RNN\main.py", line 52, in loss.backward() RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Please help, have no idea how to implement similar network to this network:

The code is:

class MyNet(nn.Module):
    def __init__(self, n=N):
        super(MyNet, self).__init__()
        self.lx = nn.Linear(n, n)
        self.l1 = nn.Linear(n, n)
        self.l1_t = nn.Linear(n, n)
        self.ly = nn.Linear(n, 1)

        self.xt = 0
        self.fn = nn.Tanh()

    def forward(self, x):
        lx = self.fn(self.lx(x))
        l1 = self.fn(self.l1(lx + self.xt))

        self.xt = self.fn(self.l1_t(self.xt + l1))

        x = self.fn(self.ly(l1))
        return x

Normal view

Title: How does one use vllm with pytorch 2.2.2 and python 3.11?