Some errors encountered in the use of PyTorch on Cuda

"AttributeError: ‘Model’ object has no attribute ‘parameters’>"

You would have to derive your custom Model from nn.Module as: class Model(nn.Module).

"AttributeError: Can't get attribute 'Model Name' on module '__main__' from 'XXX.py'>"

This error might be raised when using “torch.load(Model Name)” to load a pre-trained model (e.g. “XXX.pt”). How I solved it by adding “import Model Name” to the same file that load the model. I think some hows the namespace is hard code when you save the model. So then it can’t search when you load the model again.

"CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)"

This error might be raised, if you are running out of memory and cublas fails to create the handle, so try to reduce the memory usage e.g. via a smaller batch size.

"{RuntimeError}Expected a 'cuda' device type for generator but found 'cpu'"

This is an error encountered in the use of “torch.utils.data.random_split”. And it is solved by adding the “generator=torch.Generator(device=device)” as follows:

train_dataset, validate_dataset = torch.utils.data.random_split(dataset, [train_size, validate_size]) # incorrect
train_dataset, validate_dataset = torch.utils.data.random_split(dataset, [train_size, validate_size], generator=torch.Generator(device=device)) # correct

More information can be found in the following links:

Expected a ‘cuda‘ device type for generator but found ‘cpu‘的解决方法

RuntimeError: Expected a ‘cuda’ device type for generator but found ‘CPU’

"RuntimeError: Boolean value of Tensor with more than one value is ambiguous"

The general meaning is that the tensor contains multiple (more than 1 but not 1) boolean values, which is unclear, that is, it cannot be compared. The reason for the error may be that the loss function is declared without parentheses or the callable object is used without parentheses.

loss_function=nn.MSELoss   # correct
loss_function=nn.MSELoss() # incorrect

More information can be found in the following link:

RuntimeError: Boolean value of Tensor with more than one value is ambiguous

"ValueError: multilabel-indicator format is not supported"

This is an error encountered in running “fpr, tpr, thresholds = metrics.roc_curve(actual, pred, pos_label=1)”. At first, the shape of actual and pred are (batch_size, seq) which meets this error. Then I reshape the shape to 1 dimension and the problem is solved.

This function expects inputs in a certain format, usually either a 1D array for binary classification or a 2D array for multi-class classification. Adjusting the shape of your input arrays to match these expectations can help resolve this issue. By using .ravel() or another method to reshape your actual and pred arrays to the desired format, you can avoid the error related to the shape mismatch when using metrics.roc_curve().

In addition, there is a useful function torch.masked_select(x, mask) which can mask the noise in x according to the boolean in the mask. Note that the value in mask is boolean not 0 or 1. More information can be found in the following link:

PyTorch中的masked_select选择函数

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu! (when checking argument for argument index in method wrapper_gather)

Here are some useful links to convert data and code from the CPU to the GPU：

Pytorch中实现CPU和GPU之间的切换

PyTorch如何使用GPU加速（CPU与GPU数据的相互转换）

PyTorch查看模型和数据是否在GPU上

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1

While I tried your code, and it did not give me an error, I can say that usually the best practice to debug CUDA Runtime Errors: device-side assert like yours is to turn collab to CPU and recreate the error. It will give you a more useful traceback error.

Most of the time CUDA Runtime Errors can be the cause of some index mismatching so like you tried to train a network with 10 output nodes on a dataset with 15 labels. And the thing with this CUDA error is once you get this error, you will receive it for every operation you do with torch.tensors. This forces you to restart your notebook.

I suggest you restart your notebook, get a more accurate traceback by moving to CPU, and check the rest of your code especially if you train a model on the set of targets somewhere.

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CUDAType instead (while checking arguments for embedding)

I would suggest you check the input type I had the same issue which was solved by converting the input type from int32 to int64.(running on win10) ex: x = torch.tensor(train).to(torch.int64)

RuntimeError: expected scalar type Double but found Float

You can use model.double() to convert all the model parameters into double type. This should give a compatible model given your input data is double. Keep in mind though that double type is usually slower than single due to its higher precision nature. You can also use model.float() to convert all the model parameters into float type.

roc_auc_score - Only one class present in y_true

You could use try-except to prevent the error:

import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 0, 0])
y_scores = np.array([1, 0, 0, 0])
try:
    roc_auc_score(y_true, y_scores)
except ValueError:
    pass

Now you can also set the roc_auc_score to be zero if there is only one class present. However, I wouldn’t do this. I guess your test data is highly unbalanced. I would suggest using a stratified K-fold instead so that you at least have both classes present.

Import "cv2" could not be resolved. ModuleNotFoundError: No module named 'cv2'

It just happened to me and I solved it by installing both opencv-python and opencv-python-headless with pip and reloading the Visual Studio Code window right after it.

To install the needed packages, just run this command in the terminal:

$ pip install opencv-python opencv-python-headless

"Python json.loads shows ValueError: Extra data"

Getting the error like ValueError: Extra data: line 88 column 2 - line 50607 column 2 (char 3077 - 1868399)
Error can be solved by just iterating over the file and loading each line as JSON in the loop:

tweets = []
with open('tweets.json', 'r') as file:
    for line in file:
        tweets.append(json.loads(line))

This avoids storing intermediate python objects. As long as you write one full tweet per append() call, this should work.