How to prevent the duplicate result from previous frame in python object trackingSARON RAVUTH
Currently, I work on Object detection, tracking and counting and I want to store the result from object detection ,tracking, and counting and whenever the vehicle cross the line, the result always give me duplicate. how can i prevent that and here for the camera code class Camera(BaseCamera): """ OpenCV video stream """ video_source = 0 start, end = Point(0, 500), Point(1280, 500) detector = Detector() tracker = ByteTrack() line_zone = LineZone(start=start, end=end) annotator = LineZoneAnnotato
30 April 2024 at 12:04

How to prevent the duplicate result from previous frame in python object tracking

Currently, I work on Object detection, tracking and counting and I want to store the result from object detection ,tracking, and counting and whenever the vehicle cross the line, the result always give me duplicate. how can i prevent that

and here for the camera code

class Camera(BaseCamera): """ OpenCV video stream """ video_source = 0 start, end = Point(0, 500), Point(1280, 500) detector = Detector() tracker = ByteTrack() line_zone = LineZone(start=start, end=end) annotator = LineZoneAnnotator()

def __init__(self, enable_detection: bool = False):
    video_source = os.environ.get("VIDEO_SOURCE")
    try:
        video_source = int(video_source)
    except Exception as exp:    # pylint: disable=broad-except
        if not video_source:
            raise EnvironmentError("Cannot open the video source!") from exp
    finally:
        Camera.set_video_source(video_source)
    super().__init__()
    self.enable_detection = enable_detection

@staticmethod
def set_video_source(source):
    """Set video source"""
    Camera.video_source = source

@classmethod
def frames(cls):
    """
    Get video frame
    """
    camera = cv2.VideoCapture(Camera.video_source)
    if not camera.isOpened():
        raise RuntimeError("Could not start camera.")

    while True:
        # read current frame
        ret, img = camera.read()

        # Loop back
        if not ret:
            camera.set(cv2.CAP_PROP_POS_FRAMES, 0)
            continue

        # Object detection
        results = cls.detector(image=img)
        selected_classes = [ 2, 3]

        tensorflow_results = results.detections
        cls.annotator.annotate(img, cls.line_zone)
        if not tensorflow_results:
            yield cv2.imencode(".jpg", img)[1].tobytes()
            continue

        detections = Detections.from_tensorflow(tensorflow_results=tensorflow_results)

        detections = cls.tracker.update_with_detections(detections=detections)
        detections = detections[np.isin(detections.class_id, selected_classes)]
        
        result=cls.line_zone.trigger(detections)
        if type(result)!=type(None) and len(result)>=3:

            print(result[2])
            
        img = visualize(image=img, detections=detections)

        # encode as a jpeg image and return it
        yield cv2.imencode(".jpg", img)[1].tobytes()

How do I add reversible noise to the MNIST dataset using PyTorch?

I would like to add reversible noise to the MNIST dataset for some experimentation.

Here's what I am trying atm:

import torchvision.transforms as transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
from PIL import Image
import torchvision

def display_img(pixels, label = None):
    plt.imshow(pixels, cmap="gray")
    if label:    
        plt.title("Label: %d" % label)
    plt.axis("off")
    plt.show()

class NoisyMNIST(torchvision.datasets.MNIST):
    def __init__(self, root, train=True, transform=None, target_transform=None, download=False):
        super(NoisyMNIST, self).__init__(root, train=train, transform=transform, target_transform=target_transform, download=download)

    def __getitem__(self, index):
        img, target = self.data[index], self.targets[index]
        img = Image.fromarray(img.numpy(), mode="L")

        if self.transform is not None:
            img = self.transform(img)
        
        # add the noise
        noise_level = 0.3
        noise = self.generate_safe_random_tensor(img) * noise_level
        noisy_img = img + noise
        
        return noisy_img, noise, img, target

    def generate_safe_random_tensor(self, img):
        """generates random noise for an image but limits the pixel values between -1 and 1""" 
       
        min_values = torch.clamp(-1 - img, max=0)
        max_values = torch.clamp(1 - img, min=0)
       
        return torch.rand(img.shape) * (max_values - min_values) + min_values



# Define transformations to apply to the data
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert images to tensors
    transforms.Normalize((0.1307,), (0.3081,)),
])

train_dataset = NoisyMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = NoisyMNIST(root='./data', train=False, download=True, transform=transform)

np_noise = train_dataset[img_id][1]
np_data = train_dataset[img_id][0]



display_img(np_data_sub_noise, 4)

Ideally, this would give me the regular MNIST dataset along with a noisy MNIST images and a collection of the noise that was added. Given this, I had assumed I could subtract the noise from the noisy image and go back to the original image, but my image operations are not reversible.

Any pointers or code snippets would be greatly appreciated. Below are the images I currently get wit my code:

Original image:

With added noise:

And with the noise subtracted for the image with noise:

How to fine optimal cylindrical dimension to view panoramic photos

Say I have a panoramic photo (less than 360) and would like to view it in a cylindrical window in a VR headset to get an immersive experience. How to determine the optimal width, height, and radius of the window (assuming the viewer is at the center of the curvature)?

Why different noise in GAN generate different images? – ai.stackexchange.com

I understand that noise $z$ serves as the input to the generator. Noise $z$ is essentially a vector of random numbers, typically from Gaussian distribution with chosen size of like $100$. However, I ...

Linear equation from pixel space to world space

I'm successfully detecting lane lines and determining their linear equations within the image plane (pixel space).

I've been able to convert each pixel's location from pixel coordinates to real-world coordinates (world space) using the camera's intrinsic and extrinsic parameters.

Then, I use linear regression on the world coordinates to generate a single linear equation representing the lane line in world space.

My current approach involves two steps: first converting to world space, then performing linear regression. Is there a way to directly transform the pixel-space linear equation itself into its equivalent world-space equation without the intermediate conversion step?

c++ code or mathematical explanation

Flowchart to Pseudocode conversion using python

Here I want to know if there is any library or Python code to convert flowchart to pseudocode. Here, I input the flowchart as an image file. So have to use openCV too. The output is needed to be text output of relevant pseudocode.

Issue with Reading Frames from CCTV Camera Using OpenCV via RTSP

import sys
import time
import cv2
import numpy as np


class CameraManager:
    def __init__(self, camera_uris):
        self.camera_uris = camera_uris
        self.cameras = self._initialize_cameras()

    def _initialize_cameras(self):
        cameras = []
        for uri in self.camera_uris:
            cap = cv2.VideoCapture(uri)
            if not cap.isOpened():
                print("Cannot open camera")
                sys.exit()
            cameras.append(cap)
        return cameras

    def read_frames(self,skip_frames=3):
        frames = []
        for index, cam in enumerate(self.cameras):
            for _ in range(skip_frames):
                cam.read()
            ret, frame = cam.read()
            # print(int(cam.get(cv2.CAP_PROP_POS_FRAMES)))
            if not ret:
                print("Can't receive frame (stream end?). Exiting ...")
                self.cameras[index] = cv2.VideoCapture(self.camera_uris[index])
                frame = np.zeros((1080, 1920, 3), dtype=np.uint8)
            frames.append(frame)
        return frames

def resize_images(images, new_width=640, new_height=480):
    """
    Resize all images in the list to the specified dimensions.
    """
    resized_images = []
    for img in images:
        resized = cv2.resize(img, (new_width, new_height))
        resized_images.append(resized)
    return resized_images

def calculate_grid_size(num_images):
    """
    Calculate the grid size based on the number of images.
    """
    if num_images == 0:
        return 0, 0
    rows = ((num_images - 1) // 3) + 1
    cols = 3
    return rows, cols

def merge_images_in_grid(images):
    """
    Merge images into a grid layout, with the grid size dynamically calculated.
    """
    if not images:
        raise ValueError("No images to merge")

    # Resize images
    images = resize_images(images, 640, 480)

    # Calculate grid size
    grid_rows, grid_cols = calculate_grid_size(len(images))

    # Get dimensions of the resized images
    img_height, img_width, _ = images[0].shape

    # Grid dimensions
    grid_width = img_width * grid_cols
    grid_height = img_height * grid_rows

    # Create an empty black image for the grid
    merged_image = np.zeros((grid_height, grid_width, 3), dtype=np.uint8)

    # Place each image in the grid
    for i, img in enumerate(images):
        row = i // grid_cols
        col = i % grid_cols
        merged_image[row * img_height:(row + 1) * img_height, col * img_width:(col + 1) * img_width, :] = img

    return merged_image

class MainApplication:
    def __init__(self):
        self.camera_uris = ["rtsp://admin:[email protected]","rtsp://admin:[email protected]","rtsp://admin:[email protected]","rtsp://admin:[email protected]"]
        self.camera_manager = CameraManager(camera_uris=self.camera_uris)

    def run(self):
        frame_number = 0
        while True:
            frames = self.camera_manager.read_frames()
            time.sleep(0.5)
            merge_image = merge_images_in_grid(images=frames)
            cv2.namedWindow('output', cv2.WINDOW_NORMAL)
            cv2.imshow('output', merge_image)

            if cv2.waitKey(1) & 0xFF == ord('q'):
                exit()


if __name__ == "__main__":
    try:
        app = MainApplication()
        app.run()
    except KeyboardInterrupt:
        print("Exiting ..")

In this code, I manage multiple CCTV cameras connected via RTSP (Real-Time Streaming Protocol). The program continuously captures frames from these cameras, selectively skipping a set number of frames. After reading a frame, it's added to a list which is then returned. Instead of implementing YOLO object detection, I've introduced a sleep interval of 0.5 seconds. Following this pause, the captured frames are displayed using OpenCV's imshow function

The program runs without any errors when time.sleep(0.5) is not included, but including this line causes an error.

When I replace the time.sleep(0.5) with YOLO object detection, the program also encounters an error.

Can't receive frame (stream end?). Exiting ...
[h264 @ 0x1c6bbc0] error while decoding MB 114 18, bytestream -35
Can't receive frame (stream end?). Exiting ...
Can't receive frame (stream end?). Exiting ...
[hevc @ 0x13d3840] Could not find ref with POC 6
[h264 @ 0x1c2e9c0] error while decoding MB 102 13, bytestream -5
[h264 @ 0x244bfc0] error while decoding MB 12 31, bytestream -7
Can't receive frame (stream end?). Exiting ...
Can't receive frame (stream end?). Exiting ...
[hevc @ 0x1c42900] Could not find ref with POC 0
[rtsp @ 0x1ade880] RTP: PT=60: bad cseq 1d84 expected=0ba8
[hevc @ 0x1c27880] Could not find ref with POC 36

why the program throws an error when time.sleep(0.5) is included or when YOLO object detection is implemented. Identifying the root cause of these errors will help in finding an appropriate solution.

Serverless Development Experience for Embedded Computer Vision

Learn to create real-time computer vision applications with a couple of Python functions and a model URI.

Serverless Development Experience for Embedded Computer Vision

Learn to create real-time computer vision applications with a couple of Python functions and a model URI.

How to get set of colours in an image using python PIL

After importing an image using python's PIL module I would like to get the set of colours in the image as a list of rgb tuples.

What If I know before hand that there will only be 2 colours and the image will be very small, maybe 20x20 pixels? However, I will be running this algorithm over a lot of images. Will It be more effiient to loop through all pixels until I see 2 unique colours? Because I understand loops are very slow in python.

Is global pooling necessary in image classification models?

In many image classification models, the global pooling operation is performed before the classification layer (i.e. fully connected layer) to reduce model complexity. Is the global pooling layer a necessary component in the image classification framework?

Solving Real-World Problems with Self-Hosted AI: Unleashing the Potential in Your Applications

Discover the fundamentals of artificial intelligence (AI) and how it can revolutionize your business. Learn about the benefits, applications, and practical use cases to ignite your curiosity

Face Recognition System in insightface python. How to to control data control accross the application?

I am building a software surveillance project which is AI and Web based face recognition system. For face recognition I am using insightface, and for detecting identity I am using faiss - Facebook similarity search. for stable streaming I am using imutils library which is quite flexible for frame related streams coming from cameras. So, now, I have collected those technologies to use and now I am just stopped and just dont have any idea how to start, in my plan there is an illustration follows like this:

Interesting thing is this process above executes every time whenever face is detected in camera, (one of cameras) and each camera is binded to one independent thread. I am thinking of building more productive program so another feature of this program that web based and frame based functions should be asynchronous. the main problem I have currently is I dont know how to transport the data flow such as camera connection status, users' face encodings in numpy array format, database data from AI to Web (Django REST). I believe that there are many senior programmers in stackoverflow sharing time and effort for new learners, so I would be very glad with taking respectful help from senior AI developers for me as junior dev. Here is more specific and detailed questions I am engaged in:

I am curious about how do I build stable stream or sort of stable infinite and regular loop that takes care of life of connected cameras, life of face recognition. Or the way the program works in simple while or for loop may not be productive, Can you guys please give idea about that.
If I establish stable camera connection for multiple cameras, then how do I share RAM or VRAM memory to them as lightweight as possible, like I mean, my program is already configured for GPU, and would async, multiprocessing (tasks in different cores and Python interpreters), or threading (multiple threads in single core) be efficient, or what other more efficient ways?
I have DRF app which has two models: Employees, Cameras. Employees model includes several fields including imagefield in it. And face encodings can be read from path which is stored in that imagefield. and my face training dataset includes like this: /media//main.jpg and Faces will be loaded as follows:

import torch
import insightface
import cv2
import os
import numpy as np
import faiss


class FaceTrainer:
    def __init__(self, root_dir):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        try:
            self.face_model = insightface.app.FaceAnalysis()
            ctx_id = 0
            self.face_model.prepare(ctx_id=ctx_id)
        except Exception as e:
            print(f"Failed to load models: {e}")
            raise

        self.index, self.known_face_names = self.load_face_encodings(root_dir)
        print(self.index)

    def load_face_encodings(self, root_dir):
        known_face_encodings = []
        known_face_names = []
        for dir_name in os.listdir(root_dir):
            dir_path = os.path.join(root_dir, dir_name)
            if os.path.isdir(dir_path):
                for file_name in os.listdir(dir_path):
                    if file_name.endswith((".jpg", ".png")):
                        image_path = os.path.join(dir_path, file_name)
                        image = cv2.imread(image_path)
                        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                        faces = self.face_model.get(image)

                        if faces:
                            face = faces[0]
                            embedding = face.embedding
                            known_face_encodings.append(embedding)
                            known_face_names.append(dir_name)

        known_face_encodings = np.array(known_face_encodings)
        index = faiss.IndexFlatL2(known_face_encodings.shape[1])
        index.add(known_face_encodings)
        return index, known_face_names

    @property
    def get_face_model(self):
        return self.face_model

This method is not efficient anyway, cause the newly added employees' images wont be loaded and trained during the AI program lifecycle. I wanna do things so dynamic so when new camera addresses will be added to database something called Celery background process should take care of connecting these cameras by checking database in some interval. Like I said, I have the logic to do it but I dont have an idea where to start and time is pretty intensive right now.

And last question I have how do I make AI and Web conversate each other, like AI should pass data to Web and Web should pass data such as camera and employees details to AI.

For more conversation I am available on github, Telegram, Gmail I would be very glad if you guys help me and answer my poorly asked questions as more detailed as possible and that would also be beneficial for everyone who is learning AI and Web development just like me.

Thank you a lot from advance, Abdusamad

How to extract black colored large polygons from a background image using open cv

I have following images and I want to extract the black colored polygons using open cv in python. I have tried using thresholding + contouring but since I do not know the number of polygons it's hard to filter out irrelevant polygons/contours. Any suggestions on how to do this.

How to optimise resource utilisation on a platform where users can try out multiple Image processing models

I am building a platform to host multiple models for users to try, I want best way to optimise cloud resources so that the GPU and other resources only get billed when a user is using them.

Looking out for services from cloud service providers like GCP, AWS, Azure etc

How to contrsuct a Volume MRI/CT from image slices

I would like to achieve the following as seen below: https://www.youtube.com/watch?v=tXJS-ZnBP4k&ab_channel=mahajanimaging

I would like to use python or C++ to achieve this along with necessary libraries such as nibabel, simple ITK etc.

Need guidance on how to achieve this.

Thank you.

World to pixel transformation in Pyrender

I'm trying to transform a point in a 3D world rendered with pyrender to pixel coordinates, however the pixel coordinates are incorrect and I can't figure out what I'm doing wrong. I appreciate any hints!

The goal is to get the pixel coordinates uvw of the world-point UVW. Currently, I do the following:

Create Camera:

I create a camera from an already existing intrinsic matrix (= K). I do this mainly for debugging purposes, so I can be sure that K is correct:

K = np.array([[415.69219382,   0.        , 320.        ],
   [  0.        , 415.69219382, 240.        ],
   [  0.        ,   0.        ,   1.        ]])
K = np.ascontiguousarray(K, dtype=np.float32)
p_cam = pyrender.camera.IntrinsicsCamera(fx = K[0][0], fy = [1][1], cx =[0][2],  cy = [1][2])

scene.add(p_cam, pose=cam_pose.get_transformation_matrix(x=6170., y=4210., z=60., yaw=20, pitch=0, roll=40)) # cam_pose is my own class

Create transformation matrix

I'm creating an transformation matrix with an extrinsic rotation.

def get_transformation_matrix(self, x, y, z, yaw, pitch, roll):
    from scipy.spatial.transform import Rotation as R
    '''
    yaw = rotate around z axis
    pitch = rotate around y axis
    roll = rotate around x axis
    '''
    xyz = np.array([
        [x],
        [y],
        [z]
    ])
    rot = rot_matrix = R.from_euler('zyx', [yaw, pitch, roll], degrees=True).as_matrix()
    last_row = np.array([[0,0,0,1]])
    tf_m = np.concatenate((np.concatenate((rot,xyz), axis = 1), last_row), axis = 0)
    return np.ascontiguousarray(tf_m, dtype=np.float32)

Render image

Using the created camera, I render the following image. The point I'm trying to transform is the tip of the roof, which approximately has the pixel coordinates (500,160). I marked it in the 3D scene with the pink cylinder.

Transform world to pixel frame

from icecream import ic
UVW1 = [[6184],[4245],[38],[1]] #the homogeneous coordinates of the pink cylinder in the world frame
world_to_camera = np.linalg.inv(cam_pose.transformation_matrix).astype('float32') @ UVW1
ic(world_to_camera)
camera_to_pixel = K @ world_to_camera
ic(camera_to_pixel/camera_to_pixel[2]) #Transforming the homogeneous coordinates back

Output:

ic| world_to_camera: array([[ 17.48892188],
                            [  7.11796755],
                            [-39.35071968],
                            [  1.        ]])

ic| camera_to_pixel/camera_to_pixel[2]: array([[135.25094424],
                                               [164.80738424],
                                               [  1.        ]])

Results

To me, the world_to_camera pose seems like it might be correct. However, when transforming from camera frame to pixel frame, the x-coordinate (135) is wrong (the y-coordinate (164) might still make sense).

Attached a screenshot of the 3D scene. The yellow cylinder+axes represent the camera, while the blue point represents the point I'm trying to transform (earlier magenta in the rendered image).

So to me, the only source of error could be the intrinsic matrix, however I'm defining this matrix myself, so I don't see how it could be incorrect. Is there something I'm blind to?

Why is my kNN Search not returning matches when the submitted image is slightly different?

I have an Elastic Search Index which is setup to store dense_vectors.

The dense vectors field is configured like so.

image_dense_vector: {
  type: 'dense_vector',
  dims: 512,
  index: true,
  similarity: 'dot_product'
}

I have a process setup which utilizes the python img2vec library along with the Resnet18 model to generate a Vector with 512 dimensions like so.

img2vec = Img2Vec(cuda=False, model="resnet18", layer_output_size=512)

resized_image = img.resize((224, 224))
vector = img2vec.get_vec(resized_image, tensor=False)
vector = helpers.normalize_vector(vector)

es.index(index=request.index, id=id,
   body={"image_dense_vector": vector},
)

To search for results I'm passing in an image and getting the vector for that image the same way as above then passing it into a ES query like so.

{
  "field": "image_dense_vector",
  "query_vector": vector,
  "k": limit,
  "num_candidates": 50,
}

The problem I'm running into is that if my query image has a smudge or isn't perfectly cropped the result images are off. The total number of images indexed is about 500,000.

I would like to be able to figure out a way to make it so that the images don't need to be a perfect match to get a accurate response.

Here are some example images and the original image that was index.

Example Images:

Original Image: https://dev-cdn.keycollectorcomics.com/media/3110ff7a-a053-4180-b079-8a81b946aa35.jpg

YOLOv8 making false positives detections

I have trained 215 pics which are contains chess pieces with using yolov8m.pt as pre-trained model and epoch value was 75.

!yolo task=detect mode=train model=yolov8m.pt data={dataset.location}/data.yaml epochs=75 imgsz=800 plots=True

Then when I tried that model it detected stuff which is not in train set. How can I solve that problem.

Normal view

Create Camera:

Create transformation matrix

Render image

Transform world to pixel frame

Results