โŒ

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

How to prevent the duplicate result from previous frame in python object tracking

Currently, I work on Object detection, tracking and counting and I want to store the result from object detection ,tracking, and counting and whenever the vehicle cross the line, the result always give me duplicate. how can i prevent that

and here for the camera code

class Camera(BaseCamera): """ OpenCV video stream """ video_source = 0 start, end = Point(0, 500), Point(1280, 500) detector = Detector() tracker = ByteTrack() line_zone = LineZone(start=start, end=end) annotator = LineZoneAnnotator()

def __init__(self, enable_detection: bool = False):
    video_source = os.environ.get("VIDEO_SOURCE")
    try:
        video_source = int(video_source)
    except Exception as exp:    # pylint: disable=broad-except
        if not video_source:
            raise EnvironmentError("Cannot open the video source!") from exp
    finally:
        Camera.set_video_source(video_source)
    super().__init__()
    self.enable_detection = enable_detection

@staticmethod
def set_video_source(source):
    """Set video source"""
    Camera.video_source = source

@classmethod
def frames(cls):
    """
    Get video frame
    """
    camera = cv2.VideoCapture(Camera.video_source)
    if not camera.isOpened():
        raise RuntimeError("Could not start camera.")

    while True:
        # read current frame
        ret, img = camera.read()

        # Loop back
        if not ret:
            camera.set(cv2.CAP_PROP_POS_FRAMES, 0)
            continue

        # Object detection
        results = cls.detector(image=img)
        selected_classes = [ 2, 3]

        tensorflow_results = results.detections
        cls.annotator.annotate(img, cls.line_zone)
        if not tensorflow_results:
            yield cv2.imencode(".jpg", img)[1].tobytes()
            continue

        detections = Detections.from_tensorflow(tensorflow_results=tensorflow_results)

        detections = cls.tracker.update_with_detections(detections=detections)
        detections = detections[np.isin(detections.class_id, selected_classes)]
        
        result=cls.line_zone.trigger(detections)
        if type(result)!=type(None) and len(result)>=3:

            print(result[2])
            
        img = visualize(image=img, detections=detections)

        # encode as a jpeg image and return it
        yield cv2.imencode(".jpg", img)[1].tobytes()

How do I add reversible noise to the MNIST dataset using PyTorch?

I would like to add reversible noise to the MNIST dataset for some experimentation.

Here's what I am trying atm:

import torchvision.transforms as transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
from PIL import Image
import torchvision

def display_img(pixels, label = None):
    plt.imshow(pixels, cmap="gray")
    if label:    
        plt.title("Label: %d" % label)
    plt.axis("off")
    plt.show()

class NoisyMNIST(torchvision.datasets.MNIST):
    def __init__(self, root, train=True, transform=None, target_transform=None, download=False):
        super(NoisyMNIST, self).__init__(root, train=train, transform=transform, target_transform=target_transform, download=download)

    def __getitem__(self, index):
        img, target = self.data[index], self.targets[index]
        img = Image.fromarray(img.numpy(), mode="L")

        if self.transform is not None:
            img = self.transform(img)
        
        # add the noise
        noise_level = 0.3
        noise = self.generate_safe_random_tensor(img) * noise_level
        noisy_img = img + noise
        
        return noisy_img, noise, img, target

    def generate_safe_random_tensor(self, img):
        """generates random noise for an image but limits the pixel values between -1 and 1""" 
       
        min_values = torch.clamp(-1 - img, max=0)
        max_values = torch.clamp(1 - img, min=0)
       
        return torch.rand(img.shape) * (max_values - min_values) + min_values



# Define transformations to apply to the data
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert images to tensors
    transforms.Normalize((0.1307,), (0.3081,)),
])

train_dataset = NoisyMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = NoisyMNIST(root='./data', train=False, download=True, transform=transform)

np_noise = train_dataset[img_id][1]
np_data = train_dataset[img_id][0]



display_img(np_data_sub_noise, 4)

Ideally, this would give me the regular MNIST dataset along with a noisy MNIST images and a collection of the noise that was added. Given this, I had assumed I could subtract the noise from the noisy image and go back to the original image, but my image operations are not reversible.

Any pointers or code snippets would be greatly appreciated. Below are the images I currently get wit my code:

Original image:

enter image description here

With added noise:

enter image description here

And with the noise subtracted for the image with noise:

enter image description here

Linear equation from pixel space to world space

I'm successfully detecting lane lines and determining their linear equations within the image plane (pixel space).

I've been able to convert each pixel's location from pixel coordinates to real-world coordinates (world space) using the camera's intrinsic and extrinsic parameters.

Then, I use linear regression on the world coordinates to generate a single linear equation representing the lane line in world space.

My current approach involves two steps: first converting to world space, then performing linear regression. Is there a way to directly transform the pixel-space linear equation itself into its equivalent world-space equation without the intermediate conversion step?

c++ code or mathematical explanation

Issue with Reading Frames from CCTV Camera Using OpenCV via RTSP

import sys
import time
import cv2
import numpy as np


class CameraManager:
    def __init__(self, camera_uris):
        self.camera_uris = camera_uris
        self.cameras = self._initialize_cameras()

    def _initialize_cameras(self):
        cameras = []
        for uri in self.camera_uris:
            cap = cv2.VideoCapture(uri)
            if not cap.isOpened():
                print("Cannot open camera")
                sys.exit()
            cameras.append(cap)
        return cameras

    def read_frames(self,skip_frames=3):
        frames = []
        for index, cam in enumerate(self.cameras):
            for _ in range(skip_frames):
                cam.read()
            ret, frame = cam.read()
            # print(int(cam.get(cv2.CAP_PROP_POS_FRAMES)))
            if not ret:
                print("Can't receive frame (stream end?). Exiting ...")
                self.cameras[index] = cv2.VideoCapture(self.camera_uris[index])
                frame = np.zeros((1080, 1920, 3), dtype=np.uint8)
            frames.append(frame)
        return frames

def resize_images(images, new_width=640, new_height=480):
    """
    Resize all images in the list to the specified dimensions.
    """
    resized_images = []
    for img in images:
        resized = cv2.resize(img, (new_width, new_height))
        resized_images.append(resized)
    return resized_images

def calculate_grid_size(num_images):
    """
    Calculate the grid size based on the number of images.
    """
    if num_images == 0:
        return 0, 0
    rows = ((num_images - 1) // 3) + 1
    cols = 3
    return rows, cols

def merge_images_in_grid(images):
    """
    Merge images into a grid layout, with the grid size dynamically calculated.
    """
    if not images:
        raise ValueError("No images to merge")

    # Resize images
    images = resize_images(images, 640, 480)

    # Calculate grid size
    grid_rows, grid_cols = calculate_grid_size(len(images))

    # Get dimensions of the resized images
    img_height, img_width, _ = images[0].shape

    # Grid dimensions
    grid_width = img_width * grid_cols
    grid_height = img_height * grid_rows

    # Create an empty black image for the grid
    merged_image = np.zeros((grid_height, grid_width, 3), dtype=np.uint8)

    # Place each image in the grid
    for i, img in enumerate(images):
        row = i // grid_cols
        col = i % grid_cols
        merged_image[row * img_height:(row + 1) * img_height, col * img_width:(col + 1) * img_width, :] = img

    return merged_image

class MainApplication:
    def __init__(self):
        self.camera_uris = ["rtsp://admin:[email protected]","rtsp://admin:[email protected]","rtsp://admin:[email protected]","rtsp://admin:[email protected]"]
        self.camera_manager = CameraManager(camera_uris=self.camera_uris)

    def run(self):
        frame_number = 0
        while True:
            frames = self.camera_manager.read_frames()
            time.sleep(0.5)
            merge_image = merge_images_in_grid(images=frames)
            cv2.namedWindow('output', cv2.WINDOW_NORMAL)
            cv2.imshow('output', merge_image)

            if cv2.waitKey(1) & 0xFF == ord('q'):
                exit()


if __name__ == "__main__":
    try:
        app = MainApplication()
        app.run()
    except KeyboardInterrupt:
        print("Exiting ..")

In this code, I manage multiple CCTV cameras connected via RTSP (Real-Time Streaming Protocol). The program continuously captures frames from these cameras, selectively skipping a set number of frames. After reading a frame, it's added to a list which is then returned. Instead of implementing YOLO object detection, I've introduced a sleep interval of 0.5 seconds. Following this pause, the captured frames are displayed using OpenCV's imshow function

The program runs without any errors when time.sleep(0.5) is not included, but including this line causes an error.

When I replace the time.sleep(0.5) with YOLO object detection, the program also encounters an error.

Can't receive frame (stream end?). Exiting ...
[h264 @ 0x1c6bbc0] error while decoding MB 114 18, bytestream -35
Can't receive frame (stream end?). Exiting ...
Can't receive frame (stream end?). Exiting ...
[hevc @ 0x13d3840] Could not find ref with POC 6
[h264 @ 0x1c2e9c0] error while decoding MB 102 13, bytestream -5
[h264 @ 0x244bfc0] error while decoding MB 12 31, bytestream -7
Can't receive frame (stream end?). Exiting ...
Can't receive frame (stream end?). Exiting ...
[hevc @ 0x1c42900] Could not find ref with POC 0
[rtsp @ 0x1ade880] RTP: PT=60: bad cseq 1d84 expected=0ba8
[hevc @ 0x1c27880] Could not find ref with POC 36

why the program throws an error when time.sleep(0.5) is included or when YOLO object detection is implemented. Identifying the root cause of these errors will help in finding an appropriate solution.

How to get set of colours in an image using python PIL

After importing an image using python's PIL module I would like to get the set of colours in the image as a list of rgb tuples.

What If I know before hand that there will only be 2 colours and the image will be very small, maybe 20x20 pixels? However, I will be running this algorithm over a lot of images. Will It be more effiient to loop through all pixels until I see 2 unique colours? Because I understand loops are very slow in python.

Face Recognition System in insightface python. How to to control data control accross the application?

I am building a software surveillance project which is AI and Web based face recognition system. For face recognition I am using insightface, and for detecting identity I am using faiss - Facebook similarity search. for stable streaming I am using imutils library which is quite flexible for frame related streams coming from cameras. So, now, I have collected those technologies to use and now I am just stopped and just dont have any idea how to start, in my plan there is an illustration follows like this:

Project plan

Interesting thing is this process above executes every time whenever face is detected in camera, (one of cameras) and each camera is binded to one independent thread. I am thinking of building more productive program so another feature of this program that web based and frame based functions should be asynchronous. the main problem I have currently is I dont know how to transport the data flow such as camera connection status, users' face encodings in numpy array format, database data from AI to Web (Django REST). I believe that there are many senior programmers in stackoverflow sharing time and effort for new learners, so I would be very glad with taking respectful help from senior AI developers for me as junior dev. Here is more specific and detailed questions I am engaged in:

  1. I am curious about how do I build stable stream or sort of stable infinite and regular loop that takes care of life of connected cameras, life of face recognition. Or the way the program works in simple while or for loop may not be productive, Can you guys please give idea about that.
  2. If I establish stable camera connection for multiple cameras, then how do I share RAM or VRAM memory to them as lightweight as possible, like I mean, my program is already configured for GPU, and would async, multiprocessing (tasks in different cores and Python interpreters), or threading (multiple threads in single core) be efficient, or what other more efficient ways?
  3. I have DRF app which has two models: Employees, Cameras. Employees model includes several fields including imagefield in it. And face encodings can be read from path which is stored in that imagefield. and my face training dataset includes like this: /media//main.jpg and Faces will be loaded as follows:
import torch
import insightface
import cv2
import os
import numpy as np
import faiss


class FaceTrainer:
    def __init__(self, root_dir):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        try:
            self.face_model = insightface.app.FaceAnalysis()
            ctx_id = 0
            self.face_model.prepare(ctx_id=ctx_id)
        except Exception as e:
            print(f"Failed to load models: {e}")
            raise

        self.index, self.known_face_names = self.load_face_encodings(root_dir)
        print(self.index)

    def load_face_encodings(self, root_dir):
        known_face_encodings = []
        known_face_names = []
        for dir_name in os.listdir(root_dir):
            dir_path = os.path.join(root_dir, dir_name)
            if os.path.isdir(dir_path):
                for file_name in os.listdir(dir_path):
                    if file_name.endswith((".jpg", ".png")):
                        image_path = os.path.join(dir_path, file_name)
                        image = cv2.imread(image_path)
                        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                        faces = self.face_model.get(image)

                        if faces:
                            face = faces[0]
                            embedding = face.embedding
                            known_face_encodings.append(embedding)
                            known_face_names.append(dir_name)

        known_face_encodings = np.array(known_face_encodings)
        index = faiss.IndexFlatL2(known_face_encodings.shape[1])
        index.add(known_face_encodings)
        return index, known_face_names

    @property
    def get_face_model(self):
        return self.face_model

This method is not efficient anyway, cause the newly added employees' images wont be loaded and trained during the AI program lifecycle. I wanna do things so dynamic so when new camera addresses will be added to database something called Celery background process should take care of connecting these cameras by checking database in some interval. Like I said, I have the logic to do it but I dont have an idea where to start and time is pretty intensive right now.

And last question I have how do I make AI and Web conversate each other, like AI should pass data to Web and Web should pass data such as camera and employees details to AI.

For more conversation I am available on github, Telegram, Gmail I would be very glad if you guys help me and answer my poorly asked questions as more detailed as possible and that would also be beneficial for everyone who is learning AI and Web development just like me.

Thank you a lot from advance, Abdusamad

World to pixel transformation in Pyrender

I'm trying to transform a point in a 3D world rendered with pyrender to pixel coordinates, however the pixel coordinates are incorrect and I can't figure out what I'm doing wrong. I appreciate any hints!

The goal is to get the pixel coordinates uvw of the world-point UVW. Currently, I do the following:

Create Camera:

I create a camera from an already existing intrinsic matrix (= K). I do this mainly for debugging purposes, so I can be sure that K is correct:

K = np.array([[415.69219382,   0.        , 320.        ],
   [  0.        , 415.69219382, 240.        ],
   [  0.        ,   0.        ,   1.        ]])
K = np.ascontiguousarray(K, dtype=np.float32)
p_cam = pyrender.camera.IntrinsicsCamera(fx = K[0][0], fy = [1][1], cx =[0][2],  cy = [1][2])

scene.add(p_cam, pose=cam_pose.get_transformation_matrix(x=6170., y=4210., z=60., yaw=20, pitch=0, roll=40)) # cam_pose is my own class

Create transformation matrix

I'm creating an transformation matrix with an extrinsic rotation.

def get_transformation_matrix(self, x, y, z, yaw, pitch, roll):
    from scipy.spatial.transform import Rotation as R
    '''
    yaw = rotate around z axis
    pitch = rotate around y axis
    roll = rotate around x axis
    '''
    xyz = np.array([
        [x],
        [y],
        [z]
    ])
    rot = rot_matrix = R.from_euler('zyx', [yaw, pitch, roll], degrees=True).as_matrix()
    last_row = np.array([[0,0,0,1]])
    tf_m = np.concatenate((np.concatenate((rot,xyz), axis = 1), last_row), axis = 0)
    return np.ascontiguousarray(tf_m, dtype=np.float32)

Render image

Using the created camera, I render the following image. The point I'm trying to transform is the tip of the roof, which approximately has the pixel coordinates (500,160). I marked it in the 3D scene with the pink cylinder.

Rendered image

Transform world to pixel frame

from icecream import ic
UVW1 = [[6184],[4245],[38],[1]] #the homogeneous coordinates of the pink cylinder in the world frame
world_to_camera = np.linalg.inv(cam_pose.transformation_matrix).astype('float32') @ UVW1
ic(world_to_camera)
camera_to_pixel = K @ world_to_camera
ic(camera_to_pixel/camera_to_pixel[2]) #Transforming the homogeneous coordinates back

Output:

ic| world_to_camera: array([[ 17.48892188],
                            [  7.11796755],
                            [-39.35071968],
                            [  1.        ]])

ic| camera_to_pixel/camera_to_pixel[2]: array([[135.25094424],
                                               [164.80738424],
                                               [  1.        ]])

Results

To me, the world_to_camera pose seems like it might be correct. However, when transforming from camera frame to pixel frame, the x-coordinate (135) is wrong (the y-coordinate (164) might still make sense).

Attached a screenshot of the 3D scene. The yellow cylinder+axes represent the camera, while the blue point represents the point I'm trying to transform (earlier magenta in the rendered image). Screenshot of the 3D scene with camera and target point.

So to me, the only source of error could be the intrinsic matrix, however I'm defining this matrix myself, so I don't see how it could be incorrect. Is there something I'm blind to?

Why is my kNN Search not returning matches when the submitted image is slightly different?

I have an Elastic Search Index which is setup to store dense_vectors.

The dense vectors field is configured like so.

image_dense_vector: {
  type: 'dense_vector',
  dims: 512,
  index: true,
  similarity: 'dot_product'
}

I have a process setup which utilizes the python img2vec library along with the Resnet18 model to generate a Vector with 512 dimensions like so.

img2vec = Img2Vec(cuda=False, model="resnet18", layer_output_size=512)

resized_image = img.resize((224, 224))
vector = img2vec.get_vec(resized_image, tensor=False)
vector = helpers.normalize_vector(vector)

es.index(index=request.index, id=id,
   body={"image_dense_vector": vector},
)

To search for results I'm passing in an image and getting the vector for that image the same way as above then passing it into a ES query like so.

{
  "field": "image_dense_vector",
  "query_vector": vector,
  "k": limit,
  "num_candidates": 50,
}

The problem I'm running into is that if my query image has a smudge or isn't perfectly cropped the result images are off. The total number of images indexed is about 500,000.

I would like to be able to figure out a way to make it so that the images don't need to be a perfect match to get a accurate response.

Here are some example images and the original image that was index.

Example Images: enter image description here enter image description here

Original Image: https://dev-cdn.keycollectorcomics.com/media/3110ff7a-a053-4180-b079-8a81b946aa35.jpg

โŒ
โŒ