โŒ

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Why does filtering based on a condition results in an empty DataFrame in pandas?

I'm working with a DataFrame in Python using pandas, and I'm trying to apply multiple conditions to filter rows based on temperature values from multiple columns. However, after applying my conditions and using dropna(), I end up with zero rows even though I expect some data to meet these conditions.

The goal is compare with Ambient temp+40 C and if the value is more than this, replace it with NaN. Otherwise, keep the original value.

Here's a sample of my DataFrame and the conditions I'm applying:

data = {
    'Datetime': ['2022-08-04 15:06:00', '2022-08-04 15:07:00', '2022-08-04 15:08:00', 
                 '2022-08-04 15:09:00', '2022-08-04 15:10:00'],
    'Temp1': [53.4, 54.3, 53.7, 54.3, 55.4],
    'Temp2': [57.8, 57.0, 87.0, 57.2, 57.5],
    'Temp3': [59.0, 58.8, 58.7, 59.1, 59.7],
    'Temp4': [46.7, 47.1, 80, 46.9, 47.3],
    'Temp5': [52.8, 53.1, 53.0, 53.1, 53.4],
    'Temp6': [50.1, 69, 50.3, 50.3, 50.6],
    'AmbientTemp': [29.0, 28.8, 28.6, 28.7, 28.9]
}
df1 = pd.DataFrame(data)
df1['Datetime'] = pd.to_datetime(df1['Datetime'])
df1.set_index('Datetime', inplace=True)

Code:

temp_cols = ['Temp1', 'Temp2', 'Temp3', 'Temp4', 'Temp5', 'Temp6']
ambient_col = 'AmbientTemp'

condition = (df1[temp_cols].lt(df1[ambient_col] + 40, axis=0))

filtered_df = df1[condition].dropna()
print(filtered_df.shape)

Response:

(0, 99)

Problem:

Despite expecting valid data that meets the conditions, the resulting DataFrame is empty after applying the filter and dropping NaN values. What could be causing this issue, and how can I correct it?

How to perform matthews_corrcoef in sklearn simultaneously between every column using a matrix X and and output y?

I want to calculate the Matthews correlation coefficient (MCC) in sklearn between every column of a matrix X with an output y. Here is my code:

from sklearn.metrics import matthews_corrcoef
import numpy as np

X = np.array([[1, 0, 0, 0, 0],
              [1, 0, 0, 1, 0],
              [1, 0, 0, 0, 1],
              [1, 1, 0, 0, 0],
              [1, 1, 0, 1, 0],
              [1, 1, 0, 0, 1],
              [1, 0, 1, 0, 0],
              [1, 0, 1, 1, 0],
              [1, 0, 1, 0, 1],
              [1, 0, 0, 0, 0]])
y = np.array([1, 0, 1, 0, 1, 0, 1, 0, 1, 0])
n_sample, n_feature = X.shape
rcf_all = []
for i in range(n_feature):
    coeff_c_f = abs(matthews_corrcoef(X[:, i], y))
    rcf_all.append(coeff_c_f)
rcf = np.mean(rcf_all)

It worked pretty good here but as long as I have a very big matrix with many features, calculating them by looping through one feature at a time is pretty slow. What is the most effective way to perform this simultaneously without using the loop to speed up the calculation process?

Edit:

I come up with the idea proposed by Andrej Kesely with the following extra codes from my previous question see here.

@numba.njit
def get_all_mcc_numba_with_y(X, y):
    rows, columns = X.shape

    confusion_matrix = np.zeros((2, 2), dtype="float32")

    out = []

    for i in range(columns):
        _fill_cm(confusion_matrix, X[:, i], y)
        out.append(abs(mcc(confusion_matrix)))

    return sum(out) / len(out)


@numba.njit
def numba_calculation_using_matthews_coef(X, y):
    rcf_all = get_all_mcc_numba_with_y(X, y)
    rcf = np.mean(rcf_all)

    
    all_rff = get_all_mcc_numba(X)
    rff = np.mean(all_rff)

    return rcf, rff

However I got these errors when I tried to call numba_calculation_using_matthews_coef

def get_all_mcc_numba_with_y(X, y):
    <source elided>
    for i in range(columns):
        _fill_cm(confusion_matrix, X[:, i], y)
        ^

During: resolving callee type: type(CPUDispatcher(<function get_all_mcc_numba_with_y at 0x1508dcd60>))
During: typing of call at 

During: resolving callee type: type(CPUDispatcher(<function get_all_mcc_numba_with_y at 0x1508dcd60>))
During: typing of call at 


def numba_calculation_using_matthews_coef(X, y):
    <source elided>
    rcf_all = get_all_mcc_numba_with_y(X, y)

Reading Data in Python using pandas

import pandas as pd
import sklearn
from sklearn.datasets import load_iris
Loading data from a CSV file
data = pd.read_csv('D:/Projects/FLGRU_Model/FLDataset/01-12/DrDoS_LDAP.csv')
df = pd.read_csv(data)
Performing data analysis
df.head()  # Display the first few rows
df.describe()  # Statistical summary of the data

What could be the problem with the following code,? its not reading the data

I was reading the data and te code is giving the following errors

 DtypeWarning: Columns (85) have mixed types. Specify dtype option on import or set low_memory=False.
  data = pd.read_csv('D:/Projects/FLGRU_Model/FLDataset/01-12/DrDoS_LDAP.csv')
Traceback (most recent call last):

Bokeh server only shows white page

I am trying to run a Bokeh server that shows an interface in browser with two candlestick charts that compare the data of two stocks. The browser opens with command bokeh serve --show main.py but all that appears is a blank page.

i was expecting to get a page in my browser that shows 4 inputs, first two for stock tickers, second two for start date and end date. After input there should be two charts next to each other to visualize the data. I am getting this error in the terminal:

UserWarning: 
It looks like you might be running the main.py of a directory app directly.
If this is the case, to enable the features of directory style apps, you must
call "bokeh serve" on the directory instead. For example:

    bokeh serve my_app_dir/

If this is not the case, renaming main.py will suppress this warning.

  sys.exit(main())
2024-04-27 13:12:56,435 Starting Bokeh server version 3.4.1 (running on Tornado 6.4)
2024-04-27 13:12:56,436 User authentication hooks NOT provided (default user enabled)
2024-04-27 13:12:56,439 Bokeh app running at: http://localhost:5006/main
2024-04-27 13:12:56,439 Starting Bokeh server with process id: 8136
2024-04-27 13:12:57,450 Error running application handler <bokeh.application.handlers.script.ScriptHandler object at 0x7ded45d1e150>: cannot import name 'tow' from 'bokeh.layouts' (/home/cameron/Documents/Code/.venv/lib/python3.12/site-packages/bokeh/layouts.py)
File 'main.py', line 9, in <module>:
from bokeh.layouts import tow, column Traceback (most recent call last):
  File "/home/cameron/Documents/Code/.venv/lib/python3.12/site-packages/bokeh/application/handlers/code_runner.py", line 229, in run
    exec(self._code, module.__dict__)
  File "/home/cameron/Documents/Code/main.py", line 9, in <module>
    from bokeh.layouts import tow, column
ImportError: cannot import name 'tow' from 'bokeh.layouts' (/home/cameron/Documents/Code/.venv/lib/python3.12/site-packages/bokeh/layouts.py)
 
2024-04-27 13:12:57,792 WebSocket connection opened
2024-04-27 13:12:57,793 ServerConnection created
2024-04-27 13:13:19,808 WebSocket connection closed: code=1001, reason=None

Here is the code:

import math
import datetime as dt

import numpy as np
import yfinance as yf

from bokeh.io import curdoc
from bokeh.plotting import figure
from bokeh.layouts import row, column
from bokeh.models import TextInput, Button, DatePicker, MultiChoice

def load_data(ticker1, ticker2, start, end):
    df1 = yf.download(ticker1, start, end)
    df2 = yf.download(ticker2, start, end)
    return df1, df2

def plot_data(data, indicators, sync_axis=None):
    df = data
    gain = df.Close > df.Open
    loss = df.Open > df.Close
    width = 12 * 60 * 60 * 1000

    if sync_axis is not None:
        p = figure(x_axis_type="datetime", tools="pan,wheel_zoom,box_zoom,reset,save", width=1000,
                   x_range=sync_axis)
    else:
        p = figure(x_axis_type="datetime", tools="pan,wheel_zoom,box_zoom,reset,save", width=1000)

    p.xaxis.major_label_orientation = math.pi / 4
    p.grid.grid_line_alpha = 0.25

    p.segment(df.index, df.High, df.index, df.Low, color="black")
    p.vbar(df.index[gain], width, df.Open[gain], df.Close[gain], fill_color="#00ff00", line_color="00ff00")
    p.vbar(df.index[loss], width, df.Open[loss], df.Close[loss], fill_color="#ff0000", line_color="ff0000")


def on_button_click(ticker1, ticker2, start, end, indicators):
    df1, df2 = load_data(ticker1, ticker2, start, end)
    p1 = plot_data(df1, indicators)
    p2 = plot_data(df2, indicators, sync_axis=p1.x_range)
    curdoc().clear()
    curdoc().add_root(layout)
    curdoc().add_root(row(p1, p2))



stock1_text = TextInput(title='Stock 1')
stock2_text = TextInput(title='Stock 2')
date_picker_from = DatePicker(title="Start date", value="2020-01-01", min_date="2000-01-01", 
                              max_date=dt.datetime.now().strftime("%Y-%m-%d"))
date_picker_to = DatePicker(title="End date", value="2020-02-01", min_date="2000-01-01", 
                              max_date=dt.datetime.now().strftime("%Y-%m-%d"))
indicator_choice = MultiChoice(options=["100 Day SMA", "30 Day SMA", "Linear Regression line"])

load_button = Button(label="Load Data", button_type="success")
load_button.on_click(lambda: on_button_click(stock1_text.value, stock2_text.value, 
                                             date_picker_from.value, date_picker_to.value,
                                             indicator_choice.value))

layout = column(stock1_text, stock2_text, date_picker_from, date_picker_to, indicator_choice, load_button)

curdoc().clear()
curdoc().add_root(layout)


I am very new to this and don't understand why the terminal is saying I wrote 'tow' on line 9, I originally made a typo, but I did fix it to say 'row'. I am unable to find the solution anywhere.

Numpy Broadcasting - Need complete understanding

I am trying to understand Numpy Broadcasting.

So I want to understand why is the below code working?

a = np.arange(4).reshape(2,2)
b = np.arange(6).reshape(3,2)

a = a[:, np.newaxis]

a + b

I mean if we try to add a and b without adding another dimension, it throws an ValueError. "ValueError: operands could not be broadcast together with shapes (2,2) (3,2)"

But if we when use newaxis which really adds a dimension to a, the shapes still being different, why is that working? Why NumPy is not throwing an error, for trying to perform an arithmetic operation on arrays of different dimension and shape.

Also is there a definitive resource that gives an indepth explanation of the Numpy broadcasting with an exhaustive list of examples?

After I added a dimension based off of the explanation given in the below link How do I use np.newaxis? - kmario23's explanation Scenario 2. It seems to work. But my understanding is if the shapes are different then it should not work.

How to match pairs of values contained in two numpy arrays

I have two sets of coordinates and want to find out which coordinates of the coo set are identical to any coordinate in the targets set. I want to know the indices in the coo set which means I'd like to get a list of indices or of bools.

import numpy as np

coo = np.array([[1,2],[1,6],[5,3],[3,6]]) # coordinates
targets = np.array([[5,3],[1,6]]) # coordinates of targets

print(np.isin(coo,targets))

[[ True False]
 [ True  True]
 [ True  True]
 [ True  True]]

The desired result would be one of the following two:

[False True True False] # bool list
[1,2] # list of concerning indices

My problem is, that ...

  • np.isin has no axis-attribute so that I could use axis=1.
  • even applying logical and to each row of the output would return True for the last element, which is wrong.

I am aware of loops and conditions but I am sure Python is equipped with ways for a more elegant solution.

Normal Equation for linear regression

I have the following X and y matrices:

enter image description here

for which I want to calculate the best value for theta for a linear regression equation using the normal equation approach with:

theta = inv(X^T * X) * X^T * y

the results for theta should be : [188.400,0.3866,-56.128,-92.967,-3.737]

I implement the steps with:

X=np.matrix([[1,1,1,1],[2104,1416,1534,852],[5,3,3,2],[1,2,2,1],[45,41,30,36]])
y=np.matrix([460,232,315,178])

XT=np.transpose(X)

XTX=XT.dot(X)

inv=np.linalg.inv(XTX)

inv_XT=inv.dot(XT)

theta=inv_XT.dot(y)

print(theta)

But I dont't get the desired results. Instead it throws an error with:

Traceback (most recent call last): File "C:/", line 19, in theta=inv_XT.dot(y) ValueError: shapes (4,5) and (1,4) not aligned: 5 (dim 1) != 1 (dim 0)

What am I doing wrong?

How can I speed up the processing of my nested for loops for a giant 3D numpy array?

I created a very large 3D numpy array called tr_mat. The shape of tr_mat is:

tr_mat.shape
(1024, 536, 21073)

Info on the 3D numpy array: First and before going into the actual code, I would like to clarify what I am attempting to do. As can be seen from tr_mat.shape, the 3D numpy array contains numeric values in 1024 rows and 536 columns. That is, we have 536 * 1024 = 548864 values in each of the 21073 matrices.

Conceptual background about my task: Each of the 21073 2D numpy arrays within the 3D numpy array contains grayscaled pixel values from an image. The 3D numpy array tr_mat is already transposed, because I would like to construct a time-series based on identical pixel positions across all 21073 matrices. Finally, I would like to individually save each of the resulting 548864 time-series in a .1D textfile. (Hence, I would end up with saving 548864 .1D textfiles.)

The relevant part of the code:

tr_mat = frame_mat.transpose() # the tranposed 3D numpy array
# Save
rangeh = range(0, 536)
for row, row_n_l in zip(tr_mat, rangeh): # row = pixel row of the 2D image
        for ts_pixel, row_n in zip(row, rangeh): # ts_pixel = the pixel time-series across the 3D array (across the single 2D arrays)
        # Save
        with open(f"/volumes/.../TS_Row{row_n_l}_Pixel{row_n}.1D", "w") as file:
                for i in ts_pixel: file.write(f"{i}\n") # Save each time-series value per row

Question: Could you provide me some tips how to modify my code in order to speed it up? I wrapped tqdm around the first for loop to check how fast the nested loop is processed, and it took around 20 minutes to reach ~120 of 536 rows. Also, it seems to me that the loop gets slower and slower as the iterations go up.

Faster alternative for numpy einsum in Python

I am trying to perform the following operations on some tensors. Currently I am using einsum, and I wonder if there is a way (maybe using dot or tensordot) to make things faster, since I feel like what I am doing is more or less some outer and inner products.

res1 = numpy.einsum('ij, kjh->ikjh', A, B)
res2 = np.einsum('ijk, jk->ij', C, D)

I have tried using tensordot and dot, and for some reason, I cannot figure the right way to set the axes...

Applying a specified rule to build an array

Consider array a whose columns hold random values taken from 1, 2, 3:

a = np.array([[2, 3, 1, 3],
              [3, 2, 1, 3],
              [1, 1, 1, 2],
              [1, 3, 2, 3],
              [3, 3, 1, 3],
              [2, 1, 3, 2]])

Now, consider array b whose first 2 columns hold the 9 possible pairs of values taken from 1, 2, 3 (the order of the pair elements is important). The 3rd column of b associates a non-negative integer with each pairing.

b = np.array([[1, 1,  6]
              [1, 2,  0]
              [1, 3,  9]
              [2, 1,  6]
              [2, 2,  0]
              [2, 3,  4]
              [3, 1,  1]
              [3, 2,  0]
              [3, 3,  8]])

I need help with code that produces array c where vertically adjacent elements in a are replaced with matching values from the 3rd column of b. For example, the first column of 'a' moves down from 2 to 3 to 1 to 1 to 3 to 2. So, the first column of c would hold values 4, 1, 6, 9, 0. The same idea applies to every column of a. We see that pair order is important (moving from 3 to 1 produces value 1, while moving from 1 to 3 produces value 9.

The output of this small example would be:

c = np.array([[4, 0, 6, 8],
              [1, 6, 6, 0],
              [6, 9, 0, 4],
              [9, 8, 6, 8],
              [0, 1, 9, 0]])

Because this code will be executed a vast number of times, I'm hoping there is a speedy vectorized solution. Thanks.

Confusion regarding the inner workings of NumPy's SeedSequence

In case it matters at all, I'm using Python 3.11.5 64-bit on a Windows 11 Pro desktop computer with NumPy 1.26.4.

In order to try to better understand what NumPy is doing behind the scenes when I ask for a np.random.Generator object from some given SeedSequence, I decided to try to reconstruct in pure Python what happens when I initialize a SeedSequence from a given entropy value.

Based on the source code for SeedSequence found here, my understanding of how uint32 overflow works, and the fact that (on my machine at least) np.dtype(np.uint32).itemsize is 4, i.e. XSHIFT, defined as np.dtype(np.uint32).itemsize * 8 // 2, is 16, I wrote the following code:

seed = int(input('Please enter a seed: '))
Entropy = seed
Spawn_key = ()
Pool_size = 8
N_children_spawned = 0
Pool = [0 for _ in range(Pool_size)]
Assembled_entropy = []
Ent = Entropy + 0
while Ent > 0:
    Assembled_entropy.append(Ent & 0xffffffff)
    Ent >>= 32
if not Assembled_entropy:
    Assembled_entropy = [0]

hash_const = 0x43b0d7e5
for i in range(Pool_size):
    if i < len(Assembled_entropy):
        Assembled_entropy[i] ^= hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        Assembled_entropy[i] *= hash_const
        Assembled_entropy[i] &= 0xffffffff
        Assembled_entropy[i] ^= Assembled_entropy[i] >> 16
        Pool[i] = Assembled_entropy[i]
    else:
        value = hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        value *= hash_const
        value &= 0xffffffff
        value ^= value >> 16
        Pool[i] = value
for i_src in range(Pool_size):
    for i_dst in range(Pool_size):
        if i_src != i_dst:
            Pool[i_src] ^= hash_const
            hash_const *= 0x931e8875
            hash_const &= 0xffffffff
            Pool[i_src] *= hash_const
            Pool[i_src] &= 0xffffffff
            Pool[i_src] ^= Pool[i_src] >> 16
            x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
            y = (0x4973f715 * Pool[i_src]) & 0xffffffff
            Pool[i_dst] = x - y
            Pool[i_dst] &= 0xffffffff
            Pool[i_dst] ^= Pool[i_dst] >> 16
print(Pool)

I have copied the shell outputs of some test runs below.

Please enter a seed: 0
[595626433, 3558985979, 200295889, 3864401631, 3155212474, 198111058, 4047350828, 373757291]
Please enter a seed: 1
[2396653877, 491222160, 2441066534, 3196981647, 1764919720, 3210735412, 1132315803, 1197535761]
Please enter a seed: 123456789
[2161290507, 266876805, 2694113549, 3306969538, 3218948428, 3543586554, 886289367, 3129292100]
Please enter a seed: 123456789123456789
[2628723507, 610487362, 209721652, 1960674985, 3519121735, 1259052354, 2097159984, 3934338599]
Please enter a seed: 123456789123456789123456789123456789
[2988668238, 798946769, 2484899198, 1005350017, 2633831484, 343737596, 1402961265, 3184558744]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789
[431881030, 3789410928, 218849910, 879851040, 1423068736, 85390627, 3721593143, 198649564]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789
[702225118, 2293461530, 514808704, 2115883586, 3179647446, 3197133803, 3807436730, 1822195906]

from numpy.random import SeedSequence
seed = int(input('Please enter a seed: '))
seedseq = SeedSequence(entropy=seed, spawn_key=[], pool_size=8, n_children_spawned=0)
print([int(value) for value in seedseq.pool])

However, providing those same values to the above version of the program, which calls NumPy's SeedSequence directly, gives very different results:

Please enter a seed: 0
[2043904064, 467759482, 3940449851, 2747621207, 4006820188, 4161973813, 800317807, 2622167125]
Please enter a seed: 1
[476219752, 3923368624, 2653737542, 2876255837, 1861759290, 3300511046, 3253139541, 2224879358]
Please enter a seed: 123456789
[480462800, 1421661229, 2686834002, 3365909768, 3295673516, 1830753151, 1249963727, 3680881655]
Please enter a seed: 123456789123456789
[3112345096, 1618497203, 2864025213, 3262672577, 379697145, 163816190, 1265228116, 2568065655]
Please enter a seed: 123456789123456789123456789123456789
[2197723902, 2868273012, 1547285866, 2772382071, 2016971656, 1130152919, 897020445, 135618137]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789
[3230290517, 251217303, 1180998335, 454107561, 4150025399, 1840013050, 1216833737, 89665521]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789
[902839167, 3446715647, 2106916613, 1578536987, 595141342, 3126308643, 400300642, 3659109886]

What is going on here?

Drawing the outermost contour of a set of data points without losing resolution

I have a set of data points (as scattered data points in black) and I want to draw their outermost contour. I have tried to calculate the convex hull of my points (code below) but I lose too much resolution and the shape loses its nuances.

# load usual stuff
from __future__ import print_function
import sys, os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib import colors
from scipy.spatial import ConvexHull

# read input file
cbm_contour = sys.argv[1]


def parse_pdb_coords(file):
    f = open(file, "r")

    coords = X = np.empty(shape=[0, 3])

    while True:
        line = f.readline()
        if not line: break

        if line.split()[0] == "ATOM" or line.split()[0] == "HETATM":
            Xcoord = float(line[30:38])
            Ycoord = float(line[38:46])
            Zcoord = float(line[46:54])

            coords = np.append(coords, [[Xcoord, Ycoord, Zcoord]], axis=0)

    return coords

##########################################################################

plt.figure(figsize=(11, 10))

# parse input file
cbm_coords = parse_pdb_coords(cbm_contour)

# consider only x- and y-axis coordinates
flattened_points = cbm_coords[:, :2]
x = cbm_coords[:,0]
y = cbm_coords[:,1]

# Find the convex hull of the flattened points
hull = ConvexHull(flattened_points)

for simplex in hull.simplices:
    plt.plot(flattened_points[simplex, 0], flattened_points[simplex, 1], color='red', lw=2)

plt.scatter(cbm_coords[:,0], cbm_coords[:,1], s=1, c='black')

plt.xlabel('X-axis coordinate ($\mathrm{\AA} $)', size=16)
plt.ylabel('Y-axis distance ($\mathrm{\AA} $)', size=16)

plt.yticks(np.arange(-20, 24, 4),size=16)
plt.xticks(np.arange(-20, 24, 4),size=16)

plt.savefig("example.png", dpi=300, transparent=False)
plt.show()

Note that this can't be transformed to a 'minimal working example' due to the complexity of the data points, but my dataset can be downloaded here. The idea is to have a generalized solution for other datasets too.

Does anyone have a suggestion?

enter image description here

Mapping 2d point (x,y) into grid index (row,col)

I have a particular **Grid ** for robot navigation showing as so:

enter image description here

I have scaled this by a factor of 10 to create my Environment. This is given in Cartesian coordinates

enter image description here

I would like to map a point in the Environment (x,y) into Grid index (row, col)

The obstacle format is given in the form:

[[(60.0, 0.0), (70.0, 0.0), (70.0, 10.0), (60.0, 10.0)],

[(70.0, 0.0), (80.0, 0.0), (80.0, 10.0), (70.0, 10.0)],

[(60.0, 10.0), (70.0, 10.0), (70.0, 20.0), (60.0, 20.0)],

[(70.0, 10.0), (80.0, 10.0), (80.0, 20.0), (70.0, 20.0)],

However when i try to plot these using the following code:

fig, ax = plt.subplots()
        for obs_corner in self.obstacle_vertices:
            ox=obs_corner[0][0]
            oy=obs_corner[0][1]
            w=self.planner.scaling_factor
            h=self.planner.scaling_factor
            ax.add_patch(patches.Rectangle((ox,oy),w,h,edgecolor='black',facecolor='black',fill=True))

The graph generated is :

enter image description here

I think i am making a basic mistake in mapping matrix into a 2d environemnt

How to rearrange numbers in a 4x4 grid to all sum up to 0

so I need to code a solution to see if by rearranging the numbers of a 4x4 grid to see if the columns, rows and diaganals all equal 0. I've been trying to solve this question for a bit but its not working.

I looked online and the only answer I could see was the brute force method which would make every possible method, the only problem is that there are 16! combinations aka 20 trillion combinations so I tried to flatten it and that still didn't work.

Have a look at the code please:

import pandas as pd
import numpy as np
from itertools import permutations, islice

# Initial grid
grid = pd.DataFrame({
    'A': [5,-5,3,-2],
    'B': [1,-1,9,-10],
    'C': [-7,-4,7,4],
    'D': [-3,-8,2,8]
})

# there are like 16! possible combinations (20 trillion)
def is_valid_solution(df):
    # Check rows and columns sum to 0
    if not (df.sum(axis=0) == 0).all() or not (df.sum(axis=1) == 0).all():
        return False
    # Check diagonals sum to 0
    if df.values.trace() != 0 or np.fliplr(df.values).trace() != 0:
        return False
    return True

def rearrange_grid_bruteforce(df):
    values = df.values.flatten()
    # Use islice to limit the number of permutations evaluated
    for perm in islice(permutations(values), 100000):  # Just as an example, not feasible for the entire set
        new_grid = np.array(perm).reshape(df.shape)
        new_df = pd.DataFrame(new_grid, columns=df.columns)
        if is_valid_solution(new_df):
            return new_df
    return None

# Attempt to find a solution
# Note: This is for demonstration and will not compute all permutations due to computational constraints
solution = rearrange_grid_bruteforce(grid)

print('Working on it....')

if solution is not None:
    print("A solution was found:")
    print(solution)
else:
    print("No solution found within the limited permutation set.")

How do I parallelize a set of matrix multiplications

Consider the following operation, where I take 20 x 20 slices of a larger matrix and dot product them with another 20 x 20 matrix:

import numpy as np

a = np.random.rand(10, 20)
b = np.random.rand(20, 1000)

ans_list = []

for i in range(980):
    ans_list.append(
        np.dot(a, b[:, i:i+20])
    )

I know that NumPy parallelizes the actual matrix multiplication, but how do I parallelize the outer for loop so that the individual multiplications are run at the same time instead of sequentially?

Additionally, how would I go about it if I wanted to do the same using a GPU? Obviously, I'll use CuPy instead of NumPy, but how do I submit the multiple matrix multiplications to the GPU either simultaneously or asynchronously?


PS: Please note that the sliding windows above are an example to generate multiple matmuls. I know one solution (shown below) in this particular case is to use NumPy built-in sliding windows functionality, but I'm interested in knowing the optimal way to run an arbitrary set of matmuls in parallel (optionally on a GPU), and not just a faster solution for this particular example.

windows = np.lib.stride_tricks.sliding_window_view(b, (20, 20)).squeeze()
ans_list = np.dot(a, windows)

python printing each element twice

The code is giving first output correctly but it doesnot work when there is a same element more than once. I am expecting this as output:

Group 1 : 60, 120
Group 2 : 30, 150
Ungrouped : 30 60

But I am keep getting:

Group 1: 60, 120
Group 2: 150, 30
Group 3: 150, 30
Group 4: 60, 120
Ungrouped: [ 30 120  30  30  30   0] 

Till now I did this

def findGroups(money, fare):



  count=0
  ungroup = np.zeros(len(money), dtype = int)
  for i in range(0,len(money)):
    for j in range(i+1, len(money)):
      if money[i]+money[j] == fare:
        count += 1
        print(f'Group {count}: {money[i]}, {money[j]}')
      else:
        ungroup[i] = money[j]
  print(f"Ungrouped: {ungroup} ")
money = np.array( [60, 150, 60, 30, 120, 30])
fare = 180
print(f'Task 3:')
findGroups(money, fare) # This should print

# Group 1 : 60, 120
# Group 2 : 30, 150
# Ungrouped : 30 60

Solve matrix and vector multiplication with parameters instead of values (preferably in python)

I want to look at some vector operations and see which matrix elements go into which vector, e.g. if I define a matrix with elements

mat = [["a11", "a12"], ["a21", "a22"]]

and a vector

vec = ["v1", "v2"]

then I'm looking for some module / library that gives me the result when I calclulate the product:

res = mat*vec = ["a11"*"v1" + "a12"*"v2", "a21"*"v1" + "a22"*"v2"]

I know this is easy to do if all the parameters are actual numbers with numpy and of course I could work this out by hand, but if the operations becomes more complex it would be nice to have a way to automatically generate the resulting vector as a parameter equation.

Bonus points if the equation gets simplified, if e.g. the result has +"a11" - "a11" somewhere and reduces this to 0.

Is this at all possible to do in python? Wolfram Alfa gets me what I'm looking for, but I also need some operations on input data so a way to do this with a script would be great.

โŒ
โŒ