โŒ

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Have a trajectory optimization problem posed as NLP. How to solve it with Gekko? Currently facing "Exception: @error: Max Equation Length" error

I have a trajectory optimization problem posed as a Non linear program. The states x and controls u for each time step are concatenated into a single vector. By using indexing to extract the appropriate x's and u's, I want to add constraints for dynamics and also create an objective function which is the sum of x'Qx +u'Ru for each time step.I also have an initial condition and goal condition to be set as constraints

At the moment, the following is my code where I have created an array Z but to enforce the initial and goal conditions, I had to go element by element. Even though it works,is there a better way to do this?

nx = 13
nu = 4
N = int(self.tf/self.h)+1
         
Q = np.diag(np.ones(nx))
R = 0.1*np.diag(np.ones(nu))
Qf = 10*np.diag(np.ones(nx))    

#Initialize Model
m = GEKKO(remote=False)
#Set global options
m.options.IMODE = 3 #steady state optimization   
m.options.SOLVER = 3             
Z = m.Array(m.Var,self.nz,value=0,lb=-1e5,ub=1e5)

#Set upper and lower bounds: 1) U_lb = 0 
for i in np.arange(0,N-1):
    for j in self.u_index[:,i]:
        Z[j].lower = 0

for count, ele in enumerate(self.x_index[:,0]):
        print(Z[ele],xic[count])
        m.Equation(Z[ele]==xic[count])

for count, ele in enumerate(self.x_index[:,N-1]):
        print(Z[ele],xgoal[count])
        m.Equation(Z[ele]==xgoal[count])

#Objective
J = []
for i in np.arange(0,N-1):
        x = Z[self.x_index[:,i]]
        u = Z[self.u_index[:,i]]
        J.append(0.5*(x-xgoal).T@Q@(x-xgoal) + 0.5*u.T@R@u)
        
#Solve simulation          
m.Minimize(J)     
m.solve()

My cost function is currently the main issue as it causes the following error: Exception: @error: Max Equation Length" error. Any suggestions on how to tackle this? I am aware there is another method of solving optimal control problems using Gekko but I would like to try to solve it as an NLP

MySQL Database Performance Decimated After Server Migration

I have a CRM that I've built over the past couple of years that has been working fine. The database is MySQL, and the front end is a web interface. This week, we 'upgraded' our hosting plan from a shared server to a cloud server, and immediately we saw a complete collapse of performance on some queries. Routine searches that used to take under a second started taking several minutes and causing the site to crash. I feel like the company is gaslighting me, telling me that it's an issue with needing to optimize my queries, and not addressing the fact that this drop in performance happened right after they moved our website and database to a different server.

Here is an example query that runs from one of our most frequently accessed pages.

SELECT A.ID, H.Member_Type, H.Member_Subtype, L.Current_Status AS Job_Status
FROM Member_ID A 
LEFT JOIN Member_ID_Status A2 ON A.First_Status = A2.ID 
LEFT JOIN Member_Name B ON A.MemberID = B.MemberID AND B.Current_Status = 'Active' AND B.Preferred = 'Yes' 
LEFT JOIN Member_Election_Districts ET ON A.MemberID = ET.MemberID 
LEFT JOIN Member_Addresses C ON A.MemberID = C.MemberID AND C.Current_Status = 'Active' AND C.Preferred = 'Yes' 
LEFT JOIN Member_Phones D ON A.MemberID = D.MemberID AND D.Current_Status = 'Active' AND D.Preferred = 'Yes' 
LEFT JOIN Member_Emails F ON A.MemberID = F.MemberID AND F.Current_Status = 'Active' AND F.Preferred = 'Yes' 
LEFT JOIN Member_Current_Parameters H ON A.MemberID = H.MemberID 
LEFT JOIN Member_SQ HH ON A.MemberID = HH.MemberID AND HH.SQID='0' AND HH.Current='Yes' 
LEFT JOIN Member_Language I ON A.MemberID = I.MemberID AND I.Current_Status = 'Active' AND I.Preferred = 'Yes' 
LEFT JOIN Member_Leaders J ON A.MemberID = J.MemberID AND J.Current_Status = 'Active' 
LEFT JOIN COPE_ID K ON A.MemberID = K.MemberID AND K.Current_Status = 'Active' 
LEFT JOIN Job_ID L ON A.MemberID = L.MemberID 
LEFT JOIN Job_Location MM ON L.JobID = MM.JobID AND MM.Current_Status = 'Active' 
LEFT JOIN WS_Work_Area MMM ON MM.Worksite_Area = MMM.ID 
LEFT JOIN Job_Start_Date N ON L.JobID = N.JobID AND N.Current_Status = 'Active' 
LEFT JOIN Job_Termination_Date NN ON L.JobID = NN.JobID AND NN.Current_Status = 'Active' 
LEFT JOIN Job_Type O ON L.JobID = O.JobID AND O.Current_Status = 'Active' 
LEFT JOIN Job_Classification P ON L.JobID = P.JobID AND P.Current_Status = 'Active' 
LEFT JOIN Job_EmployeeID PP ON L.JobID = PP.JobID AND PP.Current_Status = 'Active' 
LEFT JOIN Job_Wage PPP ON L.JobID = PPP.JobID AND PPP.Current_Status = 'Active' 
LEFT JOIN Job_Employer_Division Q ON L.JobID = Q.JobID AND Q.Current_Status = 'Active' 
LEFT JOIN Employer_List R on Q.EmpID = R.ID 
LEFT JOIN WS_Name S ON MM.WS_ID = S.WS_ID AND S.Current_Status = 'Active' AND S.Preferred = 'Yes' 
LEFT JOIN WS_Address T ON MM.WS_ID = T.WS_ID AND T.Current_Status = 'Active' AND T.Preferred = 'Yes' 
LEFT JOIN Job_Shift U ON L.JobID = U.JobID AND U.Current_Status = 'Active' 
LEFT JOIN Job_Weekly_Hours WH ON L.JobID = WH.JobID AND WH.Current_Status = 'Active' 
LEFT JOIN Dues_Billing DB ON L.JobID = DB.JobID 
LEFT JOIN COPE_Billing CB ON L.JobID = CB.JobID 
LEFT JOIN Member_Early_Vote MEV ON A.MemberID = MEV.MemberID AND MEV.Election = '' 
LEFT JOIN Contracts_2_Emp_WS X ON Q.EmpID = X.EmpID AND Q.Division = X.Division AND MM.WS_ID = X.WS_ID AND X.Current_Status = 'Active' 
WHERE 1=1 
AND A.Current_Status = 'Active' 
AND Q.EmpID = '2' 
GROUP BY A.MemberID, L.JobID 
HAVING 1=1 AND Job_Status = 'Active';

and here is the EXPLAIN for this query. It executes in 0.575 sec, in spite of joining about 30 tables together.

EXPLAIN Query 1

One of the returned columns on this query is H.Member_Type. I already have a couple of WHERE conditions on the above query, but if I also add a WHERE condition on H.Member_Type (adding "AND H.Member_Type = 'Member" right before the 'GROUP BY statement), this query suddenly takes between 5 and 20 minutes to complete. I should note that the column I'm adding the WHERE condition to in this second query is indexed. Here is the EXPLAIN for this query.

EXPLAIN Query 2

Adding to the confusion for me, if I choose to add a WHERE condition to another column, something like 'First_Name' which is not an indexed value, the performance of the query is only a little bit slower. (in that case, the query runs in about 2 seconds)

My database does have a lot of tables, and most of my queries do pull from several tables at once, but all join columns are indexed and joins are mostly 1:1, and we've never had performance issues before. I also, for my login on our CRM, display query times on the screen, so every day I'm seeing query times on these types of searches, and the worse I'll see on a complex query with lots of conditions will maybe be 3 or 4 seconds, but simple queries like these never take more than a second.

I've tried rebuilding indexes on some of the tables that seemed to be causing problems. That's had no noticeable impact so far.

I've been testing peeling away some of the joins from my query just to see if I can tell if any other specific tables are causing a problem. If I pull a lot of the joins away I can eventually get a query to run quickly with the WHERE condition on Member_Type, but it's inconsistent.

I've mostly been fighting with our hosting company, as I'm convinced that they did something wrong in the migration, or that the server they moved us onto is the problem. But even if the server was under resourced, it doesn't make sense to me that a query with identical joins would go from 0.5 seconds to over 17 minutes in the screenshot above, simply by adding a basic WHERE condition. My problem is I don't know what the potential causes could be, so I don't know what to ask them to check, and they're acting like this is all just a matter of overly complex queries on my part. I feel like I'm going insane.

ADDITIONAL INFO BASE ON COMMENTS/QUESTIONS

Sorry for burying some of these in the comments below, here is some additional info requested:

Both the old shared server and the new cloud server run MySQL 8.0, and they informed me that the MySQL configurations are unified and centrally managed, so there should have been no changes in MySQL configuration -- The shared server they are now telling me ran on 40 CPU cores with 72GB of RAM, and our cloud server has 4 cores and 8GB of RAM w/ 40GB of SSD space. I asked before migrating to make sure that I wouldn't see a slowdown and they told me that it would, in fact, be faster (apparently I shouldn't have taken their word for that!). Web server and DB server are on the same VM

The 'GROUP BY' does not reduce the resulting rows by much. 81 rows without the GROUP BY statement, 79 rows with.

The dataset should fit in the buffer pool without any problem - to highlight the issue here, I deliberately chose a query that was already narrowed down to a fairly small number of return rows. Many other queries on our site are still working fine - ones returning many thousands of rows and which include a lot of nested queries and wildcard search criteria, etc. We don't have tables with millions of records, we have one or two tables with a few hundred thousand records, but none of the tables involved in this query has over 50,000.

UPDATE

I made some changes to my database structure and got this query working, but my underlying concerns still remain.

In my previous structure, each piece of job data exists in it's own table (wage, start date, etc.). This is to make it less costly to maintain a full history of updates and changes, adding a new value for a field only requires a few columns of data in a table, not duplicating all of the data for an entire job, and it also lets us store status update data relative to every change in information. The main table that tracks jobs (Job_ID), just has the JobID # which is used to pull all of the relevant data for that job when we search for it. The queries have always been very quick.

What I did now that I'm faced with the limitations of the new server, is I added another dozen columns to the Job_ID table to hold the current active values for all data relative to the job, and updated the related queries that are called for job updates to not only change the values in the individual tables, but also to update each piece of information on the relevant job in the Job_ID table as well. I kinda hate it because it means I'm storing data in that table that also exists elsewhere, which feels like an unnecessary duplicate use of space, but it did get the problematic query down from 5+ minutes to 0.2 seconds, which is a win I guess.

What still makes me concerned is whether the reduce cores and RAM on the cloud server can explain the difference between running the original query I highlighted in under a second and taking >5 min when adding another specific WHERE condition? It seems like, if the new setup was going to be slower, it would be more universally slow, as opposed to running some queries fine, and then crashing on other specific ones. So my concern is that maybe I've sidestepped an issue with my 'fix' here, but that issue is still there and could raise it's head again on another query/table/process?

Similar to a coin change problem, but with "coin" repetitions and another optimization goal

The goal is to get all possible variants of sequences of positive integer numbers from a list L where each number can be used unlimited often (so repetitions are allowed) which if added together give a sum targetSum with the constraint that the amount of used numbers in the generated variant of the sequence is limited to the range between n and m (including n and m).

The code below is what I have came up with up to now, but it runs too slow for the target purpose of being part of an optimization problem:

def allArrangementsOfIntegerItemsInLsummingUpTOtargetSum(L, targetSum, n=None, m=None):
    if n is None:   n  = 1
    if m is None:   m = targetSum
    lenL = len(L)
    # print(f"{targetSum=}, {L=}, {n=}, {m=}")
    Stack           = []
    # Initialize the Stack with the starting point for each element in L
    for i in range(lenL):
        currentSum  =   L[ i ]
        path        = [   L[ i ]   ]
        start       = 0         # Start from 0 allows revisiting of all items
        Stack.append(   (currentSum, path, start )   )  

    while Stack:
        currentSum, path, start = Stack.pop()
        # Check if the current path meets the criteria
        if currentSum == targetSum and n <= len(path) <= m:
            yield path
        if currentSum > targetSum or len(path) > m:
            continue  
        # ^ - NEXT please: stop exploring this path as it's not valid or complete

        # Continue to build the path by adding elements from L, starting from 0 index
        for i in range(len(L)):  # Change start to 0 if you want to include all permutations
            newSum = currentSum + L[ i  ]
            newPath = path + [ L[ i  ]  ]
            Stack.append((newSum, newPath, 0))  # Start from 0 allows every possibility
# def allArrangementsOfIntegerItemsInLsummingUpTOtargetSum
splitsGenerator = allArrangementsOfIntegerItemsInLsummingUpTOtargetSum

Any idea of how to write code able to come up an order of magnitude faster with a result?

I searched the Internet already for weeks and found that all of the knapsack, coin change and dynamic programming based known optimization approaches are not covering such a basic task which special case is to divide a list into partitions with sizes ranging from a to b for the purpose of optimization of an overall weight function which uses values obtained from a local weight function calculating single weights out of the items in each of the partitions.

To give an example let's consider the case of list L = [ 13, 17, 23 ] and the targetSum = 30, with n=1 and m=30. The are two possible sequences to arrive at a sum of 30:

  • [ 13, 17 ]
  • [ 17, 13 ]

Let's consider a list L = [ 125, 126, 127, ... , 130 ] and a targetSum = 6000. The only possible sequence is here:

  • [ 125, 125, ... , 125 ] with a length of 48 elements

What I am interested in is another algorithm able to come up with the results much faster, so the programming language is secondary but preferring Python to describe the algorithm and to be tested against the provided code.

Optimize PostgreSQL query to insert users into the database

  • The use case goes like this.
  • There is a signup endpoint on which the user will submit their email and password
  • 4 rows have to be created at once with a couple of ids generated
  • The tables look like this

authentication_types

id name
uuid varchar
primary key
0aa4d9a9-e024-4792-bc41-36a4f3528d36 password

accounts

id email password ...few other columns authentication_type_id
uuid varchar varchar uuid
primary key unique foreign key to authentication_types
7a9d912a-69ab-4615-9058-e1bb1c4e36c5 password ... ... 0aa4d9a9-e024-4792-bc41-36a4f3528d36

users

id enabled
uuid boolean
primary key
fc9ca826-63dc-43b8-97b6-2e949ffd8a30 true

user_accounts

id account_id user_id
uuid uuid uuid
primary key foreign key to accounts foreign key to users
bd4b338f-1b5a-4b24-9908-e5cfb4080dd4 7a9d912a-69ab-4615-9058-e1bb1c4e36c5 fc9ca826-63dc-43b8-97b6-2e949ffd8a30

verification_tokens

id expires token account_id
uuid timestamptz varchar uuid
primary key unique foreign key to accounts
865a6389-67ea-4e38-a48f-9f4b60ffe816 ... ... 7a9d912a-69ab-4615-9058-e1bb1c4e36c5

When a new signup happens I want to do the following things

  • Generate a uuid and insert a row into accounts
  • Generate a uuid and insert a row into users
  • Generate a uuid + Take the id of the inserted row from accounts and users and insert into user_accounts
  • Generate a uuid + token + account id generated above and insert into verification_tokens

This is the query I came up with after a lot of thought. Get the authentication type id for say password and run a CTE with some uuid generation in every step. Is there a way to optimize this any further?

WITH account_data(id, email, password, authentication_type_id) AS (
      VALUES( gen_random_uuid()
             ,:email
             ,:password
             ,(SELECT id 
               FROM authentication_types 
               WHERE name = :authenticationTypeName) ) )
,ins1(user_id) AS (
      INSERT INTO users(id, enabled) 
      VALUES( gen_random_uuid()
             ,true)
      RETURNING id AS user_id )
,ins2(account_id) AS (
      INSERT INTO accounts (id, email, password, authentication_type_id) 
      SELECT id
            ,email
            ,password
            ,authentication_type_id 
      FROM account_data 
      RETURNING id AS account_id )
,ins3 AS (
      INSERT INTO user_accounts (id, account_id, user_id) 
      VALUES( gen_random_uuid()
             ,(SELECT account_id 
               FROM ins2)
             ,(SELECT user_id 
               FROM ins1)       ) )
INSERT INTO verification_tokens (id, token, account_id) 
VALUES( gen_random_uuid()
       ,:token
       ,(SELECT account_id 
         FROM   ins2)     ) 
RETURNING (SELECT account_id FROM ins2) AS id

If there is a faster method to execute the above query, I would be glad to hear. Thank you for your help in advance

Is it possible for pgfaults to happen during a hardware speculation and if not, do I need a compiler barrier here?

Suppose we have the following code sequence in C:

if (s->a != 1)
   return;
//do we need a compiler barrier here
//to make sure the compiler does not
//reorder access of s->b across s->a?
if (s->b != 2);
   return;
do_sth();

In this case, one thing that is for sure is s->b is only safe to access given that the condition s->a is true (which is possible with -fno-strict-aliasing just as what I said in the comments). So my question is

  1. whether the modern CPU possibly hardware-speculates s->b first?
  2. if it does, how does it typically handle the pgfaults here? By rolling back the speculation?

Furthermore, from the compiler's POV,

  1. would it happen to reorder two if branches here?
  2. if so, is it necessary to add a barrier() between these two if branches or some compiler already has some guidelines for implicitly adding it automatically for some cases?
โŒ
โŒ