โŒ

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

MySQL Database Performance Decimated After Server Migration

I have a CRM that I've built over the past couple of years that has been working fine. The database is MySQL, and the front end is a web interface. This week, we 'upgraded' our hosting plan from a shared server to a cloud server, and immediately we saw a complete collapse of performance on some queries. Routine searches that used to take under a second started taking several minutes and causing the site to crash. I feel like the company is gaslighting me, telling me that it's an issue with needing to optimize my queries, and not addressing the fact that this drop in performance happened right after they moved our website and database to a different server.

Here is an example query that runs from one of our most frequently accessed pages.

SELECT A.ID, H.Member_Type, H.Member_Subtype, L.Current_Status AS Job_Status
FROM Member_ID A 
LEFT JOIN Member_ID_Status A2 ON A.First_Status = A2.ID 
LEFT JOIN Member_Name B ON A.MemberID = B.MemberID AND B.Current_Status = 'Active' AND B.Preferred = 'Yes' 
LEFT JOIN Member_Election_Districts ET ON A.MemberID = ET.MemberID 
LEFT JOIN Member_Addresses C ON A.MemberID = C.MemberID AND C.Current_Status = 'Active' AND C.Preferred = 'Yes' 
LEFT JOIN Member_Phones D ON A.MemberID = D.MemberID AND D.Current_Status = 'Active' AND D.Preferred = 'Yes' 
LEFT JOIN Member_Emails F ON A.MemberID = F.MemberID AND F.Current_Status = 'Active' AND F.Preferred = 'Yes' 
LEFT JOIN Member_Current_Parameters H ON A.MemberID = H.MemberID 
LEFT JOIN Member_SQ HH ON A.MemberID = HH.MemberID AND HH.SQID='0' AND HH.Current='Yes' 
LEFT JOIN Member_Language I ON A.MemberID = I.MemberID AND I.Current_Status = 'Active' AND I.Preferred = 'Yes' 
LEFT JOIN Member_Leaders J ON A.MemberID = J.MemberID AND J.Current_Status = 'Active' 
LEFT JOIN COPE_ID K ON A.MemberID = K.MemberID AND K.Current_Status = 'Active' 
LEFT JOIN Job_ID L ON A.MemberID = L.MemberID 
LEFT JOIN Job_Location MM ON L.JobID = MM.JobID AND MM.Current_Status = 'Active' 
LEFT JOIN WS_Work_Area MMM ON MM.Worksite_Area = MMM.ID 
LEFT JOIN Job_Start_Date N ON L.JobID = N.JobID AND N.Current_Status = 'Active' 
LEFT JOIN Job_Termination_Date NN ON L.JobID = NN.JobID AND NN.Current_Status = 'Active' 
LEFT JOIN Job_Type O ON L.JobID = O.JobID AND O.Current_Status = 'Active' 
LEFT JOIN Job_Classification P ON L.JobID = P.JobID AND P.Current_Status = 'Active' 
LEFT JOIN Job_EmployeeID PP ON L.JobID = PP.JobID AND PP.Current_Status = 'Active' 
LEFT JOIN Job_Wage PPP ON L.JobID = PPP.JobID AND PPP.Current_Status = 'Active' 
LEFT JOIN Job_Employer_Division Q ON L.JobID = Q.JobID AND Q.Current_Status = 'Active' 
LEFT JOIN Employer_List R on Q.EmpID = R.ID 
LEFT JOIN WS_Name S ON MM.WS_ID = S.WS_ID AND S.Current_Status = 'Active' AND S.Preferred = 'Yes' 
LEFT JOIN WS_Address T ON MM.WS_ID = T.WS_ID AND T.Current_Status = 'Active' AND T.Preferred = 'Yes' 
LEFT JOIN Job_Shift U ON L.JobID = U.JobID AND U.Current_Status = 'Active' 
LEFT JOIN Job_Weekly_Hours WH ON L.JobID = WH.JobID AND WH.Current_Status = 'Active' 
LEFT JOIN Dues_Billing DB ON L.JobID = DB.JobID 
LEFT JOIN COPE_Billing CB ON L.JobID = CB.JobID 
LEFT JOIN Member_Early_Vote MEV ON A.MemberID = MEV.MemberID AND MEV.Election = '' 
LEFT JOIN Contracts_2_Emp_WS X ON Q.EmpID = X.EmpID AND Q.Division = X.Division AND MM.WS_ID = X.WS_ID AND X.Current_Status = 'Active' 
WHERE 1=1 
AND A.Current_Status = 'Active' 
AND Q.EmpID = '2' 
GROUP BY A.MemberID, L.JobID 
HAVING 1=1 AND Job_Status = 'Active';

and here is the EXPLAIN for this query. It executes in 0.575 sec, in spite of joining about 30 tables together.

EXPLAIN Query 1

One of the returned columns on this query is H.Member_Type. I already have a couple of WHERE conditions on the above query, but if I also add a WHERE condition on H.Member_Type (adding "AND H.Member_Type = 'Member" right before the 'GROUP BY statement), this query suddenly takes between 5 and 20 minutes to complete. I should note that the column I'm adding the WHERE condition to in this second query is indexed. Here is the EXPLAIN for this query.

EXPLAIN Query 2

Adding to the confusion for me, if I choose to add a WHERE condition to another column, something like 'First_Name' which is not an indexed value, the performance of the query is only a little bit slower. (in that case, the query runs in about 2 seconds)

My database does have a lot of tables, and most of my queries do pull from several tables at once, but all join columns are indexed and joins are mostly 1:1, and we've never had performance issues before. I also, for my login on our CRM, display query times on the screen, so every day I'm seeing query times on these types of searches, and the worse I'll see on a complex query with lots of conditions will maybe be 3 or 4 seconds, but simple queries like these never take more than a second.

I've tried rebuilding indexes on some of the tables that seemed to be causing problems. That's had no noticeable impact so far.

I've been testing peeling away some of the joins from my query just to see if I can tell if any other specific tables are causing a problem. If I pull a lot of the joins away I can eventually get a query to run quickly with the WHERE condition on Member_Type, but it's inconsistent.

I've mostly been fighting with our hosting company, as I'm convinced that they did something wrong in the migration, or that the server they moved us onto is the problem. But even if the server was under resourced, it doesn't make sense to me that a query with identical joins would go from 0.5 seconds to over 17 minutes in the screenshot above, simply by adding a basic WHERE condition. My problem is I don't know what the potential causes could be, so I don't know what to ask them to check, and they're acting like this is all just a matter of overly complex queries on my part. I feel like I'm going insane.

ADDITIONAL INFO BASE ON COMMENTS/QUESTIONS

Sorry for burying some of these in the comments below, here is some additional info requested:

Both the old shared server and the new cloud server run MySQL 8.0, and they informed me that the MySQL configurations are unified and centrally managed, so there should have been no changes in MySQL configuration -- The shared server they are now telling me ran on 40 CPU cores with 72GB of RAM, and our cloud server has 4 cores and 8GB of RAM w/ 40GB of SSD space. I asked before migrating to make sure that I wouldn't see a slowdown and they told me that it would, in fact, be faster (apparently I shouldn't have taken their word for that!). Web server and DB server are on the same VM

The 'GROUP BY' does not reduce the resulting rows by much. 81 rows without the GROUP BY statement, 79 rows with.

The dataset should fit in the buffer pool without any problem - to highlight the issue here, I deliberately chose a query that was already narrowed down to a fairly small number of return rows. Many other queries on our site are still working fine - ones returning many thousands of rows and which include a lot of nested queries and wildcard search criteria, etc. We don't have tables with millions of records, we have one or two tables with a few hundred thousand records, but none of the tables involved in this query has over 50,000.

UPDATE

I made some changes to my database structure and got this query working, but my underlying concerns still remain.

In my previous structure, each piece of job data exists in it's own table (wage, start date, etc.). This is to make it less costly to maintain a full history of updates and changes, adding a new value for a field only requires a few columns of data in a table, not duplicating all of the data for an entire job, and it also lets us store status update data relative to every change in information. The main table that tracks jobs (Job_ID), just has the JobID # which is used to pull all of the relevant data for that job when we search for it. The queries have always been very quick.

What I did now that I'm faced with the limitations of the new server, is I added another dozen columns to the Job_ID table to hold the current active values for all data relative to the job, and updated the related queries that are called for job updates to not only change the values in the individual tables, but also to update each piece of information on the relevant job in the Job_ID table as well. I kinda hate it because it means I'm storing data in that table that also exists elsewhere, which feels like an unnecessary duplicate use of space, but it did get the problematic query down from 5+ minutes to 0.2 seconds, which is a win I guess.

What still makes me concerned is whether the reduce cores and RAM on the cloud server can explain the difference between running the original query I highlighted in under a second and taking >5 min when adding another specific WHERE condition? It seems like, if the new setup was going to be slower, it would be more universally slow, as opposed to running some queries fine, and then crashing on other specific ones. So my concern is that maybe I've sidestepped an issue with my 'fix' here, but that issue is still there and could raise it's head again on another query/table/process?

Optimize PostgreSQL query to insert users into the database

  • The use case goes like this.
  • There is a signup endpoint on which the user will submit their email and password
  • 4 rows have to be created at once with a couple of ids generated
  • The tables look like this

authentication_types

id name
uuid varchar
primary key
0aa4d9a9-e024-4792-bc41-36a4f3528d36 password

accounts

id email password ...few other columns authentication_type_id
uuid varchar varchar uuid
primary key unique foreign key to authentication_types
7a9d912a-69ab-4615-9058-e1bb1c4e36c5 password ... ... 0aa4d9a9-e024-4792-bc41-36a4f3528d36

users

id enabled
uuid boolean
primary key
fc9ca826-63dc-43b8-97b6-2e949ffd8a30 true

user_accounts

id account_id user_id
uuid uuid uuid
primary key foreign key to accounts foreign key to users
bd4b338f-1b5a-4b24-9908-e5cfb4080dd4 7a9d912a-69ab-4615-9058-e1bb1c4e36c5 fc9ca826-63dc-43b8-97b6-2e949ffd8a30

verification_tokens

id expires token account_id
uuid timestamptz varchar uuid
primary key unique foreign key to accounts
865a6389-67ea-4e38-a48f-9f4b60ffe816 ... ... 7a9d912a-69ab-4615-9058-e1bb1c4e36c5

When a new signup happens I want to do the following things

  • Generate a uuid and insert a row into accounts
  • Generate a uuid and insert a row into users
  • Generate a uuid + Take the id of the inserted row from accounts and users and insert into user_accounts
  • Generate a uuid + token + account id generated above and insert into verification_tokens

This is the query I came up with after a lot of thought. Get the authentication type id for say password and run a CTE with some uuid generation in every step. Is there a way to optimize this any further?

WITH account_data(id, email, password, authentication_type_id) AS (
      VALUES( gen_random_uuid()
             ,:email
             ,:password
             ,(SELECT id 
               FROM authentication_types 
               WHERE name = :authenticationTypeName) ) )
,ins1(user_id) AS (
      INSERT INTO users(id, enabled) 
      VALUES( gen_random_uuid()
             ,true)
      RETURNING id AS user_id )
,ins2(account_id) AS (
      INSERT INTO accounts (id, email, password, authentication_type_id) 
      SELECT id
            ,email
            ,password
            ,authentication_type_id 
      FROM account_data 
      RETURNING id AS account_id )
,ins3 AS (
      INSERT INTO user_accounts (id, account_id, user_id) 
      VALUES( gen_random_uuid()
             ,(SELECT account_id 
               FROM ins2)
             ,(SELECT user_id 
               FROM ins1)       ) )
INSERT INTO verification_tokens (id, token, account_id) 
VALUES( gen_random_uuid()
       ,:token
       ,(SELECT account_id 
         FROM   ins2)     ) 
RETURNING (SELECT account_id FROM ins2) AS id

If there is a faster method to execute the above query, I would be glad to hear. Thank you for your help in advance

How to lower LCP loading time

I hope you're well.

I am trying to reduce my LCP time on mobile devices, Google speed insights LCP on mobile is 3.7 seconds and webpage tests saying around 2.2 s On dektop its around 1s to fully load the website.

Website is hosted on hostinger "HOSTING PRO" plan. I have literally tried everything with no luck. I have WP-ROCKET installed along with cloudflare.

This is my first website that I have created and have no previous coding experience etc.. Website is purely built using youtube tutorials and a lot of trial and error.

We're currently running facebook ads, and long loading time started to affect our landing page views as the website doesent load fast enough using mobile.

I have attached a waterfall to give you a better idea of what's going on.

I can see that the fonts are loaded quite early and takes around 500ms to load, is there a way to reduce this?

Any feedback of how can i reduce the LCP would be very much apperciated!

Link to full report: https://www.webpagetest.org/result/240422_BiDc4F_CXD/

Tried defering and preloading plugins and javascripts,

Tried using youtube totorials, instaling wp-wocket and CDN cloudflare

How can I speed up the processing of my nested for loops for a giant 3D numpy array?

I created a very large 3D numpy array called tr_mat. The shape of tr_mat is:

tr_mat.shape
(1024, 536, 21073)

Info on the 3D numpy array: First and before going into the actual code, I would like to clarify what I am attempting to do. As can be seen from tr_mat.shape, the 3D numpy array contains numeric values in 1024 rows and 536 columns. That is, we have 536 * 1024 = 548864 values in each of the 21073 matrices.

Conceptual background about my task: Each of the 21073 2D numpy arrays within the 3D numpy array contains grayscaled pixel values from an image. The 3D numpy array tr_mat is already transposed, because I would like to construct a time-series based on identical pixel positions across all 21073 matrices. Finally, I would like to individually save each of the resulting 548864 time-series in a .1D textfile. (Hence, I would end up with saving 548864 .1D textfiles.)

The relevant part of the code:

tr_mat = frame_mat.transpose() # the tranposed 3D numpy array
# Save
rangeh = range(0, 536)
for row, row_n_l in zip(tr_mat, rangeh): # row = pixel row of the 2D image
        for ts_pixel, row_n in zip(row, rangeh): # ts_pixel = the pixel time-series across the 3D array (across the single 2D arrays)
        # Save
        with open(f"/volumes/.../TS_Row{row_n_l}_Pixel{row_n}.1D", "w") as file:
                for i in ts_pixel: file.write(f"{i}\n") # Save each time-series value per row

Question: Could you provide me some tips how to modify my code in order to speed it up? I wrapped tqdm around the first for loop to check how fast the nested loop is processed, and it took around 20 minutes to reach ~120 of 536 rows. Also, it seems to me that the loop gets slower and slower as the iterations go up.

Comparing efficiency of two queries

I am learning SQL, and while working through guided examples, I like to attempt writing my own query to answer a question before seeing how the lesson suggests doing it. In this case, I used a somewhat different approach, and I'm interested in comparing my method to the given example.

The guided example uses a public dataset available on BigQuery called bigquery-public-data.new_york_citibike. The problem posed is to find Citibike rides with trip durations that most exceeded the average trip duration for its starting bike station.

Here is the code suggested in the guided example:

SELECT
  starttime,
  start_station_id,
  tripduration,
  (
    SELECT ROUND(AVG(tripduration),2)
    FROM bigquery-public-data.new_york_citibike.citibike_trips
    WHERE start_station_id = outer_trips.start_station_id
  ) AS avg_duration_for_station,
  ROUND(tripduration - (
    SELECT AVG(tripduration)
    FROM bigquery-public-data.new_york_citibike.citibike_trips
    WHERE start_station_id = outer_trips.start_station_id), 2) AS difference_from_avg
FROM bigquery-public-data.new_york_citibike.citibike_trips AS outer_trips
ORDER BY difference_from_avg DESC
LIMIT 25

And here is the code I came up with before looking at their solution:

SELECT
  starttime,
  start_station_id,
  tripduration,
  station_averages.station_average AS station_avg,
  ROUND (tripduration - station_averages.station_average,2) AS diff_from_station_avg
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
  JOIN (
      SELECT
        start_station_id AS station_id,
        ROUND(AVG(tripduration),2) as station_average
      FROM `bigquery-public-data.new_york_citibike.citibike_trips`
      GROUP BY station_id
    ) AS station_averages
  ON start_station_id = station_averages.station_id
ORDER BY 5 DESC
LIMIT 25

Looking at my code versus theirs, I thought mine would run more quickly, because I'm only calculating the average trip duration for each station once, when making the JOIN table. Their code, on the other hand, calculates each station average as many times as that station appears in the table.

In order to compare the two approaches, I put them each in a BigQuery tab, and after running them and verifying that they do the same thing, I looked at the "Job Information" for each query. The first one, supplied by the course I'm taking took "0 seconds", and mine took "1 second".

In order to see the comparison better, I tried editing the final line of each query to

LIMIT 250

...and then 2500, and 25000, etc. Their query kept running just about 1 second faster than mine, until I increased the limit all the way to 2.5 million. Then, mine ran in 17 seconds, and theirs ran in 18 seconds.

Can anyone help me understand what's going on here? Was my understanding of the code correct when I guessed that my query was more efficient? Why did the processing time scale differently for theirs and mine? Why does changing the limit even affect the time, when it has to calculate all results for every row in the table before it can ORDER BY column 5?

I hope these questions make sense. Thanks in advance for any insignts.

SQL Server query executes very slow [closed]

I have the following query that takes minutes to get executed. When I started writing this query, uptil TVVA5.VVA_VAL it was OK but when I introduced TVVA6 it became slow and when I introduced TVVA7 it became slower and keeps becoming slower as I add TVVA columns. I found that until any 5 TVVA columns it works well. Any idea to optimize this query.

SELECT 
                [TCRD].[CRD_REQ_ID] AS [requestId],
                [TCTP].[CTP_CDE] AS cardType,
                [TCHD].[CHD_COD_EXT] AS codeCardHolder,
                [TCHD].[CHD_FRST_NAMES] AS firstNames,
                [TCHD].[CHD_INI] AS initials,
                [TCHD].[CHD_PFX_LST_NAME] AS prefixLastName,
                [TCHD].[CHD_LST_NAME] AS lastName,
                [TCHD].[CHD_TTL_PFX] AS titlePrefix,
                [TCHD].[CHD_TTL_SFX] AS titleSuffix,
                [TCHD].[CHD_DOB] dateOfBirth,
                [TCRD].[CRD_VAL_DTE] AS cardExpiryDate,
                [TCRD].[CRD_ISS_DTE] AS cardIssueDate,
                [TCHD].[CHD_NAT_CODE] AS natCode,
                [TCHD].[CGD_GDR_CDE] AS genderCode,

                [TPIC].[PIC_VAL] AS picture,
                [TSIG].[SIG_VAL] AS [signature],

                [TCRD].[CRD_NAME_ON_CARD] AS nameOnCard,
                [TORG].[ORG_CDE] AS organizationCode,
                [TNAT].[NAT_DESC_AR] AS nationalityArabic,
                TORG.ORG_FULL_NAME issuingAuthority,

                TVVA1.VVA_VAL nameArabic,
                TVVA2.VVA_VAL docmentType,
                TVVA3.VVA_VAL docmentNumber,
                TVVA4.VVA_VAL passportNumber,
                TVVA5.VVA_VAL phoneNumber,
                TVVA6.VVA_VAL professionEnglish,
                TVBV1.VBV_VAL passportImage,
                TVVA7.VVA_VAL cardPersonalizationDate,
                TVVA8.VVA_VAL printerSerialNumber,
                TVVA9.VVA_VAL placeOfBirthArabic,
                TVVA10.VVA_VAL addressInQatarArabic,
                TVVA11.VVA_VAL sponsorNameEnglish,
                TVVA12.VVA_VAL sponsorNameArabic,
                TVVA13.VVA_VAL residencyType
            FROM        
                TCHD                
            INNER JOIN
                TCRD
            ON
                [TCHD].[CHD_ID] = [TCRD].[CRD_CHD_ID]
            INNER JOIN
                TCTP
            ON
                [TCRD].[CRD_CTP_ID] = [TCTP].[CTP_ID]
            INNER JOIN
                TNAT
            ON
                [TCHD].[CHD_NAT_CODE] = [TNAT].[NAT_CODE]
            INNER JOIN
                TORG
            ON
                [TCRD].[CRD_ORG_ID] = [TORG].[ORG_ID]
            INNER JOIN
                TPIC                        
                cross apply (select [TPIC].[PIC_VAL] AS '*' for xml path('')) P ([picture])
            ON
                [TCRD].[CRD_ID] = [TPIC].[PIC_CRD_ID]
            INNER JOIN              
                TSIG
                cross apply (select [TSIG].[SIG_VAL] AS '*' for xml path('')) S ([signature])
            ON
                [TCRD].[CRD_ID] = [TSIG].[SIG_CRD_ID]
            INNER JOIN
                TVVA TVVA1
            ON
                TVVA1.VVA_PK_VAL = TCHD.CHD_ID 
            AND
                TVVA1.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'NAME_ARABIC' )
            INNER JOIN
                TVVA TVVA2
            ON
                TVVA2.VVA_PK_VAL = TCHD.CHD_ID 
            AND 
                TVVA2.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'DOCUMENT_TYPE' )
            INNER JOIN
                TVVA TVVA3
            ON
                TVVA3.VVA_PK_VAL = TCHD.CHD_ID
            AND 
                TVVA3.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'DOCUMENT_NUMBER' )
            INNER JOIN
                TVVA TVVA4
            ON
                TVVA4.VVA_PK_VAL = TCHD.CHD_ID
            AND 
                TVVA4.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'PASSPORT_NUMBER' )
            INNER JOIN
                TVVA TVVA5
            ON
                TVVA5.VVA_PK_VAL = TCHD.CHD_ID
            AND 
                TVVA5.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'PHONE_NUMBER' )
            INNER JOIN
                TVVA TVVA6
            ON
                TVVA6.VVA_PK_VAL = TCHD.CHD_ID
            AND 
                TVVA6.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'PROFESSION_ENGLISH' )
            INNER JOIN
                TVVA TVVA7
            ON
                TVVA7.VVA_PK_VAL = TCRD.CRD_ID
            AND 
                TVVA7.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'CARD_PERSONALIZATION_DATE' )
            INNER JOIN
                TVBV TVBV1
                cross apply (select TVBV1.VBV_VAL AS '*' for xml path('')) PP ([passportImage])
            ON
                TVBV1.VBV_PK_VAL = TCHD.CHD_ID      
            AND 
                TVBV1.VBV_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'PASSPORT_IMAGE' )
            INNER JOIN
                TVVA TVVA8
            ON
                TVVA8.VVA_PK_VAL = TCRD.CRD_ID
            AND 
                TVVA8.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'PRINTER_SERIAL_NUMBER' )
            INNER JOIN
                TVVA TVVA9
            ON
                TVVA9.VVA_PK_VAL = TCHD.CHD_ID
            AND 
                TVVA9.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'PLACE_OF_BIRTH_ARABIC' )
            INNER JOIN
                TVVA TVVA10
            ON
                TVVA10.VVA_PK_VAL = TCHD.CHD_ID
            AND 
                TVVA10.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'ADDRESS_IN_QATAR_ARABIC' )
            INNER JOIN
                TVVA TVVA11
            ON
                TVVA11.VVA_PK_VAL = TCHD.CHD_ID
            AND 
                TVVA11.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'SPONSOR_NAME_ENGLISH' )
            INNER JOIN
                TVVA TVVA12
            ON
                TVVA12.VVA_PK_VAL = TCHD.CHD_ID
            AND 
                TVVA12.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'SPONSOR_NAME_ARABIC' )
            INNER JOIN
                TVVA TVVA13
            ON
                TVVA13.VVA_PK_VAL = TCHD.CHD_ID
            AND 
                TVVA13.VVA_DVR_ID = (SELECT TDVR.DVR_ID FROM TDVR WHERE TDVR.DVR_NAME = 'RESIDENCY_TYPE' )
            WHERE                                       
                [TCRD].[CRD_REQ_ID] = 10720

Is there a significant performance difference between importing an object that maps icons and importing icons individually?

My team is using @mui/icons-material in our React project and we're currently importing the same icons to represent the same things in many different components.

Component A

import { Group as IconGroup, Person as IconPerson } from '@mui/icons-material';

return <div>...<IconPerson />...<IconGroup /></div>;

Component B

import { Group as IconGroup, Person as IconPerson } from '@mui/icons-material';

return <div>...<IconPerson />...<IconGroup /></div>;

I'm thinking of creating an object that maps all of our icons and referencing that instead.

icons.ts

import { Group as IconGroup } from '@mui/icons-material';
import { Person as IconPerson } from '@mui/icons-material';

export const icons = {
  collection: <IconGroup />,
  user: <IconPerson />,
}

Component A

import { icons } from 'icons';

return <div>...{icons.user}... {icons.collection}</div>;

Component B

import { icons } from 'icons';

return <div>...{icons.user}... {icons.collection}</div>;

The advantage is that if we ever want to move away from @mui/icons-material or just change an icon, we only need to change it in one place and it would be reflected across all components.

But I am concerned importing a rather large object into so many components may degrade performance. I'm not sure how ESM modules handle that, or if that's even a consideration.

In terms of performance, is it better to import the icons individually? Is there a significant difference when importing a shared object that maps them to keys?

Efficient way to use apply a function on Pandas rows [duplicate]

I am looking for an efficient way to apply a function on each row of a dataframe to perform some operation and repeat the row by a number defined in other column. Currently, I am doing it by iterate on each row, but it takes too long on a large dataframe.

Sample code is as below:

`import pandas as pd
def my_func(row):
    row = row.to_frame().T
    repeated_row = row.loc[row.index.repeat(row['col2'])]
    return repeated_row
df = pd.DataFrame(data = {'col1':list('abc'),
                          'col2': [2,2,3]})
df_comb = pd.DataFrame()
for i, row in df.iterrows():
    df_rep = my_func(row)
    df_comb = pd.concat([df_comb, df_rep], axis=0)`

However, I want a solution that's not using the for loop as above and I couldn't find an answer for this historically. I imagine there will be an equivalent way to use "apply" function to this df, such as:

df_comp = pd.concat([df.apply(lambda row: my_func(row)), axis=1], axis=0)

But at the moment this syntax does not work properly.

Much appreciated if you could point out the correct solution.

Fetch SQL table data once & keep running query on that table data locally instead of connecting to db every time in PHP

I'm trying to reduce the number of query done to database via PHP for a page that it repeatedly running query inside loop to fetch data.

Is there any way I can fetch the SQL table data locally once, and keep running query (if any) on that locally available pool of table data instead of having to connect to DB every time? This is causing lot of delay in page loading. Looking to optimize the same by reducing the DB connections.

Couldn't find any solution on my research. Thought of using SQL Views but they too depend on frequent connections to DB. I'm looking to fetch data at once on page load, and keep running my other queries on the same selected data over and over instead of connecting to DB every time. The nested loops are taking too long and affecting the performance.

Put query hint (OPTION) into view in SQL Server

I have an SQL query on a view using several joins that is occasionally running really slow - a lot slower than normal, making the query nearly unusable.

I copied the query out of the view and experimented and found a solution at https://dba.stackexchange.com/a/60180/52607 - if I add

OPTION (MERGE JOIN, HASH JOIN)

to the end of the query, it is running ~6x faster.

I now tried to adapt the OPTION to the original view, but SQL Server/SSMS tells me

Incorrect syntax near the keyword 'OPTION'.

How can I add this option to the view so that the resulting query of the view is just as fast?

(Adding the option to the query on the view did not result in any speedup. This looked like this:

select * from vMyView
where SomeDate >= CONVERT(Datetime, '2017.09.20')
OPTION (MERGE JOIN, HASH JOIN)

I think I would have to use this option directly for the vMyView - if possible.)

Optimizing simple but slow query with OR condition

Some background: I'm running the following simple select statement on Postgres 15.3 server with 128GB memory, and contrary to my belief it takes ~6 minutes. The statement in involve two relations, big_table_70m with approximately 70M rows and other_table_50m with a bit less than 50M rows. In both tables I've Btree indexes on esat_last_modified and last_modified.

This takes 6 minutes to return:

select
    count(*)
FROM
    big_table_70m
where
    big_table_70m.esat_last_modified > (
        select
            max(esat_last_modified)
        from
            other_table_50m
    )
    OR big_table_70m.task_last_modified > (
        select
            max(last_modified)
        from
            other_table_50m
    );

However when dropping the OR condition and performing the query on either side, it returns very fast (>70ms):

select
    count(*)
FROM
    big_table_70m
where
    big_table_70m.esat_last_modified > (
        select
            max(esat_last_modified)
        from
            other_table_50m
    );

I've tried to add multi-column index on (esat_last_modified,last_modified) to make it more efficient without a success.

Can you please help me improve this query. Eventually the return count should be few thousands of rows.

explain analyze:

                                                                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1765894.63..1765894.64 rows=1 width=8) (actual time=403477.117..403477.314 rows=1 loops=1)
   Buffers: shared hit=77762962 read=848381 dirtied=27535
   I/O Timings: shared/local read=380517.629
   InitPlan 2 (returns $1)
     ->  Result  (cost=0.70..0.71 rows=1 width=8) (actual time=0.028..0.029 rows=1 loops=1)
           Buffers: shared hit=6
           InitPlan 1 (returns $0)
             ->  Limit  (cost=0.56..0.70 rows=1 width=8) (actual time=0.026..0.027 rows=1 loops=1)
                   Buffers: shared hit=6
                   ->  Index Only Scan Backward using f9b1917d68a5656ca14b2f5e12447c6a on other_table_50m  (cost=0.56..4671638.13 rows=33810205 width=8) (actual time=0.025..0.026 rows=1 loops=1)
                         Index Cond: (esat_last_modified IS NOT NULL)
                         Heap Fetches: 1
                         Buffers: shared hit=6
   InitPlan 4 (returns $3)
     ->  Result  (cost=0.67..0.68 rows=1 width=8) (actual time=0.013..0.014 rows=1 loops=1)
           Buffers: shared hit=5
           InitPlan 3 (returns $2)
             ->  Limit  (cost=0.56..0.67 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=1)
                   Buffers: shared hit=5
                   ->  Index Only Scan Backward using "429b05d0a626f21b4708244ab67da408" on other_table_50m other_table_50m_1  (cost=0.56..5092889.03 rows=47176591 width=8) (actual time=0.013..0.013 rows=1 loops=1)
                         Index Cond: (last_modified IS NOT NULL)
                         Heap Fetches: 1
                         Buffers: shared hit=5
   ->  Gather  (cost=1765893.02..1765893.23 rows=2 width=8) (actual time=403477.112..403477.306 rows=1 loops=1)
         Workers Planned: 2
         Params Evaluated: $1, $3
         Workers Launched: 0
         Buffers: shared hit=77762962 read=848381 dirtied=27535
         I/O Timings: shared/local read=380517.629
         ->  Partial Aggregate  (cost=1764893.02..1764893.03 rows=1 width=8) (actual time=403476.429..403476.429 rows=1 loops=1)
               Buffers: shared hit=77762951 read=848381 dirtied=27535
               I/O Timings: shared/local read=380517.629
               ->  Parallel Index Only Scan using multi_column_index on big_table_70m  (cost=0.57..1724191.80 rows=16280489 width=0) (actual time=403476.425..403476.425 rows=0 loops=1)
                     Filter: ((esat_last_modified > $1) OR (task_last_modified > $3))
                     Rows Removed by Filter: 70394276
                     Heap Fetches: 6894859
                     Buffers: shared hit=77762951 read=848381 dirtied=27535
                     I/O Timings: shared/local read=380517.629
 Planning:
   Buffers: shared hit=352
 Planning Time: 0.949 ms
 Execution Time: 403477.593 ms
(42 rows)

Here is a link to depesz It shows that significant amount of time is due to IO (reading from the disk).

Added multi-column index on big_table_70m without a success. I've also rewrite the select to use CTE but obviously it didn't make any change to the plan.

I expect the statement to return in the same speed as without the OR condition.

Efficiency of while(true) ServerSocket Listen

I am wondering if a typical while(true) ServerSocket listen loop takes an entire core to wait and accept a client connection (Even when implementing runnable and using Thread .start())

I am implementing a type of distributed computing cluster and each computer needs every core it has for computation. A Master node needs to communicate with these computers (invoking static methods that modify the algorithm's functioning).

The reason I need to use sockets is due to the cross platform / cross language capabilities. In some cases, PHP will be invoking these java static methods.

I used a java profiler (YourKit) and I can see my running ServerSocket listen thread and it never sleeps and it's always running. Is there a better approach to do what I want? Or, will the performance hit be negligible?

Please, feel free to offer any suggestion if you can think of a better way (I've tried RMI, but it isn't supported cross-language.

โŒ
โŒ