最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

matlab - Optimizing two-dimensional interval partitioning for multi-dimensional matrix data - Stack Overflow

matteradmin6PV0评论

There is a four-dimensional matrix data, along with corresponding X coordinates data_X and Y coordinates data_Y for each element. I want to perform a two-dimensional interval segmentation, or grid partitioning, based on edge_X and edge_Y, resulting in a new array data_class. The current loop implementation is very slow. How can I improve the computational speed?

clc;clear
rng("default")
data = randi([0, 100], 40,30,20, 10);
data_X =  randi([0, 100], 40,30,20, 10);
data_Y =  randi([0, 100], 40,30,20, 10);

% Define the edges for histograms in X and Y directions
edge_X = [0:10:100];
edge_Y = [0:20:100];

sz = size(data);
for ix = 1:sz(1)
    for iy = 1:sz(2)
        for ii = 1:sz(3)
            for ij = 1:sz(4)
               % Extract the current data, X, and Y values
                g_data = data(ix, iy, ii, ij);
                g_X = data_X(ix, iy, ii, ij);
                g_Y = data_Y(ix, iy, ii, ij);
    
                % Compute the 2D histogram count  
                g_id = histcounts2(g_X, g_Y, edge_X, edge_Y);
                g_new = g_id * g_data;
                data_class  (ix,iy,ii,ij,:,:) = g_new; 
            end
        end
    end
end

Since each element of data is processed individually, g_data will only be noted as 1 in a bin, and then

g_new = g_id * g_data; 

g_new is the position of g_data after reclassification

The current loop implementation is very slow. How can I improve the computational speed?

There is a four-dimensional matrix data, along with corresponding X coordinates data_X and Y coordinates data_Y for each element. I want to perform a two-dimensional interval segmentation, or grid partitioning, based on edge_X and edge_Y, resulting in a new array data_class. The current loop implementation is very slow. How can I improve the computational speed?

clc;clear
rng("default")
data = randi([0, 100], 40,30,20, 10);
data_X =  randi([0, 100], 40,30,20, 10);
data_Y =  randi([0, 100], 40,30,20, 10);

% Define the edges for histograms in X and Y directions
edge_X = [0:10:100];
edge_Y = [0:20:100];

sz = size(data);
for ix = 1:sz(1)
    for iy = 1:sz(2)
        for ii = 1:sz(3)
            for ij = 1:sz(4)
               % Extract the current data, X, and Y values
                g_data = data(ix, iy, ii, ij);
                g_X = data_X(ix, iy, ii, ij);
                g_Y = data_Y(ix, iy, ii, ij);
    
                % Compute the 2D histogram count  
                g_id = histcounts2(g_X, g_Y, edge_X, edge_Y);
                g_new = g_id * g_data;
                data_class  (ix,iy,ii,ij,:,:) = g_new; 
            end
        end
    end
end

Since each element of data is processed individually, g_data will only be noted as 1 in a bin, and then

g_new = g_id * g_data; 

g_new is the position of g_data after reclassification

The current loop implementation is very slow. How can I improve the computational speed?

Share Improve this question edited Nov 19, 2024 at 16:47 Mark Rotteveel 110k230 gold badges156 silver badges225 bronze badges asked Nov 18, 2024 at 13:04 greenasdsgreenasds 433 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

The first thing you should be doing is looking at the orange underlined warning in your editor about data_class growing with each loop iteration. Initialising it before all of your loops with this saves a large portion of the runtime immediately

Pre-allocate data_class before the loops:

szout = [sz, [numel(edge_X),numel(edge_Y)]-1];
data_class = NaN(szout);

But the main bottleneck is calling histcounts2 on every value individually, you can make large time savings by bringing it outside of the ii/ij loops, and instead of using the first input as a count of 1 for the bin it's in, use the 4th and 5th outputs which store the bin each value should go into. In your case this is the row and column indices.

Here are some timings:

  • Baseline code: 20.53sec
  • Pre-allocating data_class: 7.75sec (-62% total runtime)
  • De-nest histcounts2: 0.53sec (-97% total runtime)
  • Only assign non-zero values: 0.10sec (-99.5% total runtime)

De-nest histcounts2

sz = size(data);
szout = [sz, [numel(edge_X),numel(edge_Y)]-1];
data_class = NaN(szout);
for ix = 1:sz(1)
    for iy = 1:sz(2)

        g_X = data_X(ix, iy, :, :);
        g_Y = data_Y(ix, iy, :, :);
        [~,~,~,bx,by] = histcounts2(g_X, g_Y, edge_X, edge_Y);
        bx = squeeze(bx);
        by = squeeze(by);
        g_data = squeeze( data(ix, iy, :, :) );
            
        for ii = 1:sz(3)
            for ij = 1:sz(4)
                g_new = zeros(szout(end-1:end));
                g_new(bx(ii,ij),by(ii,ij)) = g_data(ii,ij);
                data_class(ix,iy,ii,ij,:,:) = g_new; 
            end
        end
    end
end

Further, you don't need to initialise the array of zeros g_new for every slice just to set one value. Instead, initialise data_class as a matrix of zeros and just override the specific values you're interested in:

Only assign non-zero values

sz = size(data);
szout = [sz, [numel(edge_X),numel(edge_Y)]-1];
data_class = zeros(szout);
for ix = 1:sz(1)
    for iy = 1:sz(2)

        g_X = data_X(ix, iy, :, :);
        g_Y = data_Y(ix, iy, :, :);
        [~,~,~,bx,by] = histcounts2(g_X, g_Y, edge_X, edge_Y);
        bx = squeeze(bx);
        by = squeeze(by);
        g_data = squeeze( data(ix, iy, :, :) );
            
        for ii = 1:sz(3)
            for ij = 1:sz(4)
                data_class(ix,iy,ii,ij,bx(ii,ij),by(ii,ij)) = g_data(ii,ij); 
            end
        end
    end
end

You could do something with sub2ind to get rid of the ii/ij loops entirely, but in a quick test that didn't show any performance over just using the loops.

Post a comment

comment list (0)

  1. No comments so far