最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

(qkdb+) How to group data with slight date discrepancies (+-1 day) for unique identifier creation - Stack Overflow

matteradmin5PV0评论

I am trying to create uniqueid column for each child (as they appear more than once in the table) so that I can identify between children with same name (no child has the same dadname, mumname and childdob)

q)group select dadname, mumname, childdob from peopletb
dadname         mumname            childdob   |                  
----------------------------------------------| -----------------
simon           jessica             2007.03.27| 0 1658 2292 18295
mike            rachel              2007.02.06| 1 4293 10976 33439 53751 63077 81844 90244 98782 ..
james           lucy                2005.04.09| 2 12756 21354
francis         anne                2007.04.12| 3 4151 8252 159075 172735 175531 177199 206953 40..
francis         anne                2007.04.13| 7 3255 7292 128021 143595 157839 168120 186578 21..
mike            rachel              2007.02.05| 10 4292 10946 33339 56751 67077 82844 96244 99381 ..

and so on

The issue is, is that there are some slight discrepancies in the childdob where it has been incremented by one day or the opposite it has gone back one day.

when grouping the data i want it to group childdob +1 or -1 from each other so that the discrepancy is eliminated.

e.g.

mike            rachel              2007.02.05| 10 4292 10946 33339 56751 67077 82844 96244 99381 ..
&
mike            rachel              2007.02.06| 1 4293 10976 33439 53751 63077 81844 90244 98782 ..

would have the same uniqueid

as would

francis         anne                2007.04.12| 3 4151 8252 159075 172735 175531 177199 206953 40..
&
francis         anne                2007.04.13| 7 3255 7292 128021 143595 157839 168120 186578 21..

I am trying to create uniqueid column for each child (as they appear more than once in the table) so that I can identify between children with same name (no child has the same dadname, mumname and childdob)

q)group select dadname, mumname, childdob from peopletb
dadname         mumname            childdob   |                  
----------------------------------------------| -----------------
simon           jessica             2007.03.27| 0 1658 2292 18295
mike            rachel              2007.02.06| 1 4293 10976 33439 53751 63077 81844 90244 98782 ..
james           lucy                2005.04.09| 2 12756 21354
francis         anne                2007.04.12| 3 4151 8252 159075 172735 175531 177199 206953 40..
francis         anne                2007.04.13| 7 3255 7292 128021 143595 157839 168120 186578 21..
mike            rachel              2007.02.05| 10 4292 10946 33339 56751 67077 82844 96244 99381 ..

and so on

The issue is, is that there are some slight discrepancies in the childdob where it has been incremented by one day or the opposite it has gone back one day.

when grouping the data i want it to group childdob +1 or -1 from each other so that the discrepancy is eliminated.

e.g.

mike            rachel              2007.02.05| 10 4292 10946 33339 56751 67077 82844 96244 99381 ..
&
mike            rachel              2007.02.06| 1 4293 10976 33439 53751 63077 81844 90244 98782 ..

would have the same uniqueid

as would

francis         anne                2007.04.12| 3 4151 8252 159075 172735 175531 177199 206953 40..
&
francis         anne                2007.04.13| 7 3255 7292 128021 143595 157839 168120 186578 21..
Share Improve this question asked Nov 18, 2024 at 18:28 bashprofile84bashprofile84 992 silver badges11 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 2

There's a few tricky edge cases when you don't know the true dob, and if you have datapoints both above and below, but something like this might be a starting point:

peopletb:([]dadname:100?`simon`mike`james;mumname:100?`jessica`rachel`lucy;childdob:100?2007.01.03 2007.01.02 2007.01.04 2007.06.01 2007.08.10 2007.08.09 2007.10.23 2007.10.24)

/default grouping
q)select idx:i by dadname,mumname,childdob from `childdob xasc peopletb
dadname mumname childdob  | idx
--------------------------| --------------
james   jessica 2007.01.02| ,9
james   jessica 2007.01.04| ,26
james   jessica 2007.08.09| 56 58 59
james   jessica 2007.08.10| 75 77
james   jessica 2007.10.24| ,96
james   lucy    2007.01.02| ,3
james   lucy    2007.01.03| 13 21 24
...

/step 1 - fudge the date
q)fudge:update childdob:{(y;x)2>y-x}\[childdob]by dadname,mumname from `childdob xasc peopletb;
/step 2 - group the fudged dates
q)select idx:i by dadname,mumname,childdob from fudge
dadname mumname childdob  | idx
--------------------------| --------------------------
james   jessica 2007.01.02| ,9
james   jessica 2007.01.04| ,26
james   jessica 2007.08.09| 56 58 59 75 77
james   jessica 2007.10.24| ,96
james   lucy    2007.01.02| 3 13 21 24
james   lucy    2007.01.04| ,34
james   lucy    2007.08.09| 50 51 55 60 65 66 70 76 78
...

Notice how james/jessica combined 2007.08.09/10 into 09 but didn't combine 2007.01.02/04 (ambiguous, doesn't satisfy 2>y-x). However you could increase the fudge to 3>y-x for more leniency

Articles related to this article

Post a comment

comment list (0)

  1. No comments so far