最新消息:Welcome to the puzzle paradise for programmers! Here, a well-designed puzzle awaits you. From code logic puzzles to algorithmic challenges, each level is closely centered on the programmer's expertise and skills. Whether you're a novice programmer or an experienced tech guru, you'll find your own challenges on this site. In the process of solving puzzles, you can not only exercise your thinking skills, but also deepen your understanding and application of programming knowledge. Come to start this puzzle journey full of wisdom and challenges, with many programmers to compete with each other and show your programming wisdom! Translated with DeepL.com (free version)

dplyr - grouped recoding with lookup table in R - Stack Overflow

matteradmin5PV0评论

I have a data table where one variable is messy and can contain different variants of the same value (e.g., team name Newcastle United or Newcastle). These variants occur alongside another grouping-like variable (e.g., both the Premier League and A-League have Newcastle clubs with different team name variants):

team = c('Newcastle United','Newcastle','Newcastle Utd','Newcastle United Jets','Newcastle','Newcastle Jets')
competition=c('Premier League','Premier League','Premier League','A-League','A-League','A-League')
df = tibble(team,competition)
# A tibble: 6 × 2
  team                  competition   
  <chr>                 <chr>         
1 Newcastle United      Premier League
2 Newcastle             Premier League
3 Newcastle Utd         Premier League
4 Newcastle United Jets A-League      
5 Newcastle             A-League      
6 Newcastle Jets        A-League     

I also have a lookup table that specifies the desired team name per competition as follows:

old_name=c('Newcastle','Newcastle Utd','Newcastle','Newcastle United Jets')
new_name=c('Newcastle United','Newcastle United','Newcastle Jets','Newcastle Jets')
competition=c('Premier League','Premier League','A-League','A-League')
lookup=tibble(old_name,new_name,competition)
# A tibble: 4 × 3
  old_name              new_name         competition   
  <chr>                 <chr>            <chr>         
1 Newcastle             Newcastle United Premier League
2 Newcastle Utd         Newcastle United Premier League
3 Newcastle             Newcastle Jets   A-League      
4 Newcastle United Jets Newcastle Jets   A-League    

How can I recode/relabel team such that only the relevant competition from the lookup table is used? I tried combining dplyr's group_by and recode in different ways but no luck so far.

(My real data and lookup tables are much bigger and the data table includes cases that don't have a match in the lookup table.)

Desired output:

# A tibble: 6 × 2
  team             competition   
  <chr>            <chr>         
1 Newcastle United Premier League
2 Newcastle United Premier League
3 Newcastle United Premier League
4 Newcastle Jets   A-League      
5 Newcastle Jets   A-League      
6 Newcastle Jets   A-League    

I have a data table where one variable is messy and can contain different variants of the same value (e.g., team name Newcastle United or Newcastle). These variants occur alongside another grouping-like variable (e.g., both the Premier League and A-League have Newcastle clubs with different team name variants):

team = c('Newcastle United','Newcastle','Newcastle Utd','Newcastle United Jets','Newcastle','Newcastle Jets')
competition=c('Premier League','Premier League','Premier League','A-League','A-League','A-League')
df = tibble(team,competition)
# A tibble: 6 × 2
  team                  competition   
  <chr>                 <chr>         
1 Newcastle United      Premier League
2 Newcastle             Premier League
3 Newcastle Utd         Premier League
4 Newcastle United Jets A-League      
5 Newcastle             A-League      
6 Newcastle Jets        A-League     

I also have a lookup table that specifies the desired team name per competition as follows:

old_name=c('Newcastle','Newcastle Utd','Newcastle','Newcastle United Jets')
new_name=c('Newcastle United','Newcastle United','Newcastle Jets','Newcastle Jets')
competition=c('Premier League','Premier League','A-League','A-League')
lookup=tibble(old_name,new_name,competition)
# A tibble: 4 × 3
  old_name              new_name         competition   
  <chr>                 <chr>            <chr>         
1 Newcastle             Newcastle United Premier League
2 Newcastle Utd         Newcastle United Premier League
3 Newcastle             Newcastle Jets   A-League      
4 Newcastle United Jets Newcastle Jets   A-League    

How can I recode/relabel team such that only the relevant competition from the lookup table is used? I tried combining dplyr's group_by and recode in different ways but no luck so far.

(My real data and lookup tables are much bigger and the data table includes cases that don't have a match in the lookup table.)

Desired output:

# A tibble: 6 × 2
  team             competition   
  <chr>            <chr>         
1 Newcastle United Premier League
2 Newcastle United Premier League
3 Newcastle United Premier League
4 Newcastle Jets   A-League      
5 Newcastle Jets   A-League      
6 Newcastle Jets   A-League    
Share Improve this question asked Nov 16, 2024 at 21:53 mrroymrroy 1257 bronze badges 1
  • 1 Perhaps try one of the methods from stackoverflow/questions/67081496/… ? – jared_mamrot Commented Nov 16, 2024 at 22:37
Add a comment  | 

1 Answer 1

Reset to default 1

An approach using full_join

library(dplyr)

full_join(df, lookup, by = join_by(team == old_name, competition)) %>% 
  mutate(team = coalesce(new_name, team), new_name = NULL)
# A tibble: 6 × 2
  team             competition
  <chr>            <chr>
1 Newcastle United Premier League
2 Newcastle United Premier League
3 Newcastle United Premier League
4 Newcastle Jets   A-League
5 Newcastle Jets   A-League
6 Newcastle Jets   A-League
Post a comment

comment list (0)

  1. No comments so far