How to create a generic string of letters and numbers for "n" clusters in R to add in a dataframe?

Issue

I have this:

df<-structure(list(x = c(-0.803739264931451, 0.852850728148773, 0.927179506105653, -0.752626056626365, 0.706846224294882, 1.0346985222527, -0.475845197699957, -0.460301566967151, -0.680301544955355, -1.03196929988978), y = c(-0.853052609097935, 0.367618436999606, -0.274902437566225, -0.511565170496435, 0.81067919693492, 0.394655023166806, 0.989760805249143, -0.858997792847955, -0.66149481321353, -0.0219935446644728), shape = c(1, 1, 2, 2, 2, 2, 3, 3, 4, 4)), row.names = c(NA, 10L), class = "data.frame")

Output:

x y shape
-0.8037393 -0.85305261 1
0.8528507 0.36761844 1
0.9271795 -0.27490244 2
-0.7526261 -0.51156517 2
0.7068462 0.81067920 2
1.0346985 0.39465502 2
-0.4758452 0.98976081 3
-0.4603016 -0.85899779 3
-0.6803015 -0.66149481 4
-1.0319693 -0.02199354 4

Expected output:
How to create a generic string of letters and numbers for "n" clusters in R to add in a dataframe,as shown below:

obs: for example, if there were 100 clusters, the label of cluster 100 could be AA1 and so on.

df$label<-   #What is the correct code for this problem?
x y shape label
-0.8037393 -0.85305261 1 A1
0.8528507 0.36761844 1 A2
0.9271795 -0.27490244 2 B1
-0.7526261 -0.51156517 2 B2
0.7068462 0.81067920 2 B3
1.0346985 0.39465502 2 B4
-0.4758452 0.98976081 3 C1
-0.4603016 -0.85899779 3 C2
-0.6803015 -0.66149481 4 D1
-1.0319693 -0.02199354 4 D2

Solution

Here is a small function that should do it for you:

f <- function(g,n) {
  letter_index = if_else(g%%26 ==0, 26, g%%26)
  paste0(
    paste0(rep(LETTERS[letter_index], times = ceiling(g/26)), collapse=""),
    1:n)
}

Now apply that function to each shape value, using group_by() and mutate()

df %>% 
  group_by(shape) %>% 
  mutate(code = f(cur_group_id(), n()))

Output:

        x       y shape code 
    <dbl>   <dbl> <dbl> <chr>
 1 -0.804 -0.853      1 A1   
 2  0.853  0.368      1 A2   
 3  0.927 -0.275      2 B1   
 4 -0.753 -0.512      2 B2   
 5  0.707  0.811      2 B3   
 6  1.03   0.395      2 B4   
 7 -0.476  0.990      3 C1   
 8 -0.460 -0.859      3 C2   
 9 -0.680 -0.661      4 D1   
10 -1.03  -0.0220     4 D2

Explanation:

  • The function f() takes two values, an integer number indicating the group number (passed by cur_groupid()) and the number of values in that shape value (passed by n()). In the function, we use modulo to get the right number of times to replicate the LETTERS value, and then we paste it to the sequence from 1 to n

Answered By – langtang

Answer Checked By – Dawn Plyler (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.