堆

堆块（Chunk）

堆块（Chunk）的结构：

chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of previous chunk, if unallocated (P clear)  |    //prev_size
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of chunk, in bytes                     |A|M|P|    //|A|M|P| -> |NON_MAIN_ARENA|IS_MAPPED|PREV_INUSE|
  mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             User data starts here...                          .
        .                                                               .
        .             (malloc_usable_size() bytes)                      .
next    .                                                               |
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             (size of chunk, but used for application data)    |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of next chunk, in bytes                |A|0|1|
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

每个字段的具体的解释如下：

prev_size, 如果该 chunk 的物理相邻的前一地址 chunk（两个指针的地址差值为前一 chunk 大小）是空闲的话，那该字段记录的是前一个 chunk 的大小 (包括 chunk 头)。否则，该字段可以用来存储物理相邻的前一个 chunk 的数据。这里的前一 chunk 指的是较低地址的 chunk 。
size，该 chunk 的大小，大小必须是 2 * SIZE_SZ 的整数倍。如果申请的内存大小不是 2 * SIZE_SZ 的整数倍，会被转换满足大小的最小的 2 * SIZE_SZ 的倍数。32 位系统中，SIZE_SZ 是 4；64 位系统中，SIZE_SZ 是 8。该字段的低三个比特位对 chunk 的大小没有影响，它们从高到低分别表示
- NON_MAIN_ARENA，记录当前 chunk 是否不属于主线程，1 表示不属于，0 表示属于。
- IS_MAPPED，记录当前 chunk 是否是由 mmap 分配的。
- PREV_INUSE，记录前一个 chunk 块是否被分配。一般来说，堆中第一个被分配的内存块的 size 字段的 P 位都会被设置为 1，以便于防止访问前面的非法内存。当一个 chunk 的 size 的 P 位为 0 时，我们能通过 prev_size 字段来获取上一个 chunk 的大小以及地址。这也方便进行空闲 chunk 之间的合并。
fd，bk， chunk 处于分配状态时，从 fd 字段开始是用户的数据。chunk 空闲时，会被添加到对应的空闲管理链表中，其字段的含义如下
- fd 指向下一个（非物理相邻）空闲的 chunk
- bk 指向上一个（非物理相邻）空闲的 chunk
- 通过 fd 和 bk 可以将空闲的 chunk 块加入到空闲的 chunk 块链表进行统一管理
fd_nextsize， bk_nextsize，也是只有 chunk 空闲的时候才使用，不过其用于较大的 chunk（large chunk）。
- fd_nextsize 指向前一个与当前 chunk 大小不同的第一个空闲块，不包含 bin 的头指针。
- bk_nextsize 指向后一个与当前 chunk 大小不同的第一个空闲块，不包含 bin 的头指针。
- 一般空闲的 large chunk 在 fd 的遍历顺序中，按照由大到小的顺序排列。这样做可以避免在寻找合适 chunk 时挨个遍历。

Bin

概述

数组中的 bin 依次如下：

第一个为 unsorted bin，字如其面，这里面的 chunk 没有进行排序，存储的 chunk 比较杂。
索引从 2 到 63 的 bin 称为 small bin，同一个 small bin 链表中的 chunk 的大小相同。两个相邻索引的 small bin 链表中的 chunk 大小相差的字节数为 2 个机器字长，即 32 位相差 8 字节，64 位相差 16 字节。
small bins 后面的 bin 被称作 large bins。large bins 中的每一个 bin 都包含一定范围内的 chunk，其中的 chunk 按 fd 指针的顺序从大到小排列。相同大小的 chunk 同样按照最近使用顺序排列。
此外，上述这些 bin 的排布都会遵循一个原则：任意两个物理相邻的空闲 chunk 不能在一起。
需要注意的是，并不是所有的 chunk 被释放后就立即被放到 bin 中。ptmalloc 为了提高分配的速度，会把一些小的 chunk 先放到 fast bins 的容器内。而且，fastbin 容器中的 chunk 的使用标记总是被置位的，所以不满足上面的原则。

Tcache

头插头取，FILO
tcache_perthread_struct

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */
typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

# define TCACHE_MAX_BINS                64

static __thread tcache_perthread_struct *tcache = NULL;

每个 thread 都会维护一个 tcache_perthread_struct，它是整个 tcache 的管理结构，一共有 TCACHE_MAX_BINS 个计数器和 TCACHE_MAX_BINS 项 tcache_entry，其中

tcache_entry 用单向链表的方式链接了相同大小的处于空闲状态（free 后）的 chunk，这一点上和 fastbin 很像。
counts 记录了 tcache_entry 链上空闲 chunk 的数目，每条链上最多可以有 7 个 chunk。
用图表示大概是：

基本工作方式

第一次 malloc 时，会先 malloc 一块内存用来存放 tcache_perthread_struct 。
free 内存，且 size 小于 small bin size 时
tcache 之前会放到 fastbin 或者 unsorted bin 中
tcache 后：
- 先放到对应的 tcache 中，直到 tcache 被填满（默认是 7 个）
- tcache 被填满之后，再次 free 的内存和之前一样被放到 fastbin 或者 unsorted bin 中
- tcache 中的 chunk 不会合并（不取消 inuse bit）
malloc 内存，且 size 在 tcache 范围内
先从 tcache 取 chunk，直到 tcache 为空
tcache 为空后，从 bin 中找
tcache 为空时，如果 fastbin/smallbin/unsorted bin 中有 size 符合的 chunk，会先把 fastbin/smallbin/unsorted bin 中的 chunk 放到 tcache 中，直到填满。之后再从 tcache 中取；因此 chunk 在 bin 中和 tcache 中的顺序会反过来

Fast Bin

glibc 采用单向链表对其中的每个 bin 进行组织，并且每个 bin 采取 LIFO 策略，最近释放的 chunk 会更早地被分配。
需要特别注意的是，fastbin 范围的 chunk 的 inuse 始终被置为 1。因此它们不会和其它被释放的 chunk 合并。
但是当释放的 chunk 与该 chunk 相邻的空闲 chunk 合并后的大小大于 FASTBIN_CONSOLIDATION_THRESHOLD 时，内存碎片可能比较多了，我们就需要把 fast bins 中的 chunk 都进行合并，以减少内存碎片对系统的影响。

Small Bin

small bins 中每个 chunk 的大小与其所在的 bin 的 index 的关系为：chunk_size = 2 * SIZE_SZ *index，具体如下

下标	SIZE_SZ=4（32 位）	SIZE_SZ=8（64 位）
2	16	32
3	24	48
4	32	64
5	40	80
x	24x	28x
63	504	1008

small bins 中一共有 62 个循环双向链表，每个链表中存储的 chunk 大小都一致。比如对于 32 位系统来说，下标 2 对应的双向链表中存储的 chunk 大小为均为 16 字节。每个链表都有链表头结点，这样可以方便对于链表内部结点的管理。此外，small bins 中每个 bin 对应的链表采用 FIFO 的规则，所以同一个链表中先被释放的 chunk 会先被分配出去。

Large Bin

large bins 中一共包括 63 个 bin，每个 bin 中的 chunk 的大小不一致，而是处于一定区间范围内。此外，这 63 个 bin 被分成了 6 组，每组 bin 中的 chunk 大小之间的公差一致，具体如下：

组	数量	公差
1	32	64B
2	16	512B
3	8	4096B
4	4	32768B
5	2	262144B
6	1	不限制

这里我们以 32 位平台的 large bin 为例，第一个 large bin 的起始 chunk 大小为 512 字节，位于第一组，所以该 bin 可以存储的 chunk 的大小范围为 [512,512+64)。

fd_nextsize，bk_nextsize：只有chunk可先的时候才使用，不过用于较大的chunk（large chunk）
fd_nextsize指向前一个与当前chunk大小不同的第一个空闲块，不包含bin的头指针
bk_nextsize指向后一个与当前chunk大小不同的第一个空闲块，不包含bin的头指针
一般空闲的large chunk在fd的遍历顺序中，按照由大到小的顺序排列。这样可以避免在寻找合适chunk时挨个遍历

Large Bin的插入顺序

在index相同的情况下：
1、按照大小，从大到小排序（小的链接large bin块）
2、如果大小相同，按照free的时间排序
3、多个大小相同的堆块，只有首堆块的fd_nextsize和bk_nextsize会指向其他堆块，后面的堆块的fd_nextsize和bk_nextsize均为0
4、size最大的chunk的bk_nextsize指向最小的chunk，size最小的chunk的fd_nextsize指向最大的chunk

Unsorted Bin

unsorted bin 可以视为空闲 chunk 回归其所属 bin 之前的缓冲区。
unsorted bin 处于我们之前所说的 bin 数组下标 1 处。故而 unsorted bin 只有一个链表。unsorted bin 中的空闲 chunk 处于乱序状态，主要有两个来源

当一个较大的 chunk 被分割成两半后，如果剩下的部分大于 MINSIZE，就会被放到 unsorted bin 中。
释放一个不属于 fast bin 的 chunk，并且该 chunk 不和 top chunk 紧邻时，该 chunk 会被首先放到 unsorted bin 中。

此外，Unsorted Bin 在使用的过程中，采用的遍历顺序是头插尾取 FIFO 。

Top Chunk

程序第一次进行 malloc 的时候，heap 会被分为两块，一块给用户，剩下的那块就是 top chunk。
其实，所谓的 top chunk 就是处于当前堆的物理地址最高的 chunk。
这个 chunk 不属于任何一个 bin，它的作用在于当所有的 bin 都无法满足用户请求的大小时，如果其大小不小于指定的大小，就进行分配，并将剩下的部分作为新的 top chunk。
否则，就对 heap 进行扩展后再进行分配。在 main arena 中通过 sbrk 扩展 heap，而在 thread arena 中通过 mmap 分配新的 heap。
需要注意的是，top chunk 的 prev_inuse 比特位始终为 1，否则其前面的 chunk 就会被合并到 top chunk 中。

malloc/free流程

重要的数据结构

1.malloc_state

首先便是我们经常接触的arena

struct malloc_state
{
  /* Serialize access.  */
  mutex_t mutex;

  /* Flags (formerly in max_fast).  */
  int flags;

  /* fastbin链条数组 */
  mfastbinptr fastbinsY[NFASTBINS];

  /*top chunk 指针 */
  mchunkptr top;

  /* The remainder from the most recent split of a small request 最近分割的一个小请求的剩余部分 */
  mchunkptr last_remainder;

  /* NBINS为宏，代指128，这里包含了除fastbin的所有bin指针 */
  mchunkptr bins[NBINS * 2 - 2];

  /* bins数组的位图，用于判断bin是否为空 */
  unsigned int binmap[BINMAPSIZE];

  /* 链接下一个malloc_state的指针 */
  struct malloc_state *next;

  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  struct malloc_state *next_free;

  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
  INTERNAL_SIZE_T attached_threads;

  /* 在本arena当中从系统处获取到的内存大小  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

其结构在源码当中表现为mstate

2.malloc_chunk

struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /*  Size of previous chunk (if free).  */
  //物理地址上上一个相邻chunk的大小（前提是其被free）否则会作为上一个chunk的数据段使用
  INTERNAL_SIZE_T      size;       /* 这一个chunk的大小，包含头  Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;         //双链表，仅在被free后使用，否则为分配给用户使用的数据段

  /* Only used for large blocks: pointer to next larger size.  */
  //仅用于largebin，fd指向下一个大小不同的chunk；当有多个chunk大小相同时，指向大小相同chunk组成的子链的第一个chunk
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

源码自带的malloc_chunk细节

/*
   malloc_chunk details:

    (The following includes lightly edited explanations by Colin Plumb.)

    Chunks of memory are maintained using a `boundary tag' method as
    described in e.g., Knuth or Standish.  (See the paper by Paul
    Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
    survey of such techniques.)  Sizes of free chunks are stored both
    in the front of each chunk and at the end.  This makes
    consolidating fragmented chunks into bigger chunks very fast.  The
    size fields also hold bits representing whether chunks are free or
    in use.

    An allocated chunk looks like this:


    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of previous chunk, if allocated            | |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of chunk, in bytes                       |M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             User data starts here...                          .
        .                                                               .
        .             (malloc_usable_size() bytes)                      .
        .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of chunk                                     |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


    Where "chunk" is the front of the chunk for the purpose of most of
    the malloc code, but "mem" is the pointer that is returned to the
    user.  "Nextchunk" is the beginning of the next contiguous chunk.

    Chunks always begin on even word boundaries, so the mem portion
    (which is returned to the user) is also on an even word boundary, and
    thus at least double-word aligned.

    Free chunks are stored in circular doubly-linked lists, and look like this:

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of previous chunk                            |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `head:' |             Size of chunk, in bytes                         |P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Forward pointer to next chunk in list             |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Back pointer to previous chunk in list            |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Unused space (may be 0 bytes long)                .
        .                                                               .
        .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `foot:' |             Size of chunk, in bytes                           |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    The P (PREV_INUSE) bit, stored in the unused low-order bit of the
    chunk size (which is always a multiple of two words), is an in-use
    bit for the *previous* chunk.  If that bit is *clear*, then the
    word before the current chunk size contains the previous chunk
    size, and can be used to find the front of the previous chunk.
    The very first chunk allocated always has this bit set,
    preventing access to non-existent (or non-owned) memory. If
    prev_inuse is set for any given chunk, then you CANNOT determine
    the size of the previous chunk, and might even get a memory
    addressing fault when trying to do so.

    Note that the `foot' of the current chunk is actually represented
    as the prev_size of the NEXT chunk. This makes it easier to
    deal with alignments etc but can be very confusing when trying
    to extend or adapt this code.

    The two exceptions to all this are

     1. The special chunk `top' doesn't bother using the
    trailing size field since there is no next contiguous chunk
    that would have to index off it. After initialization, `top'
    is forced to always exist.  If it would become less than
    MINSIZE bytes long, it is replenished.

     2. Chunks allocated via mmap, which have the second-lowest-order
    bit M (IS_MMAPPED) set in their size fields.  Because they are
    allocated one-by-one, each must contain its own trailing size field.

*/

0x01 malloc步骤

step1 malloc环境准备

首先我们调用malloc(size)的时候，调用的库函数实际上为_libc_malloc，如下：

void * __libc_malloc (size_t bytes)
{
  mstate ar_ptr; //mstate:arena结构体
  void *victim;  //victim:要获取的chunk指针

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);    
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));   //调用malloc_hook

  arena_get (ar_ptr, bytes);                  //表现为宏，获取arena指针

  victim = _int_malloc (ar_ptr, bytes);  
  //调用_int_malloc，返回分配chunk指针，参数一为arena指针，参数二为我们的需要分配的字节
  /* Retry with another arena only if we were able to find a usable arena
     before.  */
  if (!victim && ar_ptr != NULL) //如果分配失败&&arena指针不为空
    {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);//尝试重新分配
    }

  if (ar_ptr != NULL)
    (void) mutex_unlock (&ar_ptr->mutex);

  assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          ar_ptr == arena_for_chunk (mem2chunk (victim)));
  return victim;//返回chunk指针
}

可以发现我们的_libc_malloc函数主要功能是获取合适的arena，然后传递给_int_malloc分配真正的堆块，然后来观察_int_malloc，而该函数十分长，因此逐步来看,

首先看到定义的一些字段，如下：

static void * _int_malloc (mstate av, size_t bytes)//传入两个参数,第一个是arena指针av,第二个是大小bytes
{
  INTERNAL_SIZE_T nb;               /* normalized request size标准化请求大小 */
  unsigned int idx;                 /* 字节所关联的bin数组下标 */
  mbinptr bin;                      /* bin数组下标所对应的bin指针 */

  mchunkptr victim;                 /* 检查/选择得到的chunk指针 */
  INTERNAL_SIZE_T size;             /* 得到的chunk大小 */
  int victim_index;                 /* 所得chunk对应bin的index */

  mchunkptr remainder;              /* 分块后的剩余部分 */
  unsigned long remainder_size;     /* 剩余部分大小 */

  unsigned int block;               /* bit map traverser */
  unsigned int bit;                 /* bit map traverser */
  unsigned int map;                 /* current word of binmap */

  mchunkptr fwd;                    /* 链表操作时用于暂存chunk  misc temp for linking */
  mchunkptr bck;                    /* misc temp for linking */

  checked_request2size(bytes, nb);

继续int_malloc

/* 若传入的av为空，那么转回调用sysmalloc，通过mmap来分配出一个chunk */
if (__glibc_unlikely (av == NULL))
  {
    void *p = sysmalloc (nb, av);
    if (p != NULL)
  alloc_perturb (p, bytes);
    return p;
  }

题外话: checked_request2size (bytes, nb)

注意到这里的INTERNAL_SIZE_T nb并没有初始化，但是后面的比较都是通过nb来判断的，正是因为后面执行了一个checked_request2size(bytes, nb)函数，将用户请求的bytes转化成了8字节对齐的nb，其宏定义如下：

//通过需求的字节大小来转变至实际需要的堆块大小
checked_request2size (bytes, nb);
//这里通过宏定义实现
#define request2size(req)                                         \
  (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE)  ?             \
   MINSIZE :                                                      \
   ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)

/*  Same, except also perform argument check */

#define checked_request2size(req, sz)                             \
  if (REQUEST_OUT_OF_RANGE (req)) {                                              \
      __set_errno (ENOMEM);                                                      \
      return 0;                                                                      \
    }                                                                              \
  (sz) = request2size (req);

step2 若在fastbin范围

然后接下来判断其是否位于fastbin范围

fastbinsY指的是用于存储fastbin的数组,其每一个元素都指向一个固定大小的fastbin单链表

/*
   如果该size位于fastbins范围， 首先检查合并堆块.
   即使av未初始化，该代码也是可正常安全执行的, 因此我们可以在不检查的情况下执行
 */

if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))//如果请求大小<=fastbin最大值
  {
    idx = fastbin_index (nb);                     //根据chunk的大小获取fastbin数组的下标
    mfastbinptr *fb = &fastbin (av, idx);         //为一个宏，通过下标idx得到fastbinsY元素指针
    mchunkptr pp = *fb;
    do
      {
        victim = pp;
          /* 若未找到合适的victim，跳出循环 */
        if (victim == NULL)
          break;
      }
    while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
           != victim);        //这里该函数为一个原子操作，目的是交换fb与victim->fd的值，也就是从链头开始取
           
  /* 下面判断取出的victim是否通过检查 */
    if (victim != 0)
      {
        if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0)) //检查堆块的size位
          {
            errstr = "malloc(): memory corruption (fast)";
          errout:
            malloc_printerr (check_action, errstr, chunk2mem (victim), av);
            return NULL;//如果size位不符合fastbin的idx,则报错
          }
        check_remalloced_chunk (av, victim, nb);  //检查重分配堆块
        void *p = chunk2mem (victim);             //堆块指针转为指向返回内存的宏
        alloc_perturb (p, bytes);                 //堆块清0
        return p;
      }
  }

大致含义即为从fastbinsY链表数组找到对应的下标，然后从头取出fastbin，清零后返回给用户。

从fastbin中取出的链表会有如下操作/检查

检查size和fastbin的idx是否符合

step3 若在small范围当中

下面继续_int_malloc，执行到这里表示超出fastbin大小范围，或者说使用fastbin分配失败，然后就会判断其进入smallbin的判断当中

/*
    如果请求的size大小属于smallbin范围，我们则检查通用的bins数组.  Since these "smallbins"
     hold one size each, no searching within bins is necessary.
     (如果是largebin范围, 我们必须等到 unsorted chunks 被处理为寻找最佳适配块. But for small ones, fits are exact
     anyway, so we can check now, which is faster.)
   */

  if (in_smallbin_range (nb))
    {
      idx = smallbin_index (nb);
      bin = bin_at (av, idx);                         //获取对应smallbin链表数组下标

      if ((victim = last (bin)) != bin)
      //如果非空bin->bk会指向单链表的第一个chunk
      //last(bin)为宏bin->bk,这里是判断他是否为空，若不为空则说明smallbin里面有堆块，进入下一步
        {
          if (victim == 0) /* 初始化检查，若为0则说明smallbin并未初始化，我们的small bin的各项还是0 */
            malloc_consolidate (av); //进行av初始化，也就是取出fast chunk各项堆块合并一下
          else                                 //这里说明我们已经经过了初始化，所以接下来就是普通的判断过程
            {
              bck = victim->bk;
              //此时victim为smallbin的尾块,此语句的作用是使smallbin->bk指向victim->bk,即倒数第二块,即为取出尾块的操作
                if (__glibc_unlikely (bck->fd != victim))         //检查
                {
                  errstr = "malloc(): smallbin double linked list corrupted";
                  goto errout;
                }
              set_inuse_bit_at_offset (victim, nb);         //设置相邻下一个堆块的inuse位
              bin->bk = bck;
              bck->fd = bin;  //从尾部脱链,将最后一块取出链表

              if (av != &main_arena)
                victim->size |= NON_MAIN_ARENA;
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);                  //清空bytes字节大小的值
              return p;
            }
        }
    }

我们对于smallbin的分配也十分简单，那就是直接从尾部开始取，但是在取之前我们会判断arena是否进行过初始化，若没有进行初始化，则调用malloc_consolidate进行初始化

题外话：malloc_consolidate

这是单独的malloc_consolidate函数讲解，与上面_int_malloc单独分开，整体代码如下：

/*
  ------------------------- malloc_consolidate -------------------------
  malloc_consolidate是一个用来拆除fastbins中chunk的特殊free()函数.
  free()本身不能用于此目的，因为在其他情况下他可能将堆块放入fastbins当中  因此我们需要使用一个相似的操作来代替free()代码。
  当然，因为该代码在malloc的工程中是第一次被调用 ，他是一个极佳的位置来触发我们的初始化代码
*/
static void malloc_consolidate(mstate av)
{
  mfastbinptr*    fb;                 /* 目前正在被合并的fastbin chunk指针
/
*  mfastbinptr*
    maxfb;              /* last fastbin (for loop control) 
/
*  mchunkptr       p;                  /*
 current chunk being consolidated 
/
*  mchunkptr       nextp;              /*
 next chunk to consolidate 
/
*  mchunkptr       unsorted_bin;       /*
 bin header 
/
*  mchunkptr       first_unsorted;     /*
 chunk to link to */
  /* These have same use as in free() */
  mchunkptr       nextchunk;
  INTERNAL_SIZE_T size;
  INTERNAL_SIZE_T nextsize;
  INTERNAL_SIZE_T prevsize;
  int             nextinuse;
  mchunkptr       bck;
  mchunkptr       fwd;
  /*
    如果max_fast 为0，则说明arena并未初始化，因此执行下面的步骤
  */
  if (get_max_fast () != 0) {
    clear_fastchunks(av);//将fastbin的inuse位都置0
    unsorted_bin = unsorted_chunks(av);                         //获取unsorted bin的bins数组下标
    /*
      移除fastbins当中所有的chunk，然后进行合并，再紧接着放入我们的unsorted bin链条当中.
      Among other reasons for doing this,placing in unsorted bin avoids needing to calculate actual bins
      until malloc is sure that chunks aren't immediately going to be reused anyway.
    */
    maxfb = &fastbin (av, NFASTBINS - 1);                 //获取最大块的fastbin
    fb = &fastbin (av, 0);                                //获取最小块的fastbin
    do {
      p = atomic_exchange_acq (fb, 0);         //纳米交换！小子
      if (p != 0) {                            //如果说换出来的堆块指针非0，那么就说明该链条上面存在fastbin堆块
    do {
      check_inuse_chunk(av, p);         
      //查询下一个堆块的inuse来表示该堆块是否被使用，以及是否是mmap分配，但这里没怎么理解，
      //因为fastbin的inuse不应该都是始终为1的么，这里可能是检查紧邻的下一个堆块可能不属于fastbin范围时的检测
      nextp = p->fd;
      /* Slightly streamlined version of consolidation code in free() */
      size = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);   //获取堆块size
      nextchunk = chunk_at_offset(p, size); //为一个宏，这里即为p+size
      nextsize = chunksize(nextchunk);                 //获取next堆块的size
      if (!prev_inuse(p)) {                         //查看p堆块前面的堆块是否分配，若未分配则进行下面的步骤
        prevsize = p->prev_size;

        size += prevsize;
        p = chunk_at_offset(p, -((long) prevsize));         //即为上面p + prevsize前一个块地址
        unlink(av, p, bck, fwd);                 //大名鼎鼎的unlink，在2.23版本表现为一个宏
      }
      if (nextchunk != av->top) {                          //如果nextchunk不是topchunk，则往下走
        nextinuse = inuse_bit_at_offset(nextchunk, nextsize); //判断nextchunk的下一块inuse位
        if (!nextinuse) {         //nextinuse为0说明该块是空的
          size += nextsize;
          unlink(av, nextchunk, bck, fwd);                 //unlink该nextchunk
        } else         //nextinuse 为1则说明该块正在被使用，因此清该堆块的inuse位为0
          clear_inuse_bit_at_offset(nextchunk, 0);
        first_unsorted = unsorted_bin->fd;
        unsorted_bin->fd = p;
        first_unsorted->bk = p;         //将fastbin取出/合并后的堆块存入unsortedbin，从链头放入
        if (!in_smallbin_range (size)) { //如果为largebin范围，则置空fd/bk_nextsize
          p->fd_nextsize = NULL;
          p->bk_nextsize = NULL;
        }
        set_head(p, size | PREV_INUSE);  //设置堆块细节，也就是头部分和脚部分
        p->bk = unsorted_bin;
        p->fd = first_unsorted;
        set_foot(p, size);
      }
      else {                 //如果nextchunk是topchunk
        size += nextsize;  //合并为top_chunk
        set_head(p, size | PREV_INUSE);
        av->top = p;
      }
    } while ( (p = nextp) != 0); //二层则遍历 fastbin 单链表得到每一个相同尺寸的空闲 chunk。
      }
    } while (fb++ != maxfb); //第一层遍历 fastbinY 数组，得到每一个固定尺寸的 fastbin 单链表
  }
  else {                 //若未初始化，则使用下面代码进行一个简单的初始化
    malloc_init_state(av);
    check_malloc_state(av);
  }
}

对每一个 chunk，首先尝试向后合并。合并操作即更新 p 的 size 以及指向，然后调用 unlink()宏将后方 chunk 从其链接的 bin 中脱链:

if (!prev_inuse(p)) {
  prevsize = p->prev_size;
  size += prevsize;
  p = chunk_at_offset(p, -((long) prevsize));
  unlink(av, p, bck, fwd);
}

然后尝试向前合并。向前合并情况复杂点，它的处理是这样的：

如果向前相邻 top_chunk，则直接合并到 top_chunk 后完事，不再理会 unsorted_bin
如果向前不相邻 top_chunk，则尝试向前合并后插入到 unsorted_bin

代码如下:

if (nextchunk != av->top) {
        nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

        if (!nextinuse) {
          size += nextsize;
          unlink(av, nextchunk, bck, fwd);
        } else
          clear_inuse_bit_at_offset(nextchunk, 0);

        first_unsorted = unsorted_bin->fd;
        unsorted_bin->fd = p;
        first_unsorted->bk = p;

        if (!in_smallbin_range (size)) {
          p->fd_nextsize = NULL;
          p->bk_nextsize = NULL;
        }

        set_head(p, size | PREV_INUSE);
        p->bk = unsorted_bin;
        p->fd = first_unsorted;
        set_foot(p, size);
      }

      else {
        size += nextsize;
        set_head(p, size | PREV_INUSE);
        av->top = p;
      }

总结:

若 get_max_fast() 返回 0，则进行堆的初始化工作，然后进入第 7 步
从 fastbin 中获取一个空闲 chunk
尝试向后合并
若向前相邻 top_chunk，则直接合并到 top_chunk，然后进入第 6 步
否则尝试向前合并后，插入到 unsorted_bin 中
获取下一个空闲 chunk，回到第 2 步，直到所有 fastbin 清空后进入第 7 步
退出函数

题外话： unlink宏

其中unlink宏如下：

/* 从bin链表数组当中取出一个chunk */
#define unlink(AV, P, BK, FD) {                                            \
    FD = P->fd;                                                                      \
    BK = P->bk;                                                                      \
    if (__builtin_expect (FD->bk != P || BK->fd != P, 0))                      \  //检查一：p->fd->bk == p && p->bk->fd == p
      malloc_printerr (check_action, "corrupted double-linked list", P, AV);  \
    else {                                                                      \
        FD->bk = BK;                                                              \

        BK->fd = FD;                                                              \         //脱链操作
        if (!in_smallbin_range (P->size)                                      \          //不在smallbin范围，那么就是在largebin范围
            && __builtin_expect (P->fd_nextsize != NULL, 0)) {                      \
        if (__builtin_expect (P->fd_nextsize->bk_nextsize != P, 0)              \

        || __builtin_expect (P->bk_nextsize->fd_nextsize != P, 0))    \         //检查二：p->fd_nextsize->bk_nextsize == p && p->bk_nextsize->fd_nextsize == p
          malloc_printerr (check_action,                                      \
                   "corrupted double-linked list (not small)",    \
                   P, AV);                                              \
            if (FD->fd_nextsize == NULL) {                                      \ //如果在小链条中
                if (P->fd_nextsize == P)                                      \         //如果largebin链条就一个，则往下走
                  FD->fd_nextsize = FD->bk_nextsize = FD;                      \
                else {                                                              \
                    FD->fd_nextsize = P->fd_nextsize;                              \
                    FD->bk_nextsize = P->bk_nextsize;                              \
                    P->fd_nextsize->bk_nextsize = FD;                              \
                    P->bk_nextsize->fd_nextsize = FD;                              \         //largebin脱链
                  }                                                              \
              } else {                                                              \                 //若刚好p为小链条最后一个
                P->fd_nextsize->bk_nextsize = P->bk_nextsize;                      \
                P->bk_nextsize->fd_nextsize = P->fd_nextsize;                      \
              }                                                                      \
          }                                                                      \
      }                                                                              \
}

step4 清理堆块碎片

上面我们讲到了smallbin判断，但如果说我们请求的bytes既不在fastbin范围，也不在smallbin范围当中呢

/*
     如果我们是largebinsize的请求, 在我们继续之前，我们先合并一下咱们的fastbin，也就是调用mallock_consolidate函数
     虽然这个举动看起来是在还仍保留有大量空闲区块的同时来消除所有的fastbin堆块, 但是他避免了因fastbin堆块导致的碎片问题。
   */
  else
    {
      idx = largebin_index (nb);         //获取largebin 的下标
      if (have_fastchunks (av))

        malloc_consolidate (av);         //合并fastbinchunk，然后摘除他们
    }

在我们调用完malloc_consolidate之后，我们进入接下来的判断，也就是进入我们的外部大循环：

step5 大循环-unsortedbin清理

    /*
    处理最近释放或剩余的块，仅在完全匹配的情况下获取一个，
    或者，如果这是一个小请求，则chunk是最近非完全匹配的剩余块。将其他已遍历的块放在bin中。
    请注意，在任何例程中，这一步骤都是将块放置在bin中的唯一位置。
    这里需要外循环，因为我们可能直到malloc接近尾声时才意识到我们应该合并，
    所以必须这样做并重试。这种情况最多发生一次，而且只有当我们需要扩展内存来为“small”请求提供服务时才会发生
   */

  for (;; )
    {
      int iters = 0;
      while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))         
      //判断条件是unsorted bin不为空，并且此时使得victim指向unsorted 链表尾部
        {
          bck = victim->bk;//bck为倒数第二个块
          if (__builtin_expect (victim->size <= 2 * SIZE_SZ, 0) //检测一：size大小最小最大检查
              || __builtin_expect (victim->size > av->system_mem, 0))
            malloc_printerr (check_action, "malloc(): memory corruption",
                             chunk2mem (victim), av);
          size = chunksize (victim);

          /*
             如果是个小请求，如果最近的remainder剩余块是unsortedbin当中唯一的块的话，
            尝试使用他来分配  This helps promote locality for
             runs of consecutive small requests. 这是最佳适配的唯一例外
            并且适用于当这里没有最佳适配的小堆块
           */

          if (in_smallbin_range (nb) &&
              bck == unsorted_chunks (av) &&
              victim == av->last_remainder &&
              (unsigned long) (size) > (unsigned long) (nb + MINSIZE))         
              //请求字节smallbin范围+unsortedbin只有一个堆块，同样也是last_remainder+这个剩余块size大于请求size
            {
              /* split and reattach remainder 直接切割unsortedbin返回,并将剩余部分留在unsortedbin*/
              remainder_size = size - nb;
              remainder = chunk_at_offset (victim, nb);
              unsorted_chunks (av)->bk = unsorted_chunks (av)->fd = remainder;
              av->last_remainder = remainder;
              remainder->bk = remainder->fd = unsorted_chunks (av);
              if (!in_smallbin_range (remainder_size))         //如果剩余块为largebin范围
                {
                  remainder->fd_nextsize = NULL;
                  remainder->bk_nextsize = NULL;
                }

              set_head (victim, nb | PREV_INUSE |
                        (av != &main_arena ? NON_MAIN_ARENA : 0));
              set_head (remainder, remainder_size | PREV_INUSE);
              set_foot (remainder, remainder_size);

              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;                                  //这里是划分出了相应堆块，直接返回
            }

          /* 从unsorted 链表移除我们的victim */
          unsorted_chunks (av)->bk = bck;    //unsortedbin->bk指向倒数第二个chunk
          bck->fd = unsorted_chunks (av);    //倒数第二个chunk->fd指向unsortedbin

          /* Take now instead of binning if exact fit */

          if (size == nb)         //最佳适配
            {
              set_inuse_bit_at_offset (victim, size);
              if (av != &main_arena)
                victim->size |= NON_MAIN_ARENA;
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }

          /* 走到这儿说明并没有最佳匹配，因此在这里就开始归还堆块给相应的bin */

          if (in_smallbin_range (size))         //为smallbin范围
            {
              victim_index = smallbin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;
            }
          else                         //为largebin范围
            {
              victim_index = largebin_index (size);
              bck = bin_at (av, victim_index); //bck指向对应的bin
              fwd = bck->fd;                   //fwd为对应的bin的头块

              /* maintain large bins in sorted order */
              if (fwd != bck)                 //说明largebin链条不为空
                {
                  /* Or with inuse bit to speed comparisons */
                  size |= PREV_INUSE;
                  /* if smaller than smallest, bypass loop below */
                  assert ((bck->bk->size & NON_MAIN_ARENA) == 0);
                  if ((unsigned long) (size) < (unsigned long) (bck->bk->size))  //bck->bk是尾块
                  //这里我们知道largebin的链条尾部是最小堆块，所以这里如果小于最小堆块的话那么直接放入链表尾部
                    {
                      fwd = bck;     //使fwd指向bin
                      bck = bck->bk; //使bck指向尾块
                      //bin->头->...->尾块->(在这里插入)->bin
                      //               ↑                 ↑
                      //              bck               fwd
                      //如此一来最后在bck与fwd插入的效果就等价于在尾部插入了

                      victim->fd_nextsize = fwd->fd;
                      //因为fd_nextsize忽略bin的头指针,所以设置victim->fd_nextsize为头块
                      //fd_nextsize指向下一个第一个大小不同的块
                      victim->bk_nextsize = fwd->fd->bk_nextsize;
                      //此处不写victim->bk_nextsize = bck;
                      //推测原因为当有多个和bck等大的chunk挂在尾部时,victim->bk_nextsize应该指向这些chunk的第一个块
                      fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;
                      //头的bk = victim   ;   victim的bk的fd = victim
                    }
                  else                 
                  //若该堆块大于最小堆块，那么我们就从链头开始寻找，直至找到一个比他小的堆块，然后放到他前面
                    {
                      assert ((fwd->size & NON_MAIN_ARENA) == 0);
                      while ((unsigned long) size < fwd->size)
                        {
                          fwd = fwd->fd_nextsize;
                          assert ((fwd->size & NON_MAIN_ARENA) == 0);
                        }

                      if ((unsigned long) size == (unsigned long) fwd->size)
                        /* 如果是等于，总是插入小链条的第二个位置  */
                        fwd = fwd->fd;
                        //same1->(插入位置)->same2->same3
                        //  ↑                 ↑
                        // bck       ->      fwd
                      else //如果size>fwd->size
                        {
                                                                  //此时chunk的大小关系: fwd->bk > victim > fwd
                          victim->fd_nextsize = fwd;              //所以victim下一块大小不同的块就是fwd
                          victim->bk_nextsize = fwd->bk_nextsize; //victim上一块大小不同的块就是fwd->bk_nextsize
                          fwd->bk_nextsize = victim;              //更新fwd上一块大小不同的块为victim
                          victim->bk_nextsize->fd_nextsize = victim;  
                          //更新victim的上一个大小不同的块的(下一个大小不同的块)为victim本身
                        }
                      bck = fwd->bk;//无论何种情况将bck更新为fwd的上一块,为了在bck和fwd中间插入块准备
                    }
                }
              else         //如果largebin链条为空
                victim->fd_nextsize = victim->bk_nextsize = victim;
            }

          mark_bin (av, victim_index);
          //初始情况:(下划线表示成员变量,箭头表示链表指向)
          //bck_fd -> fwd
          //bck    <- fwd_bk
          victim->bk = bck;//这行执行完:
          //bck_fd -> fwd
          //   bck <- fwd_bk
          //   bck <- victim_bk
          victim->fd = fwd;//这行执行完:
          //victim_fd -> fwd
          //   bck_fd -> fwd
          //      bck <- fwd_bk
          //      bck <- victim_bk
          fwd->bk = victim;//这行执行完:
          //victim_fd -> fwd
          //   bck_fd -> fwd
          //      bck <- victim_bk <- fwd_bk
          bck->fd = victim;//这行执行完:
          //bck_fd -> victim_fd -> fwd
          //   bck <- victim_bk <- fwd_bk
          //最后的效果就是在bck与fwd之间插入victim
          //对于smallbin而言是在头部插入victim
          //对于largebin而言是在bck与fwd之间插入victim,且当大小相同时,插入到大小相同这一串子链表的第二个位置

#define MAX_ITERS       10000
          if (++iters >= MAX_ITERS)
            break;
        }

step6 大循环-largebin堆块寻找

经过unsortedbin循环之后，我们再来判断

/*
        如果是large请求, 寻找最小适配块.  Use the skip list for this.
      */

     if (!in_smallbin_range (nb))
       {
         bin = bin_at (av, idx);//获取bin指针

         /* skip scan if empty or largest chunk is too small */
         if ((victim = first (bin)) != bin &&
             (unsigned long) (victim->size) >= (unsigned long) (nb)) //判断非空且最大块大于请求nb
           {
             victim = victim->bk_nextsize;
             while (((unsigned long) (size = chunksize (victim)) <
                     (unsigned long) (nb)))         //从链表尾部开始遍历，寻找到大于或等于他的块
               victim = victim->bk_nextsize;

             /* Avoid removing the first entry for a size so that the skip
                list does not have to be rerouted.  */
             //如果victim不为最后一个块，且相同大小堆块构成的子链表当中有两个或以上的相同大小的堆块
             if (victim != last (bin) && victim->size == victim->fd->size)  
               victim = victim->fd;         //始终取第二个

             remainder_size = size - nb;         //判断是否是非最佳适配，可能会多出部分
             unlink (av, victim, bck, fwd);         //脱链

             /* Exhaust */
             if (remainder_size < MINSIZE)
               {
                 set_inuse_bit_at_offset (victim, size);
                 if (av != &main_arena)
                   victim->size |= NON_MAIN_ARENA;
               }
             /* Split */
             else         //这里是存在剩余部分，然后我们存放在unsiorted bin 当中
               {
                 remainder = chunk_at_offset (victim, nb);
                 /* We cannot assume the unsorted list is empty and therefore
                    have to perform a complete insert here.  */
                 bck = unsorted_chunks (av);
                 fwd = bck->fd;
     if (__glibc_unlikely (fwd->bk != bck))
                   {
                     errstr = "malloc(): corrupted unsorted chunks";
                     goto errout;
                   }
                 remainder->bk = bck;
                 remainder->fd = fwd;
                 bck->fd = remainder;
                 fwd->bk = remainder;//在unsortedbin头部插入remainder
                 if (!in_smallbin_range (remainder_size))
                   {
                     remainder->fd_nextsize = NULL;
                     remainder->bk_nextsize = NULL;
                   }
                 set_head (victim, nb | PREV_INUSE |
                           (av != &main_arena ? NON_MAIN_ARENA : 0));
                 set_head (remainder, remainder_size | PREV_INUSE);
                 set_foot (remainder, remainder_size);
               }
             check_malloced_chunk (av, victim, nb);
             void *p = chunk2mem (victim);
             alloc_perturb (p, bytes);
             return p;
           }
       }

*step7 大循环-位图分配*

然后接着往下走：

/*
       从下一个最大的bin开始，通过扫描bin来搜索chunk。
      此搜索严格按照最佳匹配进行；即选择适合的最小的（具有接近最近最少使用的关系）块。
       位图避免了检查大多数块是否为非空。
      在预热阶段跳过所有存储箱的特殊情况下，还没有返回块，这比看起来更快。
     */

    ++idx;
    bin = bin_at (av, idx);
    block = idx2block (idx);                 //宏，右移5个bit位
    map = av->binmap[block];
    bit = idx2bit (idx);

    for (;; )
      {
        /* Skip rest of block if there are no more set bits in this block.  */
        if (bit > map || bit == 0)
          {
            do
              {
                if (++block >= BINMAPSIZE) /* out of bins */
                  goto use_top;
              }
            while ((map = av->binmap[block]) == 0);

            bin = bin_at (av, (block << BINMAPSHIFT));
            bit = 1;
          }

        /* Advance to bin with set bit. There must be one. */
        while ((bit & map) == 0)
          {
            bin = next_bin (bin);
            bit <<= 1;
            assert (bit != 0);
          }

        /* Inspect the bin. It is likely to be non-empty */
        victim = last (bin);

        /*  If a false alarm (empty bin), clear the bit. */
        if (victim == bin)
          {
            av->binmap[block] = map &= ~bit; /* Write through */
            bin = next_bin (bin);
            bit <<= 1;
          }

        else
          {
            size = chunksize (victim);

            /*  We know the first chunk in this bin is big enough to use. */
            assert ((unsigned long) (size) >= (unsigned long) (nb));

            remainder_size = size - nb;

            /* unlink */
            unlink (av, victim, bck, fwd);

            /* Exhaust */
            if (remainder_size < MINSIZE)
              {
                set_inuse_bit_at_offset (victim, size);
                if (av != &main_arena)
                  victim->size |= NON_MAIN_ARENA;
              }

            /* 切块，然后剩余的给unsorted bin */
            else
              {
                remainder = chunk_at_offset (victim, nb);

                /* We cannot assume the unsorted list is empty and therefore
                   have to perform a complete insert here.  */
                bck = unsorted_chunks (av);
                fwd = bck->fd;
    if (__glibc_unlikely (fwd->bk != bck))
                  {
                    errstr = "malloc(): corrupted unsorted chunks 2";
                    goto errout;
                  }
                remainder->bk = bck;
                remainder->fd = fwd;
                bck->fd = remainder;
                fwd->bk = remainder;

                /* advertise as last remainder */
                if (in_smallbin_range (nb))
                  av->last_remainder = remainder;
                if (!in_smallbin_range (remainder_size))
                  {
                    remainder->fd_nextsize = NULL;
                    remainder->bk_nextsize = NULL;
                  }
                set_head (victim, nb | PREV_INUSE |
                          (av != &main_arena ? NON_MAIN_ARENA : 0));
                set_head (remainder, remainder_size | PREV_INUSE);
                set_foot (remainder, remainder_size);
              }
            check_malloced_chunk (av, victim, nb);
            void *p = chunk2mem (victim);
            alloc_perturb (p, bytes);
            return p;
          }
      }

step7 使用top chunk

use_top:
    /*
     If large enough, split off the chunk bordering the end of memory
     (held in av->top). Note that this is in accord with the best-fit
     search rule. In effect, av->top is treated as larger (and thus
     less well fitting) than any other available chunk since it can
     be extended to be as large as necessary (up to system
     limitations).
     We require that av->top always exists (i.e., has size >=
     MINSIZE) after initialization, so if it would otherwise be
     exhausted by current request, it is replenished. (The main
     reason for ensuring it exists is that we may need MINSIZE space
     to put in fenceposts in sysmalloc.)
    */

  victim = av->top;
  size = chunksize (victim);

  if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))         //如果topchunk够分配，直接切割
    {
      remainder_size = size - nb;
      remainder = chunk_at_offset (victim, nb);
      av->top = remainder;
      set_head (victim, nb | PREV_INUSE |
                (av != &main_arena ? NON_MAIN_ARENA : 0));
      set_head (remainder, remainder_size | PREV_INUSE);

      check_malloced_chunk (av, victim, nb);
      void *p = chunk2mem (victim);
      alloc_perturb (p, bytes);
      return p;
    }

  /* When we are using atomic ops to free fast chunks we can get
     here for all block sizes.  */
  else if (have_fastchunks (av))         
  //如果topchunk不够，且有fastchunk，那么进行malloc_consolidate进行合并fast，然后在接着分配
    {
      malloc_consolidate (av);
      /* restore original bin index */
      if (in_smallbin_range (nb))
        idx = smallbin_index (nb);
      else
        idx = largebin_index (nb);
    }

  /*
     Otherwise, relay to handle system-dependent cases
   */
  else         //如果上述都不满足，则调用系统分配
    {
      void *p = sysmalloc (nb, av);
      if (p != NULL)
        alloc_perturb (p, bytes);
      return p;
    }
}

总结流程图

参考链接:

https://blog.csdn.net/Plus_RE/article/details/79265805

https://peiwithhao.github.io/2023/06/17/Malloc-Free/

0x02 free 步骤

step1 free判断

首先就是我们的__libc_free

void __libc_free (void *mem)
{
  mstate ar_ptr;
  mchunkptr p;                          /* chunk corresponding to mem */

  void (*hook) (void *, const void *)
    = atomic_forced_read (__free_hook);         //__free_hook
  if (__builtin_expect (hook != NULL, 0))
    {
      (*hook)(mem, RETURN_ADDRESS (0));
      return;
    }

  if (mem == 0)                              /* free(0) has no effect */
    return;

  p = mem2chunk (mem);

  if (chunk_is_mmapped (p))                       /* release mmapped memory. */
    {
      /* see if the dynamic brk/mmap threshold needs adjusting */
      if (!mp_.no_dyn_threshold
          && p->size > mp_.mmap_threshold
          && p->size <= DEFAULT_MMAP_THRESHOLD_MAX)
        {
          mp_.mmap_threshold = chunksize (p);
          mp_.trim_threshold = 2 * mp_.mmap_threshold;
          LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
                      mp_.mmap_threshold, mp_.trim_threshold);
        }
      munmap_chunk (p);
      return;
    }

  ar_ptr = arena_for_chunk (p);
  _int_free (ar_ptr, p, 0);
}

总结

进入 __libc_free()，如果释放空指针，则不做任何操作，进入第 6 步
如果释放的是 mmap() 分配的堆空间，则进入第 3 步，否则进入第 5 步
根据具体情况修改 mmap 的分配阈值以及系统的收缩阈值
进入 munmap_chunk() 完成剩余的释放工作，返回后进入第 6 步
进入 _int_free()
1. 如果和 top_chunk 不相邻并且尺寸在 fastbin 范围内，则插入到 fastbin 中，然后进入第 h 步
2. 否则判断是否通过 mmap() 分配，如果是则进入第 c 步，否则进入第 d 步
3. 调用 munmap_chunk() 函数负责后继释放工作，返回后进入第 h 步
4. 尝试向后合并
5. 如果向前相邻 top_chunk，则合并到 top_chunk，然后进入第 g 步
6. 否则尝试向前合并后，插入到 unsorted_bin
7. 如果待释放空间的总尺寸（包含合并后）大于或等于 FASTBIN_CONSOLIDATION_THRESHOLD，则调用 malloc_consolidate() 整理 fastbin 到 unsorted_bin 中
8. 返回到 __libc_free()
完成释放

Unlink宏

只要被释放的chunk不属于fastbin,glibc就会尝试将其与物理地址相邻的chunk进行合并,先尝试向后(物理低地址),再尝试向前(物理高地址)合并
unlink为宏, 其定义如下:

/* 从bin链表数组当中取出一个chunk */
#define unlink(AV, P, BK, FD) {                                            \
    FD = P->fd;                                                                      \
    BK = P->bk;                                                                      \
    if (__builtin_expect (FD->bk != P || BK->fd != P, 0))                      \  //检查一：p->fd->bk == p && p->bk->fd == p
      malloc_printerr (check_action, "corrupted double-linked list", P, AV);  \
    else {                                                                      \
        FD->bk = BK;                                                              \         
        BK->fd = FD;                                                              \         //脱链操作
        if (!in_smallbin_range (P->size)                                      \          //不在smallbin范围，那么就是在largebin范围
            && __builtin_expect (P->fd_nextsize != NULL, 0)) {                      \
        if (__builtin_expect (P->fd_nextsize->bk_nextsize != P, 0)              \  
        || __builtin_expect (P->bk_nextsize->fd_nextsize != P, 0))    \         //检查二：p->fd_nextsize->bk_nextsize == p && p->bk_nextsize->fd_nextsize == p
          malloc_printerr (check_action,                                      \
                   "corrupted double-linked list (not small)",    \
                   P, AV);                                              \
            if (FD->fd_nextsize == NULL) {                                      \ //如果在小链条中
                if (P->fd_nextsize == P)                                      \         //如果largebin链条就一个，则往下走
                  FD->fd_nextsize = FD->bk_nextsize = FD;                      \ 
                else {                                                              \
                    FD->fd_nextsize = P->fd_nextsize;                              \ 
                    FD->bk_nextsize = P->bk_nextsize;                              \
                    P->fd_nextsize->bk_nextsize = FD;                              \
                    P->bk_nextsize->fd_nextsize = FD;                              \         //largebin脱链
                  }                                                              \
              } else {                                                              \                 //若刚好p为小链条最后一个
                P->fd_nextsize->bk_nextsize = P->bk_nextsize;                      \
                P->bk_nextsize->fd_nextsize = P->fd_nextsize;                      \
              }                                                                      \
          }                                                                      \
      }                                                                              \
}

//主要步骤
  /*  
    FD = P->fd;                                                 
    BK = P->bk;                                              
    FD->bk = BK;                                            
    BK->fd = FD; 
  */

P为需要脱链的chunk,正常情况unlink过程如下图:

以 32 位为例，假设堆内存最初的布局是下面的样子

现在有物理空间连续的两个 chunk（Q，Nextchunk），其中 Q 处于使用状态、Nextchunk 处于释放状态。那么如果我们通过某种方式（比如溢出）将 Nextchunk 的 fd 和 bk 指针修改为指定的值。则当我们 free(Q) 时

glibc 判断这个块是 small chunk
判断后向合并，发现后一个 chunk 处于使用状态，不需要后向合并
判断前向合并，发现前一个 chunk 处于空闲状态，需要合并
继而对 Nextchunk 采取 unlink 操作

那么 unlink 具体执行的效果是什么样子呢？我们可以来分析一下

//主要步骤
  /*  
    FD = P->fd;                1                                 
    BK = P->bk;                2                              
    FD->bk = BK;               3                             
    BK->fd = FD;               4
  */

观察1&3(32位):

1	FD = P->fd ; *(FD + 12) = BK = P->bk

将P->fd改为target addr -12 , P->bk改为expect value后效果如下:

FD=P->fd = target addr -12
BK=P->bk = expect value
FD->bk = BK，即 *(target addr-12+12) = BK = expect value
BK->fd = FD，即 *(expect value +8) = FD = target addr-12

看起来我们似乎可以通过 unlink 直接实现任意地址读写的目的，但是我们还是需要确保 expect value +8 地址具有可写的权限。

比如说我们将 target addr 设置为某个 got 表项，那么当程序调用对应的 libc 函数时，就会直接执行我们设置的值（expect value）处的代码。需要注意的是，expect value+8 处的值被破坏了，需要想办法绕过。

当前的 unlink

但是，现实是残酷的。。我们刚才考虑的是没有检查的情况，但是一旦加上检查，就没有这么简单了。我们看一下对 fd 和 bk 的检查

1 2	// fd bkif (__builtin_expect (FD->bk != P \|\| BK->fd != P, 0)) \ malloc_printerr (check_action, "corrupted double-linked list", P, AV); \

其要求:

FD->bk == P 即 P->fd->bk = P 即 *(*(P + 8) + 12) = P
BK->fd == P 即 P->bk->fd = P 即 *(*(P + 12) + 8) = P

那么我们上面所利用的修改 GOT 表项的方法就可能不可用了。但是我们可以通过伪造的方式绕过这个机制。

首先我们通过覆盖，将 nextchunk 的 FD 指针指向了 fakeFD，将 nextchunk 的 BK 指针指向了 fakeBK 。那么为了通过验证，我们需要

fakeFD -> bk == P 即 *(fakeFD + 12) == P
fakeBK -> fd == P 即 *(fakeBK + 8) == P

当满足上述两式时，可以进入 Unlink 的环节，进行如下操作：

fakeFD -> bk = fakeBK 即 *(fakeFD + 12) = fakeBK
fakeBK -> fd = fakeFD 即 *(fakeBK + 8) = fakeFD

如果让 fakeFD + 12 和 fakeBK + 8 指向同一个指向 P 的指针，那么unlink的操作变为：

chunk* ptr = P     //假设指向P的指针名为ptr
*(P+ 8) = fakeFD   //P偏移 8的位置为P->fd
*(P+12) = fakeBK   //P偏移12的位置为P->bk
/*
为了满足以下条件:
FD->bk == P 即 P->fd->bk = P 即 *(*(P +  8) + 12) = P
BK->fd == P 即 P->bk->fd = P 即 *(*(P + 12) +  8) = P
故将fakeFD与fakeBK赋值如下
*/
*(P+ 8) = fakeFD = ptr-12
*(P+12) = fakeBK = ptr- 8
//最终效果:
FD = P->fd; //即 FD = ptr-12
BK = P->bk; //即 BK = ptr- 8
FD->bk = BK //即 将 FD->bk == *(FD+12) == *(ptr-12+12) == P 赋值为 BK == ptr- 8 = &P- 8
//即P = &P - 8
BK->fd = FD;//即 将 BK->fd == *(BK+ 8) == *(ptr- 8+ 8) == P 赋值为 FD == ptr-12 = &P-12
//即P = &P - 12

总结一下,我们需要进行的操作:

把要unlink的chunk P 的 fd 赋值为 &P - 12
把要unlink的chunk P 的 bk 赋值为 &P - 8

所取得的效果:

使得 P=&P-12 即使得 P 指向他自己所在地址偏移-12的地方

GLIBC堆管理_前置知识

堆