Slab allocation

雖然 Linux 的相關文章發表了不少,有些看起來也很有深度(只是看起來),但其實我沒有這麼大的熱情去了解 Linux 的核心運作。通常都是我遇到一個奇怪的現象或是想了解某一個問題才會認真的去看 kernel 的程式碼。今天要介紹的東西也是如此,先說明一下發生的情境(按照慣例,全部不打人名):

同事A:我使用 OSW_MEMALLOC (相當於 malloc,只是考慮到跨平台的問題,我們團隊開發了套介面來加以包裝)以後,結果它告訴我分配出來的記憶體已經被使用了 ...

這樣的描述讓我覺得很有趣(至於造成這個現象的問題就一點都不想贅述,反正就是有人 pointer 使用上出了包),有趣的點在於,malloc 會跟系統要一塊空間,如果系統記憶體用光無法分派這我可以理解(反正就是有人寫了 memory leak 的程式),但是,怎麼有可能系統分派了空間在告訴你說這塊空間有人使用呢??

在另外一位同事(Parrot)的幫助之下,我開始看 Linux Kernel 的記憶體的一些處理方式,並找到了這個問題的答案。

在開始看之前,我們先來介紹一下 Slab Allocator 這個東西。Slab Allocator 由 Sun 的工程師所提出來的 memory allocator。基本的概念可以參考下面的文章(取自 wikipedia):

The fundamental idea behind slab allocation technique is based on the observation that some kernel data objects are frequently created and destroyed after they are not needed anymore. This implies that for each allocation of memory for these data objects, some time is spent to find the best fit for that data object. Moreover, deallocation of the memory after destruction of the object contributes to fragmentation of the memory, which burdens the kernel some more to rearrange the memory.

With slab allocation, using certain system calls by the programmer, memory chunks suitable to fit data objects of certain type or size are preallocated. The slab allocator keeps track of these chunks, known as caches, so that when a request to allocate memory for a data object of certain size is received it can instantly satisfy the request with an already allocated slot. Destruction of the object, however, does not free up the memory, but only opens a slot which is put in the list of free slots by the slab allocator. The next call to allocate memory of the same size will return the now unused memory slot. This process eliminates the need to search for suitable memory space and greatly alleviates memory fragmentation. In this context a slab is one or more contiguous pages in the memory containing pre-allocated memory chunks.

簡單來說就是一個 memory pool 的概念,一開始 slab allocator 就跟作業系統要了一塊空間,當使用者透過 malloc 來要記憶體的時候,其實 slab allocator 會從已經分派好的空間中取出還沒用過的部份回傳而不用核心要 pages。而 free 的話就是直接把不用的記憶體丟回給 pool 裡面。而為甚麼 slab allocator 的介面會比核心來的有效率呢(土黃色的部份)?首先,SLAB最上層為一個由多個 kmem_cache 組成的 cache chain。每個 kmem_cache 由 slabs_full,slabs_partial,slabs_empty這 3 個 list 組成。新的slab申請到達時,slab_partial 頁會被考慮;一個內存塊釋放時 slab_empty將被優先考慮。

好,但到這裡我還沒打算進去研究 code,理由很簡單, slab 的繼承人出現了~ SLUB。Slub 和 Slab 比起來,最大的差別就是它簡化了 structure 的部份(從網路上看來的,而且我不想再看下去 :p )。

回到最原始的問題,系統如何檢查 memory 是否有被用過呢?

在 linux-2.6.28.7/mm/slub.c 裡面,我找到了下面這一隻 function:

static int check_object(struct kmem_cache *s, struct page *page,
void *object, int active)
{
____u8 *p = object;
____u8 *endobject = object + s->objsize;

____if (s->flags & SLAB_RED_ZONE) {
________unsigned int red = active ? SLUB_RED_ACTIVE : SLUB_RED_INACTIVE;

________if (!check_bytes_and_report(s, page, object, "Redzone", endobject, red, s->inuse - s->objsize))
________return 0;
____} else {
________if ((s->flags & SLAB_POISON) && s->objsize <>inuse) {
____________check_bytes_and_report(s, page, p, "Alignment padding", endobject, POISON_INUSE, s->inuse - s->objsize);
________}
____}

____if (s->flags & SLAB_POISON) {
________if (!active && (s->flags & __OBJECT_POISON) &&
____________(!check_bytes_and_report(s, page, p, "Poison", p,
________________POISON_FREE, s->objsize - 1) ||
____________!check_bytes_and_report(s, page, p, "Poison",
____________p + s->objsize - 1, POISON_END, 1)))
____________return 0;
________/*
________ * check_pad_bytes cleans up on its own.
________ */
________check_pad_bytes(s, page, p);
____}

____if (!s->offset && active)
________/*
________ * Object and freepointer overlap. Cannot check
________ * freepointer while object is allocated.
________*/
________return 1;

____/* Check free pointer validity */
____if (!check_valid_pointer(s, page, get_freepointer(s, p))) {
________object_err(s, page, p, "Freepointer corrupt");
________/*
________* No choice but to zap it and thus loose the remainder
________ * of the free objects in this slab. May cause
________ * another error because the object count is now wrong.
________ */
________set_freepointer(s, p, NULL);
________return 0;
____}
____return 1;
}

看到了紅色所標明來的關鍵字,大概可以猜出一個運作邏輯:當使用者呼叫 allocate 的時候,slub 會將分配出來的記憶體填成 POISON_INUSE,free 的時候則填成 POISON_FREE。

以程式來看還挺像的,在 new_slab 的時候有將 memory 設成 POISON_INUSE;而在 init_object 的時候則設成了 POISON_FREE (最後一個為 POISON_END)

BTW,

#define POISON_INUSE 0x5a
#define POISON_FREE 0x6b
#define POISON_END 0xa5

留言

這個網誌中的熱門文章

如何將Linux打造成OpenFlow Switch:Openvswitch

我弟家的新居感恩禮拜分享:善頌善禱

Linux Virtual Interface: TUN/TAP