.NET Memory Management 1: Stack, Heap and Strings
Hey there. This week I decided to focus on memory management after a recent
interview. I had surface-level knowledge on the topic and wanted to go deeper.
What is the stack, what is the heap, why does string live on the heap, what
happens behind the scenes when CLR creates an object, how does Span<T> work
without allocating memory, when does GC kick in, who benefits from
IDisposable - I'll cover all of it in order.
But first, let's start from the basics. What exactly are "stack and heap"?
Finding Space for Data
When you define a variable:
int x = 5;
string name = "Erdinc";
User user = new User { Name = "Erdinc", Age = 30 };
Where do x, name, and user live? In your computer's RAM - okay.
But where in RAM? In some corner, the middle? Answer: it depends on the type.
The .NET runtime uses two main regions for memory management:
| Region | Description |
|---|---|
| Stack | A LIFO (Last-In-First-Out) structure where method calls, local variables, and parameters live. Small, fast, automatically cleaned. |
| Heap | The region where dynamically allocated, long-lived objects live. Large, managed by the GC. |
Understanding the difference between the two is the first step in grasping the performance of the code you write.
Stack: LIFO Structure
The best analogy for understanding how the stack works is a stack of plates. When you want to add a plate, you put it on top. When you want to take one, you take the top one. You can't pull one from the middle. The last plate you put on is the first one taken. That's LIFO.
[Plate 3] ← Last placed, first taken (Last-In)
[Plate 2]
[Plate 1] ← First placed, last taken (First-Out)
This is exactly what happens on the stack. Methods run stacked on top of each other, and when a method finishes, its variables are cleaned up immediately:
/*
* When Calculate finishes, all its variables (a, b, result) are
* completely removed from memory. But what exactly is this "removal"?
*
* At the CPU level, the stack is managed by the RSP (Stack Pointer) register.
* When entering a method, RSP is pulled down (sub rsp, N), allocating
* N bytes of space. When the method returns, RSP is pushed up
* (add rsp, N), "freeing" that space.
*
* This is called "Stack Pointer Adjustment" or simply "stack unwinding."
*
* Key point: the data in the freed region is NOT physically DELETED.
* It is just marked as no longer "in use." The operating system can
* overwrite the same memory addresses with new values on the next
* method call. This is why the compiler gives an error if you try to
* read an uninitialized local variable in C# - the value you read
* could be leftover garbage from the previous method.
*/
void Calculate()
{
int a = 10; // a = 10 written to stack
int b = 20; // b = 20 written to stack
int result = Add(a, b); // Add method called
} // When Calculate returns, RSP is pushed up, frame is freed
int Add(int x, int y)
{
int sum = x + y; // New frame for x, y, sum opened on stack
return sum; // When Add returns, its own frame is freed
}
The stack grows as methods are called and shrinks as methods return. Each method creates its own stack frame. The frame contains the method's parameters, local variables, and return address. Visually:
flowchart TB
subgraph T1["1. Calculate called"]
direction LR
A1["[Empty]"]
end
subgraph T2["2. int a = 10"]
direction LR
A2["a = 10"]
end
subgraph T3["3. int b = 20"]
direction LR
A3["b = 20"]
A3b["a = 10"]
end
subgraph T4["4. Add(10, 20) called"]
direction LR
A4["sum = 30"]
A4b["y = 20"]
A4c["x = 10"]
A4d["--- frame boundary ---"]
A4e["b = 20"]
A4f["a = 10"]
end
subgraph T5["5. Add returned (frame removed)"]
direction LR
A5["result = 30"]
A5b["b = 20"]
A5c["a = 10"]
end
subgraph T6["6. Calculate returned (all removed)"]
direction LR
A6["[Empty]"]
end
T1 --> T2 --> T3 --> T4 --> T5 --> T6
Notice: when Add returns, x, y, sum - along with the frame - were
immediately removed. We didn't wait for the GC to come. This is called
deterministic cleanup.
Garbage Values and Memory Safety
We said earlier: the data in the freed stack region is not physically deleted. So what happens if we try to access this "unswept" data?
The C++ Side: Here's an Address, Go Wherever You Want
C++ puts up no barriers. If you declare a variable on the stack, don't initialize it, and try to read it:
void Foo()
{
int previousData = 42; // left a value on the stack
}
void Bar()
{
int x; // not initialized!
std::cout << x; // prints 42 or some other random number
}
When Bar is called, its stack frame might land exactly on the region
Foo left behind. x's value will be 42 - but this is pure coincidence.
In another run you'll get a completely different number. This is called
undefined behavior. The compiler won't error, you won't crash at runtime;
you'll just get wrong results without ever knowing it.
Even worse, C++ pointer arithmetic lets you step completely outside the stack and access completely different data within the same process:
int arr[3] = {1, 2, 3};
int* p = arr;
p += 100; // 100 elements past the array - undefined region
std::cout << *p; // reads whatever is there
On modern operating systems, thanks to virtual memory, you can't access another program's memory - the MMU won't allow it, you'd get a segfault. But within your own process, you can go to any address and read it.
The C# Side: The Compiler Blocks You
C# is the polar opposite of C++ in this regard. C# memory operations are safe. Let's try the same scenario in C#:
void Bar()
{
int x;
Console.WriteLine(x); // COMPILE ERROR: Use of unassigned local variable 'x'
}
The C# compiler performs definite assignment analysis. It requires that a local variable must always have a value assigned before it is read. If it isn't assigned, you get an error at compile time - it never even reaches runtime.
The reason is exactly what you said: reading an uninitialized variable means you get whatever happens to be in memory at that moment. There's a random value there, and that value can break your program's logic.
Consider a hospital software system:
// This code won't compile in C# - but let's assume it did
int insulinDose; // not assigned!
AdministerInsulin(patient, insulinDose); // applies a random dose
If insulinDose reads the previous leftover value in memory (say, 999), the
patient could die. C# puts the "assign before use" rule precisely to prevent
this scenario. C++ offers no such protection - the error silently continues
running.
You can use pointers in C#'s unsafe context, but that's a deliberate choice
and requires the /unsafe compiler flag. In normal C# code, you cannot
accidentally read an uninitialized value.
Summary: Why Does C# Block It?
| Situation | C++ | C# |
|---|---|---|
| Reading uninitialized local variable | Compiles, returns random value (undefined behavior) | Compile error (CS0165) |
| Going outside the stack with a pointer | Compiles, segfault or random data | Can't do it without unsafe |
| Array bounds overflow | Compiles, undefined behavior | Throws IndexOutOfRangeException |
| Using a null reference | Compiles, segfault | Throws NullReferenceException |
The C# runtime performs extra checks to protect you. These checks have a performance cost (like array bounds checking), but it's worth it for safety and debugging convenience.
Stack properties:
- Size: 1-8 MB per thread. Won't fill up unless you use
stackalloc. - Speed: Allocation and deallocation is a single CPU instruction (stack pointer movement). Cache-friendly.
- Lifetime: Limited to method scope. When the method ends, the variable dies.
- What lives here: Value types like
int,bool,double,struct,enum. Also the references (addresses) of reference types.
The most critical rule of the stack: nothing whose size is unknown at compile time can live on the stack. The stack frame size in a method call must be calculated in advance.
That's why:
byte[] buffer = new byte[4096]; // buffer reference (8 bytes) on stack, 4096 bytes on heap
The buffer variable (an 8-byte address) is on the stack. But the 4096 bytes
it points to are on the heap. Because the lifetime of those 4096 bytes might
not end when the method finishes - someone else might still be using them.
The Real Structure of an Array on the Heap
So how exactly does byte[4096] sit on the heap? And when you write
buffer[150], how does CLR find the 150th byte? How does it know the
boundary? Let's break it down.
The structure of an array object on the heap:
flowchart LR
subgraph Stack["Stack"]
U["buffer = 0x00E100 (starting address of byte[] object on heap)"]
end
subgraph Heap["Heap"]
direction TB
subgraph Arr["byte[] object (address: 0x00E100)"]
direction LR
A1["MethodTable* (0x00A000) (8 bytes, address of Byte[] MethodTable)"]
A2["Length = 4096 (4 bytes)"]
A3["padding (4 bytes)"]
A4["[0] = 0 (1 byte)"]
A5["[1] = 0 (1 byte)"]
A6["... (4094 bytes)"]
A7["[4095] = 0 (1 byte)"]
end
subgraph MT["Byte[] MethodTable (0x00A000)"]
direction LR
M1["BaseSize: 16"]
M2["ComponentSize: 1"]
M3["ElementType: System.Byte"]
M4["Rank: 1"]
M5["IsArray: true"]
end
end
U -.->|"holds the heap address"| Arr
A1 -.->|"holds the MethodTable address"| MT
MethodTable* (8 bytes): This is the heap address of the MethodTable
belonging to the Byte[] type. It's a memory address like 0x00A000.
CLR reads this address to learn what type the object is, the size of its
elements, its boundaries, and its behavior (methods). Each type has one
MethodTable, and all objects of that type share the same MethodTable.
Notice: the buffer variable on the stack holds the value 0x00E100.
This value is the starting address of the byte[] object on the heap.
In other words, buffer is the "address card" for that 4096-byte region on
the heap. When you hold this reference, CLR can do the following:
1. Learn the type: buffer.GetType() → first goes to the address
0x00E100 that buffer holds, reads the 8-byte MethodTable* value (0x00A000)
there. 0x00A000 is the address of the Byte[] MethodTable. From that
MethodTable, it retrieves IsArray = true, ElementType = System.Byte,
Rank = 1.
2. Learn the length: buffer.Length → reads the 4-byte Length value
at offset 8 from 0x00E100. It's 4096.
3. Access an element: When you write buffer[150], CLR takes three
controlled steps:
1. Bounds check: if (150 < 0 || 150 >= 4096) → throw IndexOutOfRangeException
2. Address calc: target = 0x00E100 + 16 (header) + (150 × 1) (componentSize)
= 0x00E100 + 16 + 150
= 0x00E196
3. Read/Write: *(byte*)0x00E196
The formula breakdown:
| Part | Value | Source |
|---|---|---|
| Base address | 0x00E100 | buffer reference on the stack |
| Header size | 16 bytes (8 MT + 4 Length + 4 padding) | BaseSize in Array MethodTable |
| ComponentSize | 1 (for byte) | ComponentSize field in Array MethodTable |
| Length | 4096 | At offset 8 in the object |
| index | 150 | Your code |
MethodTable has two more special fields for arrays:
ComponentSize: The size of each element in bytes.1forbyte[],4forint[], the struct size for a struct array.BaseSize: The size of the array's header (MT + Length + padding). Actual object size =BaseSize + (Length × ComponentSize).
CLR extracts everything it needs for buffer[150] from the MethodTable +
Length pair. It doesn't need a separate "start/length" struct because arrays
always start at index 0. start is always 0.
Start/Length in Span<T>
What you described as "a struct type with start and length values" is actually
Span<T>. Span is a view struct that works on top of an array:
byte[] fullBuffer = new byte[4096];
Span<byte> slice = fullBuffer.AsSpan(100, 50); // start at byte 100, take 50 bytes
Span itself is a struct (value type) living on the stack and contains:
flowchart LR
subgraph Stack["Stack"]
direction TB
subgraph Span["Span<byte> slice (stack)"]
S1["_reference (8 bytes, holds heap address of element 100)"]
S2["_length = 50 (4 bytes)"]
end
end
subgraph Heap["Heap"]
direction TB
subgraph Arr["byte[] fullBuffer (0x00E100)"]
direction LR
MT["MethodTable* (8)"]
LEN["Length = 4096 (4)"]
PAD["padding (4)"]
EL0["[0]...[99] (100 bytes, header + 0..99)"]
EL100["[100] = slice start"]
EL149["[149] = slice end"]
EL150["[150]...[4095]"]
end
end
S1 -.->|"skips the header, points directly to element 100"| EL100
Span's _reference field does not point to the start of the array, but to the
start of the slice (with the offset already added). When indexing on a
Span, CLR does:
1. Slice bounds: if (index < 0 || index >= 50) → error (Span's _length)
2. Actual address: target = _reference + (index × 1)
As you can see, Span carries the "start and length" information. But
byte[] itself does not - an array always starts at 0, and its length is
embedded in the object. The start/length pair comes into play when a view is
needed.
Where Does the Element Type Come From?
When reading buffer[150], CLR knows the 150th byte is a byte from the
ComponentSize value. The Byte[] MethodTable has ComponentSize = 1.
For Int32[] it's 4, for Double[] it's 8.
What if the element is a reference type? Consider string[]:
string[] names = new string[3];
This array's structure on the heap:
[MethodTable* → String[]] (8 bytes)
[Length = 3] (4 bytes)
[padding] (4 bytes)
[names[0] = null] (8 bytes - string reference)
[names[1] = null] (8 bytes - string reference)
[names[2] = null] (8 bytes - string reference)
Total: 16 + (3 × 8) = 40 bytes
Here, ComponentSize = 8 (reference size on 64-bit). Each element is a
reference; the string itself is elsewhere on the heap. The array only holds
addresses. When reading names[1]:
1. Address: 0x00E100 + 16 + (1 × 8) = 0x00E118
2. Read: *(string*)0x00E118 → 0x00F200 (address of the string)
3. Go to string: read the string object at 0x00F200
GC also tracks these references. As long as the string[] array itself is
alive, the strings pointed to by the array's references also stay alive (even
if nothing else references them).
Heap and MethodTable
Think of the heap as a large plot of land. When you need space, you allocate a parcel (allocation); when you're done, you clear it (GC collects). Unlike the stack, there's no order here - each parcel is independent.
But the real question is what CLR does behind the scenes when an object is
created on the heap. Let's step through what happens the moment you write
new User().
What Happens When CLR Creates an Object?
var user = new User { Name = "Erdinc", Age = 30 };
When this line executes, CLR follows these steps:
-
Look up type info. Does CLR know the
Usertype? Where is theMethodTablefor theUserclass? (If it's the first use, the type is loaded - type load.) -
Allocate memory. It looks at the
BaseSizefield in theMethodTable. This field tells how many bytes an object of typeUserwill occupy on the heap. That many bytes are allocated from the GC heap. -
Write the MethodTable pointer. The address of the
MethodTableis written to the first 8 bytes (64-bit) or 4 bytes (32-bit) of the allocated memory. This pointer (TypeHandle) is a critical reference that tells the runtime which type the object belongs to. -
Zero out fields. The remaining space is zeroed.
int→ 0,string→ null,bool→ false. -
Call constructor. The
Userconstructor runs, writing the actual values to the fields. -
Return the reference. The starting address of this heap-allocated object is assigned to the
var uservariable. This address is stored on the stack.
Visually:
flowchart LR
subgraph Stack["Stack"]
direction TB
U["user = 0x001A3F (8 bytes)"]
end
subgraph Heap["Heap"]
direction TB
subgraph Obj["User object (address: 0x001A3F)"]
direction LR
MT["MethodTable* (8 bytes)"]
F1["Age = 30 (4 bytes)"]
PAD["padding (4 bytes)"]
F2["Name = 0x00B210 (8 bytes)"]
end
subgraph MTBlock["User MethodTable"]
direction LR
MTInfo["BaseSize: 24
EEClass*: 0xF0A100
Parent MethodTable*: System.Object
Interface count: 0
Method slots: ..."]
end
subgraph Str["String 'Erdinc' (0x00B210)"]
direction LR
SM["MethodTable* (System.String)"]
SL["Length = 6"]
SC["chars: E r d i n c"]
end
end
U -.->|"points to"| Obj
MT -.->|"type info"| MTBlock
F2 -.->|"Name points to"| Str
What is MethodTable and What Does It Do?
Every type has a MethodTable. CLR creates this table when the type is
first loaded, and all objects of that type share the same MethodTable. Even
if you create a million User objects, there is only one User
MethodTable.
MethodTable contents (simplified):
| Field | Description |
|---|---|
| BaseSize | How many bytes an object of this type occupies on the heap. The first field GC uses for allocation. |
| EEClass pointer | Points to the EEClass structure. Field offsets, interface list, property metadata live here. |
| Parent MethodTable | The inheritance chain. If User, its parent is System.Object's MethodTable. |
| Interface count and Interface map | Which interfaces it implements. Used for casts and is checks. |
| Method slots | Addresses of virtual methods. Like ToString(), GetHashCode(). |
So when you call user.GetType(), CLR's only job is to read the MethodTable
pointer from the first 8 bytes of the object and return the corresponding
Type object to you.
Namespace and MethodTable Relationship
MethodTables are created per type, not per namespace. A namespace is a logical grouping used at compile time. At runtime, there is no such thing as a "namespace MethodTable."
A type's full runtime identity consists of three parts:
Assembly + Namespace + TypeName
For example:
namespace MyApp.Models
{
public class User { ... } // Identity: MyApp.dll + MyApp.Models + User
}
namespace MyApp.DTOs
{
public class User { ... } // Identity: MyApp.dll + MyApp.DTOs + User
}
These two User classes have the same name but different namespaces.
Therefore, CLR creates two separate MethodTables for them. Each has its
own BaseSize, field offsets, and method slots. Models.User and
DTOs.User are completely different types that just happen to share a name.
Namespace information is not stored as a separate field inside the
MethodTable; it's held as part of the type's name. When you call
Type.FullName, you get "MyApp.Models.User"; this string comes from the
type's metadata. CLR uses this full name when loading a type:
1. Does the assembly contain a type called "MyApp.Models.User"?
2. If yes, load its MethodTable (or use it if already loaded)
3. If not, throw TypeLoadException
So, the namespace is necessary for type resolution, but the MethodTable itself is type-based, not namespace-based. 50 classes in the same namespace = 50 separate MethodTables.
Accessing Properties: The Offset Mechanism
When you write user.Name, how does CLR know which bytes in the heap object
belong to the Name property? Answer: field offset.
The EEClass holds an offset value for each field:
User Class EEClass (simplified):
Field: Age → offset: 8 (right after the MethodTable pointer)
Field: Name → offset: 16 (Age 4 bytes + padding 4 bytes = 8 bytes later)
Total size → 24 bytes (8 MT + 4 Age + 4 pad + 8 Name)
Why is there padding? The CPU requires alignment for faster memory access.
On a 64-bit system, 8-byte references must be at addresses that are multiples
of 8. That's why 4 bytes of padding are inserted between Age (4 bytes) and
Name (8 bytes).
CLR uses these offset values to translate the user.Name call into:
// user.Name is actually this:
string name = *(string*)((byte*)user + 16); // read the reference at byte 16 of the object
The advantage of offset access: There's no need to maintain a separate
property address map for each object. All User objects use the same offsets.
Only the base address changes: object_address + Age_offset = Age's address.
Note: Since Age is a value type, its value is inside the heap object.
But since Name is a reference type, only its address is inside the heap
object. The actual string data is at a different heap address.
Offset in Inheritance
class Person
{
public int Id { get; set; }
}
class User : Person
{
public string Name { get; set; }
public int Age { get; set; }
}
The User object on the heap:
[MethodTable* → User] (8 bytes, at the very start)
[Id: int] (offset 8, field inherited from Person)
[Name: string ref] (offset 16, 4 bytes Id + 4 padding = 8)
[Age: int] (offset 24)
Total: 32 bytes
The critical point here: parent class fields always come before child
class fields. This way, even when you do Person person = user;, person.Id
is read from the same offset. Because the Person MethodTable also knows
Id is at byte 8 - and at byte 8 of the user object, Id really is there.
Value Type vs Reference Type: Summary Table
| Property | Value Type (struct, int, bool) | Reference Type (class, string) |
|---|---|---|
| Where it lives | Where its owner lives (stack or inside heap) | Always on the heap |
| Assignment behavior | Value is copied | Reference (address) is copied |
| Can be null? | No (yes with int? nullable wrapper) | Yes |
| Default value | Zeroed state (0, false, default) | null |
| Inheritance | From System.ValueType, sealed | From Object |
| GC impact | Goes away automatically when owner is collected | Requires GC collection |
| How it sits in an object | Value is inside the object | Address is inside the object |
The String Matter
String has a special place in this story. It's the type that confuses people the most.
string a = "hello";
string b = a;
b = "world";
// a is still "hello", right? Yes.
String is a reference type - it lives on the heap. But it's immutable.
That's why the b = "world" assignment above doesn't affect a - it doesn't
change the content of the string b points to; it assigns the address of a
new string to b.
In fact, the heap structure of a string also starts with a MethodTable:
String object (heap):
[MethodTable* → System.String] (8 bytes)
[Length: int = 5] (4 bytes)
[Padding] (4 bytes)
[First Char: 'h'] (2 bytes)
[Second Char: 'e'] (2 bytes)
...
Length and characters are inside the object, accessible at fixed offsets.
But you cannot change the content because CLR doesn't know the string is
writable - the object is marked as readonly.
Why Immutable?
- Thread safety. Multiple threads can read the same string without fear.
- Interning. Strings with the same content are stored once in memory.
- Security. Just because you passed a string as a parameter doesn't mean the method can modify it.
But this immutability comes at a cost:
string result = "";
for (int i = 0; i < 10000; i++)
{
result += "x"; // New string object every loop. 10,000 allocations.
}
This code creates 10,000 new string objects, polluting the heap. The right
way:
var sb = new StringBuilder();
for (int i = 0; i < 10000; i++)
{
sb.Append("x");
}
string result = sb.ToString(); // Single allocation.
Even though string is a reference type, it behaves like a value type. The ==
operator compares content (not address). Copying is safe because it's
immutable. But in memory, it's always on the heap - it can't live on the
stack because its size is unknown at compile time and its lifetime can exceed
method scope.
String Interning
CLR keeps frequently used strings in a pool called the intern pool:
string a = "dotnet";
string b = "dotnet";
Console.WriteLine(Object.ReferenceEquals(a, b)); // True - same object.
Strings that are constant at compile time are automatically interned. You can also do it manually at runtime:
string c = new string(new char[] { 'd', 'o', 't', 'n', 'e', 't' });
string d = string.Intern(c);
So when a method returns a string at runtime, does interning happen automatically? Two scenarios:
Scenario A: Method returns a compile-time literal - interned.
string GetHelloWorld()
{
return "Merhaba dunya"; // compile-time literal
}
string a = "Merhaba dunya";
string b = GetHelloWorld();
Object.ReferenceEquals(a, b); // True - both are the same intern object
The "Merhaba dunya" literal in GetHelloWorld()'s body is embedded in the
assembly metadata. When CLR loads the assembly, it also adds it to the intern
pool. When the method is called, it returns the same object from the pool. If
two literals are concatenated with + and the compiler can optimize them into
a single literal at build time, the result is still interned.
Scenario B: Method creates the string at runtime - not interned.
string GetHelloWorld()
{
var merhaba = "Merhaba ";
var dunya = "dunya";
return merhaba + dunya; // runtime concat, new object
}
string a = "Merhaba dunya";
string b = GetHelloWorld();
Object.ReferenceEquals(a, b); // False - b was created at runtime, not in the intern pool
string c = string.Intern(b); // Manual intern
Object.ReferenceEquals(a, c); // True - now it's the same pool object
Concatenation with + using variables, StringBuilder.ToString(),
Substring(), ToUpper(), reading from a file, JSON deserialization - all
of these create new heap objects at runtime. Even if the content is the same,
it doesn't automatically enter the intern pool. You need to call
string.Intern().
In short, the issue is not the string's content, but how it was created. We'll look into this more later.
Interning prevents the same string from occupying heap space multiple times. But excessive use can bloat GC's Gen 2 - because interned strings are never collected by the GC (they live until the AppDomain shuts down).
Summary: Stack, Heap, MethodTable Relationship
The complete journey of a new User() call from start to finish:
1. CLR looks at the User MethodTable → BaseSize: 24 bytes
2. Allocates 24 bytes from the GC heap
3. Writes the User MethodTable address to the first 8 bytes of allocated space
4. Zeroes the remaining space, calls the constructor
5. Assigns the heap address to the user variable on the stack
6. user.Name call: reads the string address at offset 16 from the user address
When you understand this mechanism, you start understanding most performance
problems too. Every new is an allocation. GC has an allocation budget
in its Gen 0 region; creating a new object consumes this budget. When the
budget hits zero, GC is triggered and collects unused objects. So every
allocation brings you one step closer to the next GC cycle. Every unnecessary
allocation is a preventable cost.
What's Next
We've covered what stack and heap are, what CLR does when creating objects, the role of MethodTable, and why string is special. But the real question is:
"How do we avoid heap allocations when working with strings?"
The answer: Span<T>, Memory<T>, stackalloc, and the IDisposable
pattern. In the next part of this series, I'll cover these with code examples.
We'll also look at GC's generation mechanism, why finalizers are dangerous,
and how to catch memory leaks.