Python 内存管理深度解读：让内存泄漏远离你

大家好，我是程序员晚枫。

Python 程序跑着跑着，内存爆了？

99% 是你没搞懂 Python 内存管理。

今天彻底讲透。

一、Python 内存管理 3 大机制

机制 1：引用计数

每个对象都有一个"引用计数器"：

import sys

a = []  # 引用计数 1
b = a   # 引用计数 2
print(sys.getrefcount(a))  # 3（getrefcount 也算一次）

del b   # 引用计数 2
print(sys.getrefcount(a))  # 2

del a   # 引用计数 0
# 对象被立即回收

引用计数 = 0 时，对象立即被回收。

机制 2：垃圾回收（GC）

引用计数无法解决循环引用：

a = []
b = []
a.append(b)  # b 的引用计数 2
b.append(a)  # a 的引用计数 2
del a
del b
# 引用计数不为 0，但对象已经不可达

Python 用 GC 检测循环引用，回收内存。

import gc

# 手动触发 GC
gc.collect()

# 自动 GC
gc.enable()  # 默认开启
gc.disable()  # 关闭

机制 3：内存池

Python 用内存池管理小块内存：

减少 malloc/free 调用
提高分配效率
内存池分大小（small object、large object）

二、5 大内存查看方法

方法 1：sys.getrefcount

查看对象引用计数：

import sys

x = [1, 2, 3]
print(sys.getrefcount(x))  # 2

方法 2：sys.getsizeof

查看对象大小：

import sys

print(sys.getsizeof([1, 2, 3]))  # 88 字节
print(sys.getsizeof('hello'))    # 54 字节
print(sys.getsizeof(123))        # 28 字节

方法 3：id()

查看对象内存地址：

1 2	x = [1, 2, 3] print(id(x)) # 0x7f8b8c0b1d80

方法 4：gc 模块

查看垃圾回收状态：

import gc

# 获取 GC 统计
print(gc.get_stats())

# 调试 GC
gc.set_debug(gc.DEBUG_LEAK)

方法 5：tracemalloc

追踪内存分配：

import tracemalloc

tracemalloc.start()

# 你的代码
x = [1] * 1000000

# 查看内存
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.2f} MB")
print(f"Peak: {peak / 1024 / 1024:.2f} MB")

tracemalloc.stop()

三、5 大内存优化技巧

技巧 1：用生成器代替列表

列表占内存，生成器不占：

# 慢：占用 8MB
squares = [i ** 2 for i in range(1_000_000)]

# 快：占用 200B
squares = (i ** 2 for i in range(1_000_000))

技巧 2：用 `slots` 减少内存

默认类的 __dict__ 占内存：

# 普通类
class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age

# 优化：__slots__
class User:
    __slots__ = ['name', 'age']
    def __init__(self, name, age):
        self.name = name
        self.age = age

__slots__ 内存减少 40-50%。

技巧 3：用 array 代替 list

list 存的是 PyObject 指针：

# list：每个元素是 PyObject 指针
import sys
lst = [1, 2, 3, 4, 5]
print(sys.getsizeof(lst))  # 104 字节（5 个指针）

# array：紧凑存储
import array
arr = array.array('i', [1, 2, 3, 4, 5])
print(sys.getsizeof(arr))  # 44 字节

技巧 4：用 bytes 代替 str

str 是 Unicode，bytes 是字节：

1 2	s = 'hello' # Unicode b = b'hello' # bytes

bytes 内存更小。

技巧 5：及时释放大对象

# 大对象
big_data = load_huge_file()  # 100MB
process(big_data)
del big_data  # 立即释放
gc.collect()  # 强制 GC

四、5 大内存泄漏场景

场景 1：循环引用

# 泄漏
class Node:
    def __init__(self):
        self.children = []

parent = Node()
child = Node()
parent.children.append(child)
child.parent = parent  # 循环引用

解决：用 weakref 弱引用：

import weakref

class Node:
    def __init__(self):
        self.children = []
        self._parent = None
    
    @property
    def parent(self):
        return self._parent() if self._parent else None
    
    @parent.setter
    def parent(self, value):
        self._parent = weakref.ref(value) if value else None

场景 2：全局变量

# 泄漏
cache = {}  # 全局变量，永远不释放

def add_to_cache(key, value):
    cache[key] = value

解决：用 weakref 或 LRU cache：

from functools import lru_cache

@lru_cache(maxsize=128)
def expensive_func(n):
    return n ** 2

场景 3：未关闭的文件/连接

# 泄漏
f = open('big.txt')
data = f.read()
# 没关闭！

解决：用 with：

1 2	with open('big.txt') as f: data = f.read()

场景 4：未释放的线程

# 泄漏
def worker():
    while True:
        # 永久运行
        pass

t = threading.Thread(target=worker)
t.start()
# 线程永远不结束

解决：用 daemon 线程或加退出条件。

场景 5：循环导入

# a.py
from b import func_b
def func_a(): func_b()

# b.py
from a import func_a
def func_b(): func_a()

解决：重构代码，避免循环导入。

五、5 大内存调试工具

工具 1：tracemalloc（内置）

追踪内存分配：

import tracemalloc
tracemalloc.start()
# 你的代码
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:10]:
    print(stat)

工具 2：memory_profiler

函数级内存分析：

1	pip install memory_profiler

from memory_profiler import profile

@profile
def my_func():
    a = [1] * 1000000
    return a

1	python -m memory_profiler script.py

工具 3：objgraph

对象引用关系：

1	pip install objgraph

1
2
3

import objgraph
objgraph.show_most_common_types()  # 显示最多对象类型
objgraph.show_backrefs(leak_object)  # 显示引用关系

工具 4：guppy3

堆分析：

1	pip install guppy3

1
2
3

from guppy import hpy
h = hpy()
print(h.heap())

工具 5：Py-Spy

生产环境 profiling：

1	pip install py-spy

1 2	py-spy dump --pid 12345 py-spy top --pid 12345

六、5 大常见误区

误区 1：Python 自动管理内存，我不用管

❌ 错
✅ 引用计数有局限
必须主动管理

误区 2：del 就释放内存

⚠️ 部分对
✅ 引用计数 -1
不是立即释放

误区 3：gc.collect() 总是好

⚠️ 部分对
✅ 主动 GC 有开销
看场景

误区 4：内存泄漏只发生在 C

❌ 错
✅ Python 也会泄漏
循环引用、缓存等

误区 5：用 PyPy 解决内存

⚠️ 部分对
✅ PyPy 内存不一样
不一定好

七、5 大内存管理最佳实践

实践 1：用 with

1
2
3

with open('file.txt') as f:
    data = f.read()
# 自动关闭

实践 2：用 `slots`

1 2	class User: __slots__ = ['name', 'age']

实践 3：用生成器

1
2
3

def process_big_data():
    for line in big_file:
        yield process(line)

实践 4：用 weakref

1 2	import weakref cache = weakref.WeakValueDictionary()

实践 5：监控内存

1
2
3

import psutil
process = psutil.Process()
print(process.memory_info().rss)  # 内存使用

八、5 大真实内存优化案例

案例 1：Instagram

优化前：内存泄漏
优化：用 weakref、生成器
结果：稳定运行

案例 2：Dropbox

优化前：每个文件缓存占内存
优化：LRU cache + 磁盘缓存
结果：内存降 80%

案例 3：OpenAI

优化前：模型加载占内存
优化：延迟加载、共享权重
结果：内存降 60%

案例 4：YouTube

优化前：视频转码占内存
优化：流式处理
结果：内存降 90%

案例 5：Uber

优化前：地图数据占内存
优化：分片加载
结果：内存降 70%

九、给 Python 内存管理学习者的 4 个建议

建议 1：先学引用计数

1 周理解
基础

建议 2：再学 GC

1 周上手
核心

建议 3：实战排查

用 tracemalloc
找泄漏

建议 4：养成好习惯

with、__slots__、生成器
预防为主

十、最后的最后

Python 内存管理，3 句话总结：

引用计数 + GC + 内存池：3 大机制
循环引用是最大隐患：用 weakref
tracemalloc 是必备工具：定位泄漏

学 Python 6 年，我学到的最重要的事：

"内存管理是 Python 工程师的'内功'。"

会内存管理，你的 Python 应用不会爆。

tracemalloc + __slots__ + 生成器 = 内存优化的"金三角"。

学内存管理，5 年后你就是 Python 专家。

🎓 AI 编程实战课程

程序员晚枫专注AI编程培训，通过 《50讲 · AI编程训练营》，让小白也能用AI做出实际项目。帮你从零上手！

👉 免费试看：网盘链接，免费试看前3讲，先看看适不适合自己
👉 - 👉 课程报名：点击这里报名，现在报名还送书📖