V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Summary

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

核心: 用 JEPA 范式在 1M+ 小时互联网视频上自监督预训练 1B 参数 video encoder，再用 62 小时无标签机器人视频训练 action-conditioned predictor，得到一个能同时做视频理解、动作预测、和 zero-shot robot planning 的 world model

方法: Mask-denoising in representation space (V-JEPA 2 pretraining) + 冻结 encoder + 在 representation space 上学 autoregressive action-conditioned predictor (V-JEPA 2-AC) + cross-entropy method 做 receding-horizon planning

结果: SSv2 77.3 top-1 / EK100 39.7 R@5 (相对 PlausiVL +44%) / Franka 上 zero-shot pick-and-place avg 65–80% / latent planning 16s vs Cosmos 4min per action

Sources: paper | website | github

Rating: 3 - Foundation（在一个 encoder 上完整打通 understanding / prediction / planning 三条 pipeline，为 JEPA 路线在 video + robotics 提供首个 end-to-end 证据，后续 video world model / latent planning 工作的绕不开 reference）

Key Takeaways:

Self-supervised video pretraining 真的可以同时做 understanding + prediction + planning：单一 V-JEPA 2 encoder 在 6 个分类、action anticipation、VidQA、机器人 planning 四类任务上都拿了竞争性或 SOTA 的成绩，且不依赖任何 language supervision。这是 LeCun 长期主张的 JEPA 路线在 video domain 的第一次完整 end-to-end 验证
Latent-space planning 相对 pixel-space video generation 有 ~15× 计算优势：V-JEPA 2-AC 16 秒/action vs. Cosmos 4 分钟/action，且 success rate 更高。这从工程上证伪了 “robot planning 需要先生成像素级视频” 的 implicit assumption
62 小时无标签机器人视频 + 冻结 video encoder = zero-shot 跨 lab 部署：post-training 数据量比典型 imitation learning 小 1–2 个数量级，且不需要 reward / task label / 成功失败标签，对 cross-embodiment world model 的数据需求给出了一个新的下界
Scaling recipe 清晰但有限：1B 参数已开始进入 plateau 苗头，data scaling 22M videos 的边际收益也在递减，作者主动 flag 了 long-horizon planning、language-goal、模型规模 (>1B) 三个方向作为 open problems

Teaser. V-JEPA 2 整体框架：在 1M 小时视频上做 mask denoising 预训练得到 video encoder，下游既可以用 attentive probe 做分类/anticipation 也可以对齐 LLM 做 VidQA；冻结 encoder 后用机器人交互数据训练 action-conditioned predictor (V-JEPA 2-AC)，再用 MPC 做 zero-shot manipulation planning。

Problem & Motivation

人类（包括动物）通过观察世界获得大量的物理常识，并利用这种 internal world model 做感知和规划。LeCun 长期主张构建能够 (1) 理解世界、(2) 预测未来、(3) 在新情境下规划的 AI 系统，而 JEPA (Joint Embedding Predictive Architecture) 是其推荐的技术路径。

之前的工作存在几个互不打通的孤岛：

Video understanding 模型（CLIP / SigLIP / Perception Encoder）依赖 language supervision，限制了对纯物理动态的学习
Pixel-space world models（Cosmos 等 video generation 类）能生成看起来合理的未来帧，但由于 planning 需要在 pixel space 滚动，计算开销巨大，鲜有真正闭环 robot 控制的 demo
Imitation learning / VLA（如 Octo、π0）需要大量 successful demonstration，对 reward / label 有强依赖，且无法天然利用 failure trajectories

V-JEPA 2 试图证明：仅靠 self-supervised video pretraining，配合极少量无标签机器人视频，就能贯通 understanding → prediction → planning 整条 pipeline。

❓ 一个隐含的辩论：JEPA 的支持者长期与 generative world model 阵营有方法论分歧。这篇论文用 16s vs 4min 的速度对比、以及 manipulation 上的 success rate 优势，给出了 JEPA 路线的第一个 strong empirical argument——但这个对比是否完全公平（Cosmos 是 latent diffusion，并非专门为 control 设计）值得保留怀疑。

Method

V-JEPA 2 Pretraining

Mask-Denoising in Representation Space — 核心 objective：

minimize_{θ, ϕ, Δ_{y}} ∥ P_{ϕ} (Δ_{y}, E_{θ} (x)) - sg (E_{\overline{θ}} (y)) ∥_{1}

符号说明： $E_{θ}$ 是 encoder， $P_{ϕ}$ 是 predictor， $Δ_{y}$ 是位置 mask token， $\overline{θ}$ 是 encoder 权重的 EMA， $sg$ 是 stop-gradient。Loss 只在 masked patch 上计算，用 EMA + stop-gradient 防止 representation collapse。

Architecture：encoder 和 predictor 都是 ViT，最大 1B 参数 (ViT-g)。三个工程改动：

3D-RoPE（取代原 V-JEPA 的 absolute sincos）——稳定大模型训练
Tubelet patchify： $2 \times 16 \times 16$ （ $T \times H \times W$ ）
Multiblock masking（沿用 V-JEPA）

Key Scaling Ingredients（这部分是相对原 V-JEPA 的核心增量）：

Figure 3. 四个 scaling ingredient 的累计效果（average accuracy across SSv2/Diving-48/Jester/K400/COIN/IN1K）。

Data scaling: 2M → 22M videos (VideoMix22M, +1 pt average)
Model scaling: ViT-L 300M → ViT-g 1B (+1.5 pt)
Longer training: warmup-constant-decay schedule，90K → 252K iterations (+0.8 pt)
Progressive resolution: warmup/main 阶段用 16 frames @ 256²，cooldown 阶段升到 64 frames @ 384² (+0.7 pt)，最关键是给出 8.4× GPU-time 节省——直接全分辨率训练 ViT-g 需 ~60 GPU-years

Pretraining Dataset: VideoMix22M

Table 1. VideoMix22M 数据组成。Source-specific 采样权重通过手动 tuning 确定。

Source	Samples	Type	Total Hours	Curation	Weight
SSv2	168K	EgoVideo	168	No	0.056
Kinetics	733K	ExoVideo	614	No	0.188
HowTo100M	1.1M	ExoVideo	134K	No	0.318
YT-Temporal-1B	19M	ExoVideo	1.6M	Yes	0.188
ImageNet	1M	Images	n/a	No	0.250

YT1B 的 1.6M 小时数据未经 curation，作者用 cluster-based retrieval（target 分布是 Kinetics + SSv2 + COIN + EpicKitchen 训练集）把它过滤成 Curated-YT1B，单独用此数据集训 ViT-L 就能逼近 VM22M 的水平，但更大模型 (ViT-g) 仍然受益于完整 mix——说明 curation 在小模型上 enough，大模型仍需更广 visual coverage。

❓ 22M 视频的 source weight 是 “manually tuned”，paper 没披露 sweep 范围或 sensitivity，这是个可复现性的弱点。

V-JEPA 2-AC: Action-Conditioned Post-training

冻结 V-JEPA 2 video encoder，在其 representation space 上训一个 ~300M 参数的 action-conditioned predictor。

Figure 6. V-JEPA 2-AC 训练流程。Teacher forcing 把当前帧 representation 喂给 predictor 预测下一帧，rollout loss 把 predictor 输出反馈作为输入做 multi-step 预测以减小 autoregressive error accumulation。

Setup：

数据：Droid 中 ≥4s 的视频片段，~62 小时，4 fps，16 帧 256² clip
End-effector state $s_{k} \in R^{7}$ （3 position + 3 orientation Euler + 1 gripper），action $a_{k} = s_{k + 1} - s_{k}$
Encoder 用 frozen ViT-g，每帧得到 $16 \times 16 \times 1408$ feature map
Predictor 是 24-layer / 16-head / 1024-dim transformer，block-causal attention（每个 patch 能 attend 同步 + 历史的所有 patch / action / state token）
3D-RoPE 给 video patch，仅 temporal RoPE 给 action / state token

Loss：teacher forcing + 2-step rollout

L (ϕ) = L_{teacher-forcing} (ϕ) + L_{rollout} (ϕ)

其中 rollout loss 只反传一步 recurrent，避免 BPTT 爆炸。

Planning by Energy Minimization

给定 goal image $x_{g}$ 编码成 $z_{g}$ ，规划目标是最小化 latent space 的 L1 距离：

E (\overset{a}{^}_{1 : T}; z_{k}, s_{k}, z_{g}) := ∥ P (\overset{a}{^}_{1 : T}; s_{k}, z_{k}) - z_{g} ∥_{1}

用 Cross-Entropy Method (CEM) 做 sampling-based optimization：每 planning step 从一组高斯分布采样 action 序列，取 top-k 更新分布均值方差，迭代若干轮，执行第一个 action，re-plan（receding horizon control）。

Figure 7. Planning 示意图：在 latent space 用 CEM 优化 action 序列，使 imagined future state 接近 goal representation；只执行第一个 action 后 re-plan。

Key Results

Robot Planning（最强 selling point）

Table 2. Zero-shot Robot Manipulation。两个 lab 的 Franka + RobotiQ gripper，每个任务 10 次 trial，物体位置/初始姿态 randomize。

		Grasp		Reach w/ Obj.		Pick-&-Place
Method	Reach	Cup	Box	Cup	Box	Cup	Box
Octo (BC fine-tune) Avg	100%	15%	0%	15%	70%	15%	10%
V-JEPA 2-AC Avg	100%	65%	25%	75%	75%	80%	65%

Table 3. Planning Performance vs Cosmos (latent diffusion 7B)。Single RTX 4090 GPU。

Method	Samples	Iter	Horizon	Time/action	Reach	Grasp Cup	Grasp Box	P&P Cup	P&P Box
Cosmos	80	10	1	4 min	80%	0%	20%	0%	0%
V-JEPA 2-AC	800	10	1	16 sec	100%	60%	20%	80%	50%

V-JEPA 2-AC 用 10× sample budget 但 15× 更快，并且 success rate 全面占优。Cosmos 在 grasp 和 pick-and-place 上几乎全 0，作者归因于 pixel-space prediction 的 fidelity 不足以支持精确控制。

Figure 10. Pick-and-place 的闭环 rollout。Highlighted frames 是 sub-goal 切换点（grasp → vicinity → final placement）。

❓ Octo 的 baseline performance 显得很差（grasp box 0%），作者用 hindsight relabeling fine-tune 整个 Droid。是否真正调到 Octo 的最佳状态？社区常报 Octo 在 in-distribution 任务上明显高于这个数字，所以 baseline 选择和 fine-tuning 配置可能 favor V-JEPA 2-AC。

Understanding (Probe-based Classification)

Table 4. 6 个分类任务的 attentive probe 结果（V-JEPA 2 ViT-g₃₈₄ 是 cooldown 时升到 384²，其他用 256²）。

Method	Param.	Avg.	SSv2	Diving-48	Jester	K400	COIN	IN1K
DINOv2	1.1B	81.1	50.7	82.5	93.4	83.6	90.7	86.1
PE core G	1.9B	82.3	55.4	76.9	90.0	88.5	95.3	87.6*
SigLIP2	1.2B	81.1	49.9	75.3	91.0	87.3	95.1	88.0
V-JEPA ViT-H	600M	85.2	74.3	87.9	97.7	84.5	87.1	80.0
InternVideo2 s2-1B	1B	87.0	69.7	86.4	97.0	89.4	93.8	85.8
V-JEPA 2 ViT-g	1B	87.5	75.3	90.1	97.7	86.6	90.7	84.6
V-JEPA 2 ViT-g₃₈₄	1B	88.2	77.3	90.2	97.8	87.3	91.1	85.1

V-JEPA 2 在 motion 任务（SSv2, Diving-48, Jester）显著超过 image-based encoder，appearance 任务 (K400, COIN, IN1K) 略弱于 PE/SigLIP——符合 motion-vs-appearance 的直觉分工。

Prediction (Action Anticipation on EK100)

Table 5. Epic-Kitchens-100 1-second action anticipation, mean-class recall@5（验证集）。

Method	Param.	Verb	Noun	Action
InAViT	160M	51.9	52.0	25.8
Video-LLaMA	7B	52.9	52.0	26.0
PlausiVL	8B	55.6	54.2	27.6
V-JEPA 2 ViT-L	300M	57.8	53.8	32.7
V-JEPA 2 ViT-g	1B	61.2	55.7	38.0
V-JEPA 2 ViT-g₃₈₄	1B	63.6	57.1	39.7

300M 的 V-JEPA 2 ViT-L 已经显著超过 8B 的 PlausiVL（32.7 vs 27.6 action recall@5），1B 模型再 +5.3。linear scaling with model size —— 在该任务上未见 plateau。Probe 用了 V-JEPA 2 的 predictor 输出 + encoder 输出 concat，三个 query token 分别预测 action / verb / noun。

Video QA (Aligned with LLM)

Table 8. Full-scale alignment (88.5M samples, Llama 3.1 8B 后端)。≤8B 类别 SOTA。

Method	Params	Avg.	PerceptionTest	MVP	TempCompass	TemporalBench	TOMATO	TVBench	MVBench
InternVL-2.5	300M/7B	52.1	68.9	39.9	68.3	24.3	29.4	61.6	72.6
Qwen2.5VL	1B/7B	49.7	70.5	36.7	71.7	24.5	24.6	50.5	69.6
PLM 8B	1B/8B	56.7	82.7	39.7	72.7	28.3	33.2	63.5	77.1
V-JEPA 2 + Llama 3.1 8B	1B/8B	59.5	84.0	44.5	76.9	36.7	40.3	60.6	73.5

注意：V-JEPA 2 在 5/7 个 benchmark 上拿 SOTA，但在 TVBench / MVBench 略输 PLM 8B——这两个是 general / 偏 appearance 的 benchmark，与 V-JEPA 2 强 motion / 弱 appearance 的 profile 一致。值得强调：这是首次 video encoder（无 language supervision）在对齐 LLM 后达成 SOTA VidQA，挑战了 “VidQA 必须用 CLIP-style language-supervised encoder” 的 conventional wisdom。

关联工作

基于

V-JEPA (CVPR 2024): 直接前身。V-JEPA 2 主要增量是数据 (2M→22M)、模型 (600M→1B)、progressive resolution、3D-RoPE、warmup-constant-decay schedule
JEPA (LeCun 2022 position paper): 整套方法论的源头，“learn predictive models in representation space, not pixel space”
DINOv2 / SigLIP 2 / Perception Encoder: 同代 image foundation model 对手，被作为 frozen-encoder VidQA baseline
Droid Dataset (Khazatsky et al. 2024): 提供机器人 post-training data；选 ≥4s 子集后 ~62 小时

对比

Octo: vision-language-action behavior cloning baseline（基于 OXE 1M+ trajectories pretrain，Droid hindsight relabeling fine-tune）。在 manipulation 表上完败 V-JEPA 2-AC，但 baseline 调优强度存疑
Cosmos (Nvidia 2025): pixel-space latent diffusion world model 代表。被用来证明 “pixel-space planning 计算上不可行 + success rate 低”，是这篇 paper 最关键的对比
PlausiVL / Video-LLaMA / InAViT: action anticipation baseline，被 V-JEPA 2 ViT-L (300M) 直接超过——挑战了 “anticipation 必须用大 LLM” 的设定
PerceptionLM 8B: VidQA SOTA baseline，V-JEPA 2 用同 recipe + 不同 encoder 在 5/7 benchmark 上反超

方法相关

Mask Image/Video Modeling: VideoMAE / VideoMAEv2、I-JEPA、MAE-ST。V-JEPA 系列的关键差异是在 representation space mask denoising（非 pixel space），避免预测 unpredictable details
Cross-Entropy Method (CEM): 经典 sampling-based optimizer，Williams & Bagnell etc. 在 MPC 中常用，这里被搬到 latent space planning
Receding-horizon / Model-Predictive Control: 经典控制论范式，V-JEPA 2-AC 的部署方式是 latent-space MPC——和 Dreamer / TD-MPC 一脉相承，但完全 zero-shot
Visual Servoing: 作者明确将 single-goal reaching 类比为 “learned visual servoing”——同样是用视觉反馈控制 motion，但不用 explicit camera calibration

论文点评

Strengths

方法论上的”统一证据”价值：在一个 encoder 上同时论证 understanding (SOTA classification + VidQA)、prediction (SOTA anticipation)、planning (zero-shot Franka) 三件事，且都不依赖 language supervision——为 JEPA 路线提供了第一个完整的端到端 case study。这是一个真正的 “important” 而非仅 “publishable” 工作
Latent vs pixel planning 的对比是这篇 paper 最有 actionable insight 的实验：16s vs 4min，且 success rate 反超。这从工程上为 robot world model 社区指明了一个方向：not all world models need to be generative
数据效率的直接证据：62 小时无标签机器人视频 + 冻结 encoder 即可 zero-shot 跨 lab 部署，对比 Octo（OXE 1M+ trajectories fine-tune Droid）。给后续 cross-embodiment world model 工作设了一个数据下界
Scaling ablation 设计扎实：四个 ingredient 各自给数字（+1 / +1.5 / +0.8 / +0.7 pt），并且配合 progressive resolution 给出 8.4× GPU-time 节省的工程贡献，复现门槛说明完整
诚实的 failure analysis：4.3 节直接 flag camera positioning sensitivity / long-horizon limitation / image-only goal 三个限制，附录还有 camera position 的 quantitative sensitivity 分析。这种 self-disclosure 在大厂 paper 中并不常见

Weaknesses

Cosmos 作为 pixel-space baseline 不太公平：Cosmos 是为 video generation 设计的 latent diffusion 模型，不是 robot control 优化的。Cosmos 的 0% grasp 成绩与其说是 “pixel-space 范式输了”，不如说是 “通用 video generation 模型 zero-shot 接 control 不 work”。一个更公平的对比应该是专门为 control 设计的 generative world model（如 GAIA / Genie / DreamerV3-style），但这些都没出现
Manipulation 任务范围狭窄：仅 grasp + reach + pick-and-place + 3 个物体（cup / box / bottle），且都是单步 prehensile manipulation。任何需要 in-hand reorientation、deformable object、bimanual、tool use 的任务都未触及，作者也承认 long-horizon 是未解问题
Cross-embodiment 措辞过强：tag 上自称 cross-embodiment，但实际只在 Franka 这一种 embodiment 上验证，仅是 “cross-lab”（两个不同的 Franka 安装环境）。对比 Octo / OpenVLA 等真正的 cross-embodiment 工作（涵盖 7+ 种臂），V-JEPA 2-AC 的迁移性主张需要打折
Camera positioning 是个严重 caveat：作者承认 “manually tried different camera positions before settling on one that worked”——这意味着 zero-shot 的代价是 expert 需要先 tune camera placement，离 truly autonomous 部署还有距离
Octo baseline 调优可能 underdone：Octo 在 grasp box 上拿 0% 异常低，社区常见复现的 Octo 在 in-distribution 简单任务能到 30-50%。需要看 community independent 复现才能确认 V-JEPA 2-AC 的真实优势幅度
“无 language supervision” 的 framing 略 misleading：V-JEPA 2 pretraining 没用 language，但 7.4 节做 SOTA VidQA 时用了 88.5M image/video-text 对齐数据。所以更准确的说法是 “encoder 无 language supervision，downstream alignment 仍需 paired data”

可信评估

Artifact 可获取性

代码: inference + training 代码均开源（见 GitHub README，包含 pretraining、AC post-training、attentive probe、VQA alignment 四类训练脚本）
模型权重: 已发布完整 ckpt 系列：vjepa2-vit-{l,h,g} (256² 与 384²)、vjepa2-vit-g-{384,512} for VidQA、以及 vjepa2-ac-vit-g。HF Collection 集中托管
训练细节: 完整披露——附录 §10–14 给出 4 个 stage 的逐项 hyperparameter table（learning rate / batch size / EMA schedule / mask ratio / RoPE 配置等），含数据 source weight 和 iteration 数
数据集: VideoMix22M 的所有 source 均为公开数据集（SSv2 / Kinetics / HowTo100M / YT-Temporal-1B / ImageNet），但 Curated-YT1B 的 retrieval 索引和最终采样列表未发布；Droid 公开

Claim 可验证性

✅ 6 个 classification 任务 SOTA：标准 frozen-encoder + attentive probe protocol，结果可独立复现，绝对数字与他人 reported 一致（DINOv2/SigLIP2/PE 数字与原文匹配）
✅ EK100 anticipation +44% over PlausiVL：标准 benchmark + standard recall@5 metric，附 model-size linear scaling 趋势作为佐证
✅ VidQA SOTA in 8B class on 5/7 benchmarks：用 PerceptionLM 8B 同 recipe 公平比较
✅ Latent planning 16s/action vs Cosmos 4min/action：硬件 (RTX 4090) + sample/iteration 数公开
⚠️ Zero-shot Franka manipulation 65–80% success rate：sample size 小（10 trials/task × 2 labs × 7 tasks ≈ 140 trials），且 grasp box 只有 25% avg；“various permutations” 的具体 randomization protocol 未充分说明，cherry-picking 风险中等
⚠️ “Cross-lab generalization”：两个 lab 都用 Franka + RobotiQ + 类似的 operational space controller，硬件差异远小于 cross-embodiment——claim 应为 “cross-environment within same robot platform”
⚠️ Octo baseline performance：Octo 在 grasp box 上 0% 与社区其他复现存在 gap，可能是 Droid hindsight relabeling fine-tune 的实现细节问题，paper 未给独立 sanity check
⚠️ “无 language supervision 即可对齐 LLM 拿 SOTA VidQA”：encoder 本身确实无 language supervision，但 7.4 节用了 88.5M image/video-text 对齐数据，alignment phase 对语言 paired data 依然有 significant requirement——standalone “无监督” 的暗示需要打折

Notes

这篇是 LeCun 长期主张的 JEPA 路线在 video + robotics 领域的一次完整 end-to-end 验证。重要性主要在于：把”video understanding pretraining”和”robot world model”两个原本是不同社区的研究方向用一个方法论统一起来。这种统一比单点 SOTA 更有价值
Latent vs pixel planning 的速度对比（16s vs 4min）是工程上极有 actionable 的 insight。如果你在做 robot world model，这给了一个明确信号：不要把 budget 全压在 pixel-space 生成质量上
62 小时无标签机器人视频的设定可被视为 cross-embodiment world model 的一个 lower bound：未来若要做真正多机器人迁移，可以以此为参考量级
Camera positioning 必须 manual tune 这个 caveat 暗示 V-JEPA 2-AC 学到的不是真正 metric 3D 物理动态，而是 in-distribution 的 visual-action 关联——这与其在 box / cup 这样有不同 grasping affordance 的物体上波动较大的 success rate 一致
一个值得跟进的 open question：V-JEPA 2-AC 的 representation 是否能反过来改善 V-JEPA 2 encoder？目前是单向 distillation（encoder frozen），如果允许 encoder 在 action data 上 fine-tune，是否能得到更适合 control 的 representation？这是 future work 没提但值得做的 ablation
对我们自己的 world model 工作的启示：don’t rush to pixel-space generation。先确认 representation space 内 prediction + planning 能 close the loop，再考虑是否需要 generative head

Rating

Metrics (as of 2026-04-24): citation=299, influential=38 (12.7%), velocity=28.75/mo; HF upvotes=31; github 3710⭐ / forks=446 / 90d commits=4 / pushed 31d ago

分数：3 - Foundation 理由：如 Strengths #1 所写，V-JEPA 2 在一个 encoder 上完整打通 understanding (SOTA classification + VidQA) / prediction (SOTA anticipation) / planning (zero-shot Franka) 三条 pipeline，是 JEPA 路线在 video + robotics 的第一个 end-to-end 证据，兼具方法论统一价值和 actionable 的工程对比（16s vs 4min latent vs pixel planning）。相比相邻的 2 - Frontier，这篇不是单点 SOTA 或某个范式的代表作，而是明确改变了 video world model 社区的讨论框架（“not all world models need to be generative”），且 FAIR 同步开源了完整 ckpt 系列与训练 recipe，已成为 video world model / latent planning 工作必引的 reference；Weaknesses 里列的 Cosmos 基线不公平、cross-embodiment 说辞过强等问题属于 scope 限制而非 foundational value 的动摇。

MindFlow

Explorer

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Summary

Problem & Motivation

Method

V-JEPA 2 Pretraining

Pretraining Dataset: VideoMix22M

V-JEPA 2-AC: Action-Conditioned Post-training

Planning by Energy Minimization

Key Results

Robot Planning（最强 selling point）

Understanding (Probe-based Classification)

Prediction (Action Anticipation on EK100)

Video QA (Aligned with LLM)

关联工作

基于

对比

方法相关

论文点评

Strengths

Weaknesses

可信评估

Artifact 可获取性

Claim 可验证性

Notes

Rating

Table of Contents