Content Analysis（WIP）

Aims

芯片IP厂商与应用开发商，引擎开发商，平台提供商有一定程度上的合作。很多问题从源头（应用）去优化是最简单有效的。这些问题主要包括：

bad api usage
- glLinkProgram/vkCreateGraphicsPipeline in runtime
- bad sync point, such as pipeline barrier
- not enough queue to submit commands
- clear unnecessary framebuffer before beginning to render
cpu bound
- too heavy CPU workload, such physical animation
- too many drawcalls

很多时候应用（如游戏或者benchmark）一旦发布，再想让app开发商接受我们额反馈去修改app会很困难，这时候需要想办法从driver端（包含compiler）去优化app的performance。

优化掉frame里面redundant的东西，包含commands，primitives…，对最终的渲染结果毫无影响，如check early-zs，hsr, forward pixel kill的条件去尽量打开
提高帧渲染的效率，check incremental rendering的合理性，alpha blend是否有必要，driver的行为是否能发挥出hardware的性能
check shader instructions是否合理，compiler有时候编译出来的instructions不够优化

如果确定application和driver已经最优或者无法改变的时候，可以考虑一下hardware潜在的问题，以改善hardware architecture

运行app或者trace，查看当前状态下的performance，最好是针对特定场景特定设置抓一段trace，利于performance的研究。注意api trace会隐藏掉一些cpu的问题，比如app在host端的物理模拟，所以需要首先排除掉这类问题，才可以继续使用trace去研究。

使用performance profiling tool来查看cpu/gpu performance counter, lock frequence来查看gpu utilization，确保gpu的workload是满的
workload是否overlap了，这需要结合app的render behavior一起来查看
hardware unit的利用率排序

geometry bound
1. tiler bound
  1. 很多小三角形，或者三角形很多被cull掉了，fragment processing很快，geometry很慢。可以考虑使用geometry LOD，近处的物体使用精细的mesh，远处的使用粗糙的mesh
fragment bound
1. FFE
2. EE
3. Varying
4. Blend
5. Texture
compute bound
1. Memory bound, 检查memory的地址是否存在bank conflict

GPU是一个多核处理器，尽量确保多核负载均衡