Animation Tech Intro Part 1: Skinning

June 6, 2021

Animation Tech Intro Part 1: Skinning

When I first entered the world of animation programming, I took some things for granted. There’s skinning to animate the mesh. Animation blending to mix different animations. Physics, like pendulum equations to simulate clothing details and hair strands. But I never put much more thought into these. It’s something that I use day-to-day to build visuals, but never really had to implement from scratch. In most game engines it’s just something that is already there when you come. I think every animation programmer can benefit from implementing these fundamentals on their own. It puts everything into a different perspective. It allowed me to think in terms of first principles.

In the next few blog posts, I’ll show the basics of animation programming. So how can we turn static mesh into animated mesh? Let’s start at the beginning – skinning.

For this and future posts I will use awesome Zelda fan-art by Christoph Schoch. It’s available at Zelda rig and model from Christoph Schoch for Maya and Blender version by Daitomodachi.

Skinning

What needs to happen under the hood of a game engine in order to animate the character on the screen? To answer that, we have to talk a bit about rendering first.

The character’s mesh is made of vertex and index buffers. The simplest vertex structure consists of:

position,
normal,
texture coordinate (UV).

To move a mesh, we can use a matrix. The classical setup is three matrices:

model – moves and rotates the mesh in the scene,
view – transforms geometry so it’s visible in the camera, it moves the camera around,
projection – projects our 3D world onto NDC (Normalized Device Coordinates), it gives perspective.

The idea of skinning is each vertex referring to one or more matrices that moves it independently of other vertices. This means we have to add some data per vertex. We can use ivec4 for indices. This will give us 4 indices per ivec4. We would like to also define the percentage by which the vertex is affected by the given matrix. For that, we can use vec4. Note that the weights need to add up to 1.0 (100%).

Skin indices – up to 4 indices that are referring to skinning matrices array. -1 is a special index that refers to no matrix (like a NULL).
Skin weights – weight for each of these 4 indices.

If you need more than 4 indices, then simply add two more vectors for the skinning. Note, that you don’t need to create a single vertex structure with all variants of skinning. The way I implemented that is a single memory buffer split into several spans. First keep an array of indices, then an array of vertices, and then an array of skinning data. Thanks to that the memory is still in one place (single allocation), but it’s not bound to uber vertex structure. When passing the buffers to the vertex shader, I pass them as separate views with offsets in memory. I’m using the Vulkan graphics library, so for that, I use ‘pOffsets’ in vkCmdBindVertexBuffers call. Below you can see vertex shader code with inputs for skinned mesh:

layout(binding = 0) uniform UniformBufferObject {
    mat4 model;
    mat4 view;
    mat4 proj;
} ubo;

layout(binding = 1) uniform SkinningBuffer {
	mat4 bones[512];
} skin;

layout(location = 0) in vec3 inPosition;
layout(location = 1) in vec3 inNormal;
layout(location = 2) in vec2 inTexCoord;
layout(location = 3) in ivec4 skinIndices;
layout(location = 4) in vec4 skinWeights;

The data for skinning is prepared in digital content creation apps like Maya or Blender. The weight is painted on the mesh by an artist. Usually the character is standing in A-Pose.

Alright, but how is it actually calculated in the vertex shader code? It’s simply a weighted sum of position transformations for each bone. Remember that for points the ‘w’ component should be equal to 1.0, so the translation in the matrix is applied to the position. As for normal vectors, it’s also a weighted sum, the only difference is that normal is a direction, so ‘w’ component should equal 0.0. This will ensure that only rotation is applied to normal.

void main()
{
    const vec4 pos = vec4(inPosition, 1.0f);
    const vec4 norm = vec4(inNormal, 0.0f);
    vec4 posSkinned = {0.0f, 0.0f, 0.0f, 0.0f};
    vec4 normSkinned = {0.0f, 0.0f, 0.0f, 0.0f};

    for(int i=0; i<4; ++i)
    {
        if(skinIndices[i] >= 0)
        {
            const mat4 bone = skin.bones[skinIndices[i]];
            const float weight = skinWeights[i];
			
            posSkinned += (bone * pos) * weight;
            normSkinned += (bone * norm) * weight;
        }
    }

    posSkinned.w = 1.0f;

    // ...
}

With all this, you can now apply matrices into the skinning matrix buffer to move individual parts of the mesh.

Alright, that’s cool, but it’s not very useful yet, is it? What do we actually put inside the skinning matrix buffer? We need a skeleton (or rig) definition.

Rig

Skeleton and rig are not exactly the same. Skeleton is just a collection of bones put into a hierarchy, while rig additionally includes various setups which make animating the skeleton easier like inverse kinematics, or helpers for manipulating the skeleton. In various engines code, you can see either name used. In my home project I use name rig for two reasons: 1) it’s short and 2) I do include inverse kinematics and other setups in it.

Our rig structure will require names and hierarchy of bones, as well as reference pose (the default pose of the skeleton). Names are usually kept as string hashes, for fast comparison and a small memory footprint. The hierarchy is a flat array of parent indices. Each bone has a single parent bone or no parent at all (-1). The reference pose is an array of transforms. Transform usually consist of translation in vector and rotation in quaternion. Why transforms and not matrices? It’s easier and more accurate to interpolate vector and quaternion than it is to interpolate a matrix. Also, quaternions do not suffer from gimbal lock issues.

typedef struct fa_rig_t
{
	fc_string_hash_t* boneNameHashes;
	int16_t* parents;
	fm_xform* refPose;
	uint32_t numBones;
} fa_rig_t;

The reference pose is usually a T-Pose. Why T? It’s a ‘zero’ for animators. You can put 60 degrees rotation on hands and it will be 60 degrees relative to the body. This can be also used when debugging animation issues in games: A-Pose is the default mesh pose, so you know the skinning is not applied when you see A-Pose. When seeing T-Pose, the skinning must have been applied, but there’s no animation. This little trick helps in identifying where to look for the issue (either skinning budget is exceeded or there’s some problem in the animation system itself).

Next, we need to distinguish local and model space. Local space means bones’ transforms are relative to their parents. Models space on the other hand is bones’ transforms in relation to the model position (usually it’s where the root bone is).

Why using both? Local space is perfect for blending different poses and animations, while model space is what the shader requires when applying pose to the mesh. Conversion between these spaces is easy if you sort the bones by parent indices. This is because when calculating bone ‘i’ its parent is already in the model space, so there’s no need for recursion. Note, that such sorted bones might require mapping to the final skinning matrix array if the order is different than the original data. The mapping is also useful if you need only a few bones for a given mesh part, like a hand. It will also help you keep the mesh and rig bones information separate. The only common thing is bone name hashes.

void fa_pose_local_to_model(fa_pose_t* modelPose,
                            const fa_pose_t* localPose,
                            const int16_t* parentIndices)
{
    const fm_xform* localXforms = localPose->xforms;
    fm_xform* modelXforms = modelPose->xforms;
	
    uint32_t numBones = MIN(modelPose->numXforms,
                            localPose->numXforms);
	
    for(uint16_t i = 0; i < numBones; ++i)
    {
        const int16_t idxParent = parentIndices[i];
        if(idxParent >= 0)
        {
            fm_xform_mul(&modelXforms[idxParent],
                         &localXforms[i], &modelXforms[i]);
        }
        else
        {
            modelXforms[i] = localXforms[i];
        }
    }
}

After converting it to model space, we’re ready to convert transforms into matrices and prepare them for sending to the shader. However, that’s not the end! If we were to apply pure reference pose matrices to the shader, the character would explode, as the mesh is already in a pose that was used to bind the bones to the character. So we need a way to undo that and apply the new pose. How? We need the inverse bind pose. It’s used to collapse the whole mesh into a single position, then it’s ready to be redistributed again with the use of the model space pose coming from the animation system. The inverse of bind pose is kept in the mesh data. Before applying the matrices into the shader, you have to multiply them by the inverse of bind pose matrices.

Conclusion

What I’ve just described in this post is linear blend skinning, which is probably the most widely used method in game engines. With that you can set custom pose for the character.

Is it the only method? No! There are various other interesting methods, each having its pros and cons. To mention a few:

You can also do some simulation of jiggles through skinning:

You can also manipulate vertices directly or animate the texture of the mesh. No one said you have to be limited to skinning matrices. The best examples are facial animation through blendshapes, or wrinkle maps for cloth and face skin.

That’s it. Next, I will write about animation clip sampling and compression.

Adam Dutkiewicz

3 thoughts on “Animation Tech Intro Part 1: Skinning”

Vladimir June 19, 2021, 3:19 pm

Hello and thank you very much. Please keep posting about animating.

Loading...

Reply
Mark June 19, 2021, 8:16 pm

This is amazing! keep em coming!

Loading...

Reply
Jonasz July 22, 2021, 4:15 pm

Very nice article! Can’t wait to see more.

Loading...

Reply