Pose estimation is a computer vision task for detecting the pose (i.e. orientation and position) of objects. It works by detecting a number of keypoints so that we can understand the main parts of the object and estimate its current orientation. Based on such keypoints, we will be able to form the shape of the object in either 2D or 3D.
This tutorial covers how to build an Android app that estimates the human pose in standalone RGB images using the pretrained TFLite PoseNet model. The model predicts the locations of 17 keypoints of the human body, including the location of the eyes, nose, shoulders, etc. By estimating the locations of keypoints, we'll see in a second tutorial how you can use this app to do special effects and filters, like the ones you see on Snapchat.
The outline of this tutorial is as follows:
- Base Android Studio Project
- Loading a Gallery Image
- Cropping and Scaling the Image
- Estimating the Human Pose Using PoseNet
- Getting Information About the Keypoints
- Complete Code of the
PosenetActivity.kt
Let's get started.
Bring this project to life
Base Android Studio Project
We'll use the TensorFlow Lite PoseNet Android Demo as a starting point to save time. Let's first discuss how this project works. Then we'll edit it for our own needs.
The project uses the pretrained PoseNet model, which is a transferred version of MobileNet. The PoseNet model is available for download at this link. The model accepts an image of size (257, 257)
and returns the locations of the following 17 keypoints:
- nose
- leftEye
- rightEye
- leftEar
- rightEar
- leftShoulder
- rightShoulder
- leftElbow
- rightElbow
- leftWrist
- rightWrist
- leftHip
- rightHip
- leftKnee
- rightKnee
- leftAnkle
- rightAnkle
For each keypoint there is an associated value representing the confidence, ranging from 0.0. to 1.0. So, the model returns two lists: one representing the keypoint locations, and another with the confidence for each keypoint. It's up to you to set a threshold for the confidence to classify a candidate keypoint as either accepted or rejected. Typically a good decision is made when the threshold is 0.5 and higher.
The project is implemented in the Kotlin programming language and accesses the Android camera for capturing images. For each captured image, the model predicts the positions of the keypoints and displays the image with these keypoints overlain.
In this tutorial we're going to simplify this project as much as possible. First of all, the project will be edited to work with single images selected from the gallery, not those taken with the camera. Once we have the results for a single image we'll add a mask over the eyes, which is a known effect in image-editing apps like Snapchat.
The next section discusses removing the unnecessary code from the project.
Removing the Unnecessary Code
The project is configured to work with images captured from the camera, which is not our current target. So, anything related to accessing or capturing images should be removed. There are three files to be edited:
PosetnetActivity.kt
activity file.activity_posenet.xml
layout resource file of thePosetnetActivity.kt
activity.AndroidManifest.xml
Starting with the PosenetActivity.kt
file, here is a list of the lines of code to be removed:
CameraDevice.StateCallback()
161:179CameraCaptureSession.CaptureCallback()
184:198onViewCreated()
216:219onResume()
221:224onPause()
232:236requestCameraPermission()
243:249openCamera()
327:345closeCamera()
350:368startBackgroundThread()
373:376stopBackgroundThread()
381:390fillBytes()
393:404OnImageAvailableListener
407:451createCameraPreviewSession()
598:655setAutoFlash()
657:664
As a result of removing the previous code, the following variables are no longer needed:
PREVIEW_WIDTH
: 97PREVIEW_HEIGHT
: 98- All variables defined from line 104 to 158:
cameraId
,surfaceView
,captureSession
,cameraDevice
,previewSize
,previewWidth
, and more.
After making the previous changes, these three lines are also no longer needed:
- Line 228 inside the
onStart()
method:openCamera()
- Line 578 at the end of the
draw()
method:surfaceHolder!!.unlockCanvasAndPost(canvas)
Below is the current form of the PosenetActivity.kt
after making these changes.
All of the elements in the activity_posenet.xml
file should also be removed, as they are no longer needed. So, the file should look like this:
For the AndroidManifest.xml
file, because we are no longer accessing the camera, the following 3 lines should be removed:
After removing all of the unnecessary code from the three files PosenetActivity.kt
, activity_posenet.xml
, and AndroidManifest.xml
, we'll still need to make some changes in order to work with single images. The next section discusses editing the activity_posenet.xml
file to be able to load and show an image.
Editing the Activity Layout
The content of the activity layout file is listed below. It just has two elements: Button
and ImageView
. The button will be used to load an image once clicked. It is given an ID of selectImage
to be accessed inside the activity.
The ImageView
will have two uses. The first is to show the selected image. The second is to display the result after applying the eye filter. The ImageView
is given the ID imageView
to be accessed from within the activity.
The next figure shows what the activity layout looks like.

Before implementing the button click listener, it is essential to add the next line inside the AndroidManifest.xml
file to ask for permission to access the external storage.
The next section discusses implementing the button click listener for loading an image from the gallery.
Loading a Gallery Image
The current implementation of the onStart()
callback method is given below. If you did not already do so, please remove the call to the openCamera()
method since it is no longer needed. The onStart()
method just creates an instance of the PoseNet
class so that it can be used later for predicting the locations of the keypoints. The variable posenet
holds the created instance, which will be used later inside the processImage()
method.
Inside the onStart()
method, we can bind a click listener to the selectImage
button. Here is the new implementation of such a method. Using an Intent
, the gallery will be opened asking the user to select an image. The startActivityForResult()
method is called where the request code is stored in the REQUEST_CODE
variable, which is set to 100
.
Once the user returns to the application, the onActivityResult()
callback method will be called. Here is its implementation. Using an if
statement, the result is checked to make sure the image is selected successfully. If the result is not successful (e.g. the user did not select an image), then a toast message is displayed.
If the result is successful, then the selected image is displayed on the ImageView
using the setImageURI()
method.
To be able to process the selected image, it needs to be available as a Bitmap
. For this reason, the image is read as a Bitmap based on its URI. First the URI is returned using the getData()
method, then the Bitmap is returned using the getBitmap()
method. After the bitmap is available, the processImage()
method is called for preparing the image and estimating the human pose. This will be discussed in the next two sections.
The image that will be used throughout this tutorial is shown below.

After such an image is selected from the gallery, it will be displayed on the ImageView
as shown below.

After the image is loaded as a Bitmap, and before estimating the human pose, there are two extra steps needed: cropping and scaling the image. We'll discuss those now.
Cropping and Scaling the Image
This section covers preparing the image before applying the PoseNet model.
Inside the project there's a method called processImage()
which calls the necessary methods for doing four important tasks:
- Cropping the image by calling the
cropBitmap()
method. - Scaling the image by calling the
Bitmap.createScaledBitmap()
method. - Estimating the pose by calling the
estimateSinglePose()
method. - Drawing the keypoints over the image by calling the
draw()
method.
The implementation of the processImage()
method is listed below. In this section we'll focus on the first two tasks, cropping and scaling the image.
Why do we need to crop or scale the image? Is it required to apply both the crop and scale operations, or is just one enough? Let's discuss.
The PoseNet model accepts an image of size (257, 257)
. Inside the Constants.kt
file there are two variables defined, MODEL_WIDTH
and MODEL_HEIGHT
, to represent the model input width and height respectively. Both are set to 257
.
If an image is to be passed to the PoseNet
model, then its size must be (257, 257)
. Otherwise, an exception will be thrown. If the image read from the gallery is, for example, of size (547, 783)
, then it must be resized to the model input's size of (257, 257)
.
Based on this, it seems that only the scale
(i.e. resize) operation is necessary to convert the image to the desired size. The Bitmap.createScaledBitmap()
method accepts the input bitmap, the desired width, and the desired height, and returns a new bitmap of the desired size. Why, then, is the cropping operation also applied? The answer is to preserve the model's aspect ratio. Otherwise we can easily have image quality problems. The next figure shows the result after both the cropping and resizing operations are applied.

Due to cropping the image, some of its rows at the top are lost. This will not be an issue as long as the human body appears at the center of the image. You can check the implementation of the cropBitmap()
method for how it works.
After discussing the purpose of the first two tasks of the processImage()
method, let's now discuss the remaining two: pose estimation and keypoint drawing.
Estimating the Human Pose Using PoseNet
To estimate the human pose of the selected image, all you need to do is to call the estimateSinglePose()
method, shown below. This method accepts the scaled image as an input, and returns an object in the person
variable holding the model predictions.
Based on the model predictions, the keypoints will be drawn over the image. To be able to draw over the image a canvas must first be created. The line below (inside the processImage()
method) uses the surfaceHolder
to draw the canvas, but we'll remove this:
And replace it with this:
Now we're ready to call the draw()
method to draw the keypoints over the image. Don't forget to remove this line from the end of the draw()
method: surfaceHolder!!.unlockCanvasAndPost(canvas)
.
Now that we've discussed all of the method calls inside the processImage()
method, here is its implementation.
The next figure shows the result after drawing the keypoints that the model is confident about. The points are drawn as circles. Here is the part of the code from the draw()
method that is responsible for drawing circles over the image. You can edit the value of the variable circleRadius
to increase or decrease the circle size.

Note that the drawn keypoints are the ones with a confidence greater than the value specified in the minConfidence
variable, which was set to 0.5
. You can change this to whatever works best for you.
The next section shows how to print some information about the keypoints.
Getting Information About Keypoints
The returned object person
from the estimateSinglePose()
method holds some information about the detected keypoints. This information includes:
- Location
- Confidence
- Body part that the keypoint represents
The next code creates a for
loop for looping through all the keypoints and printing the previous three properties about each in a log message.
Here is the result of running the loop. Note that the body parts such as LEFT_EYE
, RIGHT_EYE
and RIGHT_SHOULDER
have a confidence greater than 0.5
, and this is why they are drawn on the image.
D/KEYPOINT: Body Part : NOSE, Keypoint Location : (121.0, 97.0), Confidence : 0.999602
D/KEYPOINT: Body Part : LEFT_EYE, Keypoint Location : (155.0, 79.0), Confidence : 0.996097
D/KEYPOINT: Body Part : RIGHT_EYE, Keypoint Location : (99.0, 78.0), Confidence : 0.9952989
D/KEYPOINT: Body Part : LEFT_EAR, Keypoint Location : (202.0, 96.0), Confidence : 0.9312741
D/KEYPOINT: Body Part : RIGHT_EAR, Keypoint Location : (65.0, 105.0), Confidence : 0.3558412
D/KEYPOINT: Body Part : LEFT_SHOULDER, Keypoint Location : (240.0, 208.0), Confidence : 0.18282844
D/KEYPOINT: Body Part : RIGHT_SHOULDER, Keypoint Location : (28.0, 226.0), Confidence : 0.8710659
D/KEYPOINT: Body Part : LEFT_ELBOW, Keypoint Location : (155.0, 160.0), Confidence : 0.008276528
D/KEYPOINT: Body Part : RIGHT_ELBOW, Keypoint Location : (-22.0, 266.0), Confidence : 0.009810507
D/KEYPOINT: Body Part : LEFT_WRIST, Keypoint Location : (196.0, 161.0), Confidence : 0.012271293
D/KEYPOINT: Body Part : RIGHT_WRIST, Keypoint Location : (-7.0, 228.0), Confidence : 0.0037742765
D/KEYPOINT: Body Part : LEFT_HIP, Keypoint Location : (154.0, 101.0), Confidence : 0.0043469984
D/KEYPOINT: Body Part : RIGHT_HIP, Keypoint Location : (255.0, 259.0), Confidence : 0.0035778792
D/KEYPOINT: Body Part : LEFT_KNEE, Keypoint Location : (157.0, 97.0), Confidence : 0.0024392735
D/KEYPOINT: Body Part : RIGHT_KNEE, Keypoint Location : (127.0, 94.0), Confidence : 0.003601794
D/KEYPOINT: Body Part : LEFT_ANKLE, Keypoint Location : (161.0, 194.0), Confidence : 0.0022431263
D/KEYPOINT: Body Part : RIGHT_ANKLE, Keypoint Location : (92.0, 198.0), Confidence : 0.0021493114
Complete Code of the PosenetActivity.kt
Here is the complete code of the PosenetActivity.kt
.
Conclusion
This tutorial discussed using the pretrained PoseNet for building an Android app that estimates the human pose. The model is able to predict the location of 17 keypoints in the human body such as the eyes, nose, and ears.
The next figure summarizes the steps applied in this project. We started by loading an image from the gallery, cropping and resizing it to (257, 257)
. After the PoseNet model is loaded, the image is fed to it for predicting the keypoint locations. Finally, the detected keypoints with a confidence above 0.5
are drawn on the image.

The next tutorial continues this project to put filters over the locations of the keypoints, applying effects like those seen in Snapchat.