Pose estimation is a computer vision task for detecting the pose (i.e. orientation and position) of objects. It works by detecting a number of keypoints so that we can understand the main parts of the object and estimate its current orientation. Based on such keypoints, we will be able to form the shape of the object in either 2D or 3D.

This tutorial covers how to build an Android app that estimates the human pose in standalone RGB images using the pretrained TFLite PoseNet model. The model predicts the locations of 17 keypoints of the human body, including the location of the eyes, nose, shoulders, etc. By estimating the locations of keypoints, we'll see in a second tutorial how you can use this app to do special effects and filters, like the ones you see on Snapchat.

The outline of this tutorial is as follows:

  • Base Android Studio Project
  • Loading a Gallery Image
  • Cropping and Scaling the Image
  • Estimating the Human Pose Using PoseNet
  • Getting Information About the Keypoints
  • Complete Code of the PosenetActivity.kt

Let's get started.

Bring this project to life

Base Android Studio Project

We'll use the TensorFlow Lite PoseNet Android Demo as a starting point to save time. Let's first discuss how this project works. Then we'll edit it for our own needs.

The project uses the pretrained PoseNet model, which is a transferred version of MobileNet. The PoseNet model is available for download at this link. The model accepts an image of size (257, 257) and returns the locations of the following 17 keypoints:

  1. nose
  2. leftEye
  3. rightEye
  4. leftEar
  5. rightEar
  6. leftShoulder
  7. rightShoulder
  8. leftElbow
  9. rightElbow
  10. leftWrist
  11. rightWrist
  12. leftHip
  13. rightHip
  14. leftKnee
  15. rightKnee
  16. leftAnkle
  17. rightAnkle

For each keypoint there is an associated value representing the confidence, ranging from 0.0. to 1.0. So, the model returns two lists: one representing the keypoint locations, and another with the confidence for each keypoint. It's up to you to set a threshold for the confidence to classify a candidate keypoint as either accepted or rejected. Typically a good decision is made when the threshold is 0.5 and higher.

The project is implemented in the Kotlin programming language and accesses the Android camera for capturing images. For each captured image, the model predicts the positions of the keypoints and displays the image with these keypoints overlain.

In this tutorial we're going to simplify this project as much as possible. First of all, the project will be edited to work with single images selected from the gallery, not those taken with the camera. Once we have the results for a single image we'll add a mask over the eyes, which is a known effect in image-editing apps like Snapchat.

The next section discusses removing the unnecessary code from the project.

Removing the Unnecessary Code

The project is configured to work with images captured from the camera, which is not our current target. So, anything related to accessing or capturing images should be removed. There are three files to be edited:

  • PosetnetActivity.kt activity file.
  • activity_posenet.xml layout resource file of the PosetnetActivity.kt activity.
  • AndroidManifest.xml

Starting with the PosenetActivity.kt file, here is a list of the lines of code to be removed:

  • CameraDevice.StateCallback() 161:179
  • CameraCaptureSession.CaptureCallback() 184:198
  • onViewCreated() 216:219
  • onResume() 221:224
  • onPause() 232:236
  • requestCameraPermission() 243:249
  • openCamera() 327:345
  • closeCamera() 350:368
  • startBackgroundThread() 373:376
  • stopBackgroundThread() 381:390
  • fillBytes() 393:404
  • OnImageAvailableListener 407:451
  • createCameraPreviewSession() 598:655
  • setAutoFlash() 657:664

As a result of removing the previous code, the following variables are no longer needed:

  • PREVIEW_WIDTH: 97
  • PREVIEW_HEIGHT: 98
  • All variables defined from line 104 to 158: cameraId, surfaceView, captureSession, cameraDevice, previewSize, previewWidth, and more.

After making the previous changes, these three lines are also no longer needed:

  • Line 228 inside the onStart() method: openCamera()
  • Line 578 at the end of the draw() method: surfaceHolder!!.unlockCanvasAndPost(canvas)

Below is the current form of the PosenetActivity.kt after making these changes.

package org.tensorflow.lite.examples.posenet

import android.Manifest
import android.app.AlertDialog
import android.app.Dialog
import android.content.pm.PackageManager
import android.graphics.Bitmap
import android.graphics.Canvas
import android.graphics.Color
import android.graphics.Paint
import android.graphics.PorterDuff
import android.graphics.Rect
import android.os.Bundle
import android.support.v4.app.ActivityCompat
import android.support.v4.app.DialogFragment
import android.support.v4.app.Fragment
import android.util.Log
import android.util.SparseIntArray
import android.view.LayoutInflater
import android.view.Surface
import android.view.View
import android.view.ViewGroup
import android.widget.Toast
import kotlin.math.abs
import org.tensorflow.lite.examples.posenet.lib.BodyPart
import org.tensorflow.lite.examples.posenet.lib.Person
import org.tensorflow.lite.examples.posenet.lib.Posenet

class PosenetActivity :
  Fragment(),
  ActivityCompat.OnRequestPermissionsResultCallback {

  /** List of body joints that should be connected.    */
  private val bodyJoints = listOf(
    Pair(BodyPart.LEFT_WRIST, BodyPart.LEFT_ELBOW),
    Pair(BodyPart.LEFT_ELBOW, BodyPart.LEFT_SHOULDER),
    Pair(BodyPart.LEFT_SHOULDER, BodyPart.RIGHT_SHOULDER),
    Pair(BodyPart.RIGHT_SHOULDER, BodyPart.RIGHT_ELBOW),
    Pair(BodyPart.RIGHT_ELBOW, BodyPart.RIGHT_WRIST),
    Pair(BodyPart.LEFT_SHOULDER, BodyPart.LEFT_HIP),
    Pair(BodyPart.LEFT_HIP, BodyPart.RIGHT_HIP),
    Pair(BodyPart.RIGHT_HIP, BodyPart.RIGHT_SHOULDER),
    Pair(BodyPart.LEFT_HIP, BodyPart.LEFT_KNEE),
    Pair(BodyPart.LEFT_KNEE, BodyPart.LEFT_ANKLE),
    Pair(BodyPart.RIGHT_HIP, BodyPart.RIGHT_KNEE),
    Pair(BodyPart.RIGHT_KNEE, BodyPart.RIGHT_ANKLE)
  )

  /** Threshold for confidence score. */
  private val minConfidence = 0.5

  /** Radius of circle used to draw keypoints.  */
  private val circleRadius = 8.0f

  /** Paint class holds the style and color information to draw geometries,text and bitmaps. */
  private var paint = Paint()

  /** An object for the Posenet library.    */
  private lateinit var posenet: Posenet

  /**
   * Shows a [Toast] on the UI thread.
   *
   * @param text The message to show
   */
  private fun showToast(text: String) {
    val activity = activity
    activity?.runOnUiThread { Toast.makeText(activity, text, Toast.LENGTH_SHORT).show() }
  }

  override fun onCreateView(
    inflater: LayoutInflater,
    container: ViewGroup?,
    savedInstanceState: Bundle?
  ): View? = inflater.inflate(R.layout.activity_posenet, container, false)

  override fun onStart() {
    super.onStart()
    posenet = Posenet(this.context!!)
  }

  override fun onDestroy() {
    super.onDestroy()
    posenet.close()
  }

  override fun onRequestPermissionsResult(
    requestCode: Int,
    permissions: Array<String>,
    grantResults: IntArray
  ) {
    if (requestCode == REQUEST_CAMERA_PERMISSION) {
      if (allPermissionsGranted(grantResults)) {
        ErrorDialog.newInstance(getString(R.string.request_permission))
          .show(childFragmentManager, FRAGMENT_DIALOG)
      }
    } else {
      super.onRequestPermissionsResult(requestCode, permissions, grantResults)
    }
  }

  private fun allPermissionsGranted(grantResults: IntArray) = grantResults.all {
    it == PackageManager.PERMISSION_GRANTED
  }

  /** Crop Bitmap to maintain aspect ratio of model input.   */
  private fun cropBitmap(bitmap: Bitmap): Bitmap {
    val bitmapRatio = bitmap.height.toFloat() / bitmap.width
    val modelInputRatio = MODEL_HEIGHT.toFloat() / MODEL_WIDTH
    var croppedBitmap = bitmap

    // Acceptable difference between the modelInputRatio and bitmapRatio to skip cropping.
    val maxDifference = 1e-5

    // Checks if the bitmap has similar aspect ratio as the required model input.
    when {
      abs(modelInputRatio - bitmapRatio) < maxDifference -> return croppedBitmap
      modelInputRatio < bitmapRatio -> {
        // New image is taller so we are height constrained.
        val cropHeight = bitmap.height - (bitmap.width.toFloat() / modelInputRatio)
        croppedBitmap = Bitmap.createBitmap(
          bitmap,
          0,
          (cropHeight / 2).toInt(),
          bitmap.width,
          (bitmap.height - cropHeight).toInt()
        )
      }
      else -> {
        val cropWidth = bitmap.width - (bitmap.height.toFloat() * modelInputRatio)
        croppedBitmap = Bitmap.createBitmap(
          bitmap,
          (cropWidth / 2).toInt(),
          0,
          (bitmap.width - cropWidth).toInt(),
          bitmap.height
        )
      }
    }
    return croppedBitmap
  }

  /** Set the paint color and size.    */
  private fun setPaint() {
    paint.color = Color.RED
    paint.textSize = 80.0f
    paint.strokeWidth = 8.0f
  }

  /** Draw bitmap on Canvas.   */
  private fun draw(canvas: Canvas, person: Person, bitmap: Bitmap) {
    canvas.drawColor(Color.TRANSPARENT, PorterDuff.Mode.CLEAR)
    // Draw `bitmap` and `person` in square canvas.
    val screenWidth: Int
    val screenHeight: Int
    val left: Int
    val right: Int
    val top: Int
    val bottom: Int
    if (canvas.height > canvas.width) {
      screenWidth = canvas.width
      screenHeight = canvas.width
      left = 0
      top = (canvas.height - canvas.width) / 2
    } else {
      screenWidth = canvas.height
      screenHeight = canvas.height
      left = (canvas.width - canvas.height) / 2
      top = 0
    }
    right = left + screenWidth
    bottom = top + screenHeight

    setPaint()
    canvas.drawBitmap(
      bitmap,
      Rect(0, 0, bitmap.width, bitmap.height),
      Rect(left, top, right, bottom),
      paint
    )

    val widthRatio = screenWidth.toFloat() / MODEL_WIDTH
    val heightRatio = screenHeight.toFloat() / MODEL_HEIGHT

    // Draw key points over the image.
    for (keyPoint in person.keyPoints) {
      Log.d("KEYPOINT", "" + keyPoint.bodyPart + " : (" + keyPoint.position.x.toFloat().toString() + ", " + keyPoint.position.x.toFloat().toString() + ")");
      if (keyPoint.score > minConfidence) {
        val position = keyPoint.position
        val adjustedX: Float = position.x.toFloat() * widthRatio + left
        val adjustedY: Float = position.y.toFloat() * heightRatio + top
        canvas.drawCircle(adjustedX, adjustedY, circleRadius, paint)
      }
    }

    for (line in bodyJoints) {
      if (
        (person.keyPoints[line.first.ordinal].score > minConfidence) and
        (person.keyPoints[line.second.ordinal].score > minConfidence)
      ) {
        canvas.drawLine(
          person.keyPoints[line.first.ordinal].position.x.toFloat() * widthRatio + left,
          person.keyPoints[line.first.ordinal].position.y.toFloat() * heightRatio + top,
          person.keyPoints[line.second.ordinal].position.x.toFloat() * widthRatio + left,
          person.keyPoints[line.second.ordinal].position.y.toFloat() * heightRatio + top,
          paint
        )
      }
    }

    canvas.drawText(
      "Score: %.2f".format(person.score),
      (15.0f * widthRatio),
      (30.0f * heightRatio + bottom),
      paint
    )
    canvas.drawText(
      "Device: %s".format(posenet.device),
      (15.0f * widthRatio),
      (50.0f * heightRatio + bottom),
      paint
    )
    canvas.drawText(
      "Time: %.2f ms".format(posenet.lastInferenceTimeNanos * 1.0f / 1_000_000),
      (15.0f * widthRatio),
      (70.0f * heightRatio + bottom),
      paint
    )
  }

  /** Process image using Posenet library.   */
  private fun processImage(bitmap: Bitmap) {
    // Crop bitmap.
    val croppedBitmap = cropBitmap(bitmap)

    // Created scaled version of bitmap for model input.
    val scaledBitmap = Bitmap.createScaledBitmap(croppedBitmap, MODEL_WIDTH, MODEL_HEIGHT, true)

    // Perform inference.
    val person = posenet.estimateSinglePose(scaledBitmap)
    val canvas: Canvas = surfaceHolder!!.lockCanvas()
    draw(canvas, person, scaledBitmap)
  }

  /**
   * Shows an error message dialog.
   */
  class ErrorDialog : DialogFragment() {

    override fun onCreateDialog(savedInstanceState: Bundle?): Dialog =
      AlertDialog.Builder(activity)
        .setMessage(arguments!!.getString(ARG_MESSAGE))
        .setPositiveButton(android.R.string.ok) { _, _ -> activity!!.finish() }
        .create()

    companion object {

      @JvmStatic
      private val ARG_MESSAGE = "message"

      @JvmStatic
      fun newInstance(message: String): ErrorDialog = ErrorDialog().apply {
        arguments = Bundle().apply { putString(ARG_MESSAGE, message) }
      }
    }
  }

  companion object {
    /**
     * Conversion from screen rotation to JPEG orientation.
     */
    private val ORIENTATIONS = SparseIntArray()
    private val FRAGMENT_DIALOG = "dialog"

    init {
      ORIENTATIONS.append(Surface.ROTATION_0, 90)
      ORIENTATIONS.append(Surface.ROTATION_90, 0)
      ORIENTATIONS.append(Surface.ROTATION_180, 270)
      ORIENTATIONS.append(Surface.ROTATION_270, 180)
    }

    /**
     * Tag for the [Log].
     */
    private const val TAG = "PosenetActivity"
  }
}

All of the elements in the activity_posenet.xml file should also be removed, as they are no longer needed. So, the file should look like this:

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="match_parent"
    android:layout_height="match_parent">


</RelativeLayout>

For the AndroidManifest.xml file, because we are no longer accessing the camera, the following 3 lines should be removed:

<uses-permission android:name="android.permission.CAMERA" />
<uses-feature android:name="android.hardware.camera" />
<uses-feature android:name="android.hardware.camera.autofocus" />

After removing all of the unnecessary code from the three files PosenetActivity.kt, activity_posenet.xml, and AndroidManifest.xml, we'll still need to make some changes in order to work with single images. The next section discusses editing the activity_posenet.xml file to be able to load and show an image.

Editing the Activity Layout

The content of the activity layout file is listed below. It just has two elements: Button and ImageView. The button will be used to load an image once clicked. It is given an ID of selectImage to be accessed inside the activity.

The ImageView will have two uses. The first is to show the selected image. The second is to display the result after applying the eye filter. The ImageView is given the ID imageView to be accessed from within the activity.

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="match_parent"
    android:layout_height="match_parent">

    <Button
        android:id="@+id/selectImage"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="Read Image" />

    <ImageView
        android:id="@+id/imageView"
        android:layout_width="match_parent"
        android:layout_height="match_parent" />

</RelativeLayout>

The next figure shows what the activity layout looks like.

Before implementing the button click listener, it is essential to add the next line inside the AndroidManifest.xml file to ask for permission to access the external storage.

<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/>

The next section discusses implementing the button click listener for loading an image from the gallery.

The current implementation of the onStart() callback method is given below. If you did not already do so, please remove the call to the openCamera() method since it is no longer needed. The onStart() method just creates an instance of the PoseNet class so that it can be used later for predicting the locations of the keypoints. The variable posenet holds the created instance, which will be used later inside the processImage() method.

override fun onStart() {
  super.onStart()
  posenet = Posenet(this.context!!)
}

Inside the onStart() method, we can bind a click listener to the selectImage button. Here is the new implementation of such a method. Using an Intent, the gallery will be opened asking the user to select an image. The startActivityForResult() method is called where the request code is stored in the REQUEST_CODE variable, which is set to 100.

override fun onStart() {
    super.onStart()

    posenet = Posenet(this.context!!)

    selectImage.setOnClickListener(View.OnClickListener {
        val intent = Intent(Intent.ACTION_PICK)
        intent.type = "image/jpg"
        startActivityForResult(intent, REQUEST_CODE)
    })
}

Once the user returns to the application, the onActivityResult() callback method will be called. Here is its implementation. Using an if statement, the result is checked to make sure the image is selected successfully. If the result is not successful (e.g. the user did not select an image), then a toast message is displayed.

override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
    if (resultCode == Activity.RESULT_OK && requestCode == REQUEST_CODE) {
        imageView.setImageURI(data?.data)

        val imageUri = data?.getData()
        val bitmap = MediaStore.Images.Media.getBitmap(context?.contentResolver, imageUri)
        
        processImage(bitmap)

    } else {
        Toast.makeText(context, "No image is selected.", Toast.LENGTH_LONG).show()
    }
}

If the result is successful, then the selected image is displayed on the ImageView using the setImageURI() method.

To be able to process the selected image, it needs to be available as a Bitmap. For this reason, the image is read as a Bitmap based on its URI. First the URI is returned using the getData() method, then the Bitmap is returned using the getBitmap() method. After the bitmap is available, the  processImage() method is called for preparing the image and estimating the human pose. This will be discussed in the next two sections.

The image that will be used throughout this tutorial is shown below.

Image source: Fashion Kids. Here is the direct image URL.

After such an image is selected from the gallery, it will be displayed on the ImageView as shown below.

After the image is loaded as a Bitmap, and before estimating the human pose, there are two extra steps needed: cropping and scaling the image. We'll discuss those now.

Cropping and Scaling the Image

This section covers preparing the image before applying the PoseNet model.

Inside the project there's a method called processImage() which calls the necessary methods for doing four important tasks:

  1. Cropping the image by calling the cropBitmap() method.
  2. Scaling the image by calling the Bitmap.createScaledBitmap() method.
  3. Estimating the pose by calling the estimateSinglePose() method.
  4. Drawing the keypoints over the image by calling the draw() method.

The implementation of the processImage() method is listed below. In this section we'll focus on the first two tasks, cropping and scaling the image.

private fun processImage(bitmap: Bitmap) {
  // Crop bitmap.
  val croppedBitmap = cropBitmap(bitmap)

  // Created scaled version of bitmap for model input.
  val scaledBitmap = Bitmap.createScaledBitmap(croppedBitmap, MODEL_WIDTH, MODEL_HEIGHT, true)

  // Perform inference.
  val person = posenet.estimateSinglePose(scaledBitmap)
    
  // Draw keypoints over the image.
  val canvas: Canvas = surfaceHolder!!.lockCanvas()
  draw(canvas, person, scaledBitmap)
}

Why do we need to crop or scale the image? Is it required to apply both the crop and scale operations, or is just one enough? Let's discuss.

The PoseNet model accepts an image of size (257, 257). Inside the Constants.kt file there are two variables defined, MODEL_WIDTH and MODEL_HEIGHT, to represent the model input width and height respectively. Both are set to 257.

If an image is to be passed to the PoseNet model, then its size must be (257, 257). Otherwise, an exception will be thrown. If the image read from the gallery is, for example, of size (547, 783), then it must be resized to the model input's size of (257, 257).

Based on this, it seems that only the scale (i.e. resize) operation is necessary to convert the image to the desired size. The Bitmap.createScaledBitmap() method accepts the input bitmap, the desired width, and the desired height, and returns a new bitmap of the desired size. Why, then, is the cropping operation also applied? The answer is to preserve the model's aspect ratio. Otherwise we can easily have image quality problems. The next figure shows the result after both the cropping and resizing operations are applied.

Due to cropping the image, some of its rows at the top are lost. This will not be an issue as long as the human body appears at the center of the image. You can check the implementation of the cropBitmap() method for how it works.

After discussing the purpose of the first two tasks of the processImage() method, let's now discuss the remaining two: pose estimation and keypoint drawing.

Estimating the Human Pose Using PoseNet

To estimate the human pose of the selected image, all you need to do is to call the estimateSinglePose() method, shown below. This method accepts the scaled image as an input, and returns an object in the person variable holding the model predictions.

val person = posenet.estimateSinglePose(scaledBitmap)

Based on the model predictions, the keypoints will be drawn over the image. To be able to draw over the image a canvas must first be created. The line below (inside the processImage() method) uses the surfaceHolder to draw the canvas, but we'll remove this:

val canvas: Canvas = surfaceHolder!!.lockCanvas()

And replace it with this:

val canvas = Canvas(scaledBitmap)

Now we're ready to call the draw() method to draw the keypoints over the image. Don't forget to remove this line from the end of the draw() method: surfaceHolder!!.unlockCanvasAndPost(canvas).

draw(canvas, person, scaledBitmap)

Now that we've discussed all of the method calls inside the processImage() method, here is its implementation.

private fun processImage(bitmap: Bitmap) {
  // Crop bitmap.
  val croppedBitmap = cropBitmap(bitmap)

  // Created scaled version of bitmap for model input.
  val scaledBitmap = Bitmap.createScaledBitmap(croppedBitmap, MODEL_WIDTH, MODEL_HEIGHT, true)

  // Perform inference.
  val person = posenet.estimateSinglePose(scaledBitmap)

  // Draw keypoints over the image.
  val canvas = Canvas(scaledBitmap)
  draw(canvas, person, scaledBitmap)
}

The next figure shows the result after drawing the keypoints that the model is confident about. The points are drawn as circles. Here is the part of the code from the draw() method that is responsible for drawing circles over the image. You can edit the value of the variable circleRadius to increase or decrease the circle size.

if (keyPoint.score > minConfidence) {
    val position = keyPoint.position
    val adjustedX: Float = position.x.toFloat() * widthRatio
    val adjustedY: Float = position.y.toFloat() * heightRatio
    canvas.drawCircle(adjustedX, adjustedY, circleRadius, paint)
}

Note that the drawn keypoints are the ones with a confidence greater than the value specified in the minConfidence variable, which was set to 0.5. You can change this to whatever works best for you.

The next section shows how to print some information about the keypoints.

Getting Information About Keypoints

The returned object person from the estimateSinglePose() method holds some information about the detected keypoints. This information includes:

  1. Location
  2. Confidence
  3. Body part that the keypoint represents

The next code creates a for loop for looping through all the keypoints and printing the previous three properties about each in a log message.

for (keyPoint in person.keyPoints) {
    Log.d("KEYPOINT", "Body Part : " + keyPoint.bodyPart + ", Keypoint Location : (" + keyPoint.position.x.toFloat().toString() + ", " + keyPoint.position.y.toFloat().toString() + "), Confidence" + keyPoint.score);
}

Here is the result of running the loop. Note that the body parts such as LEFT_EYE, RIGHT_EYE and RIGHT_SHOULDER have a confidence greater than 0.5, and this is why they are drawn on the image.

D/KEYPOINT: Body Part : NOSE, Keypoint Location : (121.0, 97.0), Confidence : 0.999602
D/KEYPOINT: Body Part : LEFT_EYE, Keypoint Location : (155.0, 79.0), Confidence : 0.996097
D/KEYPOINT: Body Part : RIGHT_EYE, Keypoint Location : (99.0, 78.0), Confidence : 0.9952989
D/KEYPOINT: Body Part : LEFT_EAR, Keypoint Location : (202.0, 96.0), Confidence : 0.9312741
D/KEYPOINT: Body Part : RIGHT_EAR, Keypoint Location : (65.0, 105.0), Confidence : 0.3558412
D/KEYPOINT: Body Part : LEFT_SHOULDER, Keypoint Location : (240.0, 208.0), Confidence : 0.18282844
D/KEYPOINT: Body Part : RIGHT_SHOULDER, Keypoint Location : (28.0, 226.0), Confidence : 0.8710659
D/KEYPOINT: Body Part : LEFT_ELBOW, Keypoint Location : (155.0, 160.0), Confidence : 0.008276528
D/KEYPOINT: Body Part : RIGHT_ELBOW, Keypoint Location : (-22.0, 266.0), Confidence : 0.009810507
D/KEYPOINT: Body Part : LEFT_WRIST, Keypoint Location : (196.0, 161.0), Confidence : 0.012271293
D/KEYPOINT: Body Part : RIGHT_WRIST, Keypoint Location : (-7.0, 228.0), Confidence : 0.0037742765
D/KEYPOINT: Body Part : LEFT_HIP, Keypoint Location : (154.0, 101.0), Confidence : 0.0043469984
D/KEYPOINT: Body Part : RIGHT_HIP, Keypoint Location : (255.0, 259.0), Confidence : 0.0035778792
D/KEYPOINT: Body Part : LEFT_KNEE, Keypoint Location : (157.0, 97.0), Confidence : 0.0024392735
D/KEYPOINT: Body Part : RIGHT_KNEE, Keypoint Location : (127.0, 94.0), Confidence : 0.003601794
D/KEYPOINT: Body Part : LEFT_ANKLE, Keypoint Location : (161.0, 194.0), Confidence : 0.0022431263
D/KEYPOINT: Body Part : RIGHT_ANKLE, Keypoint Location : (92.0, 198.0), Confidence : 0.0021493114

Complete Code of the PosenetActivity.kt

Here is the complete code of the PosenetActivity.kt.

package org.tensorflow.lite.examples.posenet

import android.app.Activity
import android.app.AlertDialog
import android.app.Dialog
import android.content.Intent
import android.content.pm.PackageManager
import android.graphics.*
import android.os.Bundle
import android.support.v4.app.ActivityCompat
import android.support.v4.app.DialogFragment
import android.support.v4.app.Fragment
import android.util.Log
import android.util.SparseIntArray
import android.view.LayoutInflater
import android.view.Surface
import android.view.View
import android.view.ViewGroup
import android.widget.Toast
import kotlinx.android.synthetic.main.activity_posenet.*
import kotlin.math.abs
import org.tensorflow.lite.examples.posenet.lib.BodyPart
import org.tensorflow.lite.examples.posenet.lib.Person
import org.tensorflow.lite.examples.posenet.lib.Posenet
import android.provider.MediaStore
import android.graphics.Bitmap

class PosenetActivity :
    Fragment(),
    ActivityCompat.OnRequestPermissionsResultCallback {

    /** List of body joints that should be connected.    */
    private val bodyJoints = listOf(
        Pair(BodyPart.LEFT_WRIST, BodyPart.LEFT_ELBOW),
        Pair(BodyPart.LEFT_ELBOW, BodyPart.LEFT_SHOULDER),
        Pair(BodyPart.LEFT_SHOULDER, BodyPart.RIGHT_SHOULDER),
        Pair(BodyPart.RIGHT_SHOULDER, BodyPart.RIGHT_ELBOW),
        Pair(BodyPart.RIGHT_ELBOW, BodyPart.RIGHT_WRIST),
        Pair(BodyPart.LEFT_SHOULDER, BodyPart.LEFT_HIP),
        Pair(BodyPart.LEFT_HIP, BodyPart.RIGHT_HIP),
        Pair(BodyPart.RIGHT_HIP, BodyPart.RIGHT_SHOULDER),
        Pair(BodyPart.LEFT_HIP, BodyPart.LEFT_KNEE),
        Pair(BodyPart.LEFT_KNEE, BodyPart.LEFT_ANKLE),
        Pair(BodyPart.RIGHT_HIP, BodyPart.RIGHT_KNEE),
        Pair(BodyPart.RIGHT_KNEE, BodyPart.RIGHT_ANKLE)
    )

    val REQUEST_CODE = 100

    /** Threshold for confidence score. */
    private val minConfidence = 0.5

    /** Radius of circle used to draw keypoints.  */
    private val circleRadius = 8.0f

    /** Paint class holds the style and color information to draw geometries,text and bitmaps. */
    private var paint = Paint()

    /** An object for the Posenet library.    */
    private lateinit var posenet: Posenet

    override fun onCreateView(
        inflater: LayoutInflater,
        container: ViewGroup?,
        savedInstanceState: Bundle?
    ): View? = inflater.inflate(R.layout.activity_posenet, container, false)

    override fun onStart() {
        super.onStart()

        posenet = Posenet(this.context!!)

        selectImage.setOnClickListener(View.OnClickListener {
            val intent = Intent(Intent.ACTION_PICK)
            intent.type = "image/jpg"
            startActivityForResult(intent, REQUEST_CODE)
        })
    }

    override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
        if (resultCode == Activity.RESULT_OK && requestCode == REQUEST_CODE) {
            imageView.setImageURI(data?.data) // handle chosen image

            val imageUri = data?.getData()
            val bitmap = MediaStore.Images.Media.getBitmap(context?.contentResolver, imageUri)

            processImage(bitmap)
        } else {
            Toast.makeText(context, "No image is selected.", Toast.LENGTH_LONG).show()
        }
    }

    override fun onDestroy() {
        super.onDestroy()
        posenet.close()
    }

    override fun onRequestPermissionsResult(
        requestCode: Int,
        permissions: Array<String>,
        grantResults: IntArray
    ) {
        if (requestCode == REQUEST_CAMERA_PERMISSION) {
            if (allPermissionsGranted(grantResults)) {
                ErrorDialog.newInstance(getString(R.string.request_permission))
                    .show(childFragmentManager, FRAGMENT_DIALOG)
            }
        } else {
            super.onRequestPermissionsResult(requestCode, permissions, grantResults)
        }
    }

    private fun allPermissionsGranted(grantResults: IntArray) = grantResults.all {
        it == PackageManager.PERMISSION_GRANTED
    }

    /** Crop Bitmap to maintain aspect ratio of model input.   */
    private fun cropBitmap(bitmap: Bitmap): Bitmap {
        val bitmapRatio = bitmap.height.toFloat() / bitmap.width
        val modelInputRatio = MODEL_HEIGHT.toFloat() / MODEL_WIDTH
        var croppedBitmap = bitmap

        // Acceptable difference between the modelInputRatio and bitmapRatio to skip cropping.
        val maxDifference = 1e-5

        // Checks if the bitmap has similar aspect ratio as the required model input.
        when {
            abs(modelInputRatio - bitmapRatio) < maxDifference -> return croppedBitmap
            modelInputRatio < bitmapRatio -> {
                // New image is taller so we are height constrained.
                val cropHeight = bitmap.height - (bitmap.width.toFloat() / modelInputRatio)
                croppedBitmap = Bitmap.createBitmap(
                    bitmap,
                    0,
                    (cropHeight / 5).toInt(),
                    bitmap.width,
                    (bitmap.height - cropHeight / 5).toInt()
                )
            }
            else -> {
                val cropWidth = bitmap.width - (bitmap.height.toFloat() * modelInputRatio)
                croppedBitmap = Bitmap.createBitmap(
                    bitmap,
                    (cropWidth / 5).toInt(),
                    0,
                    (bitmap.width - cropWidth / 5).toInt(),
                    bitmap.height
                )
            }
        }
        Log.d(
            "IMGSIZE",
            "Cropped Image Size (" + croppedBitmap.width.toString() + ", " + croppedBitmap.height.toString() + ")"
        )
        return croppedBitmap
    }

    /** Set the paint color and size.    */
    private fun setPaint() {
        paint.color = Color.RED
        paint.textSize = 80.0f
        paint.strokeWidth = 5.0f
    }

    /** Draw bitmap on Canvas.   */
    private fun draw(canvas: Canvas, person: Person, bitmap: Bitmap) {
        setPaint()

        val widthRatio = canvas.width.toFloat() / MODEL_WIDTH
        val heightRatio = canvas.height.toFloat() / MODEL_HEIGHT

        // Draw key points over the image.
        for (keyPoint in person.keyPoints) {
            Log.d(
                "KEYPOINT",
                "Body Part : " + keyPoint.bodyPart + ", Keypoint Location : (" + keyPoint.position.x.toFloat().toString() + ", " + keyPoint.position.x.toFloat().toString() + "), Confidence" + keyPoint.score
            );

            if (keyPoint.score > minConfidence) {
                val position = keyPoint.position
                val adjustedX: Float = position.x.toFloat() * widthRatio
                val adjustedY: Float = position.y.toFloat() * heightRatio
                canvas.drawCircle(adjustedX, adjustedY, circleRadius, paint)
            }
        }

        for (line in bodyJoints) {
            if (
                (person.keyPoints[line.first.ordinal].score > minConfidence) and
                (person.keyPoints[line.second.ordinal].score > minConfidence)
            ) {
                canvas.drawLine(
                    person.keyPoints[line.first.ordinal].position.x.toFloat() * widthRatio,
                    person.keyPoints[line.first.ordinal].position.y.toFloat() * heightRatio,
                    person.keyPoints[line.second.ordinal].position.x.toFloat() * widthRatio,
                    person.keyPoints[line.second.ordinal].position.y.toFloat() * heightRatio,
                    paint
                )
            }
        }
    }

    /** Process image using Posenet library.   */
    private fun processImage(bitmap: Bitmap) {
        // Crop bitmap.
        val croppedBitmap = cropBitmap(bitmap)

        // Created scaled version of bitmap for model input.
        val scaledBitmap = Bitmap.createScaledBitmap(croppedBitmap, MODEL_WIDTH, MODEL_HEIGHT, true)
        Log.d(
            "IMGSIZE",
            "Cropped Image Size (" + scaledBitmap.width.toString() + ", " + scaledBitmap.height.toString() + ")"
        )

        // Perform inference.
        val person = posenet.estimateSinglePose(scaledBitmap)

        // Making the bitmap image mutable to enable drawing over it inside the canvas.
        val workingBitmap = Bitmap.createBitmap(croppedBitmap)
        val mutableBitmap = workingBitmap.copy(Bitmap.Config.ARGB_8888, true)

        // There is an ImageView. Over it, a bitmap image is drawn. There is a canvas associated with the bitmap image to draw the keypoints.
        // ImageView ==> Bitmap Image ==> Canvas

        val canvas = Canvas(mutableBitmap)

        draw(canvas, person, mutableBitmap)
    }

    /**
     * Shows an error message dialog.
     */
    class ErrorDialog : DialogFragment() {

        override fun onCreateDialog(savedInstanceState: Bundle?): Dialog =
            AlertDialog.Builder(activity)
                .setMessage(arguments!!.getString(ARG_MESSAGE))
                .setPositiveButton(android.R.string.ok) { _, _ -> activity!!.finish() }
                .create()

        companion object {

            @JvmStatic
            private val ARG_MESSAGE = "message"

            @JvmStatic
            fun newInstance(message: String): ErrorDialog = ErrorDialog().apply {
                arguments = Bundle().apply { putString(ARG_MESSAGE, message) }
            }
        }
    }

    companion object {
        /**
         * Conversion from screen rotation to JPEG orientation.
         */
        private val ORIENTATIONS = SparseIntArray()
        private val FRAGMENT_DIALOG = "dialog"

        init {
            ORIENTATIONS.append(Surface.ROTATION_0, 90)
            ORIENTATIONS.append(Surface.ROTATION_90, 0)
            ORIENTATIONS.append(Surface.ROTATION_180, 270)
            ORIENTATIONS.append(Surface.ROTATION_270, 180)
        }

        /**
         * Tag for the [Log].
         */
        private const val TAG = "PosenetActivity"
    }
}

Add speed and simplicity to your Machine Learning workflow today

Get startedContact Sales

Conclusion

This tutorial discussed using the pretrained PoseNet for building an Android app that estimates the human pose. The model is able to predict the location of 17 keypoints in the human body such as the eyes, nose, and ears.

The next figure summarizes the steps applied in this project. We started by loading an image from the gallery, cropping and resizing it to (257, 257). After the PoseNet model is loaded, the image is fed to it for predicting the keypoint locations. Finally, the detected keypoints with a confidence above 0.5 are drawn on the image.

The next tutorial continues this project to put filters over the locations of the keypoints, applying effects like those seen in Snapchat.