Presence is a kinetic scultpure that use pose estimation powered by PoseNet to perceive and mimic the form of the viewer. It was shown at the 2018 Makerfaire.

Background: for my final project for Intro to Physical Computing and Design for Digital Fabrication at ITP, I created Presence, a kinetic sculpture that moved in the direction of a viewers gaze. It used a gaze detection convolutional neaural network to estimate gaze position for a single user. I showed this installation at the 2017 ITP Winter Show.

View this post on Instagram

Controlling Presence just with her eyes. #computervision #gaze #kineticsculpture #itp #itpwintershow

A post shared by Dan (@dan_oved) on

I observed users interacting with the project and noticed they had difficulty understanding how to control the structure with their gaze. They needed specific instructions and often coaching. Without any instruction, the most natural instinct for people was to go up and wave their arms or move in front of it. Children were the most telling - they would move rotate their faces and heads left and right instead of their eyes, which would prevent it from working at all because for the gaze detection to work the front of the face must be visible to the camera.

Fast-forward to fall of 2018, I submitted and was accepted to show this installation at the 2018 MakerFaire. I then realized it would be great to try out using gestures from human poses as control, as from my observations this fit in more naturally to how users expect to interact with something like this.

Having worked on the PoseNet machine learning model in the spring, and having learned TouchDesigner over the summer, I switched the application to control the installation from Processing to TouchDesigner, and changed the form of control from following a gaze position to mimicking the structure of the arms and the shoulders of the person in front of the sculpture.

I also built a tool PoseNet for Installations which would download the pre-trained models so that they can be loaded offline, and would perform pose estimation and broadcast the detected keypoints over websocket. It also would bundle the application into an executable so that it could be opened on a windows machine. This allowed me to use a high powered MSI laptop from ITP, use PoseNet work without an internet connection, and receive estimated pose keypoints in TouchDesigner.

I also integrated a MiDi controller to easily make the program change from a non-interactive mode with animations, to an interactive mode controlled by a single person.

I initially wanted to hide the screen showing the pose estimation results but when setting it up people walking by saw it and were amazed at how well it worked - I decided to leave this screen visible during the show and it was a great choice because people loved to see it and it helped them understand how the installation worked.

Showing the results of PoseNet on a screen proved to greatly help explain how the installation worked.

One of the main challenges I had was that while PoseNet can work with multiple people, the installation was designed to work with only one person. Additionally, PoseNet can not associate the same person to the same pose between frames - the returned poses are ordered by confidence score. So often the pose would jump between the keypoints of different people, causing a jittery effect.

I tried a few things to get it to work with the main person interacting with it. One of the things I tried was selecting the person with the bounding box closest to the center. This didn’t work well because if someone was standing behind and was closer to the center, they were the one selected. The solution I settled on was selecting the person with the lowest bottom of the bounding box, which is essentially the person closest to the installation. This worked pretty well unless the person was too close, or other people stood next to the person but slightly in front of them.

The installation would get glitchy when multiple people stood next to each other in front of it, as it would jump between having each of them control it. It proved challenging to have the same person control it between each frame.

This glitchiness in selecting the right person, and the fact that the installation itself didnt work perfectly ended up being a blessing in disguise because when people tried for a while to figure out how it worked, it resulted in insteresting behavior. If it had just worked well I theorize people would have gone up to it, been satisfied, and walked away. This sort of delayed gratification that people got when it finally worked I believe increased the engagement time and resulting in more interesting behavior. Additionally, observing how kids wanted to interact with it when they had the freedom to behave how they wanted in front of the camera brought out some really funny behavior.

It was inspiring to see the many different ways people interacted with it and I look forward to continuing this work further.