How Google Built the Pixel 2 Camera

The range of cameras that have been
created over the years is vast. There have been cameras made for
space, for inside the human body for helping cars see and of course
for capturing our everyday lives. Today, most of us take most of our
photos and videos on our phone cameras and they can do some pretty incredible
things like take 360-degree photos record video in 4K
and help us meet new friends. Ever since last year when I got to
test out the camera on the Google Pixel I’ve been curious to know how can
photos like these come out of a camera that’s smaller than a fingernail?
What’s going on inside that I can’t see? Since the Pixel 2 just came out, I thought
it would be the perfect opportunity to run around Google, meet with the people
who worked on its camera and find out as much as I could.
You ready for this, Noodles? There won’t be any turtle
photos but there’ll be a lot of other cool photos.
OK, let’s go. The main change for building
a phone camera is size. Since we want our phones
to be light and thin you essentially have the space of a
blueberry for the camera to squeeze into.>From the outside of the phone,
you see the lens which on the Pixel 2, is actually
a stack of 6 lenses. And they have very strange shapes,
they’ve got weird Ws in them and so on because you’re
trying to correct for what are called aberrations that distort the image
in a very small amount of space. This year, there’s also
optical image stabilisation. There is an actual physical
piece wrapped around the lenses which has motors on it and it can
align the lens in a number of dimensions. Focusing moves the lenses in and
out, while optical image stabilisation moves them up, down, left and right. You can see it moving around as it compensates for what
your hand is doing. Then just a millimetre behind
the lenses is a sensor the equivalent of film
for a digital camera. It’s covered with light-sensitive
photosites, AKA pixels which capture the light and
convert it into an electric signal. This year the image sensor
has 12 megapixels but each of the pixels has a
left and right split. So it actually has 24 mega
sub-pixels, if you will. We’ll talk more about this later
but what’s interesting to note now is it gives a sensor new capabilities
related to depth of field and autofocus. Can you give a brief overview of
what’s happening when you take a photo? It’s amazing that we can use a piece
of silicone to take a picture at all You wouldn’t wanna look at the picture
direct from the sensor. It’s dark, it’s green, it’s got
bad pixels that are stuck on. Even without computational
photography, there’s lots of processing that goes on to make that image
into a good final photograph. And this sort of processing
happens on all digital cameras. Each camera does things
a bit differently but on the Pixel 2 there are roughly
30 to 40 steps, all in all. And for me the first step was
the most interesting to learn about. The sensor has a physical colour
filter laid out in a checker board pattern of red, green and blue pixels, so instead
of a pixel sensing all light colours… It senses just red or
just green or just blue. And it collects twice as
much green light because our eyes are more sensitive to green. So you have
to combine the red that was seen here from
the green that was seen here with the blue that was seen
here to make a colour image that’s a process called demosaicing. After this,
the image will get gamma corrected white balanced, denoised,
sharpened and much more. Traditionally, those steps
have been done by hardware meaning circuitry that is specialised to
do that but as the cameras begin to move toward computational photography,
it’s being done more and more in software. Computational photography
can mean a lot of different things. But it’s essentially advanced algorithms
that superpower image processing. On the Pixel 2 there’s two big features
it enables, HDR+ and portrait mode. When we set out to build HDR+
we wanted an algorithm that could take a small sensor
and make it act like a big sensor. That means you get great low-light
performance and high dynamic range. You can catch really dark and
bright things in the same picture. To achieve this on a phone camera,
every image you capture isn’t one image but a
combination of up to 10 images all of which have been
underexposed to save both the dark parts and the
bright parts of the scene. But HDR+ isn’t just averaging
all these photos together since hands can move or
things in the scene can change. So, we go through each tile
of the image and we say… did that move from the other one?
Can we move it a bit and match it up? We don’t know where that one went so let’s
discard that one tile of that one frame. We’re very very careful
about avoiding ghosts. I like how ghosts is a technical term. It is, it means “double image”. After scaring the ghosts away,
there’s the aesthetic decision of how much to combine the
dark and the light parts of the photo. If you take a picture in very dark light,
by capturing a burst of pictures and averaging them together
we can make that shot look pretty good. But should we make it
as bright as if you were in daylight? If we bring up all the dark shadows
and we save all the highlights then you’ll end up with a very
cartoony looking image. So we have to decide what to throw away. Alright pause. You see how Mark is in focus here
and the background is blurred? That’s called shallow depth of field
and was achieved by filming using a fast lens and
a wide aperture setting. Portrait mode is a new feature
for Pixel 2 that recreates this look. But, of course, on a phone
things are a little bit trickier. The lens is so tiny and
the par aperture is so small. When you just take a normal picture with
a mobile phone everything is pretty sharp. To overcome this, portrait mode
uses a combination of machine learning and depth mapping. Instead of just treating each pixel
just as a pixel we try to understand “What is it?” Is it a person, is it a background,
what’s the meaning of this pixel? The team trained a neural network
with almost a million examples of people, and people wearing hats,
and holding ice cream cones and posing with their friends, and their
dogs, to recognise what pixels are human foreground things
and which are background things. This allows the algorithm
to create a mask. And that mask says everything
inside that mask should be left sharp. But then the question is how much do you
blur out the stuff outside of the mask? When we picked the hardware we knew that
we were getting this dual pixel sensor where every pixel is
actually split into two sub-pixels. So it’s like your 2 eyes, it’s getting
2 different views of the world from the left and right sides
of a very, very small camera. And this tiny difference in perspective,
smaller than the tip of a pencil is enough to generate
a rough depth map. And we roughly size the
amount of blur to apply depending on how far
away you think it is. So, even if you take a photo
of something that isn’t a person by using depth maps, portrait mode
can create a nice macro-looking shot. And if you like selfies, portrait mode
works on the front-facing camera as well. Before making this episode
I didn’t realise the extent to which the cameras in our phones
have been tested and tuned. There’s a saying in engineering, “If you
haven’t really tested, it’s broken.” And a lot of the camera quality really
depends on how do you create a set of tests that allows you to know
how well you’re doing. Tuning a camera is a mix of art and
physics with thousands of
parameters to adjust. Problem is they all
interfere with each other. You make 1 change and you have to
work out the 10 things that are also affected and
change those as well. For this reason, there’s
labs that run the camera through a gauntlet
of automated tests measuring autofocus white balance, overall
colour and tone, resolution and more. If this sort of testing wasn’t
able to happen what do you think the
consequences would be? Without it, it would be weeks and
weeks just to get one dataset and you couldn’t iterate the
way that we do with engineering. One of my favourite setups
was a robotic stage called a hexapod that tests
video stabilisation. We can give the stage different
coordinates to go to so we can give it a slow gentle wave or we
can tell it to go rocking crazy. This year optical and electronic image
stabilisation are being used for video. The first corrects for small motions
like a little bit of handshake while the latter corrects
for bigger motions. How it works is it looks at a video
frame and then compares it to a few frames ahead using
measurements from the gyroscope. So the gyro will tell you
if you’ve moved this way or that,
and we use that to determine if this was random motion. Then
we take that motion and cancel it out. There is so much going on
inside a phone camera like the one on the Pixel 2
that I could’ve easily made a video about just stabilisation,
or autofocus or any other feature. In the course of making this video
I learned so many different things. Like the fact that to
autofocus in the dark the Pixel 2 has a tiny
infrared laser beam. And the back camera weighs 0.003 lbs,
nearly as light as a paperclip. I know I just started to scratch the
surface into how a phone camera works which is an amazingly complex process.
You may have noticed but sprinkled throughout this episode
were some photos taken on the Pixel 2. Which was rated the best smartphone
camera ever tested by DxO. And if you’re interested in seeing
more, you should check out this video. Filmed on the Pixel 2
by myself and my friend Lo. OK, that’s all from me, bye for now. Noodles is gonna take a nap and
you should go watch another video. See ya.

Posts created 3637

Leave a Reply

Your email address will not be published. Required fields are marked *

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top