| Nobody sees the thief
looking for a car to break into, or the woman steeling herself to
jump in front of a train--- but somehow the alarm is sounded.
Duncan Graham-Rowe enters a world where machines predict our
every move
|
GEORGE IS BLISSFULLY
UNAWARE that a crime is about to be
committed right under his nose. Partially obscured by a bag of doughnuts
and a half-read newspaper is one of the dozens of security monitors he is
employed to watch constantly for thieves and vandals.
On the
screen in question, a solitary figure furtively makes his way through a
car park towards his target. The miscreant knows that if the coast is
clear it will take him maybe 10 seconds to get into the car, 15 to bypass
the engine immobiliser and 10 to start the engine. Easy.
But
before he has even chosen which car to steal, an alarm sounds in the
control room, waking George from his daydream. A light blinking above the
screen alerts him to the figure circling in the car park and he picks up
his radio. If his colleagues get there quickly enough, they will not only
catch a villain but also prevent a crime.
The unnatural prophetic
powers of the security team would not exist but for some smart technology.
The alarm that so rudely disturbed George is part of a sophisticated
visual security system that predicts when a crime is about to be
committed. The remarkable research prototype was developed by Steve
Maybank at the University of Reading and David Hogg at the University of
Leeds. Although in its infancy, this technology could one day be used to
spot shoplifters, predict that a mugging is about to take place on a
subway or that a terrorist is active at an airport.
Once connected
to such intelligent systems, closed- circuit television (CCTV) will shift
from being a mainly passive device for gathering evidence after a crime,
to a tool for crime prevention. But not everyone welcomes the prospect.
The technology would ensure that every security screen is closely watched,
though not by human eyes. It would bring with it a host of sinister
possibilities and fuel people's fears over privacy.
Criminals
certainly have reason to be worried, with the car park system, for
example, the more thieves try to hide from a camera--by lurking in shadow,
perhaps--the easier it is to spot them. Underlying the system is the fact
that people behave in much the same way in car parks. Surprisingly, the
pathways they follow to and from their cars are so similar as to be
mathematically predictable--the computer recognises them as patterns. If
anyone deviates from these patterns, the system sounds the alarm. "It's
unusual for someone to hang around cars," says Maybank. "There are
exceptions, but it's rare."
To fool the system, a thief would have
to behave as though they owned the car, confidently walking up to it
without casing it first or pausing to see if the real owner is nearby. In
short, they have to stop behaving like a thief. It sounds easy, but
apparently it isn't.
Another surprising thing about the system is
that it employs relatively unsophisticated technology. For decades,
researchers have been devising clever ways for a computer presented with a
small section of a face, arm or leg to deduce that it is looking at a
person. Maybank and Hogg have rejected all this work, giving their
prototype only the simplest of rules for recognising things. "If it's tall
and thin it's a person," says Maybank. "If it's long and low it's a car."
It's the trajectory of these "objects" that the system follows. An
operator can constantly update the computer's notion of "normal behaviour"
by changing a series of threshold values for such things as the width of
pathways and walking speed. In this way it can be made more reliable over
time. If trained on enough suitable footage, the system should be able to
view children running in the car park or somebody tinkering with their
engine without raising the alarm. Its ability to calculate where people
are likely to go even allows the system to predict which car a thief is
aiming for, though Maybank concedes that the crook's target cannot be
guaranteed.
The system should identify more than just potential
car thieves. Because it spots any abnormal behaviour, the computer should
sound the alarm if a fight breaks out--though this hasn't been tested yet.
Of course, not all unusual activity is criminal. But if the system flags
up an innocuous event, says Maybank, it doesn't really matter. The idea is
to simply notify the Georges of this world when something out of the
ordinary happens. It's up to them to decide whether or not they need to
act on what they see.
Maybank plans now to join forces with Sergio
Velastin of King's College London and others in a project funded by the
European Commission to develop a full range of security features for
subways. Velastin has already broken new ground in this area. In a
recently completed project, called Cromatica, he developed a prototype
that has been tested on the London Underground for monitoring crowd flows
and warning of dangerous levels of congestion. It will also spot people
behaving badly, such as those going where they shouldn't.
Most
impressive of all, Cromatica can identify people who are about to throw
themselves in front of a train. Frank Norris, the coroner's liaison
officer for London Underground, says there is an average of one suicide
attempt on the network every week. These incidents are not only personal
tragedies but also cause chaos for millions of commuters and great
distress for the hapless train drivers.
Keeping track of thousands
of people in a tube station is impossible for a human or a computer.
Following individuals is tough enough: as people move, different parts of
their bodies appear and disappear, and sometimes they are completely
obscured. To get round this problem, Velastin rejected completely the idea
of identifying objects--people, that is.
Instead, Cromatica
identifies movement by monitoring the changing colours and intensities of
the pixels that make up a camera's view of a platform. If the pixels are
changing, the reasoning goes, the chances are that something is moving and
that it's human. The system compares its view second by second with what
it sees when the platform is empty. The more its view changes from this
baseline, the more people are passing, and the speed of change gives a
measure of how quickly those people are moving. If things stay constant
for too long, it's likely that the crowd has stopped and there may be
dangerous congestion--so an alarm would sound.
Averting a tragedy
Cromatica's ability to spot people contemplating suicide stems
from the finding, made by analysing previous cases, that these individuals
behave in a characteristic way. They tend to wait for at least ten minutes
on the platform, missing trains, before taking their last few tragic
steps. Velastin's deceptively simple solution is to identify patches of
pixels that are not present on the empty platform and which stay unchanged
between trains, once travellers alighting at the station have left.
"If we know there is a blob on the screen and it remains
stationary for more than a few minutes then we raise the alarm," says
Velastin. Security guards can then decide whether or not they need to
intervene. So far, Cromatica has not seen video footage of real suicide
cases--it has only identified people who have simulated the behaviour.
In trials where Cromatica was pitted against humans it proved
itself dramatically, detecting 98 per cent of the events--such as
congestion--spotted by humans. In fact, the humans performed
unrealistically well in the tests because they had to watch just one
screen, whereas they would normally check several screens at once.
Cromatica also scored well on false alarms: only 1 per cent of the
incidents it flagged up turned out to be non-events. This low rate is
vital, says Velastin, if operators are to trust the system.
Velastin and Maybank's present project, which includes partners
such as the defence and communications company Racal, aims to detect other
forms of criminal activity, "anything for which eventually you would want
to call the police", says Velastin. This will include people selling
tickets illegally and any violent behaviour.
But detecting violent
crime is not as straightforward as it might appear. Certainly if a fight
breaks out the characteristic fast, jerky movements of fists flying and
bodies grappling would show up as unusual activity. But what of a mugging?
Often a mugging is a verbal confrontation with no physical contact. To a
vision system, someone threatening a person with a knife looks much the
same as someone offering a cigarette to a friend. Indeed, recognising that
there is any interaction at all between people is still a monster
challenge for a machine. No one yet has the answer.
Nevertheless,
Maybank is taking the first tentative steps into this field, incorporating
into his car park system a method for identifying what people are doing
and then annotating the videotape with the details. The technique works by
attaching virtual labels to objects, such as cars and people, and then
analysing the way they move and interact. So far the system can
distinguish between basic activities such as walking, driving and meeting
(or mugging).
It is here, provided the system can be perfected,
that Maybank sees the potential for sinister uses of the technology. In
places such as the City of London--the capital's main business area--CCTV
cameras are so widespread that it's difficult to avoid them. With such
blanket coverage, and as it becomes possible to track a person from one
camera to the next, it would be relatively easy to "tail" people remotely,
logging automatically their meetings and other activities. Maybank and his
colleagues worry about this type of use. "This is something that will have
to be considered by society as a whole," he says.
Simon Davies,
director of the human rights group Privacy International, is scathing
about the technology. "This is a very dangerous step towards a total
control society," he says. For one thing, somebody has to decide what
"normal behaviour" is, and that somebody is likely to represent a narrow,
authoritarian viewpoint. "The system reflects the views of those using
it," he argues. Anyone who does act out of the ordinary will be more
likely than now to be approached by security guards, which will put
pressure on them to avoid standing out. "The push to conformity will be
extraordinary," Davies says. "Young people will feel more and more
uncomfortable if that sort of technology becomes ubiquitous."
On
the other hand, to fully grasp the benefits of a system that can recognise
and record details of different activities, consider the following
scenario: a future, technology-savvy George keeps watch as thousands of
people flow through an airport. The security team has been tipped off
about a terrorist threat. But where to begin?
One starting point
is to watch for unattended baggage. Most airports do this continuously,
with the majority of cases turning out to be lost luggage. So how do you
distinguish between a lost item and one deliberately abandoned? The best
way would be if George could rewind to the precise moment when a bag was
left by its owner.
George takes a bite of doughnut and washes it
down with some tepid coffee when suddenly an alarm sounds:
"Suspect package alert. Suspect pack..." He flicks a switch. The
system has zoomed in on a small bag on the ground next to a bench.
"Where is it?" he demands.
"Terminal three, departure gate
32," squawks the computer.
"How long?"
"Four minutes."
"Show event," orders George.
The system searches back
until it finds the electronic annotation that marks where the bag and its
carrier parted company. The screen changes to show a man sitting on the
bench with the bag at his feet. He reaches into it briefly, looks around,
then stands and walks away.
"Where is he now?" asks George.
"Terminal three, level 2, departure lounge."
"Show me."
The screen changes again, this time showing the man walking
quickly towards the exit. George picks up his radio: "Jim. We've got a
two-zero-three coming your way. Red shirt, black denim jacket. Pick him
up." After alerting the bomb squad and clearing the departure gate, he
pops the remainder of the doughnut into his mouth and turns back to that
pesky crossword . . .
Seamless tracking
There are plenty of instances where it
would be helpful to refer back to specific events. And though this
scenario may sound far-fetched, it isn't. The Forest of Sensors (FoS),
developed by Eric Grimson at the Massachusetts Institute of Technology,
near Boston, already has all the foundations of such a system--apart from
speech recognition. "We just haven't put it all together yet, so I don't
want to say we can definitely do it now," he says.
Grimson's
system, which is partly funded by the US government's Defense Advanced
Research Projects Agency, sets itself up from scratch with no human
intervention. The idea behind it was that dozens of miniature cameras
could be dropped at random into a military zone and FoS would work out the
position of every camera and build up a three-dimensional representation
of the entire area. The result is a network of cameras that requires no
calibration whatsoever. You simply plug and play, says Grimson.
Quick and dirty
In order to build up a three-dimensional
image, most 3D surveillance systems, such as those used in the car park
and subway, need every camera to be "shown" where the floor and walls are.
Grimson's system does this automatically. And provided there is a little
bit of overlap between the cameras' images, FoS will figure out where in
the big scheme of things every image belongs.
"We do it purely on
the basis of moving objects," he says. "As long as we can track anything
in sight, we can use that information to help the system figure out where
all the cameras are." Having decided what is background movement, such as
clouds passing or trees blowing in the wind, FoS then assumes that other
objects are moving on the ground. From these movements, it calculates the
ground plane and reconstructs the 3D space it's looking at. The system
then allows seamless tracking from one camera to the next.
FoS is
smart in other ways too. The system can learn from what it sees and build
up a profile of what is and what is not normal behaviour. It
differentiates between objects by sensing their shapes, using
quick-and-dirty methods to detect their edges and measure their aspect
ratios. It then classifies them as, for example, individuals, groups of
people, cars, vans, trucks, cyclists and so on.
Moreover, the
system can employ its inbuilt analytical powers to decide for itself what
activities the camera is seeing, such as a person getting into a car or
loading a truck. Of course, the system doesn't understand what these
activities are, says Grimson, it merely categorises activities by learning
from vast numbers of examples. It's up to a human to give each activity a
name.
Like Maybank and Hogg, Grimson is still struggling to
distinguish a meeting from a mugging. He hopes that higher resolution
cameras, that can spot small details and movements, will help to crack the
problem, and that's what he's working on now. Higher resolution should
also allow him to exploit progress made in recent years in gesture
recognition. In particular, he thinks that "gait recognition" will make
its mark as a way to identify people. It needs lower resolution than face
recognition and its reliability is growing fast (New Scientist, 4 December, p 18).
FoS can
already perform many of the tasks that gives Maybank the jitters. Grimson,
too, has reservations about what his research might be used for. His
system could conceivably be used by intelligence agencies to monitor the
behaviour of individuals. But he would be unhappy if his research were
used in this way. "You have to rely on the legal system to strike a
balance," he says. "It is a real worry." Fortunately, both these tasks are
probably impractical at present. "The volume of data is so huge it's
incredibly unlikely," he says.
One place where Grimson is keen to
deploy FoS is in the homes of elderly people. Many old folk are unhappy
about being monitored in their homes by CCTV because of the lack of
privacy, he says. But with FoS, there would be no need for a human to
watch at all. The system would train itself on a person's patterns of
behaviour and ask them if they were all right if they failed to get up one
morning or fell over. If the person didn't respond, the system would issue
a distress call to a help centre. Another George would send someone round
to help, without even once seeing inside the person's home.
Is
this, then, an unequivocally good use for a smart surveillance system?
Davies reckons not. "This is like justifying road accidents because they
provide hospital beds," he says. Elderly people will end up trying to
conform to the system so as not to trigger the alarm.
But, whether
for good or bad, surveillance machines are going to to get smarter.
They're already starting to recognise people's faces in the street (New Scientist, 25 September, p 40), and systems
that spot abnormal behaviour will not be far behind. So, if you have a
hanker- ing to cartwheel down main street you'd better do it now. Wait a
few years and it will be recorded, annotated and stored--just waiting to
come back and haunt you.
Further reading:
From New
Scientist, 11 December 1999
Subscribe
to New Scientist
|