63,59 €
Learn the fundamentals and applications of embedded vision systems, covering various industries and practical examples.
The ideal audience for this book includes engineers, developers, and researchers working in the field of embedded vision systems. A basic understanding of computer vision and digital image processing is recommended.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 887
Veröffentlichungsjahr: 2024
An Introduction
S. R. Vijayalakshmi, PhDS. Muruganand, PhD
Copyright ©2020 by MERCURY LEARNINGAND INFORMATION LLC. All rights reserved.
Original title and copyright: Embedded Vision. Copyright ©2019 by Overseas Press India Private Limited. All rights reserved.
This publication, portions of it, or any accompanying software may not be reproduced in any way, stored in a retrieval system of any type, or transmitted by any means, media, electronic display or mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning, without prior permission in writing from the publisher.
Publisher: David Pallai MERCURY LEARNINGAND INFORMATION22841 Quicksilver Drive Dulles, VA 20166 [email protected](800) 232-0223
S. R. Vijayalakshmi and S. Muruganand. Embedded Vision: An Introduction.ISBN: 978-1-68392-457-9
The publisher recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of service marks or trademarks, etc. is not an attempt to infringe on the property of others.
Library of Congress Control Number: 2019937247 192021321 This book is printed on acid-free paper in the United States of America.
Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc.
For additional information, please contact the Customer Service Dept. at (800) 232-0223 (toll free).
All of our titles are available in digital format at academiccourseware.com and other digital vendors. Companion disc files for this title are available by contacting [email protected]. The sole obligation of MERCURY LEARNINGAND INFORMATION to the purchaser is to replace the disc, based on defective materials or faulty workmanship, but not based on the operation or functionality of the product.
Preface
Chapter 1 Embedded Vision
1.1 Introduction to Embedded Vision
1.2 Design of an Embedded Vision System
Characteristics of Embedded Vision System Boards Versus Standard Vision System Boards
Benefits of Embedded Vision System Boards
Processors for Embedded Vision
High Performance Embedded CPU
Application Specific Standard Product (ASSP) in Combination with a CPU
General Purpose Cpus
Graphics Processing Units with CPU
Digital Signal Processors with Accelerator(s) and a CPU
Field Programmable Gate Arrays (FPGAs) with a CPU
Mobile “Application Processor”
Cameras/Image Sensors for Embedded Vision
Other Semiconductor Devices for Embedded Vision
Memory
Networking and Bus Interfaces
1.3 Components in a Typical Vision System
Vision Processing Algorithms
Embedded Vision Challenges
1.4 Applications for Embedded Vision
Swimming Pool Safety System
Object Detection
Video Surveillance
Gesture Recognition
Simultaneous Localization and Mapping (SLAM)
Automatic Driver Assistance System (ADAS)
Game Controller
Face Recognition for Advertising Research
Mobile Phone Skin Cancer Detection
Gesture Recognition for Car Safety
Industrial Applications for Embedded Vision
Medical Applications for Embedded Vision
Automotive Applications for Embedded Vision
Security Applications for Embedded Vision
Consumer Applications for Embedded Vision
Machine Learning in Embedded Vision Applications
1.5 Industrial Automation and Embedded Vision: A Powerful Combination
Inventory Tracking
Automated Assembly
Automated Inspection
Workplace Safety
Depth Sensing
1.6 Development Tools for Embedded Vision
Both General Purpose and Vendor Specific Tools
Personal Computers
OpenCV
Heterogeneous Software Development in an Integrated Development Environment
Summary
Reference
Learning Outcomes
Further Reading
Chapter 2 Industrial Vision
2.1 Introduction to Industrial Vision Systems
PC-Based Vision Systems
Industrial Cameras
High-Speed Industrial Cameras
Smart Cameras
2.2 Classification of Industrial Vision Applications
Dimensional Quality
Surface Quality
Structural Quality
Operational Quality
2.3 3D Industrial Vision
Automated Inspection
Robotic Guidance
3D Imaging
3D Imaging Methods
3D Inspection
3D Processing
3D Robot Vision
High-Speed Imaging
High-Speed Cameras
Line Scan Imaging
Capture and Storage
High-Speed Inspection for Product Defects
Labels and Marking
Web Inspection
High-Speed Troubleshooting
Line Scan Technology
Contact Image Sensors
Lenses
Image Processing
Line Scan Inspection
Tracking and Traceability
Serialization
Direct Part Marking
Product Conformity
Systems Integration Challenges
2.4 Industrial Vision Measurement
Character Recognition, Code Reading, and Verification
Making Measurements
Pattern Matching
3D Pattern Matching
Preparing for Measurement
Industrial Control
Development Approaches and Environments
Development Software Tools for Industrial Vision Systems
Image Processing and Analysis Tools
Summary
References
Learning Outcomes
Further Reading
Chapter 3 Medical Vision
3.1 Introduction to Medical Vision
Advantages of Digital Processing for Medical Applications
Digital Image Processing Requirements for Medical Applications
Advanced Digital Image Processing Techniques in Medical Vision
Image Processing Systems for Medical Applications
Stereoscopic Endoscope
3.2 From Images to Information in Medical Vision
Magnifying Minute Variations
Gesture and Security Enhancements
3.3 Mathematics, Algorithms in Medical Imaging
Artificial Intelligence (AI)
Computer-Aided Diagnostic Processing
Vision Algorithms for Biomedical
Real-Time Radiography
Image Compression Technique for Telemedicine
Region of Interest
Structure Sensitive Adaptive Contrast Enhancement Methods
LSPIHT Algorithm for ECG Data Compression and Transmission
Retrieval of Medical Images in a PACs
Digital Signature Realization Process of DICOM Medical Images
Computer Neural Networks (CNNs) in Medical Image Analysis
Deep Learning and Big Data
3.4 Machine Learning in Medical Image Analysis
Convolutional Neural Networks
Convolution Layer
Rectified Linear Unit (RELU) Layer
Pooling Layer
Fully Connected Layer
Feature Computation
Feature Selection
Training and Testing: The Learning Process
Example of Machine Learning with Use of Cross Validation
Summary
References
Learning Outcomes
Further Reading
Chapter 4 Video Analytics
4.1 Definition of Video Analytics
Applications of Video Analytics
Image Analysis Software
Security Center Integration
Video Analytics for Perimeter Detection
Video Analytics for People Counting
Traffic Monitoring
Auto Tracking Cameras for Facial Recognition
Left Object Detection
4.2 Video Analytics Algorithms
Algorithm Example: Lens Distortion Correction
Dense Optical Flow Algorithm
Camera Performance Affecting Video Analytics
Video Imaging Techniques
4.3 Machine Learning in Embedded Vision Applications
Types of Machine-Learning Algorithms
Implementing Embedded Vision and Machine Learning
Embedded Computers Make Inroads to Vision Applications
4.4 Examples for Machine Learning
1. Convolutional Neural Networks for Autonomous Cars
2. CNN Technology Enablers
3. Smart Fashion AI Architecture
4. Teaching Computers to Recognize Cats
Summary
References
Learning Outcomes
Further Reading
Chapter 5 Digital Image Processing
5.1 Image Processing Concepts for Vision Systems
Image
Signal
Systems
5.2 Image Manipulations
Image Sharpening and Restoration
Histograms
Transformation
Edge Detection
Vertical Direction
Horizontal Direction
Sobel Operator
Robinson Compass Mask
Kirsch Compass Mask
Laplacian Operator
Positive Laplacian Operator
Negative Laplacian Operator
5.3 Analyzing an Image
Color Spaces
JPEG Compression
Pattern Matching
Template Matching
Template Matching Approaches
Motion Tracking and Occlusion Handling
Template-Matching Techniques
Advanced Methods
Advantage
Enhancing The Accuracy of Template Matching
5.4 Image-Processing Steps for Vision System
Scanning and Image Digitalization
Image Preprocessing
Image Segmentation On Object
Description of Objects
Classification of Objects
Summary
References
Learning Outcomes
Further Reading
Chapter 6 Camera—Image Sensor
6.1 History of Photography
Image Formation on Cameras
Image Formation on Analog Cameras
Image Formation on Digital Cameras
Camera Types and Their Advantages: Analog Versus Digital Cameras
Interlaced Versus Progressive Scan Cameras
Area Scan Versus Line Scan Cameras
Time Delay and Integration (TDI) Versus Traditional Line Scan Cameras
Camera Mechanism
Perspective Transformation
Pixel
6.2 Camera Sensor for Embedded Vision Applications
Charge Coupled Device (CCD) Sensor Construction
Complementary Metal Oxide Semiconductor (CMOS) Sensor Construction
Sensor Features
Electronic Shutter
Sensor Taps
Spectral Properties of Monochrome and Color Cameras
Camera Resolution for Improved Imaging System Performance
6.3 Zooming, Camera Interface, and Selection
Optical Zoom
Digital Zoom
Spatial Resolution
Gray-Level Resolution
Capture Boards
Firewire Ieee 1394/IIDC DCAM Standard
Camera Link
GigE Vision Standard
USB—Universal Serial Bus
CoaXPress
Camera Software
Camera and Lens Selection for a Vision Project
6.4 Thermal-Imaging Camera
Summary
References
Learning Outcomes
Further Reading
Chapter 7 Embedded Vision Processors and Sensors
7.1 Vision Processors
Hardware Platforms for Embedded Vision, Image Processing, and Deep Learning
Requirements of Computers for Embedded Vision Application
Processor Configuration Selection
7.2 Embedded Vision Processors
Convolution Neural Networks (CNN) Engine
Cluster Shared Memory
Streaming Transfer Unit
Bus Interface
Complete Suite of Development Tools
Intel Movidius Myraid X Vision Processors
Matrox RadientPro CL
Single-Board Computer Raspberry Pi
Nvidia Jetson TX1
Nvidia Jetson Tk1
Beagle board: Beagle bone Black
Orange Pi
ODROID-C2
Banana Pi
CEVA–XM4 Imaging and Vision Processor
MAX10 FPGA
Vision DSPs for Imaging and Vision
Vision Q6 DSP Features and Benefits
Vision P6 DSP Features and Benefits
Vision P5 DSP Features and Benefits
VFPU
7.3 Sensors for Applications
Sensors for Industrial Applications
Sensors for Aviation and Aerospace
Sensors for the Automobile Industry
Agricultural Sensors
Smart Sensors
7.4 MEMS
NEMS
Biosensors
Medical Sensors
Nuclear Sensors
Sensors for Deep-Sea Applications
Sensors for Security Applications
Selection Criteria for Sensor
Summary
References
Learning Outcomes
Further Reading
Chapter 8 Computer Vision
8.1 Embedded Vision and Other Technologies
Robot Vision
Signal Processing
Image Processing
Pattern Recognition and Machine Learning
Machine Vision
Computer Graphics
Artificial Intelligence
Color Processing
Video Processing
Computer Vision Versus Machine Vision
Computer Vision Versus Image Processing
The Difference Between Computer Vision, Image Processing, and Machine Learning
8.2 Tasks and Algorithms in Computer Vision
Image Acquisition
Image Processing
Image Analysis and Understanding
Algorithms
Feature Extraction
Feature Extraction Algorithms
Image Classification
Object Detection
Object Tracking
Semantic Segmentation
Instance Segmentation
Object Recognition Algorithms
SIFT: Scale Invariant Feature Transforms Algorithm
SURF: Speed up Robust Features Algorithm
ORB: Oriented Fast and Rotated Brief Algorithm
Optical Flow and Point Tracking
Commercial Computer Vision Software Providers
8.3 Applications of Computer Vision
Packages and Frameworks for Computer Vision
8.4 Robotic Vision
Mars Path Finder
Cobots Versus Industrial Robots
Machine Learning in Robots
Sensors in Robotic Vision
Artificial Intelligence Robots
Robotic Vision Testing in the Automotive Industry
8.5 Robotic Testing in the Aviation Industry
Robotic Testing in the Electronics Industry
The Use of Drones and Robots in Agriculture
Underwater Robots
Autonomous Security Robots
Summary
References
Learning Outcomes
Further Reading
Chapter 9 Artificial Intelligence for Embedded Vision
9.1 Embedded Vision-based Artificial Intelligence
AI-Based Solution for Personalized Styling and Shopping
AI Learning Algorithms
Algorithm Implementation Options
AI Embedded in Cameras
9.2 Artificial Vision
AI for Industries
9.3 3D-Imaging Technologies: Stereo Vision, Structured Light, Laser Triangulation, and ToF
1. Stereo Vision
2. Structured Light
3. Laser Triangulation
4. Time-of-Flight Camera for 3D Imaging
Theory of Operation
Working of ToF
Comparison of 3D-Imaging Technologies
Structured-Light Versus ToF
Applications of ToF 3D-Imaging Technology
Gesture Applications
Non-Gesture Applications
Time of Flight Sensor Advantages
9.4 Safety and Security Considerations in Embedded Vision Applications
Architecture Case Study
Choosing Embedded Vision Software
Summary
References
Learning Outcomes
Further Readings
Chapter 10 Vision-Based Real-Time Examples
10.1 Algorithms for Embedded Vision
Three Classes
Local Operators
Global Transformations
10.2 Methods and Models in Vision Systems
1. Shapes and Shape Models
2. Active Shape Model (ASM)
3. Clustering Algorithms
4. Thinning Morphological Operation
5. Hough Transform (HT)
10.3 Real-Time Examples
1. Embedded-Vision-Based Measurement
2. Defect Detection on Hardwood Logs Using Laser Scanning
3. Reconstruction of Monocular Fiberscopic Images
4. Vision Technologies for Empty Bottle Inspection Systems
5. Unmanned Rotorcraft for Ground Target Following Using Embedded Vision
6. Automatic Axle-Lifting System Design
7. Object Tracking Using an Address Event Vision Sensor
8. Using FPGA as an SoC Processor in ADAS Design
9. Diagnostic Imaging
10. Electronic Pill
10.4 Research and Development in Vision Systems
Robotic Vision
Stereo Vision
Vision Measurement
Industrial Vision
Automobile Industry
Medical Vision
Embedded Vision System
Summary
References
Learning Outcomes
Further Readings
Appendix
Embedded Vision Glossary
Index
Embedded Vision (EV) is an emerging electronics industry technology. It provides visual intelligence to automated embedded products. It combines embedded systems and computer vision and is the integration of a camera and a processing board. Embedded vision integrates computer vision in machines that use algorithms to decode meaning from observed images or video images. It has a wide range of potential applications to industrial, medical, automotive including driverless cars, drones, smart phones, aerospace, defense, agriculture, consumer, surveillance, robotics, and security. It will meet the requirements of algorithms in the computer vision field and the hardware and software requirements of the embedded systems field to give visual talent to end products.
This book is an essential guide for anyone who is interested in designing machines that can see, sense, and build vision-enabled embedded products. It covers a large number of topics encountered in the hardware architecture, software algorithms, applications, advancements in camera, processors, and sensors in the field of embedded vision. Embedded vision systems are built for special applications, whereas PC based systems are usually intended for general image processing.
Chapter 1 discusses introductory points, the design of an embedded vision system, characteristics of an embedded vision system board, processors and cameras for embedded vision, components in a typical vision system and embedded vision challenges. Application areas of embedded vision are analyzed. Development tools for embedded vision are also introduced in this chapter.
Chapter 2 discusses industrial vision. PC based vision systems, industrial cameras, high speed industrial cameras, and smart cameras are discussed in this chapter. The industrial vision applications are classified as dimensional quality inspection, surface quality inspection, structural quality, and operational quality. 3D imaging methods, 3D inspection, 3D processing, 3D robotic vision, capture and storage are analyzed under the heading of 3D industrial vision. 3D pattern matching, development approaches, development software tools, image processing analysis tools are also discussed.
Chapter 3 covers medical vision techniques. Image processing systems for medical applications, from images to information in medical vision, mathematics, algorithms in medical imaging, and machine learning in medical image analysis are discussed. Stereoscopic endoscope, CT, ultrasonic imaging system, MRI, X-ray, PACS, CIA, FIA, ophthalmology, indo cyanine green, automatic classification of cancerous cells, facial recognition to determine pain level, automatic detection of patient activity, peripheral vein imaging and the stereoscopic microscope, CAD processing, radiography, and telemedicine are covered. CNN, machine learning, deep learning and big data are also discussed.
Chapter 4 discusses video analytics. Definitions, applications, and algorithms of video analytics and video imaging are covered. Different types of machine learning algorithms and examples of ML such as CNN for an autonomous car, smart fashion AI architecture, and teaching computer to recognize animals are discussed.
Chapter 5 discusses digital image processing. The image processing concept, image manipulations, image analyzing and image processing steps for an embedded vision system are covered. Image sharpening, histograms, image transformation, image enhancement, convolution, blurring, edge detection are a few image manipulations discussed in this chapter. Frequency domain, transformation, filters, color spaces, jpeg compression, pattern matching, and template matching used to analyze images are discussed.
Chapter 6 discusses the history of photography, camera sensors for embedded vision applications, zooming camera, camera interface and camera selection for vision projects are discussed in this chapter.
Chapter 7 covers embedded vision processors and sensors. This chapter deals with the vision processor selection, embedded vision processor boards, different sensors based on the applications, and MEMS. The options for processor configuration, embedded vision processor boards and sensors suited for different applications are thoroughly discussed.
Chapter 8 discusses computer vision. This chapter compares various existing technologies with embedded vision. Tasks and algorithms in computer vision such as feature extraction, image classification, object detection, object tracking, semantic segmentation, instance segmentation, object recognition algorithms, optical flow, and point tracking are discussed. Commercial computer vision software providers are listed and the applications of computer vision and robotic vision are discussed.
Chapter 9 discusses the use of artificial intelligence in embedded vision. Embedded vision based artificial intelligence, artificial vision, 3D imaging technologies, safety & security considerations in EV applications are covered. AI based solutions for personalized styling and shopping, AI embedded cameras and algorithm implementations are discussed. Stereo vision, structured light, laser triangulation, and time of flight techniques for 3D images are compared.
Chapter 10 discusses vision based, real time examples. Algorithms for embedded vision, and methods and models in vision systems are covered. Recent research, applications, and developments in the field of embedded vision systems are analyzed.
Overview
Embedded vision is the integration of vision in machines that use algorithms to decode meaning from observing images or videos. Embedded vision systems use embedded boards, sensors, cameras, and algorithms to extract information. Application areas are many, and include automobiles, medical, industry, domestic, and security systems.
Learning Objectives
After reading this the reader will be able to
■ differentiate between embedded vision and computer vision,
■ define embedded vision system,
■ understand embedded vision system design requirements,
■ understand application areas of embedded vision, and
■ development tools for vision.
Embedded vision refers to the practical use of computer vision in machines that understand their environment through visual means. Computer vision is the use of digital processing and intelligent algorithms to interpret meaning from images or video. Due to the emergence of very powerful, low-cost, and energy efficient processors, it has become possible to incorporate practical computer vision capabilities into embedded systems, mobile devices, PCs, and the cloud. Embedded vision is the integration of computer vision in machines that use algorithms to decode meaning from observing pixel patterns in images or video. The computer vision field is developing rapidly, along with advances in silicon and, more recently, purpose designed embedded vision processors.
Embedded vision is the extraction of meaning from visual inputs, creating “machines that see and understand.” Embedded vision is now spreading into a very wide range of applications, including automotive driver assistance, digital signage, entertainment, healthcare, and education. Embedded vision is developing across numerous fields including autonomous medical care, agriculture technology, search and rescue, and repair in conditions dangerous to humans. Applications include autonomous machines of many types such as embedded systems, driverless cars, drones, smart phones, and rescue and bomb disarming robots. The term embedded vision refers to the use of computer vision technology in embedded systems. Stated another way, embedded vision refers to embedded systems that extract meaning from visual inputs. Similar to the way that wireless communication has become pervasive over the past 10 years, embedded vision technology will be very widely deployed in the next 10 years.
Computer (or machine) vision is the field of research that studies the acquisition, processing, analysis, and understanding of real-world visual information. It is a discipline that was established in the 1960s, but has made recent rapid advances due to improvements both in algorithms and in available computing technology. Embedded systems are computer systems with dedicated functions that are embedded within other devices, and are typically constrained by cost and power consumption. Some examples of devices using embedded systems include mobile phones, set top boxes, automobiles, and home appliances. Embedded vision is an innovative technology in which computer vision algorithms are incorporated into embedded devices to create practical and widely deployable applications using visual data. This field is rapidly expanding into emerging high-volume consumer applications such as home surveillance, games, automotive safety, smart glasses, and augmented reality.
With the emergence of increasingly capable processors, it’s becoming practical to incorporate computer vision capabilities into a wide range of embedded systems, enabling them to analyze their environments via video inputs. Products like game controller and driver assistance systems are raising awareness of the incredible potential of embedded vision technology. As a result, many embedded system designers are beginning to think about implementing embedded vision capabilities. It’s clear that embedded vision technology can bring huge value to a vast range of applications. Two examples are eye’s vision-based driver assistance systems, intended to help prevent motor vehicle accidents, and swimming pool safety systems, which help prevent swimmers from drowning.
The term embedded vision implies a hybrid of two technologies, embedded systems and computer vision. An embedded system is a microprocessor-based system that isn’t a general-purpose computer, whereas computer vision refers to the use of digital processing and intelligent algorithms to interpretmeaning from images or video. Most commonly defined, an embedded vision system is any microprocessor-based system with image sensor functionality that isn’t a standard personal computer. Tablets and smart phones fall into this category, as well as more unusual devices such as advanced medical diagnosis instruments and robots with object recognition capabilities. So, to put it simply, embedded vision refers to machines that understand their environment through visual means.
Embedded vision processors are now developed by electronic companies to make computer vision lower in cost, lower in power and ready for smaller, more mobile. Embedded devices and coprocessing chips can be connected to neural networks or neural net processors to add efficient computer vision to machine learning. Two main trends of embedded vision are: Miniaturization of PCs and of cameras, and the possibility for vision systems to be produced affordably and for highly specific applications. Systems of this kind are referred to as embedded vision systems.
Visual inputs are the richest source of sensor information. For more than 50 years, scientists have tried to understand imaging and developed algorithms allowing computers to see with computer vision applications. The first real commercial applications, referred to as machine vision, analyzed fast moving objects to inspect and detect errors in products. Due to improving process power, lower power consumption, better image sensors, and better computer algorithms, vision elevates to a much higher level. Combining embedded systems with computer vision results in embedded vision systems. Embedded vision blocks are shown in Figures 1.1a and 1.1b.
Initially, embedded vision technology was found in complex, expensive systems, for example a surgical robot for hair transplantation or quality control inspection systems for manufacturing. Like wireless communication, embedded vision requires lots of processing power, particularly as applications increasingly adopt high-resolution cameras and make use of multiple cameras. Providing that processing power at a cost low enough to enable mass adoption is a big challenge. This challenge is multiplied by the fact that embedded vision applications require a high degree of programmability. In wireless applications algorithms don’t vary dramatically from one cell phone handset to another, but in embedded vision applications there are great opportunities to get better results and enable valuable features through unique algorithms.
FIGURE 1.1A. Embedded vision system blocks.
FIGURE 1.1B. Embedded vision block diagram.
With embedded vision, the industry is entering a “virtuous circle” of the sort that has characterized many other digital signal processing application domains. Although there are few chips dedicated to embedded vision applications today, these applications are increasingly adopting high performance, cost effective processing chips developed for other applications, including DSPs, CPUs, FPGAs, and GPUs. As these chips continue to deliver more programmable performance per watt, they will enable the creation of more high volume embedded vision products. Those high-volume applications, in turn, will attract more attention from silicon providers, who will deliver even better performance, efficiency, and programmability.
An embedded vision system consists, for example, of a camera, a so called board level camera, which is connected to a processing board as shown in Figure 1.2. Processing boards take over the tasks of the PC from the classic machine vision setup. As processing boards are much cheaper than classic industrial PCs, vision systems can become smaller and also more cost effective. The interfaces for embedded vision systems are primarily USB or LVDS (Low voltage differential signaling connector).
As like embedded systems, there are popular single board computers (SBC), such as the Raspberry Pi are available on the market for embedded vision product development. Figure 1.3 shows the Raspberry Pi is a mini computer with established interfaces and offers a similar range of features as a classic PC or laptop. Embedded vision solutions can also be implemented with so-called system on modules (SoM) or computer on modules (CoM). These modules represent a computing unit. For the adaptation of the desired interfaces to the respective application, a so called individual carrier board is needed. This is connected to the SoM via specific connectors and can be designed and manufactured relatively simply. The SoMs or CoMs (or the entire system) are cost effective on the one hand since they are available off-the-shelf, while on the other hand they can also be individually customized through the carrier board. For large manufactured quantities, individual processing boards are a good idea.
FIGURE 1.2. Design of embedded vision system.
FIGURE 1.3. Embedded System Boards
All modules, single board computers, and SoMs, are based on a system on chip (SoC). This is a component on which the processor(s), controllers, memory modules, power management, and other components are integrated on a single chip. Due to these efficient components, the SoCs, embedded vision systems have only recently become available in such a small size and at a low cost.
Embedded vision is the technology of choice for many applications. Accordingly, the design requirements are widely diversified. The two interface technologies offered for embedded vision systems in the portfolio are USB3 Vision for easy integration and LVDS for a lean system design. USB 3.0 is the right interface for a simple plug and play camera connection and ideal for camera connections to single board computers. It allows the stable data transfer with a bandwidth of up to 350 MB/s. LVDS-based interface allows a direct camera connection with processing boards and thus also to on board logic modules such as FPGAs (field programmable gate arrays) or comparable components. This allows a lean system design to be achieved and can benefit from a direct board-to-board connection and data transfer. The interface is therefore ideal for connecting to a SoM on a carrier / adapter board or with an individually developed processor unit. It allows stable, reliable data transfer with a bandwidth of up to 252 MB/s.
Most of the previously mentioned single board computers and SoMs do not include the x86 family processors common in standard PCs. Rather, the CPUs are often based on the ARM architecture. The open source Linux operating system is widely used as an operating system in the world of ARM processors. For Linux, there are a large number of open source application programs, as well as numerous freely available program libraries. Increasingly, however, x86-based single board computers are also spreading. A consistently important criterion for the computer is the space available for the embedded system.
For the software developer, the program development for an embedded system is different than for a standard PC. As a rule, the target system does not provide a suitable user interface which can also be used for programming. The software developer must connect to the embedded system via an appropriate interface if available (e.g., network interface) or develop the software on the standard PC and then transfer it to the target system. When developing the software, it should be noted that the hardware concept of the embedded system is oriented to a specific application and thus differs significantly from the universally usable PC. However, the boundary between embedded and desktop computer systems is sometimes difficult to define. Just think of the mobile phone, which on the one hand has many features of an embedded system (ARM-based, single-board construction), but on the other hand can cope with very different tasks and is therefore a universal computer.
A single board computer is often a good choice. Single board is a standard product. It is a small compact computer that is easy to use. This is also useful for developers who have had little to do with embedded vision. However, the single board computer is a system that contains unused components, and thus generally does not allow the leanest system configuration. Hence, this is suitable for small to medium quantities. The leanest setup is obtained through a customized system. Here, however, higher integration effort is a factor. This customized system is therefore suitable for large unit numbers. The benefits of embedded vision system boards at a glance are:
■ Lean system design
■ Light weight
■ Cost-effective, because there is no unnecessary hardware
■ Lower manufacturing costs
■ Lower energy consumption
■ Small footprint
This technology category includes any device that executes vision algorithms or vision system control software. The applications represent distinctly different types of processor architectures for embedded vision, and each has advantages and trade-offs that depend on the workload. For this reason, many devices combine multiple processor types into a heterogeneous computing environment, often integrated into a single semiconductor component. In addition, a processor can be accelerated by dedicated hardware that improves performance on computer vision algorithms.
Vision algorithms typically require high compute performance. And, of course, embedded systems of all kinds are usually required to fit into tight cost and power consumption envelopes. In other digital signal processing application domains, such as digital wireless communications, chip designers achieve this challenging combination of high performance, low cost, and low power by using specialized coprocessors and accelerators to implement the most demanding processing tasks in the application. These coprocessors and accelerators are typically not programmable by the chip user, however. This trade-off is often acceptable in wireless applications, where standards mean that there is strong commonality among algorithms used by different equipment designers.
In vision applications, however, there are no standards constraining the choice of algorithms. On the contrary, there are often many approaches to choose from to solve a particular vision problem. Therefore, vision algorithms are very diverse, and tend to change fairly rapidly over time. As a result, the use of nonprogrammable accelerators and coprocessors is less attractive for vision applications compared to applications like digital wireless and compression centric consumer video equipment. Achieving the combination of high performance, low cost, low power, and programmability is challenging. Special purpose hardware typically achieves high performance at low cost, but with little programmability. General purpose CPUs provide programmability, but with weak performance, poor cost, or energy efficiency.
Demanding embedded vision applications most often use a combination of processing elements, which might include, for example:
■ A general purpose CPU for heuristics, complex decision making, network access, user interface, storage management, and overall control
■ A high-performance DSP-oriented processor for real time, moderate rate processing with moderately complex algorithms
■ One or more highly parallel engines for pixel rate processing with simple algorithms
While any processor can in theory be used for embedded vision, the most promising types today are:
■ High-performance embedded CPU
■ Application specific standard product (ASSP) in combination with a CPU
■ Graphics processing unit (GPU) with a CPU
■ DSP processor with accelerator(s) and a CPU
■ Field programmable gate array (FPGA) with a CPU
■ Mobile “application processor”
In many cases, embedded CPUs cannot provide enough performance or cannot do so at an acceptable price or power consumption levels to implement demanding vision algorithms. Often, memory bandwidth is a key performance bottleneck, since vision algorithms typically use large amounts of memory bandwidth, and don’t tend to repeatedly access the same data. The memory systems of embedded CPUs are not designed for these kinds of data flows. However, like most types of processors, embedded CPUs become more powerful over time, and in some cases can provide adequate performance. There are some compelling reasons to run vision algorithms on a CPU when possible. First, most embedded systems need a CPU for a variety of functions. If the required vision functionality can be implemented using that CPU, then the complexity of the system is reduced relative to a multiprocessor solution.
In addition, most vision algorithms are initially developed on PCs using general purpose CPUs and their associated software development tools. Similarities between PC CPUs and embedded CPUs (and their associated tools) mean that it is typically easier to create embedded implementations of vision algorithms on embedded CPUs compared to other kinds of embedded vision processors. In addition, embedded CPUs typically are the easiest to use compared to other kinds of embedded vision processors, due to their relatively straightforward architectures, sophisticated tools, and other application development infrastructure, such as operating systems. An example of an embedded CPU is the Intel Atom E660T.
Application specific standard products (ASSPs) are specialized, highly integrated chips tailored for specific applications or application sets. ASSPs may incorporate a CPU, or use a separate CPU chip. By virtue of specialization, ASSPs typically deliver superior cost and energy efficiency compared with other types of processing solutions. Among other techniques, ASSPs deliver this efficiency through the use of specialized coprocessors and accelerators. ASSPs are by definition focused on a specific application, they are usually provided with extensive application software.
The specialization that enables ASSPs to achieve strong efficiency, however, also leads to their key limitation lack of flexibility. An ASSP designed for one application is typically not suitable for another application, even one that is related to the target application. ASSPs use unique architectures, and this can make programming them more difficult than with other kinds of processors. Indeed, some ASSPs are not user programmable.
Another consideration is risk. ASSPs often are delivered by small suppliers, and this may increase the risk that there will be difficulty in supplying the chip, or in delivering successor products that enable system designers to upgrade their designs without having to start from scratch. An example of a vision-oriented ASSP is the PrimeSense PS1080-A2, used in the Microsoft Kinect.
While computer vision algorithms can run on most general purpose CPUs, desktop processors may not meet the design constraints of some systems. However, x86 processors and system boards can leverage the PC infrastructure for low-cost hardware and broadly supported software development tools. Several Alliance Member companies also offer devices that integrate a RISC CPU core. A general purpose CPU is best suited for heuristics, complex decision making, network access, user interface, storage management, and overall control. A general purpose CPU may be paired with a vision specialized device for better performance on pixel level processing.
High-performance GPUs deliver massive amounts of parallel computing potential, and graphics processors can be used to accelerate the portions of the computer vision pipeline that perform parallel processing on pixel data. While General Purpose GPUs (GPGPUs) have primarily been used for high-performance computing (HPC), even mobile graphics processors and integrated graphics cores are gaining GPGPU capability meeting the power constraints for a wider range of vision applications. In designs that require 3D processing in addition to embedded vision, a GPU will already be part of the system and can be used to assist a general purpose CPU with many computer vision algorithms. Many examples exist of x86-based embedded systems with discrete GPGPUs.
Graphics processing units (GPUs), intended mainly for 3D graphics, are increasingly capable of being used for other functions, including vision applications. The GPUs used in personal computers today are explicitly intended to be programmable to perform functions other than 3D graphics. Such GPUs are termed “general purpose GPUs” or “GPGPUs.” GPUs have massive parallel processing horsepower. They are ubiquitous in personal computers. GPU software development tools are readily and freely available, and getting started with GPGPU programming is not terribly complex. For these reasons, GPUs are often the parallel processing engines of first resort of computer vision algorithm developers who develop their algorithms on PCs, and then may need to accelerate execution of their algorithms for simulation or prototyping purposes.
GPUs are tightly integrated with general purpose CPUs, sometimes on the same chip. However, one of the limitations of GPU chips is the limited variety of CPUs with which they are currently integrated. The limited number of CPU operating systems support the integration. Today there are low-cost, low-power GPUs, designed for products like smart phones and tablets. However, these GPUs are generally not GPGPUs, and therefore using them for applications other than 3D graphics is very challenging. An example of a GPGPU used in personal computers is the NVIDIA GT240.
DSPs are very efficient for processing streaming data, since the bus and memory architecture are optimized to process high-speed data as it traverses the system. This architecture makes DSPs an excellent solution for processing image pixel data as it streams from a sensor source. Many DSPs for vision have been enhanced with coprocessors that are optimized for processing video inputs and accelerating computer vision algorithms. The specialized nature of DSPs makes these devices inefficient for processing general purpose software workloads, so DSPs are usually paired with a RISC processor to create a heterogeneous computing environment that offers the best of both worlds.
Digital signal processors (“DSP processors” or “DSPs”) are microprocessors specialized for signal processing algorithms and applications. This specialization typically makes DSPs more efficient than general purpose CPUs for the kinds of signal processing tasks that are at the heart of vision applications. In addition, DSPs are relatively mature and easy to use compared to other kinds of parallel processors. Unfortunately, while DSPs do deliver higher performance and efficiency than general purpose CPUs on vision algorithms, they often fail to deliver sufficient performance for demanding algorithms. For this reason, DSPs are often supplemented with one or more coprocessors. A typical DSP chip for vision applications therefore comprises a CPU, a DSP, and multiple coprocessors. This heterogeneous combination can yield excellent performance and efficiency, but can also be difficult to program. Indeed, DSP vendors typically do not enable users to program the coprocessors; rather, the coprocessors run software function libraries developed by the chip supplier. An example of a DSP targeting video applications is the Texas Instruments DM8168.
Instead of incurring the high cost and long lead times for a custom ASIC to accelerate computer vision systems, designers can implement an FPGA to offer a reprogrammable solution for hardware acceleration. With millions of programmable gates, hundreds of I/O pins, and compute performance in the trillions of multiply accumulates/sec (tera-MACs), high-end FPGAs offer the potential for highest performance in a vision system. Unlike a CPU, which has to use time slice or multi-thread tasks as they compete for compute resources, an FPGA has the advantage of being able to simultaneously accelerate multiple portions of a computer vision pipeline. Since the parallel nature of FPGAs offers so much advantage for accelerating computer vision, many of the algorithms are available as optimized libraries from semiconductor vendors. These computer vision libraries also include preconfigured interface blocks for connecting to other vision devices, such as IP cameras.
Field programmable gate arrays (FPGAs) are flexible logic chips that can be reconfigured at the gate and block levels. This flexibility enables the user to craft computation structures that are tailored to the application at hand. It also allows selection of I/O interfaces and on-chip peripherals matched to the application requirements. The ability to customize compute structures, coupled with the massive amount of resources available in modern FPGAs, yields high performance coupled with good cost and energy efficiency. However, using FPGAs is essentially a hardware design function, rather than a software development activity. FPGA design is typically performed using hardware description languages (Verilog or VHLD) at the register transfer level (RTL) a very low-level of abstraction. This makes FPGA design time consuming and expensive, compared to using the other types of processors discussed here.
However using FPGAs is getting easier, due to several factors. First, so called “IP block” libraries—libraries of reusable FPGA design components are becoming increasingly capable. In some cases, these libraries directly address vision algorithms. In other cases, they enable supporting functionality, such as video I/O ports or line buffers. Second, FPGA suppliers and their partners increasingly offer reference designs reusable system designs incorporating FPGAs and targeting specific applications. Third, high-level synthesis tools, which enable designers to implement vision and other algorithms in FPGAs using high-level languages, are increasingly effective. Relatively low-performance CPUs can be implemented by users in the FPGA. In a few cases, high-performance CPUs are integrated into FPGAs by the manufacturer. An example FPGA that can be used for vision applications is the Xilinx Spartan-6 LX150T.
A mobile “application processor” is a highly integrated system-on-chip, typically designed primarily for smart phones but used for other applications. Application processors typically comprise a high-performance CPU core and a constellation of specialized coprocessors, which may include a DSP, a GPU, a video processing unit (VPU), a 2D graphics processor, an image acquisition processor, and so on. These chips are specifically designed for battery-powered applications, and therefore place a premium on energy efficiency. In addition, because of the growing importance of and activity surrounding smart phone and tablet applications, mobile application processors often have strong software development infrastructure, including low-cost development boards, Linux and Android ports, and so on. However, as with the DSP processors discussed in the previous section, the specialized coprocessors found in application processors are usually not user programmable, which limits their utility for vision applications. An example of a mobile application processor is the Freescale i.MX53.
While analog cameras are still used in many vision systems, this section focuses on digital image sensors usually either a CCD or CMOS sensor array that operates with visible light. However, this definition shouldn’t constrain the technology analysis, since many vision systems can also sense other types of energy (IR, sonar, etc.).
The camera housing has become the entire chassis for a vision system, leading to the emergence of “smart cameras” with all of the electronics integrated. By most definitions, a smart camera supports computer vision, since the camera is capable of extracting application specific information. However, as both wired and wireless networks get faster and cheaper, there still may be reasons to transmit pixel data to a central location for storage or extra processing.
A classic example is cloud computing using the camera on a smart phone. The smart phone could be considered a “smart camera” as well, but sending data to a cloud-based computer may reduce the processing performance required on the mobile device, lowering cost, power, weight, and so on. For a dedicated smart camera, some vendors have created chips that integrate all of the required features.
Until recent times, many people would imagine a camera for computer vision as the outdoor security camera shown in Figure 1.4. There are countless vendors supplying these products, and many more supplying indoor cameras for industrial applications. There are simple USB cameras for PCs available and billions of cameras embedded in the mobile phones of the world. The speed and quality of these cameras has risen dramatically supporting 10+ mega pixel sensors with sophisticated image-processing hardware.
FIGURE 1.4. Outdoor fixed security camera.
Another important factor for cameras is the rapid adoption of 3D- imaging using stereo optics, time-of-flight and structured light technologies. Trendsetting cell phones now even offer this technology, as do the most recent generation of game consoles. Look again at the picture of the outdoor camera and consider how much change is about to happen to computer vision markets as new camera technologies become pervasive.
Charge coupled device (CCD) image sensors have some advantages over CMOS image sensors, mainly because the electronic shutter of CCDs traditionally offers better image quality with higher dynamic range and resolution. However, CMOS sensors now account for more 90% of the market, heavily influenced by camera phones and driven by the technology’s lower cost, better integration, and speed.
Embedded vision applications involve more than just programmable devices and image sensors; they also require other components for creating a complete system. Most applications require data communications of pixels and/or metadata, and many designs interface directly to the user. Some computer vision systems also connect to mechanical devices, such as robots or industrial control systems.
The list of devices in this “other” category includes a wide range of standard products. In addition, some system designers may incorporate programmable logic devices or ASICs. In many vision systems, power, space, and cost constraints require high levels of integration with the programmable device often into a system-on-a-chip (SoC) device. Sensors to sense external parameters or environmental measurements are discussed in the separate chapter headings.
Processors can integrate megabytes’ worth of SRAM and DRAM, so many designs will not require off-chip memory. However, computer vision algorithms for embedded vision often require multiple frames of sensor data to track objects. Off-chip memory devices can store gigabytes of memory, although accessing external memory can add hundreds of cycles of latency. The systems with a 3D-graphics subsystem will usually already include substantial amounts of external memory to store the frame buffer, textures, Z buffer, and so on. Sometimes this graphics memory is stored in a dedicated, fast memory bank that uses specialized DRAMs.
Some vision implementations store video data locally, in order to reduce the amount of information that needs to be sent to a centralized system. For a solid state, nonvolatile memory storage system, the storage density is driven by the size of flash memory chips. Latest generation NAND chip fabrication technologies allow extremely large, fast and low-power storage in a vision system.
Mainstream computer networking and bus technology has finally started to catch up to the needs of computer vision to support simultaneous digital video streams. With economies of scale, more vision systems will use standard buses like PCI and PCI Express. For networking, Gigabit Ethernet (GbE) and 10GbE interfaces offer sufficient bandwidth even for multiple high-definition video streams. However, the trade association for Machine Vision (AIA) continues to promote Camera Link, and many camera and frame grabber manufacturers use this interface.
Although applications of embedded vision technologies vary, a typical computer vision system uses more or less the same sequence of distinct steps to process and analyze the image data. These are referred to as a vision pipeline, which typically contains the steps shown in Figure 1.5.
FIGURE 1.5. Vision pipeline.
At the start of the pipeline, it is common to see algorithms with simple data-level parallelism and regular computations. However, in the middle region, the data-level parallelism and the data structures themselves are both more complex, and the computation is less regular and more control-oriented. At the end of the pipeline, the algorithms are more general purpose in nature. Here are the pipelines for two specific application examples: Figure 1.6 shows a vision pipeline for a video surveillance application.
Figure 1.7 shows a vision pipeline for a pedestrian detection application. Note that both pipelines construct an image pyramid and have an object detection function in the center.
FIGURE 1.6. Vision pipeline for video surveillance application.
FIGURE 1.7. Vision pipeline for pedestrian detection application
Vision algorithms typically require high computing performance. And unlike many other applications, where standards mean that there is strong commonality among algorithms used by different equipment designers, no such standards that constrain algorithm choice exist in vision applications. On the contrary, there are often many approaches to choose from to solve a particular vision problem. Therefore, vision algorithms are very diverse, and tend to change fairly rapidly over time. And, of course, industrial automation systems are usually required to fit into tight cost and power consumption envelopes.
The rapidly expanding use of vision technology in industrial automation is part of a much larger trend. From consumer electronics to automotive safety systems, today we see vision technology (Figure 1.8) enabling a wide range of products that are more intelligent and responsive than before, and thus more valuable to users. We use the term “embedded vision” to refer to this growing practical use of vision technology in embedded systems, mobile devices, special purpose PCs, and the cloud, with industrial automation being one showcase application.
FIGURE 1.8. Vision technology.
Although the rapid progress of technology has made available very powerful microprocessor architectures, implementing a computer vision algorithm on embedded hardware/software platforms remains a very challenging task. Some specific challenges encountered by embedded vision systems include:
i. Power consumption Vision applications for mobile platforms are constrained by battery capacity, leading to power requirements of less than one Watt. Using more power means more battery weight, a problem for mobile and airborne systems (i.e., drones). More power also means higher heat dissipation, leading to more expensive packaging, complex cooling systems, and faster aging of components.
ii. Computational requirements Computer vision applications have extremely high computational requirements. Constructing a typical image pyramid for a VGA frame (640x480) requires 10–15 million instructions per frame. Multiply this by 30 frames per second and this will require a processor capable of doing 300-450 MIPS just to handle this preliminary processing step, let alone the more advanced recognition tasks required later in the pipeline. State-of-the-art, low-cost camera technology today can provide 1080p or 4K video, at up to 120 frames per second. A vision system using such a camera requires compute power ranging from a few Giga Operations per Second (GOPS ) to several hundred GOPS.
iii. Memory usage The various vision processing tasks require large buffers to store processed image data in various formats, and high bandwidth to move this data from memory and between computational units. The on-chip memory size and interconnect has a significant impact on the cost and performance of a vision application on an embedded platform.
iv. Fixed-point algorithm development Most of the published computer vision algorithms are developed for the computational model of Intel-based workstations where, since the advent of the Intel Pentium in 1993, the cost of double-precision operations is roughly identical to integer or single-precision operations. However, 64-80 bit hardware floating point units massively increase silicon area and power consumption, and software emulation libraries for floating point run slowly. For this reason, algorithms typically need to be refined to use the more efficient fixed-point data arithmetic based on integer types and operands combined with data shifts to align the radix point.
Besides the previously discussed challenges, an embedded vision developer should keep the dynamic nature of the market. The market is changing in an ongoing basis, including applications and use cases, the underlying vision algorithms, the programming models, and the supporting hardware architectures.
Currently, there is a need for standardized vision kernels, algorithm libraries, and programming models. At this time, there are no fully established standards for vision software with efficient hardware implementations. There are a number of likely candidates. OpenCV is a good starting point for reference algorithms and their test benches. Khronos is an emerging standard focused on embedded systems. OpenCL is a software framework to tie together massively parallel heterogeneous computation units.
The emergence of practical embedded vision technology creates vast opportunities for innovation in electronic systems and associated software. In many cases, existing products can be transformed through the addition of vision capabilities. One example of this is the addition of vision capabilities to surveillance cameras, allowing the camera to monitor a scene for certain kinds of events, and alert an operator when such an event occurs. In other cases, practical embedded vision enables the creation of new types of products, such as surgical robots and swimming pool safety systems that monitor swimmers in the water. Some specific applications that use embedded vision include object detection, video surveillance, gesture recognition, Simultaneous Localization and Mapping (SLAM), and Advanced Driver Assistance Systems (ADAS). Let’s take a closer look at each one.
While there are bigger markets for vision products, swimming pool safety (as shown in Figure 1.9) is one of those applications that truly shows the positive impact that technological progress can have for society. Every parent will instantly appreciate the extra layer of safety provided by machines that see and understand if a swimmer is in distress. When tragedies can happen in minutes, a vision system shows the true potential of this technology never becoming distracted or complacent in performing the duties of a digital lifeguard.
FIGURE 1.9. Pool graphic system
Object detection is at the heart of virtually all computer vision systems. Of all the visual tasks we might ask a computer to perform, the task of analyzing a scene and recognizing all of the constituent objects remains the most challenging. Furthermore, detected objects can be used as inputs for object recognition tasks, such as instance or class recognition, which can find a specific face, a car model, a unique pedestrian, and so on. Applications include face detection and recognition, pedestrian detection, and factory automation. Even vision applications that are not specifically performing object detection often have some sort of detection step in the processing flow. For example, a movement tracking algorithm uses “corner detection” to identify easily recognizable points in an image and then looks for them in subsequent frames.
Growing numbers of IP cameras and the need for surveillance cameras with better video quality is driving the global demand for video surveillance systems. Sending HD resolution images of millions of IP cameras to the cloud will require prohibitive network bandwidth. So intelligence in the camera is needed to filter the video to only transmit the appropriate cases (e.g., only the frames with pedestrian detected, with background subtracted) for further analysis.
Candidates for control by vision-based gesture recognition include automotive infotainment and industrial applications, where touch screens are either dangerous or impractical. For consumer electronics, gaming, and virtual reality, gesture recognition can provide a more direct interface for machine interaction.
SLAM is the capability of mobile systems to create a map of an area they are navigating. This has applications in areas like self-driving cars, robot vacuum cleaners, augmented reality games, virtual reality applications, and planetary rovers.
ADAS systems are used to enhance and automate vehicle systems to improve safety and driving by, for example, detecting lanes, other cars, road signs, pedestrians, cyclists, or animals in the path of a car. Another example of an emerging high-volume embedded vision application is automotive safety systems based on vision which is shown in Figure 1.10. A few automakers, such as Volvo, have begun to install vision-based safety systems in certain models. These systems perform a variety of functions, including warning the driver (and in some cases applying the brakes) when a forward collision is threatening, or when a pedestrian is in danger of being struck.
Another example of an emerging high-volume embedded vision application is “smart” surveillance cameras, which are cameras with the ability to detect certain kinds of activity. For example, the Archerfish Solo, a consumer-oriented smart surveillance camera, can be programmed to detect people, vehicles, or other motion in user selected regions of the camera’s field of view.
FIGURE 1.10. Mobileye driver assistance system.
In more recent decades, embedded computer vision systems have been deployed in applications such as target-tracking for missiles, and automated inspection for manufacturing plants. Now, as lower cost, lower power, and higher performance processors emerge, embedded vision is beginning to appear in high-volume applications. Perhaps the most visible of these is the Microsoft Kinect, a peripheral for the Xbox 360 game console that uses embedded vision to enable users to control video games simply by gesturing and moving their bodies. The success of Microsoft Kinect for the Xbox 360 game console as in Figure 1.11, subsequently expanded to support PCs, as well as the vision support in successor generation consoles from both Microsoft and Sony, demonstrates that people want to control their machines using natural language and gestures. Practical computer vision technology has finally evolved to make this possible in a range of products that extend well beyond gaming. For example, the Microsoft Kinect with 3D motion capture, facial recognition, and voice recognition is one of the fastest selling consumer electronics devices. Such highly visible applications are creating consumer expectations for systems with visual intelligence and increasingly powerful, low-cost, energy-efficient processors and sensors are making the widespread use of embedded vision practical.
FIGURE 1.11. Microsoft Kinect for Xbox 360, a gesture-based game controller.
An innovated technology tracks the facial responses of Internet users while they view content online. This would allow many companies to monitor Internet user’s real-time reactions to their advertisements.
A smart phone application detects signs of skin cancer in moles on the human body. The application allows a person to take a photograph of a mole with the smart phone and receive an instant analysis of its status. Using a complex algorithm the application will tell the person, whether or not the mole is suspicious and advise on whether the person should seek treatment. The application also allows the person to find an appropriate dermatologist in the immediate vicinity. Other revolutionary medical applications utilizing embedded vision include an iPhone app that reads heart rate and a device to assist the blind by using a camera that interprets real objects and communicates them to the user as auditory indication.
In the automotive industry, a new system incorporates gesture and face recognition to reduce distractions while driving. The use of face recognition for security purposes has been well documented; interpreting nods, winks, and hand movements to execute specific functions within the car. For example, a winking of the eye will turn the car radio off and on and by tilting head left or right, the volume will go up or down! Since many road accidents are the result of drivers trying to multitask, this application could potentially save many lives.
Vision-processing-based products have established themselves in a number of industrial applications. The most prominent one being factory automation where the application is commonly referred to as machine vision. It identifies the primary factory automation sectors as:
■ Automotive—motor vehicle and related component manufacturing
■ Chemical and Pharmaceutical—chemical and pharmaceutical manufacturing plants and related industries
■ Packaging—packaging machinery, packaging manufacturers, and dedicated packaging companies not aligned to any one industry
■ Robotics—guidance of robots and robotic machines
■ Semiconductors and Electronics—semiconductor machinery makers, semiconductor device manufacturers, electronic equipment manufacturing, and assembly facilities
The primary embedded vision products used in factory automation applications are:
■ Smart Sensors—A single unit that is designed to perform a single machine vision task. Smart sensors require little or no configuring and have limited on board processing. Frequently a lens and lighting are also incorporated into the unit.
■ Smart Cameras—This is a single unit that incorporates a machine vision camera, a processor and I/O in a compact enclosure. Smart cameras are configurable and so can be used for a number of different applications. Most have the facility to change lenses and are also available with built in LED lighting.
■ Compact Vision System—This is a complete machine vision system, not based on a PC, consisting of one or more cameras and a processor module. Some products have an LCD screen incorporated as part of the processor module. This obviates the need to connect the devices to a monitor for set up. The principal feature that distinguishes compact vision systems (CVS) from smart cameras is their ability to take information from a number of cameras. This can be more cost effective where an application requires multiple images.
■ Machine Vision Cameras (MV Cameras)—These are devices that convert an optical image into an analogue or digital signal. This may be stored in random access memory, but not processed, within the device.
■ Frame Grabbers—This is a device (usually a PCB card) for interfacing the video output from a camera with a PC or other control device. Frame grabbers are sometimes called video capture boards or cards. They vary from being a simple interface to a more complex device that can handle many functions including triggering, exposure rates, shutter speeds, and complex signal processing.
● Machine Vision Lenses—This category includes all lenses used in a machine vision application, whether sold with a camera or as a spare or additional part.
● Machine Vision Software—This category includes all software that is sold as a product in its own right, and is designed specifically for machine vision applications. It is split into:
● Library Software—allows users to develop their own MV system architecture. There are many different types, some offering great flexibility. They are often called SDKs (Software Development Kits).
● System Software—which is designed for a particular application. Some are very comprehensive and require little or no set up.
Embedded vision and video analysis have the potential to become the primary treatment tool in hospitals and clinics, and can increase the efficiency and accuracy of radiologists and clinicians. The high quality and definition of the output from scanners and X-ray machines makes them ideal for automatic analysis, be it for tumor and anomaly detection, or for monitoring changes over a period of time in dental offices or for cancer screening. Other applications include motion analysis systems, which are being used for gait analysis for injury rehabilitation and physical therapy. Video analytics can also be used in hospitals to monitor the medical staff, ensuring that all rules and procedures are properly followed.
For example, video analytics can ensure that doctors “scrub in” properly before surgery, and that patients are visited at the proper intervals. Medical imaging devices including CT, MRI, mammography, and X-ray machines, embedded with computer vision technology and connected to medical images taken earlier in a patient’s life, will provide doctors with very powerful tools to help detect rapidly advancing diseases in a fraction of the time currently required. Computer-aided detection or computer-aided diagnosis (CAD) software is currently also being used in early stage deployments to assist doctors in the analysis of medical images by helping to highlight potential problem areas.