Pages

Tuesday, June 4, 2013

Image processing technique for Video extraction

ABSTRACT

"A picture is worth a thousand words", the message we are getting from an image. Visual information has been playing an important role in our everyday life.
The significant challenge in large multimedia databases is the provision of efficient means for semantics indexing and retrieval of visual information. The video has low resolution and the often has poor contrast with a changing background. Problems in segmenting text from video are similar to those faced detection and localization phases. The main motivation for extracting the content of information is the accessibility problem. A problem that is even more relevant for dynamic multimedia data, which also have to be searched and retrieved. While content extraction techniques are reasonably developed for text, video data still is essentially opaque. Its richness and complexity suggests that there is a long way to go in extracting video features, and the implementation of more suitable and effective processing procedures is an important goal to be achieved.
This report describes brief introduction of Video and Image processing, common Image Processing Techniques, Basics of Video Processing, current researches about Video Indexing and Retrieval, Basic Requirements, Image processing techniques for Video content extraction and some applications like Videocel, COBRA Model.
.The video text extraction problem is divided into three main tasks- 1. Detection, 2. Localization, 3. Segmentation. The present development of multimedia technology and information highways has put content processing of visual media at the core of key application domains: digital and interactive  video, large distributed digital libraries, multimedia publishing.

1. Introduction

1.1 Basis of Video and Image Processing:

This chapter will introduce the basis of video and image processing. The image or video is stored only as a set of pixels with RGB values in computer. The computer knows nothing about the meaning of these pixel values. The content of an image is quite clear for a person. However, it is not so easy for a computer. For example, it is a piece of cake to recognize yourself in an image or video, even in a crowd. But this is extremely difficult for computer. The preprocessing is to help the computer to understand the content of image or video. What is the so-called content of image or video? Here content means features of image or video or their objects such as color, texture, resolution , and motion. Object can be viewed as a meaningful component in an image or video picture. For example, a moving car, a flying bird, a person are all objects. There are a lot of techniques for image and video processing. This chapter starts with an introduction to general image processing techniques and then talks about video processing techniques. The reason we want to introduce image processing first is that image processing techniques can be used on video if we treat each picture of a video as a still image.

2. Background

 A few years ago, the problems of representation and retrieval of visual media were confined to specialized image databases (geographical, medical, pilot experiments in computerized slide libraries), in the professional applications of the audiovisual industries (production, broadcasting and archives), and in computerized training or education. The present development of multimedia technology and information highways has put content processing of visual media at the core of key application domains: digital and interactive  video, large distributed digital libraries, multimedia publishing. Though the most important investments have been targeted at the information infrastructure (networks, servers, coding and compression, delivery models, multimedia systems architecture), a growing number of researchers have realized that content processing will be a key asset in putting together successful applications. The need for content processing techniques has been made evident from a variety of angles, ranging from achieving better quality in compression, allowing user choice of programs in video-on-demand, achieving better productivity in video production, providing access to large still image databases or integrating still images and video in multimedia publishing and cooperative work.

3. Common Image Processing techniques
3.1 Dithering:
Dithering is a process of using a pattern of solid dots to simulate shades of gray. Different shapes and patterns of dots have been employed in this process, but the effect is the same. When viewed from a great enough distance that the dots are not discernible, the pattern appears as a solid shade of gray.

3.2 Erosion
Erosion is the process of eliminating all the boundary points from an object, leaving the object smaller in area by one pixel all around its perimeter. If it narrows to less than three pixels thick at any point, it will become disconnected (into two objects) at that point. It is useful for removing from a segmented image objects that are too small to be of interest.
Shrinking is an special kind of erosion in that single-pixel objects are left intact. This is useful when the total object count must be preserved.
Thinning is another special kind of erosion. It is implemented in a two-step process. The first step will mark all candidate pixels for removal. The second step actually remove those cnadidates that can be removed without destroying object connectivity.

 3.3 Dilation:

Dilation is the process of incorporating into the object all the background pixels that touch it, leaving it larger in area by that amount. If two objects are separated by less than three pixels at any point, they will become connected (merged into one object) at that point. It is useful for filling small holes in segmented objects.
Thickening is a special kind of dilation. It is implemented in a two-step process. The first step marks all the candidate pixels for addition. The second step adds those cnadidates that can be added without merging objects.

3.4 Opening:

The process of erosion followed by dilation is called opening. It has the effect of eliminating small and thin objects, breaking objects at thin points, and generally smoothing the boundaries of larger objects without significantly changing their area.

3.5 Closing:

The process of dilation followed by erosion is called closing. It has the effect of filling small and thin holes in objects, connecting nearby objects, and generally smoothing the boundaries of objects without significantly changing their area.

3.6 Filtering:

Image filtering can be used for noise reduction, image sharpening, and image smoothing. By applying a low-pass or high-pass filter to the image, the image can be smoothed or sharpened respectively. Lowpass filter is used to reduce the amplitude of high-frequency components. Simple lowpass filters appliese local averaging. The gray level at each pixel is replaced with the average of the gray levels in a square or rectangular neighborhood. Gaussian Lowpass Filter applies Fourier transform to the image. Highpass filter is used to increase the amplitude of high-frquency components. It is useful for detecì¥ a set in which all the pixels are adjacent or touching. Within each region, there are some common features among the pixels, such as color, intensity, or texture. When a human observer views a scene, his visual system will automatically segment the scene for him or her. The process is so fast and efficient that one sees not a complex scene, but rather a collection of objects. However, computer must laboriously isolate the objects in an image by breaking the image into sets of pixels, each of which is the image of one object.
Image segmentation can be approached from three ways. The first approach is called region approach, in which each pixel is assigned to a particular object or region. In the boundary approach, only the boundaries that exist between the regions are located. The third is called edge approach, where people try to identify edge pixels and then link them together to form the required boundaries.


3.7 Object Recognition:
The most difficult part of image processing is object recognition. Although there are many image segmentation alogrithms that can segment image into regions with some continuous feature, it is still very difficult to recognize objects from these regions. There are several reasons for this. First, image segmentation is an ill-posed task and there is always some degree of uncertainty in the segmentation result. Second, an object may contain several regions and how to connect different regions is another problem. At present, no algorithm can segment general images into objects automatically with high accuracy. In the case that there is some a prior knowledge about the foreground objects or background scene, the accuracy of object recognition could be pretty good. Usually the image is first segmented into regions according to the pattern of color or texture. Then separate regions will be grouped to form objects. The grouping process is important for the success of object recognition. Full automatical grouping only occurs when the a prior knowledge about the foreground objects or background scene exists. In the other cased, human interaction may be required to achieve good accuracy of object recognition.

4. Basis of Video Processing
4.1 Content of Digital Video

Generally speaking, there is much similarity between digital video and image. Each picture of video can be treated as a still image. All the techniques applicable to images can also be applied to video pictures. However, there are still different. The most significant difference is that video has temporal information and uses motion estimation for compression. Video is a meaningful group of pictures that tells a story or something else. Video pictures can be grouped as a shot. A video shot is a set of pictures taken in one camera break. Within each shot, there can be one or more key pictures. Key picture is a representative of the content of a video shot. For a long video shot, there may be multiple key pictures. Usually video processing segments video into separate shots, selects key pictures from these shots, and then generate features of these key pictures. The features (color, texture, object) of key pictures are searched in video query.
Video processing includes shot detection, key picture selection, feature generation, and object extraction.

4.1.1 Shot Detection:
Shot detection is a process to detect camera shots. A camera shot consists of one or more pictures taken in one camera break. The general approach to shot detection has been the definition of a difference metric. If the difference between two pictures are above the metric, then there is a shot between them. An algorithm can be proposed for this.This algorithm uses binary search to detect shot which makes it very fast and achieve good performance as well. Recently, there are some algorithms that detect shot directly on MPEG compressed data.

4.1.2 Key Picture Selection:
After shot detection, each shot is represented by at least one key picture. The choice of key picture could be as simple as a particular picture in the shot: the first, the last, or the middle. However, in situations such as long shot, no single picture can represent the content of the entire shot. QBIC uses a synthesized key picture created by seamlessly mosaicking all the pictures in a given shot using the computed motion transformation of the dominant background. This picture is an authentic depiction of all background captured in the whole shot. In CBIRD system, key picture selection is a simple process that usually chooses the first and last pictures of a shot as key pictures.

4.1.3 Feature Generation:
After key picture selection, features of key pictures such as color, texture, intensity are stored as indexes of the video shot. Users can perform traditional search by using keyword querying and content-based query by specifying a color, intensity, or texture pattern. Only the generated features will be searched against and the retrieval can be in real time.

4.2.4 Object Extraction:
During the process of shot detection and key picture selection, the objects in the video are also extracted using image segmentation techniques or motion information. Segmentation-based techniques is mainly based on image segmentation. And objects are recognized and tracked by segmentation projection. Motion-based techniques make use of motion vectors to distinguish objects from background and keep track of their motion. It is a very difficult problem. And the new MPEG-4 standard will talk about how to get objects in the video and encode them separately into different layers. Hopefully this process is not manual and it is also unrealistic to expect it to be full automatical.

5. Current Research about Video Indexing and Retrieval

Video indexing and retrieval is a very active research area. In the field of digital video, computer-assisted content-based indexing is a critical technology and currently a bottleneck in the productive use of video resources. Only an indexed video can effectively support retrieval and distribution in video editing, production, video-on-demand and multimedia information systems. To achieve this, we need algorithms and systems that provide the ability to store and retrieve video in a way that allows flexible and efficient search based on content. In this chapter, we will talk about some important aspects about the state of art progress in video indexing and retrieval. It is organized as follows:
v  Video Parsing
v  Video Indexing and Retrieval
v  Object Recognition and Moition Tracking

5.1 Video Parsing:
The first step of video processing is video parsing. Video parsing is a process to segment video stream into generic shots. These shots are the elementary index unit in a video database, just like a word in a text database. Then each of these shots will be represented by one or more key pictures. Only these key pictures are stored into the video database. There are several tasks in video parsing, including shot detection and key picture selection.

5.1.1 Shot Detection in video parsing:
The first step of video parsing is shot detection. Shot detection algorithms usually belong to two classes: (1)those based on global representations like color/intensity histograms without any local information, and
(2)those based on measuring local difference like intensity change. The former are relatively insensitive to motion but can miss shots when scenes look quite different but have similar distributions. The latter are sensitive to moving objects and camera. Some systems combine the advantages of the two classes of detection by using a mixed method. QBIC is one of these systems.

5.1.2 Key Picture Selection in video parsing
The next step after shot detection is key picture selection. Each shot has at least one key picture. Key picture can best represent the visual content of video. The number of key pictures for each shot can be constant or adaptive to shot content. The first picture is selected as a key picture; and subsequent pictures are compared against this candidate. A two-threshold technique, similar to the one described above, is applied to identify a picture significantly different from the candidate. This new picture is considered another key picture and the subsequent pictures are compared against this new candidate. Users can control the density of key pictures by adjusting the two threshold values.

5.1.3Feature Generation in video parsing
After key picture selection, features of key pictures such as color, texture, intensity are stored as indexes of the video shot. Users can perform traditional search by using keyword querying and content-based query by specifying a color, intensity, or texture pattern. Only the generated features will be searched against and the retrieval can be in real time.

5.2 Video Indexing and Retrieval
After each object in the video shot has been segmented and tracked, their features such as color, texture, motion can be obtained and stored in a feature database. The resulting database is a simple feature, value pair and the actual query is performed on this feature database. For each feature, there is a function to calculated the distance between query object and tracked objects in thevideo database. The total distance is a weighted sum of these distances. If the total distance is below a certain threshold, then it is returned as a possible matching.
There are also some image processing system such as Yahoo Image Surfer Category List, WebSeer, WebSeek, VisualSeek,UCB's query all images, Lycos and MIT's Photobook. Some of them are mainly based on keyword searching. First the images are assigned one or more keywords manually and catorized into different groups such as photos, arts, people, animals, plants. Users can then browse through the separate category that may be interesting to them.

5.2.1  Examples of some image processing systems :
“Yahoo Image Surfer Category List(YISCL)” and “Lycos”. YISCL system also provides visual search function which is based on color distribution matching. UCB's query all image presents several interesting ideas such as ``blobworld'' and ``body plan''. Blobworld is a region. While blobworld does not exist completely in the "thing" domain, it recognizes the nature of images as combinations of objects, and querying and learning in blobworld are more meaningful than they are with simple "stuff" representations. The Expectation-Maximization (EM) algorithm is used to perform automatic segmentation based on image features. After segmentation, each region is shown as an elliptic blob. Body plan is an algorithm for image segmentation. A body plan is a sophisticated model of the way a horse is put together; as a result, the program is capable of recognising horses in different aspects  MIT's Photobook allows users to perform texture modeling, face recognition, shape matching, brain matching, and interactive segmentation and annotation. WebSeek allows users to draw a query that depicts the spatial relations between objects.

5.3 Object Recognition and Motion Tracking
This is such an important topic. In video, the credibility of object recognition is higher than that in still image because there is more information available. The most valuable information is motion vectors. The motion vectors of a moving object has some intrinsic patterns that conform to a motion model. There are some papers talking about object recognition using affine motion model.

6. Basic Requirements:
 

6.1Video Data Modeling
          In a conventional database management system (DBMS), access to data is based on distinct attributes of well-defined data developed for a specific application. For unstructured data such as audio, video, or graphics, similar attributes can be defined. A means for extracting information contained in the unstructured data is required. Next, this information must be appropriately modeled in order to support both user queries for content and data models for storage.
 

Fig 1:  First Stage in Video Data Adaptation: Data Modeling

From a structural perspective, a motion picture can be modeled as data consisting of a finite-length of synchronized audio and still images. This model is a simple instance of the more general models for heterogeneous multimedia data objects. Davenport et al describe the fundamental film component as the shot: a contiguously recorded audio/image sequence. To this basic component, attributes such as content, perspective, and context can be assigned, and later used to formulate specific queries on a collection of shots. Such a model is appropriate for providing multiple views on the final data schema and has been suggested by Lippman and Bender Smith and Davenport use a technique called stratification for aggregating collections of shots by contextual descriptions called strata. These strata provide access to frames over a temporal span rather than to individual frames or shot endpoints. This technique can then be used primarily for editing and creating movies from source shots. It also provides a quick query access and a view of desired blocks of video. Because of the linearity of the medium we cannot get a coherent description of an item but as a result of the stratification method the related information is lumped together. The linear integrity of the raw footage is erased resulting in contextual information which relates the shot with the environment. Rowe et al. have developed a video-on-demand system for video data browsing. In this system the data are modeled based on a survey of what users would query for. Three types of indices were identified to satisfy the user queries. The first is a textual bibliographic index which includes information about the video and the individuals involved in the making of the video. The second is a textual structural index of the hierarchy of movie, i.e., segment, scene, and shots. The third is a content index which includes keyword indices for the audio track, object indices for significant objects and key images in the video which represent important events.
The above model does not utilize the semantics associated with video data. Different video data types have different semantics associated with them. We must take advantage of this fact and model video data based on the semantics associated with each data type.

6.2 Video indexing
Video annotation or indexing is the process of attaching content based labels to video. Video indexing is the process of extracting from the video data the temporal location of a feature and its value.

6.2.1 Need of video indexing
Indexing video data is essential for providing content based access. Indexing has typically been viewed either from a manual annotation perspective or from an image sequence processing perspective. The indexing effort is directly proportional to the granularity of video access. As applications demand finer grain access to video, automation of the indexing process becomes essential. Given the current state of art in computer vision, pattern recognition and image processing reliable and efficient automation is possible for low level video indices,  like cuts and image motion properties etc.
Existing work on content based video access and video indexing can be grouped into three main categories

6.2.1.1 High level indexing
The work by Davis is an excellent instance of high level indexing. This approach uses a set of predefined index terms for annotating video. The index terms are organized based on a high level ontological categories like action, time, space, etc.
The high level indexing techniques are primarily designed from the perspective of manual indexing or annotation. This approach is suitable for dealing with small quantities of new video and for accessing previously annotated databases. 

6.2.1.2 Low level indexing
These techniques provide access to video based on properties like color, texture etc. These techniques can be classified under the label of low level indexing. 
The driving force behind this groups of techniques is to extract data features from the video data, organize the features based on some distance metric and to use similarity based matching to retrieve the video. Their primary limitation is the lack of semantics attached to the features. 

6.2.1.3 Domain specific indexing
These techniques use the high level structure of video to constrain the low level video feature extraction and processing. These techniques are effective in their intended domain of application. The primary limitation of these techniques is their narrow range of applicability. 

 6.3 Video data Management
We here want to know how to extract contents from segmented video shots and then index them effectively so that users can retrieve and browse a large amount of video collections. Management of sequential video streams includes three steps, i.e. parsing, content extraction & indexing and retrieval & browsing.
Video parsing is the process of detecting scene changes or the boundaries between camera shots in a video stream The video stream is segmented into generic clips. These clips are the elemental index units in a video database, just like a word in a text database. Then, each of these clips will be represented visually by their key frames. To reduce the requirements for mass amount of storage, only these key frames will be stored into the database. of indices for their location. There are two type of transitions, abrupt transitions or camera break and gradual transitions e.g., fade-in, fade-out, dissolve, and wipe.
Indexing, which tags video clips when the system inserts them into the database. The tag includes information based on a knowledge model that guides the classification according to the semantic primitives of the images. Indexing is thus driven by the image itself and any semantic descriptors provided by the model. Two types of indices, text-based and image-based, are needed. The text-based index is typed in by human operator based on the key frames using a content logger. The image-based index is automatically constructed based on the image features extracted from the key frames.
Retrieval and browsing, where users can access the database through queries based on text and/or visual examples or browse it through interaction with displays of meaningful icons. Users can also browse the results of a retrieval query. It is important that both retrieval and browsing appeal to the user's visual intuition. By visual query, users want to find video shots that look similar to a given example. In concept query, users want to find video shots by the presence of specific objects or events. Visual query can be realized by directly comparing low level visual features like color, texture, shape and temporal variance of video shots or their representative frames (i.e. key frames). On the other hand, the concept query depends on object detection, tracking and recognition. Since fully automatic object extraction is still impossible, some extent of user interaction is necessary in this process Manual indexing labor can be greatly reduced with the help of video analysis techniques. 

7. Image Processing Techniques For Video Content Extraction
 


The increase in the diversity and availability of electronic information led to additional processing requirements, in order to retrieve relevant and useful data: the accessibility problem. This problem is even more relevant for audiovisual information, where huge amounts of data have to be searched, indexed and processed. Most of the solutions for this type of problems point towards a common need: to extract relevant information features for a given content domain. A process which underlies two difficult tasks: deciding what is relevant and extracting it. In fact, while content extraction techniques are reasonably developed for text, video data still is essentially opaque. Despite its obvious advantages as a communication medium, the lack of suitable processing and communication supporting platforms has delayed its introduction in a generalized way. This situation is changing and new video based applications are being developed.

7.1 Toolkit overview
videoCEL is basically a library for video content extraction. Its components extract relevant features of video data and can be reused by different applications. The object model includes components for video data modelling and tools for processing and extracting video content, but currently the video processing is restricted to images.
At the data modelling level, the more significant concepts are the following:
· Images, for representing the frame data, a numerical matrix whose values can be colors, color map entries, etc.;
· ColorMaps, which map entries into a color space, allowing an additional indexation level;
· ImageDisplayConvertes and ImageIOHandlers, that convert images in the specific formats of the platforms and vice-versa.
The object model of videoCEL is a subset of a more complete model, which also includes concepts such has shots, shot sequences and views Concepts, which are modelled in a distinct toolkit that provides functionalities for indexing, browsing and playing annotated video segments.
A shot object is a discrete sequence of images with a set of temporal attributes such as frame rate and duration and represents a video segment. A shot sequence object groups several shots using some semantic criteria.  Views, are used to visualize and browse shots and shot sequences.

7.2 Temporal segmentation tools
One of the most important tasks for video analysis is to specify a unit set, in which the video temporal sequence may be organized. The different video transitions are important for video content identification and for the definition of the semantics of the video language, making their detection one of the primary goals to be achieved. The basic assumption of the transition detection procedures is that the video segments are spatially and temporally continuous, and thus the boundary images must suffer significant content changes. Changes, which depend on the transition type and can be measured. The original problem is reduced to the search of suitable difference quantification metrics, whose maximums identify, with great probability, the transition temporal locations.

7.3 Cut detection
The process of detecting cuts is quite simple, mainly because the changes in content are very visible and they always occur instantaneously between consecutive frames. The implemented algorithm simply uses one of the quantification metrics, and a cut is declared when the differences are above a certain threshold. Thus, its success is greatly dependent on the metric suitability. The results obtained by applying this procedure to some of our metrics are presented next. The thresholds selection was made empirically, while trying to maximize the success of the detection (minimizing simultaneously the false and missed detections). The captured video segment belongs to an outdoors news report, so its transitions are not very “artistic” (mainly cuts). There are several well known strategies that usually improve this detection. For instance, the use of adaptive thresholds increases the flexibility of the thresholding, allowing the adaptation of the algorithm to diverse video content An approach that was used with some success in previous work , while trying to reduce  some of the lacks of the metrics specific behavior, was simply to produce a weighted average of the differences obtained with two or more metrics. Pre-processing images using noise filters or lower resolution operators are also quite usual tasks, offering means for reducing image the noise and also the processing complexity. The distinctive treatment of image regions, in order to eliminate some of the more extreme values, remarkably increases the detection accuracy, specially when there are only a few objects moving on the captured scene .

7.4 Gradual transition detection
Gradual transitions, such as fades, dissolves and wipes, cause more gradual changes which evolve during several images. Although the obtained  differences are less distinct from the average values, and can have similar values to the ones caused by camera operations, there are several successful procedures, which were adapted and are currently supported by the toolkit.

7.4.1 Twin-Comparison algorithm
This algorithm was developed after verifying that, in spite of the fact that the first and last transition frames are quite different, consecutive images remain very similar. Thus, as in the cuts detection, this procedure uses one of the difference metrics, but, instead of one, it has two thresholds: one higher for cuts, and another for the gradual transitions. While this algorithm just detects gradual transitions and distinguish them from cuts, there are other approaches which also classify fades, dissolves and wipes,.

7.4.2 Edge-Comparison algorithm
This algorithm  analyses both edge change fractions, exiting and entering. Distinct gradual transitions generate characteristic variations of these values. For instance, a fade in always generates an increase in the entering edge fraction; conversely, a fade out causes an increase in the exiting edge fraction; a dissolve has the same effect as a fade out followed by a fade in.

7.5 Camera operation detection
As distinct transitions give different meanings to adjacent video segments, the possible camera operations are also relevant for content identification . For example, that information can be used to build salient stills and select key frames or segments for video representation. All the methods which detect and classify camera  operations start from the following observation: each one generates global characteristic changes in the captured objects and background . For example, when a pan happens they move horizontally in the opposite direction of the camera motion; the behavior of the tilts is similar but in the vertical axis; zooms generate convergent or divergent moves.

7.5.1 X-ray based method
This approach  basically produces fingerprints of the global motion flow. After extracting the edges, each image is reduced to its horizontal and vertical projections, a column and a row, that roughly represent the horizontal and vertical global motions, which are usually referred to as the x-ray images.

7.6 Lighting conditions characterization
Light effects are usually mentioned in the cinema language grammar, as they contribution is essential for the overall video content meaning. The lighting conditions can be easily extracted by observing the distribution of the light intensity histogram: its mode, mean and average are valuable in characterising its distribution type and spread. These features also allow the quantification of the lighting variations, once the similarity of the images is determined.

7.7 Scene segmentation
Scene segmentation refers to the image decomposition in its main components: objects, background, captions, etc. It is a first step for the identification and classification of the scene main features, and its tracking during all the sequence. The simplest implemented segmentation method is the amplitude thresholding, which is quite successful when the different regions have distinct amplitudes. It is particularly useful procedure for binarizing captions. Other methods are described below.

7.7.1 Region-based segmentation
Region-based segmentation procedures find out various regions in an image which have similar features. One of such algorithms is the split and merge algorithm, that first divides the image in atomic homogeneous regions, and then merges the similar adjacent regions until they are sufficiently different. Two distinct metrics are needed: one for measuring the initial regions homogeneity (the variance, or any other difference measure), and another for quantifying the adjacent regions similarity (the average, median, mode, etc.).

7.7.2 Motion-based segmentation
The main idea in motion-based segmentation techniques is to identify image regions with similar motion behaviors. These properties are determined by analysing the temporal evolution of the pixels. This process is carried out in the frequency image produced for all the image sequence. When more constant pixels are selected, for example, the final image is the background causing the motion removal. Once the background is extracted, the same principle can be used to extract and track motion  or objects.

7.7.3 Scene and object detection
The process of detecting scenes or scene regions (objects) is, in certain way, the opposite process of transition detection: we want to find images regions whose differences are below a certain threshold. As a consequence this procedure uses difference quantification metrics. These functions can be determined for all the image, or a hierarchical growing resolution calculation can be performed to accelerate the process. Another tested algorithm, also hierarchical, is based in the hausdorff distance. It retrieves all the possible transformations (translation, rotation, etc.) between the edges of two images . Another way of extracting objects is by representing their contours. The toolkit uses a polygonal line approach  to represent contours as a set of connected segments. The ending of a segment is detected when the relation between the current segment polygonal area and its length is beyond a certain threshold.

7.7.4 Caption extraction
Based on an existing caption extraction method  a new and more effective procedure was implemented. As the captions are usually artificially added to images, the first step of this procedure is extracting high-contrast regions. This task is performed by segmenting the edge image, whose contours have been previously dilated by a certain radius. These regions are then subjected to a certain caption-characteristic size constrains, based on the x-rays (projections of edge images) properties; just the horizontal clusters remain. The resulting image is segmented and two different images are produced: one with black background for lighter text, and another with white background for darker text. The process is complete after binarizing both images and proceeding to more dimensional region constrains.

7.8 Edge detection
Two distinct procedures for edge detection  were implemented: (1) gradient module thresholding, where the image vectors are obtained using the Sobel operator; (2) the canny filter, considered the optimum detector, which analyses the representativity of gradient module maximums, and thus producing thinner contours. As the differential operators amplify high frequency zones, it is common practice to pre-process the images using noise filters, a functionality also supported by the toolkit in the form of several smoothing operators: the median filter, the average filter, and a gaussian filter.

8. Applications
 
8.1 Videocel applications:
Video browser:
This application is used to visualize video streams. The browser can load a stream  and split it in its shot segments using cut detection algorithms. Each shot is then represented in the browser main window by an icon, that is a reduced form of its first frame the shots can be played using several view objects.

WeatherDigest :
The WeatherDigest application generates HTML documents from TV weather forecasts. The temporal sequence of maps, presented on the TV, is mapped to a sequence of images in the HTML page. This application illustrates the importance of information models.

News analysis :
 developed a set of applications to be used by social scientists in content analysis of TV news. The analysis was in filling forms including news items duration, subjects, etc., which our attempts to automate. The system generates HTML pages with the images and CSV (Comma Separated Values) tables suitable for use in spreadsheets such as Excel. Additionally, these HTML pages can be also used for news browsing and there also is a java based tool for accessing this information.

8.2 COBRA Model            
In order to explore video content and provide a framework for automatic extraction of semantic content from raw video data, we propose the Contenet Based RetrievAl (COBRA)  video data model. The model is independent of feature/semantic extractors, providing flexibility by using different video processing and pattern recognition techniques for that purposes. The feature grammar is exploited to describe the low-level persistent meta-data. The grammar also describes the dependencies between the extractors.
At the same time it is in line with the latest development in MPEG-7, distinguishing four distinct layers within video content: the raw data, the feature, the object, and the event layer. The object and event layers are concept layers consisting of entities characterized by prominent spatial and temporal dimensions respectively.
To provide automatic extraction of concepts (objects and events) from visual features (which are extracted using existing video/image processing techniques), the COBRA video model is extended with object and event grammars. These grammars are aimed at formalizing the descriptions of these high-level concepts, as well as facilitating their extraction based on features and spatio-temporal reasoning.
This rule-based approach results in the automatic mapping from features to high-level concepts. However, we still have the problem of creating object and event rules manually. This might be very difficult, especially in the case of certain object rules which require extensive user familiarity with features and object extraction techniques.
As the model also provides a framework for stochastic modeling of events, we have chosen to exploit the learning capability of Hidden Markov Models (HMMs) to recognize events in video data automatically.

                        

                                                      Figure 2 - Video sequences
                               
                        

Figure 3 - Video shots
                                                       

Figure 4 - Principal color detection
                                                                 

Figure 5 - Detected player
SELECT
vi.frame seq
FROM player pl, video vi
WHERE
Event:vi.frame.e=((Appr_net_BSL:
((e1:Player_near_the_net,e2:Backhand_slice),(),(),(),(before (e2,e1,n),n<75 ampras="" e1.o1.name="e2.o1.nam=" span="">
Fig 6 - Video query

The WHERE clause of the query shown in the table above, constrains player profiles on only documents that contain videos with the event Approaching the net with backhand slice stoke'. This new event description, defined inside the query, demonstrates how complex events can be defined dynamically based on previously extracted events and spatio-temporal relations. The first one of two predefined events is the Player_near_the_net event, which is defined using spatio-temporal rules, where as the second one, the Backhand_slice event, is defined using the HMM approach. The temporal relation requires e1 to start at least 75 frames before event e2. The event descriptions are evaluated by the query processor. It rewrites the event from its conceptual definition into a standard object algebra extended by the COBRA video model and spatio-temporal operations. Therefore, a user is able to explore video content specifying very detailed complex queries that include a combination of features, objects, and events, as well as spatio-temporal relations among them.

Conclusion


Visual information has always been always an important source of knowledge. With the advances in information computing & communication technology, this information in the type of digital images & digital video, is highly available also through the computer. To be able to cope with the explosion of visual information, an organization of material which allows for fast search & retrieval is required. This calls for the system which is in some way can provide content-based handling of visual information. In this seminar I have tried to give the basic image processing techniques, status of content based access to images & video databases, some applications regarding to video content extraction.
An image extraction system is necessarily for users that have large collection of images like digital library. During the last few years, some content based techniques for image retrieval system are commercially available.. These systems offer retrieval by color,, texture or shape & smart combinations of these images help users in finding the image he is looking for.. A video retrieval system is useful for video archiving, video editing, production etc.

BIBLIOGRAPHY


Basis of Video and Image Processing by Jian Wang
A survey on content based access to image & video databases by Kjersti Aas &  Line Eikvil

Content-based representation and retrieval of visual Media: a state-of-the-art review 1
Philippe Aigrain, Hongjiang Zhang Dragutin Petkovic

Advanced content-based semantic scene analysis and information retrieval: the schema project
E. Izquierdo, J.R. Casas, R. Leonardi, P. Migliorati, Noel e.

O’connor, I. Kompatsiaris and M. G. Strintzis 

Sunday, June 2, 2013

Home Automation - A Comprehensive Insight


ABSTRACT

Automation in the simplest of terms means ‘to auto’ thus avoiding manual interface with the machine or in simpler words ‘control without direct interference’. The second important part of this paper is Home the place where human beings dwell. With advancements in time and technology automation plays an important role in the home environment .homes have now (potentially)become the places where a progressive interaction say from libraries to museums ,hospitals to shops takes place and people receive a number of services . modern automation thus provides for new architectures and components which on home implementation go for highly personalized services at an affordable rate .Automation market also has the potential to become a parallel runner to the professional market provided necessary technological advances and input of infra structure takes place .the broad aspect of home automation includes man-machine interface issues ,energy usage and task planning and management , non-intrusive automation systems ,privacy issues ,safety protocols coupled with telecommuting and paradigms of a changing environment.
                        Taking  into widespread domains of home automation the paper aims to cover the paper aims to cover the following points-
•    Energy usage and task management for proper coordination of daily chores .
•    Friendly multimodal home interfaces which are ‘intelligent’.
•    Non-intrusive systems which go for context awareness and are prepared for ‘surprises’.
•    Proper sensor recognition especially using wireless protocols.
•    Active furniture.
•    Privacy security and safety.
•    Production paradigms and ethical issues.
                   So how about an automated home with intelligent beds, autonomous coffee tables, and an ‘arm-equipped kitchen’ with protection against breaking and intrusions and care of the elderly.
                   Thus the adage ‘home sweet home’ now will change to “home sweet automated home”.

HOME AUTOMATION- AN INTRODUCTION
The sole aim of this paper is to reflect upon issues, raise questions and finally attempt to come at a tangible solution that home automation raises for researchers of information and communication technologies. Intelligent homes are a vision of   “home of the future” as well as a related, though not identical of home control products .so it becomes essentially important to trajectorise this technological interaction thereby providing resurrected images. It also deals with people’s reactions to both the vision and product formulation and everyday implications of these technologies.
           The earliest developments of these technological prototypes taken back to the 1960’s and the 70’s but it has not been a runaway box-office hit. It has been a nascent industry with but few companies willing to spend technological know-how on the field .
            Some of the very specific applications include complex management electricity loads to benefit from multiple tariffs , generally unfamiliar to the public . The ‘automatic home’ is something seen in sci-fi movies resembling a Starship ENTERPRISE .The ‘butler in the box’ devices  such as voice recognition systems provide a further impetus to home automation.
            Home automation can also be thought of as a logical consequence to the today’s ‘just-in-time mentality’ coupled with convenience, cost saving, security as the other prime de facto.
            Home automation thus signifies a ‘technology for living in’ which provokes speculation about home and life styles.
            The later pages of these papers continue in the same vein set forth by the introduction as it automatically explores various aspects of home automation such as technological aspects, security, lighting ,electrical appliances , multi utilities , future trends , applications and finally conjure up on a simple case study concluded  by the epilogue.




                        TECHNICAL ASPECTS OF HOME AUTOMATION

It is generally assumed  that home automation extols the wonders of home control such as it will be a panacea for stoves left on, groping in the dark for light switches, high  heating bills without going skin deep into the various technicalities involved. The laggardly approach towards home automation resulted in construction of numerous stand alone devices without providing anything that is compatible and affordable to the typical homeowner .A particular standard to unify all appliances so that ‘consumer economics’ becomes  easy to deal with.
                       The  EIA  developed the  Consumer Electronic Bus (CEBus) along with major players like Sony, Philips ,AT&T, Panasonic, Texas , Mitsubishi, RCA, for a proper standardization of home appliances.
                       The CEBus facilitates communication between home automation devices and appliances with a view to reduce the jungle of hand held remotes by infra red controllers.
                        CEBus has five primary goals:
Ø  It would be retrofittable.
Ø  It would use distributed intelligence.
Ø  It would be non –product specific.
Ø  Have an open architecture.
Ø  Finally it would be expandable.
CEBus isn’t actually a bus but a network specification which follows the seven layer network model lay down by the ISO and comprises of physical, network, transport, session, datalink and application layers. There exists a well defined network through which the layers communicate with each other.

§  THE PHYSICAL LAYER
 This is the lowest layer and it is the place where all the CEBus strengths lie embedded within. All the different media are present in the physical layer specification and all the layers above the physical layer are specification independent.
          The CEBus specifies six media to carry the signal generated information as part of its specification programme. They are :
·         PLBus (power line bus)
·         SRBus (single room bus)
·         RFBus (radio frequency bus)
·         TPBus (twisted pair bus)
·         FOBus (fibre optic bus)
The last three are together referred to as the WIRED BUS.

§  DATA LINK LAYER
After the physical layer comes the data link layer responsible for the provision of clean channel of communication for higher levels. The prerequisites of the data link layer are:
ü  Collision handling
ü  Detection and resolution
ü  Packet acknowledgement
ü  Packet reconstruction
The above mentioned four aspects are handled using CSMA (Carrier sense. Multiple access) and with CDCR (Collision detection and multiple resolution).
          Packets are signals of information transmitted via various nodes interconnected to each other. since there is always a possibility that two nodes will transmit information at the same time and the information collision takes place. As it is best to avoid to avoid such a situation collision prevention is tried first .

§  APPLICATION LAYER
 The final layer responsible for end user visibility. In CEBus the highest level defined it is not visible to the user because the operation is more or less part of the functionality embedded within. so the end user can be said to see the programmers sight .for avoiding discrepancy EIA gives CAL (common application language) for intelligent device communication.
                     A prelude to the CAL is the Application Protocol Data Unit i.e. the
APDU made up of  3810 bytes but it only gives the information of the first two bytes as mode information and type identifier. The mode information specifies the service class, header type, and data type length for the command which follows. The type identifier determines whether command is implicit or explicit and also determines response codes for an explicit command..
                        Finally the Common Application Service Elements decide what the final outcome should be. CASE enables us to create any commands and do whatever function is pleased.

§  NETWORK LAYER
The network layer is primary responsible for determining which media are to receive the packets of information signals and also deals with the breaking apart of packets which exceed the normal 32- byte limit.

§  TRANSPORT, SESSION AND PRESENTATION LAYERS
 The OSI layer 7 network model was designed to be useful in just about any application , so some extra ‘fat’ would be there that needs to be trimmed while implementing  applications which do not require all facilitations and segmentations defined in the model. The function of transport session and presentation layer are handled by application, network, and data link layer in  the CEBus.

               Thus summarizing the whole exercise this chapter puts forward the various functional layers of CEBus which is today by far the most widely used by Home Automation buffs today.

                            SECURITY AT THE AUTOMATED HOME
The forthcoming pages of this paper concern themselves with the effectiveness of electronic security systems in our homes, businesses and law enforcements.
                   With regard to smart security it I widely acknowledged that “professionally installed and monitored alarm systems are useful instruments to deter crime and provide peace of mind for residential and business owners”. A home security system is acknowledged by insurance companies as effective, necessary and beneficial.
                     There are many instances where home security systems are a boon for residents and businessmen. These sophisticated electronic security systems pay for themselves by a drastic reduction of vandalism and crime.yet another aspect which comes into light is that it has reduced lawsuits more than property losses.

                      MODES OF SECURITY SYSTEMS
v  Alarms as security systems
v  CCTV surveillance systems.

1. ALARMS AS SECURITY SYSTEMS:
                     THE mantra is “show me the yard sign”. Alarm systems are single most effective measure to reduce probability of burglary.
                      It is a generally observed trend that:
·         Expensive homes are more likely to be burgled than non detached homes.
·         Close proximity to a highway entrance increases a home’s vulnerability to burglary.


2. CCTV CAMERAS FOR THE ‘SMART SHOP’:
                      CCTV surveillance at shops has by far been the most effective way for protection against theft and burglary. The only weakling is the invasion  of   privacy  at homes.
                        The security at home has been the most grave concern that it started the dual income trend after WW -2.
After an insider into the need for security let has have a study on the components required for a smart home.
                      A security system which protects homes against intruders and also operates light and electricity appliances as modules embedded within can be programmed  to be apart of alarm signaling.
                  For example - consider the security console system introduced by the Marmitek Corporation known as Security Console 2200. this base systems can be operated by the security console of the 2200.   
                     The working of the device can be illustrated as follows: at the alarm situation the built in telephone voice dialer of 2200 will dial up to 4 preprogrammed numbers and play the prerecorded alarm message. Anyone picking up the phone at the other end responds to alarm message by pressing a single digit on that telephone. This will stop the future dialing on and allows the person who picked up the phone to listen in protected premises by means of microphone of security console SC2200. Path of sensors will be reported in a similar way by plays fixed service message.
                                   

                                      Additional feature of security console 2200 

Ø  Anti faming circuit detecting radio signals.
Ø  Alarm message recording by the use up to 12sec.
Ø  Silent alarm function.
Ø  Wired sensor input
Ø  Additional sirens and light activation.
Ø  Lifestyle function provides for perfect simulation of residential.
              The parts of smart system are 
ü  Base station
ü  Door/window station
ü  Motion sensors
ü  Key chain remote controls
ü  System remote controls
ü  Lamp module
ü  Glass break sensor
ü  Programming telephone numbers
ü  Jamming detector and panic alarm.
from the study of the example of the security console 2200 it becomes evident that security system for a smart home is a must and forms an integral part of it.

   Diagram of SC2200 by the MARMITEK Corporation
                                Thus this phase of the paper stresses on the importance of a security system and also provides the instruction to do the needful.

                              HEALTH AT A SMART HOME
                                     
An aspect of home automation comes out to be home networking which has home technology as its nearest relative .a home network allows the residence to be connected to the outside world through a residential gateway that passes information down to the ISDN or DSL. Home networking allows the home to be fully connected, controlled externally as well as internally. Thus home networking makes it possible the luxury of telemedicine and telecare.
                                 Home network allow the devices of smart home to be commented monitored via external sources for the disabled person home networking allows for safety and reassurance that a fault developed will be informed to the correct people through the network.
                                It gives the much required during to the ill and diseased to retain the care of hospital within there own home. Thus a person at home can be. Remotely assessed by medical staff and telemedicine proved for virtual medical service to the home.     

 Telecare and Telehealth
               In their infancy, designed nosing one off at devices such as blood pressure and configured to standard system such as smart house or a call system. Relying on the use of internet of telephone for trans furring information from the source to the relief the doctor.
          
         


   Basic Criteria Underline for health technology
                                    Solution 
·         Affordability
·         Ease of use
·         Flexibility
·         Functionality and Interactively
·         Reliability and maintainability
·         Replicable and ease of installation
·         Up gradable
                        Thus health solution needs to be flexible to suit to needs of the occupant. It is required to be dependable and reliable over a basic set of criteria. This technology at a smart home is evolving with respect to peoples relationship to technology at home is changing.

                          Utilities for a smart Home       
Any smart home has various uses friendly utilities which are integrated into the smart home circuitry and there by provide for a wide range of applications.
                      Some of these utilities include
Ø  Message controller
Ø  Telephone security Intercom
Ø  3-V temperature sensor
Message controller: - A message controller provides for free annunciation for home automation system. Owners customize their own announcement to accompany key events. The voice to EPROM storage retains message without power. Its built in produce great voice reproduction. Up to 8 nearly 10sec   
    
                                 
 INTELLA VOICE RECORDER
. Announcement easily be recorded The onboard processor monitors the power line using X- YW523 for key events.        
                            

Telephone security intercom- enables a doorbell to be answered from any telephone.these are compatible with a commercial or single line phone systems and does not require a dedicated  trunk port  to operate .
                         when the bell is activated by button the phone is picked up by the occupant and is connected to the door station .compatible with cordless phones is particularly useful when employees and security people are working alone and respond flexibility to open doors.

3-v temperature sensors – analog devices introduce a 3-v temp sensor IC with a voltage output guaranteed accuracy better than 2 degree Celsius and non –linearity better than 0.5%  over 0- 100degree Celsius .
                            a temp sensor say AD 22103 is ratio metric ,providing for sustained precision operation as voltage levels decrease . the sensor equipped with on-chip linearization and signal conditioning thereby eliminating external circuitry cutting development cost.



                         Temperature Sensor             

   APPLICATIONS
       
Applications of home automation :
1. GENIO :  an ambient intelligence application in home automation and entertainment environment.
            The avatar of  GENIO acts as a home assistant as well as combines effectively with home appliances thereby increasing its sphere of influence at a wide variety of applications.
               Applications of home assistant -
ü  Speech recognition software.
ü An RFID
ü Entertainment services.
ü Home assistant application.



                              HOME APPLICATIONS-
Ø  Oven
Ø  Refrigerator
Ø  Washing machine
Ø  Dishwasher
Now as we have discussed avatar let us now make it clear what an avatar is- an icon or a representation of a user in a shared virtual reality.

v  Features  - reading e mails
v  Checking goods in the refrigerator.
v  Shopping list downloaded to a personal digital assistant.
v  Effective management of washing machine, dishwasher, oven, boiler.
v  Preparing a recipe
v  Intelligent plugs.

                        FUTURE ASPECTS OF HOME AUTOMATION

  The concept of ambient intelligence arises as a need for providing a vision to the future     environments where people are assisted by information technology in all walks of life. Much different from the computer environment we know today. The envisioned technology “will weave into fabric of everyday life until it is indistinguishable from it”.
                   In short we envision the creation of smart environments which integrate information, communicate and sense technologies into everyday objects to differentiate between “system oriented, importunate smartness and people oriented ,empowering systems” for latter achievement where “smart spaces smarten people” the requirements of potential users who are going to liven the future in intelligent homes. This can especially be said of service providing systems as their benefits are a post –product of suffice information from the user.

PRERQUSITES FOR FUTURE ENVIRONMENTS
·         Desire to maintain control over their environment and to properly define responsibilities.
·         Responsibility of parents for their children and to control  and protect over information gathered by them.
·         Sensitive issues such as interpersonal contact are to be dealt with.
·         Reduction of information overload and burden to search for information items.
·         Prevention of annoying accidents, adjustments of lights desired ambiences are a must.
·         Conclusion of spontaneous conversations.

Advantages of home automation
Ø  User control.
Ø  Value addition.
Ø  Maintenance of home comfort with no subversion.
Ø  Secure, safe and private.

          Disadvantages of home automation

§  It includes laziness as people migrate become incompetent as they do not have to do anything.
§  Low cost effectiveness
   Thus this chapter highlights the future automation and enlists the advantages and disadvantages of home automationInfra –red control (IR control)

   CONCLUSION

As we come to the end of the insight on home automation we find that we have bygone through all possible nuances of home automation. We have read the possibilities, implications, aspects, technicalities, modernization, securities, and also the various marketers which are providing us these services in compliance with the marketing standards set by CEBus and X-10.
                      We have also seen the advantages and say a few disadvantages of home automation and finally conjure up the future trends in home automation. Thus it can definitely be said that home automation is born of the present and it is the thing for the future.

Bibliography

1)      www.smarthome.com X10 and Home Automation parts
2)      www.smarthomeusa.com X10 and Home Automation parts
3)      www.x10pro.com X10 parts
4)      www.appdig.com Home for Ocelot controller and also a touch screen version! User forums
5)      www.worthdist.com Distributor for HA parts and accessories
6)      www.home-electro.com IR interface units, Serial and USB versions
7)      www.girder.nl PC Control software, User forums
8)      www.evation.com IR interface