HK1175049B

HK1175049B - Camera-based scanning

Info

Publication number: HK1175049B
Application number: HK13101983.2A
Authority: HK
Inventors: D．尼耶姆切维奇; M．武科萨沃维奇; M．武格代利亚; A．米特洛夫克; G．F．佩特舒宁格; B．德雷塞维克
Original assignee: 微软技术许可有限责任公司
Priority date: 2009-09-23
Filing date: 2010-08-28
Publication date: 2016-07-29

Description

Camera-based scanning

Background

Computing devices including cameras are becoming more common and mobile, such as laptop computers, tablet PCs, digital camera devices, mobile phones, ultra-mobile PCs, and other mobile data, messaging and/or communication devices, and the like. A user may take various photographs with a camera associated with a computing device, including capturing images of a presentation, whiteboard, business card, document, sketch, drawing, and the like. The user can then reference the captured image to recall information contained therein such as charts, photographs, lists, and other text. Often, users want to be able to utilize the information in the captured images into their own documents, notes, and/or presentations. However, traditionally, the images captured by the camera are static, and it may not be straightforward to extract electronically useful and/or editable information from the static images.

Conventional techniques for generating a scanned version of a captured image include taking a print of the image and then manually operating a scanner to create the scanned version of the image. Another conventional technique to work with captured images requires transferring the image from the capture device to a desktop computer and then using the desktop computer's image editing application to further process the image. These manually intensive techniques for acquiring information contained in captured images can be inconvenient and time consuming for a user.

SUMMARY

This summary is provided to introduce simplified concepts of camera-based scanning. These simplified concepts are further described in the detailed description that follows. This summary is not intended to identify essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Embodiments of camera-based scanning are described. In embodiments, a scanned document may be created using an image captured by a camera associated with the device. An image captured by a camera is processed to identify a quadrilateral (rectangular) portion within the image that corresponds to a rectangular object such as paper, business cards, whiteboards, screens, and the like. One or more of these quadrilateral portions may be selected for automatic scanning based on a scoring scheme and/or semi-automatic scanning with the aid of input from a user. One or more scanned documents are created from the selected quadrilateral portions by unfolding (un-warp) the selected portions to eliminate perspective (perspective) effects (e.g., adjusting the portions to rectangles) and applying various image enhancements to improve appearance.

Brief Description of Drawings

Embodiments of camera-based scanning are described with reference to the following figures. The same reference numbers are used throughout the drawings to reference like features and components:

FIG. 1 illustrates an example of a device that can implement embodiments of camera-based scanning.

FIG. 2 illustrates an example system in which embodiments of camera-based scanning can be implemented.

FIG. 3 illustrates an example method of camera-based scanning in accordance with one or more embodiments.

FIG. 4 illustrates other example methods of camera-based scanning in accordance with one or more embodiments.

FIG. 5 illustrates components of an example device that can implement embodiments of camera-based scanning.

Detailed Description

Embodiments of camera-based scanning provide a suitably configured computing device user with a technique to scan documents, presentations, and other objects using images taken by a camera associated with the device. Camera-based scanning can correct for perspective effects on rectangular objects such as paper, business cards, whiteboards, screens, etc.

For example, a user may aim a camera of the portable device at a target and initiate capture of an image of the target using a button, touch, or other suitable input. When a user initiates capture, a capture operation is performed to capture an image of a target. Image capture may initiate various processing of the captured image to create one or more scanned documents from the captured image. In an embodiment, a touch input or touch event on the touch screen may be initiated to indicate both the area of interest and that image capture should occur. The location of the touch input may be used in subsequent processing steps to guide the scoring function. This processing may include identifying quadrilaterals of potential portions within the captured image that are to be considered for scanning. The device may be configured to select one or more of the identified quadrilaterals automatically based on a scoring scheme and/or semi-automatically with the aid of input from a user. One or more scanned documents may then be created by unfolding the selected quadrilateral to eliminate perspective effects (e.g., adjusting the quadrilateral to a rectangle) and applying various image enhancements to improve appearance.

While features and concepts of the described camera-based scanning systems and methods can be implemented in any number of different environments, systems, and/or various configurations, embodiments of camera-based scanning are described in the context of the following example systems and environments.

FIG. 1 illustrates an example 100 of a computing device 102 that can implement embodiments of camera-based scanning. Computing device 102 is an example of various types of devices, including the example portable device described with reference to fig. 2, and may also be implemented with any number and combination of the different components described with reference to the example device shown in fig. 4. The computing device 102 includes an integrated display screen 104 to display user interfaces, user interface elements and features, user-selectable controls, various displayable objects, and so forth. The computing device 102 also includes a camera 106 to capture digital images. In the example, camera 106 is shown on the side of computer device 102 opposite display screen 104.

The computing device 102 also includes at least one input driver 108 to process various inputs from a user to operate the computing device 102. In at least some embodiments, the display screen 104 is a touch screen, and the input driver 108 is operable to detect and process various touch inputs and/or touch events. In an embodiment, a touch input or touch event on the touch screen 104 may be initiated to simultaneously indicate the region of interest and initiate image capture. The image may be displayed on a display screen as a preview of the image to be captured and a touch event at a particular location on the screen indicates that the image should be captured. Further, the particular location is identified as being of interest to a scoring function in a subsequent processing algorithm. Thus, touch input may be utilized to select a portion of an image and cause the camera taking the picture to begin capturing the image.

Computing device 102 also includes a capture application 110 to initiate display of a user interface 112 and various user interface elements, features, and controls to facilitate capturing and processing images by camera 106. Further, capture application 110 represents functionality of computing device 102 for implementing camera-based scanning techniques described herein. The example user interface 112 is shown as a split screen interface with a viewfinder 114 and a scanned image display 116. The viewfinder 114 can present the current image from the camera 106 and switch to present the captured image when the picture is taken. In addition, the user is also able to modify and select portions of the captured image through interaction with the viewfinder 114.

Scanned image display 116 may present one or more portions of the captured image processed by capture application 110 to produce a scanned document. The split screen enables simultaneous display of the captured image in the viewfinder 114 and the scanned document generated from the captured image in the scanned image display 116. In this manner, a user may simultaneously view a captured image and a scanned portion of the image, and may use the user interface 112 to intuitively make adjustments, such as modifying the boundaries of a selected portion or selecting a different portion, etc.

In an embodiment of camera-based scanning, a user of the computing device 102 may initiate camera-based scanning by taking a picture of the target 118. The target 118 of the photograph may include one or more rectangular objects such as documents, paper, business cards, photographs, whiteboards, and the like. In the example of FIG. 1, the target 118 is displayed as a display screen being used for a commercial presentation. Upon taking a picture to initiate a camera-based scan, the capture application 110 captures an image of the target 118 and may output the image in the viewfinder 114 of the user interface 112.

The capture application 110 is implemented to detect a portion of the captured image that corresponds to a rectangular object. In particular, the capture application 110 may be configured to identify a quadrilateral of potential regions within the captured image that will be considered for scanning. Various feature extraction techniques suitable for finding arbitrary shapes within images and other documents may be used to identify quadrilaterals within images.

In at least several embodiments, the capture application 110 includes or otherwise utilizes an edge detector operable to detect edges based on visual differences such as sharp changes in brightness. An example algorithm suitable for edge detection is the Canny algorithm. When edges have been identified, the edges may be incorporated into the connected lines to form a quadrilateral. For example, vertices (corners) may be identified by edge detection and then connected to form a quadrilateral. For example, this may involve applying a linear hough transform to correct defects in the detected edges and deriving (derive) lines corresponding to the edges. Thus, a set of potential quadrilaterals may be derived using detected edges and lines, where lines are detected from similar directed edges along a particular direction and then combined to form a quadrilateral.

The capture application 110 may cause an indicator 120 to be displayed in the user interface 112 to represent a quadrilateral detected and/or selected within the captured image. For example, the captured image in the viewfinder 114 of FIG. 1 includes an indicator 120 configured as a dot at a vertex and a dashed line displayed along an edge. Other indicators 120 are also contemplated, such as lines of action, color changes, flags, and the like. The potential quadrilaterals may be presented through a viewfinder to enable a user to select one or more quadrilaterals for scanning.

In an implementation, the capture application 110 may be configured to automatically select one or more potential quadrilaterals to scan based on a scoring scheme. The scoring mechanism may score the potential quadrilateral based on various criteria including, for example, size, location, recognition of content such as text and faces, and the like. The highest scoring quadrilateral may be automatically selected. In another example, any quadrilateral that exceeds a threshold score may be selected.

The capture application 110 may also be configured to enable semi-automatic selection of a quadrilateral when the automatic selection fails to find a suitable quadrilateral and/or when the user initiates the semi-automatic selection. To do so, the indicators 120 described above may be used to present automatically selected quadrilaterals, corners, and/or lines in the user interface 112. The user then provides input to modify the automatically selected quadrilateral, eliminate the quadrilateral, define a custom quadrilateral, and so on.

In an implementation, the indicator 120 may be selected by a user to modify the quadrilateral, such as by dragging an angle to change the position of the angle. In another example, a user may define a custom quadrilateral by selecting one corner through interaction with user interface 112. The capture application 110 may be configured to automatically derive a corresponding quadrilateral based on a user's selection of a corner. The user may also operate a selection and drag tool of the user interface 112 to identify areas of the custom quadrilateral. When the computing device 102 is touchable, the user may touch and drag directly on the display screen 104 to modify and define the quadrilateral. Other input devices may also be used for semi-automatic selection of the quadrilateral, including, for example, a stylus, mouse, directional keys, and/or other suitable input devices.

One or more scanned documents may then be created by unfolding the selected quadrilateral to eliminate perspective effects (e.g., adjusting the quadrilateral to a rectangle) and applying various image enhancements to improve appearance. In particular, to perform the unfolding, the capture application 110 may be implemented to detect and correct for distortion due to perspective of the captured image. For example, the capture application 110 may determine the perspective based on the angles and ratios of the selected quadrilateral. The capture application 110 can crop the captured image to correspond to the selected quadrilateral. The capture application 110 may then rotate, resize at least some portions, and otherwise correct the cropped image to account for perspective distortion and produce an unbent image that is adjusted to be rectangular.

The capture application 110 may also use various image enhancements to improve the appearance of the unbent image. Examples of such image enhancements include, for example, color enhancement, correction for brightness and shading, and background removal. Image enhancement may also include applying Optical Character Recognition (OCR) to the unbent image to identify text and produce a scanned document having an editable text portion.

Consider the particular example of a business presentation shown in fig. 1. The business presentation is displayed on a display screen and includes text and a diagram of a person. When the user takes a picture of a commercial presentation, an image is captured and may be displayed in the viewfinder 114. Note that the image appears to be oblique due to the angle at which the photograph was taken. Initially, the capture application 110 can identify screen boundaries using the camera-based scanning techniques described herein. Thus, the use indicator 120 identifies a quadrangle corresponding to the screen in the viewfinder 114. Further, the capture application 110 may initially present a scanned version of the screen including text and diagrams by scanning the image display 116.

However, in the example, the user has selected the custom quadrilateral to select a view without text. For example, the user may touch the display screen 104 of the computing device 102 to select a corner and/or drag a selection box around the view. In response to this selection, the selected portion (e.g., view) is scanned by unfolding to eliminate the perspective effect (e.g., tilt) and using the enhancement. The generated scan view is displayed in scan image display 116 and can be presented simultaneously with the captured commercial presentation in viewfinder 114 using the split screen of user interface 112. The user may utilize the scanned view in various ways, such as by adding annotations, sharing with colleagues, posting on a website or blog, and so forth.

FIG. 2 illustrates an example system 200 in which embodiments of camera-based scanning can be implemented. The example system 200 includes a portable device 202 (e.g., a wired and/or wireless device) that may be any one or combination of the following: a mobile personal computer 204, a Personal Digital Assistant (PDA), a mobile phone 206 (e.g., cellular, VoIP, WiFi, etc.) that may be implemented for data, messaging, and/or voice communication, a portable computer device 208 (e.g., a laptop with a touch screen, etc.), a media device 210 (e.g., a personal media player, a portable media player, etc.), a gaming device, an application device, an electronic device, and/or any other type of portable device that is capable of receiving, displaying, and/or communicating data in any form of audio, video, and/or images.

Each of the various portable devices may include an integrated display and/or an integrated touch screen or other display, as well as selectable input controls via which a user may enter data and/or select. For example, mobile personal computer 204 includes an integrated touch screen 212 on which a user interface may be displayed, including displayable objects and/or user interface elements 216, such as any type of image, graphic, text, selectable buttons, user-selectable controls, menu selections, map elements, and/or any other type of user-interface displayable features or items. In accordance with one or more embodiments of camera-based scanning described herein, the user interface 214 may also display captured and scanned images via split screens.

Any of the various portable devices described herein may be implemented with one or more sensors, processors, communications components, data inputs, memory components, storage media, processing and control circuits, and/or content presentation systems. Any of the portable devices may also be implemented to communicate via a communication network, which may include any type of data network, voice network, broadcast network, IP-based network, and/or wireless network that facilitates data, messaging, and/or voice communication. The portable device may also be implemented with any number and combination of different components described with reference to the example device shown in fig. 4. The portable device may also be associated with a user (i.e., a person) and/or an entity that operates the device, such that the portable device describes logical devices that include users, software, and/or a combination of devices.

In this example, the portable device 202 includes one or more processors 218 (e.g., any of microprocessors, controllers, and the like), a communication interface 220 for data, messaging, and/or voice communication, and a data input 222 that receives media content 224. Media content (e.g., including recorded media content) may include any type of audio, video, and/or image data received from any media content or data source, such as messages, television media content, music, video clips, data feeds, interactive games, network-based applications, and any other content. The portable device 202 is implemented with a device manager 226, which device manager 226 includes any one or combination of a control application, software application, signal processing and control module, code that is native to a particular device, and/or a hardware abstraction layer for a particular device.

The portable device 202 includes various software and/or media applications 228 that may incorporate components such as a capture application 230 capable of being processed or otherwise executed by the processor 218. The media applications 228 may include music and/or video players, image applications, Web browsers, email applications, messaging applications, digital photograph applications, and so forth. The portable device 202 includes a presentation system 232 to present a user interface from the capture application 230 to generate a display on any portable device. The presentation system 232 is also implemented to receive and present any form of audio, video, and/or image data received from any media content and/or data source. The portable device 202 also includes a camera 234 and an input driver 236 that can incorporate or otherwise utilize a touch screen driver of the touch screen 212. The input driver 236 may be configured to detect and process various inputs and/or determinable representations of gestures, inputs, and/or actions to operate the functionality of the portable device 202 including the operation of the capture application 230 to implement camera-based scanning. Implementations of the capture application 230 and the input driver 236 are described with reference to the capture application 110 and the input driver 108 shown in FIG. 1, and with reference to embodiments of camera-based scanning described herein.

In accordance with one or more embodiments of camera-based scanning, example methods 300 and 400 are described with reference to respective fig. 3 and 4. Generally, any of the functions, methods, procedures, components, and modules described herein can be implemented using hardware, software, firmware, fixed logic circuitry, manual processing, or any combination thereof. A software implementation represents program code that performs specified tasks when executed by a computer processor. The example methods may be described in the general context of computer-executable instructions, which may include software, applications, routines, programs, objects, components, data structures, procedures, modules, functions, and the like. The methods may also be practiced in distributed computing environments where the methods are performed by processing devices that are linked through a communications network. In a distributed computing environment, computer-executable instructions may be located in both local and remote computer storage media and/or devices. Further, the features described herein are platform-independent and may be implemented on a variety of computing platforms having a variety of processors.

Fig. 3 illustrates an example method 300 of camera-based scanning. The order in which the methods are described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the methods, or an alternate method.

At block 302, an input is detected that initiates capture of an image. For example, the input driver 108 at the computing device 102 detects a user selection to take a picture of the target 118. The target 118 may include one or more rectangular objects that are detectable for camera-based scanning. The capture may be initiated by the user manipulating a touch screen button, a key stroke, a dedicated shutter button operation of the computing device 102, or another suitable input.

At block 304, an image is captured in response to the input. For example, in response to the input at block 302, the capture application 110 at the computing device 102 may capture an image of the target 118 using the camera 106. The captured image may be presented through the user interface 112 of the computing device 102.

At block 306, one or more portions of the captured image are scanned based on the detection of the quadrilateral in the captured image. At block 308, enhancements are applied to portions of one or more scans. Various techniques may be used to detect the quadrilateral in the captured image. For example, the capture application 110 at the computing device 102 may identify the quadrilateral using either or both of the automated and semi-automated techniques described with respect to fig. 1. In another example, a manual technique may be used in which the captured image may be presented to the user through the computing device 102 for manual selection of the quadrilateral. In this example, the functionality that enables automatic detection of the quadrilateral through the capture application 110 may be disabled, may not be included, or may not be available for other reasons. Various enhancements for improving the appearance of one or more of the scanned portions are contemplated. More details regarding techniques for detecting quadrilaterals and enhancing scanned images are provided below with reference to an exemplary method 400 illustrated in FIG. 4.

At block 310, a scanned document corresponding to one or more portions is output. For example, capture application 110 at computing device 102 may cause the scanned document to be presented in a scanned document display of user interface 112. The user can then process the scanned document, such as by saving the document, adding annotations, sending the document to one or more recipients, and so forth.

Fig. 4 illustrates an example method 400 of camera-based scanning. In particular, FIG. 4 represents an example algorithm suitable for scanning images captured by the camera 106 of the computing device 102. The order in which the methods are described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the methods, or an alternate method.

At block 402, the captured image is preprocessed. For example, the capture application 110 of the computing device 102 may pre-process the captured image in various ways to prepare the image for a camera-based scan. By way of example, the pre-processing may include applying image filters, enhancing contrast, adjusting brightness, downscaling, GS conversion (GSconversion), median filtering, and so forth. In an embodiment, the pre-processing includes one or more of downscaling, contrast enhancement, and noise filtering of the image. Downscaling may be performed to reduce the resolution of the image and the number of pixels the image needs to be processed. As image resolution increases, more computing resources are consumed to process the image and produce more false (e.g., unwanted) edges from edge detection. Thus, the downscaling may speed up processing and enable improved edge detection.

Contrast enhancement may be used to set the brightest hue in an image to white and the darkest hue to black. This may also improve the detection of edges and lines by detection algorithms that find sharp differences in contrast and/or brightness. Noise filtering includes applying one or more filters to remove image noise. Certain noise filters, such as gaussian blur, can degrade (e.g., soften) the edges of the image and make it difficult to detect the edges. Thus, edge preserving noise filtering techniques such as bilateral and/or median filtering may be used in conjunction with camera-based scanning techniques to prevent edge degradation.

After preprocessing, the method 400 may proceed to perform detection of a quadrilateral in the captured image. At block 404, an edge in the captured image is detected. The detected edges may correspond to the borders of rectangular objects within the image, such as the edges of business cards, canvas frames, edges of display screens, and so forth. Edges may be detected in any suitable manner. For example, the capture application 110 of the computing device 102 can be implemented to employ the Canny algorithm to detect edges based on sharp changes in contrast.

The Canny algorithm may utilize a configurable threshold that defines an amount of contrast difference sufficient to detect edges. In one embodiment, the threshold used by the Canny algorithm may be adaptive. For example, the threshold for image areas with instances of high edges (e.g., carpet, table, or other textured surface) may be increased. This may reduce the number of false edges detected in these regions. Similarly, the threshold of the image area with instances of relatively low edges may be lowered to increase the chance of detecting edges with relatively small contrast differences.

At block 406, a line corresponding to the detected edge is determined. The capture application 110 can utilize the detected edges to construct corresponding lines. In one implementation, the lines are determined by applying a linear hough transform. Lines may be identified to correct for defects that occur in edge detection, such as incomplete edges, wavy edges, and the like. This step may also include scoring the lines according to a scoring scheme and selectively discarding the lines based on scoring criteria. For example, isolated lines, indeterminate lines, and lines determined not to form a suitable quadrilateral may be discarded or ignored.

Various techniques for scoring and selecting lines are contemplated. In an embodiment, a scoring refinement may be applied to a configurable number of top scoring bars determined by applying the hough transform. In particular, each of the top scoring lines is re-scored and re-classified by scanning the area around the line to find edges with similar orientations. The new score of a particular line is proportional to the number of edges with similar orientation found in the scan area.

In an embodiment, the area scanned for a particular line may be adjusted based on the orientation of the line. For example, for a nearly horizontal line, the scan area may include a configurable number of pixels above/below the pixels of the line. Similarly, for a nearly vertical line, the scan area may include a configurable number of pixels to the left/right of the pixels of the line.

At block 408, the detected edges and lines are used to derive a possible quadrilateral. In particular, the capture application 110 of the computing device 102 may construct a possible quadrilateral using the edges detected at block 404 and the lines determined at block 406. In other words, the lines may be combined to form a quadrilateral in which the lines are detected from similarly oriented edges along a particular direction. The lines may be combined in various ways to form a quadrilateral. In an example, the detected lines may be first processed to find possible quadrilaterals based on finding lines that form opposite sides of the quadrilaterals. The detected lines may be processed again to find possible quadrilaterals based on the lines forming the corners of the quadrilateral. Unlike some previous techniques, the corners may correspond to irregular or inclined quadrilaterals, as well as to corners forming almost right angles.

At block 410, a quadrilateral is selected for scanning. For example, using the set of possible quadrilaterals constructed in block 408, the capture application 110 may be configured to automatically select a quadrilateral according to scoring criteria. This step may involve identifying meaningful line combinations based on a scoring scheme and discarding combinations that are deemed to be meaningless. The scoring scheme may account for various criteria including the location of the image, relative size, content contained within and outside of the quadrilateral, and so forth.

The capture application 110 may use scoring criteria to select a possible quadrilateral from the possible quadrilaterals. In other words, the capture application 110 may use the criteria to score possible quadrilaterals to make a possible or approximate best guess as to the intended target of the captured image. For example, a large quadrilateral near the center of the captured image may be the intended target and may be selected by the capture application 110 based on the score. In contrast, a small quadrilateral located away from the center and having little or no color change may not be a meaningful quadrilateral and may be discarded.

Various heuristics may be used in order to find the optimal quadrilateral. In an embodiment, the scoring scheme may calculate an initial score for a particular quadrilateral, and then optionally modify the initial score to account for (accountfor) quadrilateral features that may increase or decrease the initial score. For example, the initial score may be calculated based on the relative sizes of the quadrilaterals. One method by which the initial score can be calculated based on relative size is to divide a quadrilateral region from the image region and take the square root of the value. Optionally, various quadrilateral features may be considered to modify the initial score. For example, the initial score may be modified with a penalty to increase the score and/or decrease the score.

In one particular example, the initial score may be multiplied by various penalty factors or otherwise adjusted to account for "unwanted" quadrilateral properties. For example, the penalty factor may be configured as a multiplier in the range of 0 to 1. Various penalties and corresponding penalty factors are considered. For example, a penalty may be applied when two lines of a quadrilateral corner extend beyond that corner. The penalty factor may be proportional to how far and how close the line extends beyond the angle.

Another penalty may be based on the angle formed between the two lines of the quadrilateral angle. The penalty factor in this example may be proportional to the difference of the angle and the right angle. Other example penalties may be assessed for quadrilaterals that extend beyond the image boundaries, quadrilaterals that have significant tilt relative to the image boundaries, and/or quadrilaterals that are located away from the image center or otherwise not aligned.

The various enhancement factors may also be multiplied by or otherwise adjusted to the initial score of the quadrilateral to account for the "desired" quadrilateral characteristic. For example, example enhancements may be applied to well-formed quadrilaterals that are located near the center of the image, substantially aligned with the image, and the like. It is noted that the enhancements and penalties described herein may be used individually and/or in combination to implement a scoring scheme that selects a quadrilateral.

At block 412, a determination is made as to whether the selection of the quadrilateral was successful. For example, the capture application 110 may determine when an appropriate quadrilateral has been selected. In some cases, automatic selection of a quadrilateral at block 404 and 410 fails to detect a suitable quadrilateral. For example, the capture application 110 may determine that no possible quadrilateral satisfies a defined scoring threshold. In this case, the selection is determined to be unsuccessful and semi-automatic correction may be initiated.

In another example, the one or more quadrilaterals automatically selected at block 410 may be presented through the user interface 112 of the computing device 102 for approval by the user. The user may then provide input that approves or disapproves of the rendered quadrilateral. In this example, the determination at block 412 may be made based on input provided by the user. If the user approves, the selection is considered successful. If the user does not approve, the selection is determined to be unsuccessful and semi-automatic correction can be initiated.

If the selection is unsuccessful in the above scenario, then at block 414, semi-automatic correction is used to select the quadrilateral based on user input. Semi-automatic correction enables a user to provide input to modify automatic selections that may be made by the capture application 110. For example, the one or more quadrilaterals automatically selected at block 410 may be presented via the user interface 112 of the computing device 102. The presentation may utilize an indicator 120 to show the quadrilateral boundary. In an implementation, at least some of the indicators 120 are selectable by touch or other suitable input to modify the corresponding quadrilateral. The capture application 110 can detect the interaction with the indicator 120 and cause a corresponding modification to the quadrilateral. For example, a user may interact with the indicator 120 to make modifications, such as modifying a quadrilateral size by selecting and dragging a corner point (e.g., a vertex), dragging to move the quadrilateral to a different location, rotating the quadrilateral, and so forth.

Additionally or alternatively, the user may define a custom quadrilateral by selecting one or more corners through interaction with the user interface 112. Again, this interaction may again be through touch or other suitable input. The capture application 110 may be configured to automatically derive a corresponding quadrilateral in response to a user interaction selecting a corner using the techniques described herein. If the user is still dissatisfied with a quadrilateral, the user may select another corner and the capture application 110 may use the two selected corners to derive the corresponding quadrilateral. The process may be repeated a third time by selection of a third corner. If the user is still unsatisfied and the fourth corner is selected, the capture application 110 may output a quadrilateral whose vertices correspond to the four selected corners. In this manner, the user can provide continuous prompts to adjust the quadrilateral automatically selected by the capture application 110.

The user may also operate a selection and drag tool of the user interface 112 to identify areas of the custom quadrilateral. For example, in FIG. 1, a user's interaction to select a view of a person in the viewfinder 114 of the user interface 112 is depicted. After semi-automatic correction, the method 400 proceeds to block 416.

At block 416, processing is performed to expand the perspective of the selected quadrilateral. This step may be performed upon a successful quadrilateral selection as determined at step 412 and after semi-automatic correction at block 414. In general, the unfolding is performed to produce an unbent image corresponding to the selected quadrilateral that is adjusted to a rectangle. For example, the capture application 110 may determine the perspective based on the angles and ratios of the selected quadrilateral. The capture application 110 can also crop the captured image to correspond to the selected quadrilateral. Further, the capture application 110 may correct the perspective by rotating, resizing portions, and otherwise correcting to account for perspective distortion.

At block 418, visual enhancement is applied to the unbent image. As described above in connection with the previous figures, the capture application 110 may apply various enhancements to the unbent image.

Fig. 5 illustrates various components of an example device 500 that can be implemented as any type of portable and/or computer device described with reference to fig. 1 and 2 to implement embodiments of camera-based scanning. Device 500 includes communication devices 502 that enable wired and/or wireless communication of device data 504 (e.g., received data, data that is being received, data scheduled for broadcast, data packets of the data, etc.). The device data 504 or other device content can include configuration settings of the device, media content stored on the device, and/or information associated with a user of the device. Media content stored on device 500 can include any type of audio, video, and/or image data. Device 500 includes one or more data inputs 506 via which any type of data, media content, and/or inputs can be received, such as user-selectable inputs, messages, music, television media content, recorded video content, and any other type of audio, video, and/or image data received from any content and/or data source.

Device 500 also includes communication interfaces 508 that can be implemented as any one or more of a serial and/or parallel interface, a wireless interface, any type of network interface, a modem, and as any other type of communication interface. Communication interfaces 508 provide a connection and/or communication links between device 500 and a communication network by which other electronic, computing, and communication devices communicate data with device 500.

Device 500 includes one or more processors 510 (e.g., any of microprocessors, controllers, and the like) which process various computer-executable instructions to control the operation of device 500 and to implement embodiments of camera-based scanning. Additionally or alternatively, device 500 can be implemented with any one or combination of hardware, firmware, or fixed logic circuitry that is implemented in connection with processing and control circuits which are generally identified at 512. Although not shown, device 500 can include a system bus or data transfer system that couples the various components within the device. A system bus can include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures.

Device 500 also includes computer-readable media 514, such as one or more memory components, examples of which include Random Access Memory (RAM), non-volatile memory (e.g., any one or more of a read-only memory (ROM), flash memory, EPROM, EEPROM, etc.), and a disk storage device. A disk storage device may be implemented as any type of magnetic or optical storage device, such as a hard disk drive, a recordable and/or rewriteable Compact Disc (CD), any type of a Digital Versatile Disc (DVD), and the like. The device 500 may also include a mass storage media device 516.

Computer-readable media 514 provides data storage mechanisms to store the device data 504, as well as various device applications 518 and any other types of information and/or data related to operational aspects of device 500. For example, an operating system 520 can be maintained as a computer application with the computer-readable media 514 and executed on processors 510. The device applications 518 can include a device manager (e.g., a control application, software application, signal processing and control module, code that is native to a particular device, a hardware abstraction layer for a particular device, etc.). The device applications 518 also include any system components or modules to implement embodiments of camera-based scanning. In this example, the device applications 518 include a capture application 522 and an input driver 524 that are shown as software modules and/or computer applications. Alternatively or in addition, the capture application 522 and the input module 524 may be implemented as hardware, software, firmware, or any combination thereof.

The device 500 also includes an audio and/or video input-output system 526 that provides audio data to an audio system 528 and/or provides video data to a display system 530. The audio system 528 and/or the display system 530 can include any devices that process, display, and/or otherwise render audio, video, and image data. These devices may include at least a camera 532 for enabling the capture of video and images. Video signals and audio signals may be communicated from device 500 to an audio device and/or a display device via an RF (radio frequency) link, S-video link, composite video link, component video link, DVI (digital video interface), analog audio connection, or other similar communication link. In an embodiment, the audio system 528 and/or the display system 530 are implemented as external components to device 500. Alternatively, the audio system 528 and/or the display system 530 are implemented as integrated components of the example device 500. Similarly, camera 532 may be implemented as an external or internal component of device 500.

Although embodiments of camera-based scanning have been described in language specific to structural features and/or methods, it is to be understood that the subject of the appended claims is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as example implementations of camera-based scanning.

Claims

1. A method implemented by a computing device, the method comprising:

capturing an image in response to initiating a camera-based scan of the image;

automatically selecting one or more quadrilateral objects in the captured image for scanning;

determining whether the automatic selection of the one or more quadrilateral objects is successful based on whether the one or more quadrilateral objects have associated scores that exceed a predetermined threshold;

in an instance in which the automatic selection is determined to be unsuccessful based on the one or more quadrilateral objects having associated scores that do not exceed the predetermined threshold, making a user modification to the automatic selection using semi-automatic correction based on user input; and

creating one or more scanned documents from the portion of the image corresponding to the selected one or more quadrangular objects, including correcting the portion for perspective distortion of the one or more quadrangular objects in the captured image.

2. The method of claim 1, wherein automatically selecting one or more quadrilateral objects further comprises:

detecting an edge in the captured image based on the disparity; and

determining the one or more quadrilateral objects as a combination of the detected edges.

3. The method of claim 2, further comprising:

detecting the edges using a Canny algorithm and forming potential quadrilaterals from the identified edges using a hough transform; and

applying a scoring scheme to the potential quadrilateral to determine one or more quadrilateral objects.

4. The method of claim 1, further comprising presenting a user interface having a portion for displaying the captured image and another portion for concurrently displaying at least one scanned document created from the captured image.

5. The method of claim 4, wherein the user interface is configured to:

presenting an indicator within the captured image to identify the selected one or more quadrilateral objects;

enabling a user to interact with the indicator to make a user modification of the automatic selection of one or more quadrilateral objects; and

updating and displaying at least one scanned document created according to the user modification in response to user interaction with the indicator.

6. The method of claim 1, further comprising:

performing the creating one or more scanned documents using automatic selection if the automatic selection is successful; and

if the automatic selection is not successful,

performing the creating one or more scanned documents using quadrilateral objects in the captured image selected by the semi-automatic correction.

7. The method of claim 6, wherein using semi-automatic correction comprises:

receiving a user input selecting a location in the captured image; and

a corresponding quadrilateral is automatically generated based on the user input, the selected location being a corner of the corresponding quadrilateral.

8. The method of claim 1, further comprising applying one or more visual enhancements to improve the appearance of one or more scanned documents.

9. The method of claim 1, wherein correcting the portion for perspective distortion comprises adjusting each of the portions of the captured image to correspond to a rectangle.

10. A portable computing device, comprising:

a camera;

one or more processors coupled to the memory; and

a capture application stored in the memory and executable via the one or more processors to cause the portable computing device to perform camera-based scanning of images captured via the camera by at least:

selecting at least one quadrilateral from the captured image for scanning, the at least one quadrilateral corresponding to a rectangular object in the captured image, the selecting comprising:

performing an automatic selection to identify the at least one quadrilateral based on detection of one or more potential quadrilaterals in the captured image;

determining whether the automatic selection is successful by at least determining whether the at least one quadrilateral has an associated score that exceeds a predetermined threshold;

selecting the at least one quadrilateral identified by the automatic selection if the automatic selection is determined to be successful based on the associated score exceeding a predetermined threshold;

initiating semi-automatic correction to obtain a user selection to identify at least one quadrilateral and selecting at least one quadrilateral identified by semi-automatic selection if the automatic selection is determined to be unsuccessful based on the associated score not exceeding the predetermined threshold; and

the captured image is processed to produce a scanned document corresponding to the selected at least one quadrilateral.

11. The portable computing device of claim 10, wherein the capture application is further configured to cause the portable computing device to perform the detection of the one or more potential quadrilaterals by at least:

detecting an edge in the captured image; and

determining a line corresponding to the detected edge; and

a potential quadrilateral is generated by combining the detected edges and lines.

12. The portable computing device of claim 11, wherein detecting an edge comprises applying an algorithm to find a sharp change in brightness corresponding to the edge.

13. The portable computing device of claim 11, wherein determining the line corresponding to the detected edge comprises applying a linear transformation to correct a defect of the detected edge.

14. The portable computing device of claim 10, wherein the performing automatic selection comprises applying a scoring scheme to one or more potential quadrilaterals detected in the captured image, the scoring scheme accounting for at least a location of the quadrilaterals in the image and a relative size of the quadrilaterals in the captured image.

15. The portable computing device of claim 10, wherein processing the captured image to produce a scanned document comprises:

cropping the captured image to correspond to at least one quadrilateral;

unfolding the cropped image to correct for perspective distortion; and

one or more visual enhancements are applied to the cropped image.

16. The portable computing device of claim 10, wherein the capture application is further configured to cause the portable computing device to output a user interface having one portion for displaying the captured image and another portion for simultaneously displaying the scanned document resulting from processing of the captured image.

17. A method implemented by a computing device, the method comprising:

processing the captured image to create a scanned document from a selected portion of the captured image, the selected portion corresponding to a rectangular object in the captured image, the selected portion being automatically selected by the computing device based on the selected portion having an associated score that exceeds a predetermined score threshold;

initiating display of a user interface configured to present a split screen having a captured image on one side of the split screen and a scanned document created from the captured image on the other side of the split screen;

allowing user interaction with the captured image through the one side of the split screen to modify the selected portion of the captured image;

updating the scanned document displayed in the other side of the split screen to reflect modifications made through the user interaction;

wherein the method further comprises:

if the automatic selection is determined to be unsuccessful based on the associated score not exceeding the predetermined score threshold, making a user modification to the automatic selection using semi-automatic correction based on user input.

18. The method of claim 17, wherein the image is captured by a camera of the computing device.

19. The method of claim 18, wherein the computing device is a portable computing device.

20. The method of claim 17, wherein the image is captured by a camera separate from the computing device and transmitted to the computing device configured to process the captured image to create the scanned document.