AzureSearch_JFK_Files | repo contains the sample code | Azure library
kandi X-RAY | AzureSearch_JFK_Files Summary
kandi X-RAY | AzureSearch_JFK_Files Summary
This repo contains the sample code of the Azure Search and Cognitive Services used to provide insights and analysis around the JFK Files.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of AzureSearch_JFK_Files
AzureSearch_JFK_Files Key Features
AzureSearch_JFK_Files Examples and Code Snippets
Community Discussions
Trending Discussions on AzureSearch_JFK_Files
QUESTION
Just looking for guidance or even a general outline on approach here.
I am using azure search to OCR a batch of pdfs. I have turned on hit highlighting and I am successfully getting results back there that I am looping through / displaying in my view for the end user. I was looking on expanding that functionality to show the pdf images with the highlighting on the images themselves like in the JFK azure example. I am not proficient in react and seem to be getting lost there.
I am assuming I need to save off the OCR images to a data store for reference using the normalized_images that are created? I do have pdfs locally I can load but assume the OCR images maybe different. Have turned on GeneratedNormalizedImagesPerPage and turned on cache which creates files in my storage account.
Then I assume I need to pull the associated image, display it, use the highlight results and pull a corresponding bounding box where the phrase was detected? Problem with that approach is that I do not see any association between the highlight hit and the location (bounding box) of the hit nor the associated image file the hit was on.
Probably way off on approach here but any guidance is appreciated.
Edit 1 I did noticed the items on this page in the JFK example: https://github.com/microsoft/AzureSearch_JFK_Files/tree/master/JfkWebApiSkills/JfkWebApiSkills Would trying to replicate the ImageStore (so those are stored in my storage account) and then the HocrGenerator (appears to handle points in a doc) into my skillset for my index be the approach?
...ANSWER
Answered 2021-Feb-08 at 17:56There are a few steps here:
you need to save the layoutText from the OCR skill somewhere the UI can access it. The JFK Files demo converts it to a HOCR (to display in the UI) and saves it in index as a field in the index so that it is retrieved in the search results. HOCR isn't necessary and you may find it more efficient to store the layout in blobs using a knowlege store object projection.
save the extracted images into blob storage using a file projection into the knowledge store. Keep in mind that the images may be resized in the process and the coordinates will match the resized image saved to the store. If you want to map the coordinates to the original image see this.
At search time, map the highlight to the the metadata. You will find this code in the nodejs frontend, however it may be simpler to follow in the original demo by following the code here. Essentially you just find the first occurrence of the highlighted word in the metadata, display the associated image, and calculate the bounding region of the word.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install AzureSearch_JFK_Files
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page