note-parser | Parse notes with javascript | Parser library
kandi X-RAY | note-parser Summary
kandi X-RAY | note-parser Summary
Parse notes with javascript
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of note-parser
note-parser Key Features
note-parser Examples and Code Snippets
Community Discussions
Trending Discussions on note-parser
QUESTION
I am creating a parser for the .one
file extension, which when finished I will add to the Apache Tika project.
Here is the APL 2.0 licensed Open Source project I'm creating: https://github.com/nddipiazza/onenote-parser-java
I used the specification document here: https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-one/73d22548-a613-4350-8c23-07d15576be50
As a starting point, I ported over the code from this open source C++ project: https://github.com/dropbox/onenote-parser
I have gotten a long way in the parsing of the documents, but I've hit a road block.
Here is the OneNote file I'm using to parse: https://drive.google.com/file/d/1uROTEnKeBKU08CG_K5zdDTGHa178LgBK/view?usp=sharing
I am unable to view the Section1TextArea1 and Section1TextArea2 in my parsed results. So I'm missing some sort of key data parsing element or something.
It is definitely in the OneNote file itself. I can see it in the Hex viewer:
Here is the JSON parse output: https://gist.github.com/nddipiazza/02d2252d357b3b02a6b9ab1050474267
I feel like the spec document is missing some very important information needed in order to parse this proprietary format.
What major element(s) am I missing resulting in me not getting the actual text content?
...ANSWER
Answered 2019-Dec-07 at 17:26I figured it out. It was a matter of understanding that property values in OneNote can have either:
- Binary contents
- Ascii text contents
- UTF-16LE contents.
There is a variety of them sprinkled throughout.
Also I just went ahead and parse the entire root file tree. It will result in lots of duplicate text but i don't really care.
The project is updated with test cases and the fix here: https://github.com/nddipiazza/onenote-parser-java/tree/master/src/main/java/org/apache/tika/onenote
UPDATE:
Just created the apache tika PR: https://github.com/apache/tika/pull/300
QUESTION
I am trying to write a parser for OneNote files.
I would like to get a complete list of all the Property IDs to Property Name.
Here is what I have so far: https://github.com/nddipiazza/onenote-parser-java/blob/5e291a7e6666b4ee62e0f13d9422ca5b4f223e6f/src/main/java/org/apache/tika/onenote/Properties.java
But I cannot find various other ones that appear in documents, such as 0x348b
.
Where can I find a complete, definitive list?
...ANSWER
Answered 2019-Nov-23 at 14:14Ah I just didn't google hard enough.
Here it is:
My list was complete from the above link.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install note-parser
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page