Do you want to easily convert HTML to Markdown? Simply by copy & paste?
I've a series of HTML blog posts written in HTML with TinyMCE which I want to convert to Markdown Extra. There are several scripts for this task:
- html2text of Aaron Swarz, a "pioneer of the Internet" who sadly passed away
- Markdownifier
- Markdownify (Text in German)
Neither really satisfied me. Furthermore, I want to convert my blog posts with saving all of them to a file. I've written a little JavaScript script.
As I'm using regex the conversion is limited. The HTML must be well formed without errors. Simply, <p> <br> <strong> <em> <h1> <h2> <h3> <h4> <h5> <a> <li> <table>
are replaced with the Markdown equivalent. Thus, a conversion using this script is not fully automatic. But for my task perfect.
Converter
Source Code
Source code on GitHub.
The JavaScript source file.
License: GPL 3.
Appendix
Japanse Overview
There is an overview on a Japanese site.
Google Translated Text
Libraries I want to know when converting HTML to Markdown 11 selections + α
We examined a few libraries and tools that convert HTML to Markdown notation.
<h2> Foo </ h2> → ## Foo
You don't want to be embedded in the app, but you just want to re-edit the article before moving to Markdown with WordPress, or when the post text changes from Markdown to HTML after conversion due to something wrong. It is a request to return to. I found a lot if I was investigating.
Library 11 + Tools & Editors Comparison
Library feature list.
name | language | license | Extended notation |
---|---|---|---|
reMarked.js | JavaScript | MIT | ○ |
to-markdown | JavaScript | MIT | × |
HTML2 Markdown | JavaScript | unknown | × |
Simple HTML to Markdown Extra converter with regex | JavaScrpt | GPLv3 | ○ |
Markdownify | PHP | LGPL | ○ |
HTML To Markdown for PHP | PHP | MIT | × |
html2text | Python | GPLv3 | × |
url2markdown (using html2text) | Python | GPLv3 | × |
reverse_markdown | Ruby | WTFPL | ○ |
html2markdown | Ruby | MIT | × |
Html2 Markdown | C # | unknown | × |
Pandoc (conversion tool for various formats) | - | - | ○ |
Markable (Online Editor) | - | - | × |
※ There is something that I tried for a while, but it is unverified that conversion can be done without problems properly. There are various extension notations, so I changed it to "○" if it felt like it was a bit compatible.
What I want to pay attention to the conversion behavior is that if you can mix Markdown + HTML like the body of WordPress, the information will disappear if the script tag etc. is not output as HTML as it is after conversion. A proper (?) Library contains an action to delete script tags that do not have meaning as a document, and conversely, a simple conversion library simply replaces HTML tags and outputs tags that can not be handled as it is It is done.
In the following, what is expressed as early Markdown is Markdown 1.0.1 that does not include an extension . I do not know if the library is actually 100% compliant.
reMarked.js
- JavaScript, MIT
- demo
reMarked.js is a JavaScript library that supports not only the initial Markdown notation but also the table. The demo site can convert specified HTML text, which is useful when you want to convert a little. It is used by pronama.jp/md .
A variety of conversion options are also featured (not available on the demo site). By default, the contents of script are not output, but you can also specify optional tags to output or not.
var reMarker = new reMarked ({unsup_tags : {ignore : " " }}); // No tag not output | |
var markdown = reMarker . render ( document . body ); |
to-markdown
- JavaScript, MIT
- demo
to-markdown is a simple JavaScript library. The site has a bug related to blockquate, but unfortunately it seems that development has stopped.
Simple HTML can convert it without any problems, and you can try it on the demo site right away. The content that can not be converted is output as it is.
HTML2 Markdown
- JavaScript, license unknown
- No demo site
It seems that conversion is done using HTML Parser library. I have not tried.
HTML to Markdown converter using regex
- JavaScript, GPLv3
- demo
A simple JavaScript code is available that converts HTML to Markdown with regular expressions. The function is limited, but the extension notation is included.
Markdownify
- PHP, LGPL
- No demo site
- http://markdownrules.com/ seems to be a conversion service using Markdownify
Markdownify is a library that supports Markdown Extra . You can also convert table and HTML attribute values to Markdown.
<? php | |
$ converter = new Markdownify \ ConverterExtra ; | |
$ converter- > parseString ( ' <h1 id = "md"> Heading </ h1> ' ); | |
// Returns: # Heading {#md} | |
? > |
The initial Markdown and a class to convert to the extended Markdown are provided. Content that can not be converted is output as it is, but content such as script is not output. Although the conversion option is also provided when you read the code, you need to modify the code to output the content that can not be converted as it is.
It was an impression that most supported the extended notation (though I do not know how well it can be converted). However, development has stopped in the state where the official site is down.
HTML To Markdown for PHP
- PHP, MIT
- No demo site
HTML To Markdown for PHP is a PHP library that supports early Markdown. It has less features but also conversion options.
html2text
- Python, GPLv3
- demo
html2text is an old Python library. It corresponds to the early Markdown notation. There is a demo for converting web pages from URLs, but it almost feels like a failure.
Content that can not be converted is deleted. Again, reading the code provides conversion options, but not all output settings.
import html2text | |
h = html2text.HTML2Text () | |
h.ignore_links = True | |
print h.handle ( " <p> Hello, <a href='http://earth.google.com/'> world </a>! " ) |
It is an impression that there are a lot of tools that use html2text or because of an old library. Tools for OS X Markdown Service Tools were also used.
url2markdown
- Python, GPLv3
- demo
url2markdown is a library to get web page from URL and convert it to Markdown. The conversion part is html2text. However, I use Readability's Parser API for fetching web pages, and the results are old and information is lost.
reverse_markdown
- Ruby, WTFPL
- No demo site
reverse_markdown is a gem-style Ruby library. I have not actually tried it, but according to the explanation, table is also supported.
html2markdown
- Ruby, MIT
- No demo site
html2markdown is also a gem-style Ruby library. Only simple conversion is supported, as described in "Simple html to Markdown". It was also in Bitbucket .
Html2 Markdown
- C #, license unknown
- No demo site
Html2Markdown is a C # library, also published by Nuget . Looking at the code we have a fairly simple conversion.
Pandoc (conversion tool)
Pandoc is a tool that can convert various document formats. HTML to Markdown is also possible. It supports various Markdown notations. However, the result of table was subtle with missing columns. The details of Pandoc are as follows.
HTML-Supports a variety of formats! Get to know the document conversion tool Pandoc-Qiita
Markable (Online Editor)
Markable is an online Markdown editor. I will introduce HTML to Markdown conversion function. Once you register your account, you can use the HTML import feature. The conversion from HTML seems to support only the initial Markdown notation, but the Markdown to HTML conversion function also supports extended notation such as table.
In the online editor examined, only Markable supported conversion from HTML.
in conclusion
At first I found only a few libraries and checked their operation, but if you search frequently you will come out a lot. Everyone's making it.
If you want to convert a bit, reMarked.js demo site is useful. ReMarked.js itself seems to be the end goal to integrate with the existing WYSIWYG HTML editor.
Please let me know if you have any other information or mistakes.