Why is there No Simple Text File of the Standard Works?
Moderator: EmilyHedrick
- marianomarini
- Senior Member
- Posts: 619
- Joined: Sat Jan 19, 2008 3:13 am
- Location: Vicenza. Italy
Re: Why is there No Simple Text File of the Standard Works?
There are also free software to extract simple text from PDF file.
Give a look to Internet.
Give a look to Internet.
-
- Community Moderators
- Posts: 4038
- Joined: Thu Jan 25, 2007 11:32 am
- Location: Dundee, Oregon, USA
Re: Why is there No Simple Text File of the Standard Works?
Marianomarini, thanks for that mention. Perchance, are you aware of any of those free programs that can unravel the two-column format of the PDF files? Last I checked, the downloadable scripture PDF files are in two-column format, the same as the paper scriptures. The pdftotext tool I tried early this year did extract plain text, but the text was somewhat tangled due to the two-column format. To have usable plain text would have still required manual or automated untangling.
- marianomarini
- Senior Member
- Posts: 619
- Joined: Sat Jan 19, 2008 3:13 am
- Location: Vicenza. Italy
Re: Why is there No Simple Text File of the Standard Works?
You can check VeryPDF, it can maintain even two column in the text file . I'll give a look.
-
- New Member
- Posts: 14
- Joined: Sun Mar 06, 2011 6:28 am
Re: Why is there No Simple Text File of the Standard Works?
No one answered the question 'why'. Lacking an answer from the church, I'd guess that it has to do with control of the copyrighted text.
There are lots of ways to 'hack' a text copy of the scriptures based on files you can download from lds.org/media-library/ebooks I will explain on relatively easy method below that doesn't require any programing nor scripting knowledge. But I will preface that with my worry that it is hard to guarantee the accuracy of the output. I would hate, for example, to generate an ascii copy of the text, and have there be errors in it, especially if sharing it with someone else. Anyway...
First, grab the epub version of the scriptures from lds.org. It appears that the epub versions don't have footnotes etc. to start with, which is what seems to be desired here.
Second, get a free and open-source program called 'calibre' . It is an e-reader/ebook manager that many people apparently like. I was able to easily install it on my linux based computer. It appears they have versions for macOS and Windows as well.. I installed in LInux using my distribution-provided package, something that the folks at calibre apparently recommend against, but it worked fine for me for this purpose.
Use the command line as follows:
where input and output are the base filenames. For example, to convert the book of mormon:
That will take the epub file book-of-mormon-eng.epub (downloaded from lds.org) and output a text file of the same name except .txt extension instead of .epub
You should be able to do that for each of the 4 books of the standards works, resulting in 4 text files. You could merge those 4 resulting text files into a single file using various means.
Note if you try to use the .pdf file as the source in the above conversion, the results may not be what you want, becuase it tries to deal with all the footnotes; and it appears there may be other problems as well.
Note I read about this calibre method here: https://askubuntu.com/questions/102458/ ... ext#102475
I hope this helps someone.
There are lots of ways to 'hack' a text copy of the scriptures based on files you can download from lds.org/media-library/ebooks I will explain on relatively easy method below that doesn't require any programing nor scripting knowledge. But I will preface that with my worry that it is hard to guarantee the accuracy of the output. I would hate, for example, to generate an ascii copy of the text, and have there be errors in it, especially if sharing it with someone else. Anyway...
First, grab the epub version of the scriptures from lds.org. It appears that the epub versions don't have footnotes etc. to start with, which is what seems to be desired here.
Second, get a free and open-source program called 'calibre' . It is an e-reader/ebook manager that many people apparently like. I was able to easily install it on my linux based computer. It appears they have versions for macOS and Windows as well.. I installed in LInux using my distribution-provided package, something that the folks at calibre apparently recommend against, but it worked fine for me for this purpose.
Use the command line as follows:
Code: Select all
ebook-convert input.epub output.txt
where input and output are the base filenames. For example, to convert the book of mormon:
Code: Select all
ebook-convert book-of-mormon-eng.epub book-of-mormon-eng.txt
You should be able to do that for each of the 4 books of the standards works, resulting in 4 text files. You could merge those 4 resulting text files into a single file using various means.
Note if you try to use the .pdf file as the source in the above conversion, the results may not be what you want, becuase it tries to deal with all the footnotes; and it appears there may be other problems as well.
Note I read about this calibre method here: https://askubuntu.com/questions/102458/ ... ext#102475
I hope this helps someone.
-
- New Member
- Posts: 14
- Joined: Sun Mar 06, 2011 6:28 am
Re: Why is there No Simple Text File of the Standard Works?
Update to my previous post:
The epub book of Mormon file I downloaded from lds.org today 30Oct2017 has no footnotes and converted beautifully to plain text as per my above post.
However the other epub scripture files on lds.org today all do have footnotes (contrary to my poor assumption). And the table entries for Doctrine and Covenants, Pearl of Great Price, and Triple combination all link to the same Triple combination epub file.
I'm in the process of trying to convert the Holy Bible epub to text but it is taking my less- than- powerful PC a very long time. I suspect it will try to handle the footnotes so may not produce the clean plain text.
I think there may be other non-lds resources out on the internet to get plain text versions of the Bible including the king James version.
For doctrine and covenants and pearl of Great Price, you may have to hack a little harder to strip out the footnotes etc
For what it is worth.
The epub book of Mormon file I downloaded from lds.org today 30Oct2017 has no footnotes and converted beautifully to plain text as per my above post.
However the other epub scripture files on lds.org today all do have footnotes (contrary to my poor assumption). And the table entries for Doctrine and Covenants, Pearl of Great Price, and Triple combination all link to the same Triple combination epub file.
I'm in the process of trying to convert the Holy Bible epub to text but it is taking my less- than- powerful PC a very long time. I suspect it will try to handle the footnotes so may not produce the clean plain text.
I think there may be other non-lds resources out on the internet to get plain text versions of the Bible including the king James version.
For doctrine and covenants and pearl of Great Price, you may have to hack a little harder to strip out the footnotes etc
For what it is worth.
-
- Community Moderators
- Posts: 4038
- Joined: Thu Jan 25, 2007 11:32 am
- Location: Dundee, Oregon, USA
Re: Why is there No Simple Text File of the Standard Works?
Excellent! Thank you for posting those pointers and instructions.
-
- New Member
- Posts: 14
- Joined: Sun Mar 06, 2011 6:28 am
Re: Why is there No Simple Text File of the Standard Works?
At least with epub it should be easier to strip out the footnotes and superscripts--and you don't have to deal with the two column format of the PDF files. Epub is basically a zipped archive of html files, based on XML. I think python ( and other languages) should be very capable to parse out the text.
-
- Community Moderators
- Posts: 4038
- Joined: Thu Jan 25, 2007 11:32 am
- Location: Dundee, Oregon, USA
Re: Why is there No Simple Text File of the Standard Works?
Based on the epub files reportedly being HTML on the inside, another possible solution would be the html2text utility available for Linux and likely for other systems.ross.rick wrote:At least with epub it should be easier to strip out the footnotes and superscripts--and you don't have to deal with the two column format of the PDF files. Epub is basically a zipped archive of html files, based on XML. I think python ( and other languages) should be very capable to parse out the text.
- aebrown
- Community Administrator
- Posts: 15155
- Joined: Tue Nov 27, 2007 8:48 pm
- Location: Draper, Utah
Re: Why is there No Simple Text File of the Standard Works?
These are helpful techniques, and it's nice that you shared them. But everyone should be aware that content derived from copyrighted content is still under the same copyright restrictions as the original. So text obtained this way cannot be used in apps or published in other ways without specific permission from the Church Intellectual Property Office. But if it's for personal use, then you can use the derived text in the same ways you might use the original text.ross.rick wrote:No one answered the question 'why'. Lacking an answer from the church, I'd guess that it has to do with control of the copyrighted text.
There are lots of ways to 'hack' a text copy of the scriptures based on files you can download...
-
- New Member
- Posts: 1
- Joined: Sat Jul 27, 2019 5:31 pm
Re: Why is there No Simple Text File of the Standard Works?
brettbkg wrote:I wish I had the skills to do write such a script. I work in healthcare and I'm not deeply technical, so I'm afraid I just had to do it the long and hard way. The work has been done (It took a few hours). I'm happy to make it available to others in my situation who can't write scripts to automate it (I could link to it in a public folder in my Dropbox) -- I just don't know if anyone would frown on that from a legal/copyright standpoint.
I would love a copy of your spreadsheet if you are sharing. : ) My email is nahomie10@gmail.com