Tyrel's Blog

Code, Flying, Tech, Automation

Dec 16, 2022

Advent of Code 2022 + End of Year Updates

Advent of Code this year is kicking my butt so I haven't been doing any tech blogging really lately. If you want to follow my progress, I think I might be done as of day 15 - This one seems to be a traveling salesman/knapsack problem related. Here's my repo: https://gitea.tyrel.dev/tyrel/advent-of-code/src/branch/main/2022/python.

I'm not on the computer that runs it, but I've been spending a lot of time playing with Apple's System7 in the BasiliskII emulator. Might have some fun projects with that coming up, but wanted to do some more learning before I start anything. So I have been going through a course on 6052 Assembly programming for the NES, and I'm about 73% done with that, it's really great!

It's By Gustavo Pezzi at Pikuma, if "oldschool" programming floats your boat then I definitely recommend it. It's all programming through making roms with CC65/CA65 assembler, and using FCEUX to see your results, super neat.

I've been picking up some more Go work at work. My current team is sort of disbanding so I'm going to be moving away from doing just Python. It's been a year since I've done Go stuff, since I left Tidelift, so I'm really rusty.

Speaking of Rust, I was trying to do Advent of code in Rust also, and made it TWO whole days in Rust. It's still on my bucket of stuff to learn, but my free time seems to be running out lately, and I have a lot of things on my plate to get done.

 · · ·  python  adventofcode  6502  assembly  rust  go

Dec 06, 2022

Notary Public

Short update today.

I kept meaning to put together a way for people to utilize my notary services, so I finally made https://tyrel.bike/notary to redirect here to my Notary Public page.

I am a North Carolina Notary Public, and love helping my neighbors use my services.

If you're near me and need help - checkout My Notary Page.

 · · ·  notary

Nov 11, 2022

Coffee Gear

I put this up on my wiki a bit ago when a friend asked for coffee recommendations. Hopefully you can enjoy it and learn about some coffee machines you don't know yet.

What I drink

Woke Living Coffee

My favorite coffee is from Pamela and Marcus at https://wokelivingcoffee.com/. They are a couple of local to me roasters in Wake Forest, NC who have connections to a farm in La Dalia, Nicaragua. Not only do they sell great coffee, they are extremely nice and we visit them any chance we get at our local Black Farmers Market.

Gear

Grinders

I prefer burr grinders, there's documented evidence that they are better, I won't get into that here.

Baratza

I use a Baratza Virtuoso that I picked up refurbished. It works great! For drip coffee I will grind at step 28, for aeropress I will grind at 20, and for french press I will set to 30.

Hario

For travel I will bring my Hario Skerton, works great, super easy to clean. I usually don't change the grind setting while traveling so I don't complain about the annoying screw post to set it.

Brewing Machines

Technivorm

For my daily coffee, I have a Moccamaster Technivorm. My friend Andrey recommended it. It works extremely well and very consistent pours.

Chemex

When I'm feeling fancy - or I'm trying a new coffee - I will break out my Chemex. I do a 1:16 ratio of beans to water. I use the Brown Paper Chemex Filters. I appreciate the bleached papers, but prefer unbleached.

Aeropress

For camping, I will bring my Aeropress. It's plastic, lightweight, and to my experience it is indestructible for travel.

French Press

When I roast my own coffee, I like to experience it in multiple brewing methods. I have a Bodum Bean French Press I got over a decade ago as a gift that has worked great. This one has an o-ring to seal the pouring spout, so the temperature chamber inside doesn't leak - a feature I like.

Roasting Machines & Software

FreshRoast SR700

I have a glorified popcorn maker SR700 as a roaster. I'm not the biggest fan of it, the built in software is a mess, the manual buttons on it are a nightmare to use. It works, I can only get consistent coffee out of it if I use OpenRoast. It has a USB port so you can control it with software.

OpenRoast

The OpenRoast software is okay, but I don't have a temperature probe on my SR700, so I can only see "what is set" for temperature, and not get an accurate reading if I were using something like Artisan. I could set up a PID server on an arduino and plug into the usb port, but I feel at that rate I'd rather just buy a new roaster that works with better software. I do like OpenRoast is in Python so I can read and write the code.

 · · ·  coffee gear  coffee beans  coffee

Nov 04, 2022

Neighbor's Water Heater Automation (part 1)

The Setting

My neighbor has a Bosch tankless water heater he put in last year. This water heater has one slight problem that when the power even blips a single second, it gets set back to its lowest temperature of 95°F. My neighbor (we'll call him Frank for this post because Frank Tank is funny) Frank wants to set his heater to 120°F in his house. The problem arises in that his water heater is under the house in his crawl space.

Without an easy way to set his temperature, he needs to crawl under his crawl space and turn a dial EVERY. SINGLE. TIME.

He asked me if I knew of anything off the shelf that would help. I did not. So I said the only logical thing someone like me would have done. "I can totally automate that!"

The Lay Of The Land

He has a Bosch Tronic 6000C, with what appears to be a rotary encoder knob to set the temperature. I only spent a few minutes under his house while planning this and didn't think to any measuring of how many detents to rotate, or how long the dial took to rotate to 120°F, so my first pass of this project is done with estimations.

bosch heater with a temperature 7 segment LED set to 120F

Project Time - Round 1!

I have a few random servos laying around, and an NodeMCU ESP8266 module. I figure these would be the perfect solution! ... note: was half right...

I found some code online by Kumar Aditya that is for the two items in my current parts list (ESP8266 and SG90)

The Original code runs a web server on port 80, and runs a web page with some jQuery (wow it's been a while) to change the angle of the servo. I realized this wasn't what I needed because my servos could only go 180° and I might need to go multiple rotations. I found a youtube video on how to make a SG90 run infinite in either direction, so I did those modifications. I then modified the front end code a little bit.

The new code on the back end was actually exactly the same, even though the effect was slightly different. It would run on port 80, listen at / and /angle, but the angle here was more of direction and speed (a vector?). The way the servo was built, 160° was "Stop", higher than that was rotate clockwise, lower was rotate counter clockwise.

I put three buttons on my page that would be "Lower" (150), "STOP" (160), and "Higher" (170). I then did some standard debouncing and disabling of buttons using setTimeout and such.

For a final touch I added in a range slider for "Time". This held how many seconds after pressing Higher or Lower, that I would send the STOP command again.

This seemed to work relatively well, but I figure I should just use a stepper motor if I was attempting to emulate one this way. I dug around in my closet and was able to find some parts.

blue case servo with a white arm, cables running off screen. sitting on a desk.

Project Time - Round 2!

I was able to rummage up a 28BYJ-48 stepper with control board, and a HW-131 power module.

With these I needed a new library so I stripped the c++ code down to its basics, just getting me a server with the index page for the first pass.

On the Javascript side of things, I then decided I would add a temperature slider, from 90° to 120° (which writing this realize it should be from 95°... git commit...) with a confirmation button, and a small button to initialize down to 95°.

The initialize button would need to trigger an initialization where I rotate counter clockwise an appropriate amount of time (Length TBD) in order to force the rotary encoder dial to always start at a known state of 95. The green submit button sends the new desired temperature as a post.

Server side, I was using a library called AccelStepper. This I set some made up max speeds and steps per rotation, actual values TBD.

I added an endpoint called /setTemperature that takes in a temperature and sets a local temperature variable. From there, I calculate the temperature less 95, to find out how many degrees I need to increase by, for now I'm considering this rotations.

I then apply a multiplier (TBD also... there's a lot of these as you can see!) and call stepper.moveTo() and it actually feels like it's pretty accurate.

The endpoint /initialize runs stepper.moveTo with ten rotations CCW, and then resets the "known location" back to zero (this also runs on power on for now).

webpage controls, title "Water Heater Control", a blue slider with a green button saying "Set Temperature: 90", and red "Initialize to 90" button
blue case servo with a white arm, cables running off screen. sitting on a desk.

In Action

The result of this second round of coding is a lot more that I expect to happen once I can finally get down beneath his house. Frank will lose power, his water heater will reset to 95°F, the NodeMCU will reboot, and reinitialize itself. Frank will then open his browser to the NodeMCU's server, set the desired temperature, and take warm showers.

Version 2 will come once I actually test EVERYTHING. My first quesiton is if a rubber band on a lego tire with a servo wheel adaptor (which I 3d modeled and printed...) will work sufficiently. Programming wise, I need to figure out how many steps is one degree. Is the rotary encoder one degree per detent? Is it a constant speed? Is it like an alarm clock where you can sometimes jump by 10?

Stay tuned to find out the exciting conclusion once I can go down below Frank's house.

blue case servo with a white arm, cables running off screen. sitting on a desk.

Code

The code is currently at https://gitea.tyrel.dev/tyrel/frank_tank.git

 · · ·  automation  c++  esp8266  servo  stepper

Nov 04, 2022

Office Meeting Sensor

NOTES

This post is ported over from my wiki, so the format isn't as storytelling as a blog post could be, but I wanted it here.

Home Assistant Parts

Third Party Plugin Requirements

Zoom Plugin

I followed the Read Me from https://github.com/raman325/ha-zoom-automation#installation-single-account-monitoring and set up a Zoom Plugin for my account, that will detect if I am in a meeting or not.

Pi Zero

I have a tiny project Enclosure box that I dremeled a hole for the GPIO pins in the cover and I then sandwich the Blinkt onto the Pi Zero with another dremeled hole running to the micro usb power, and that's it for hardware.

For software, I installed the python packages for Pimoroni and Blinkt, which came with a lovely set of sample projects. I deleted everything except the mqtt.py file, which I then put my Mosquitto server settings.

I then added a new service in systemd to control the mqtt server

[Unit]
Description=Meeting Indicator

[Service]
Type=simple
ExecStart=/usr/bin/python2 /home/pi/mqtt.py
WorkingDirectory=/home/pi/Pimoroni/blinkt/examples
Restart=always
RestartSec=2

[Install]
WantedBy=sysinit.target

Pleased with the results, and testing by sending some messages over mqtt that changed the color, I then dove into Node-RED

Node-Red

This is my first project using Node-RED, so I'm sure I could optimize better, but I have two entry points, one is from running HomeAssistant app on my mac, which gets me sensor data for my webcam, and the other is the aforementioned Zoom Presence plugin I created. These are Events:State nodes.

When either of these are True, they call first my ceiling light to turn on, which next will then add a msg.payload of

rgb,0,255,0,0
rgb,1,255,0,0
rgb,2,255,0,0
rgb,3,255,0,0
rgb,4,255,0,0
rgb,5,255,0,0
rgb,6,255,0,0
rgb,7,255,0,0

as one string. This leads to a Split, which will in turn, emit a new MQTT message for each line (I split on \n) and turn on all 8 LEDs as red. This is inefficient because I am still using the sample code for the blinkt which requires you to address each LED individually, my next phase I will remove the pin requirement and just have it send a color for all of them at once, one line.

When either of the sensors states are False, I then flow into a Time Range node, in which I check if it's between 9-5 or not. If it is, then I turn all the LEDs Green, and if it's outside 9-5 I just turn the LEDs off. I do not turn OFF the overhead light, in case it was already on. I don't care about the state enough.

I also intentionally trigger at the Office Hours node, which will inherently turn the Green on at 9:01am, and off at 5:01pm. As well as turn on Red for any long standing meeting times I have.

Images

Screenshot of Nodered, with the flow of control for turning on the lights.
wall mounted enclosure with a strip of LED lights.

Oct 17, 2022

Comparing Go GORM and SQLX

Django ORM - My History

I'm not the best SQL developer, I know it's one of my weak points. My history is I did php/mysql from the early 2000s until college. In college I didn't really focus on the Database courses, the class selection didn't have many database course. The one Data Warehousing course I had available, I missed out on because I was in England doing a study abroad program that semester. My first job out of college was a Python/Django company - and that directed my next eight years of work.

Django, if you are unaware, is a MVC framework that ships with a really great ORM. You can do about 95% of your database queries automatically by using the ORM.

entry, created = Entry.objects.get_or_create(headline="blah blah blah")
q = Entry.objects.filter(headline__startswith="What")
q = q.filter(pub_date__lte=datetime.date.today())
q = q.exclude(body_text__icontains="food")

Above are some samples from the DjangoDocs. But enough about Django.

My Requirements

Recently at my job I was given a little bit of leeway on a project. My team is sort of dissolving and merging in with another team who already does Go. My Go history is building a CLI tool for the two last years of my previous job. I had never directly interacted with a database from Go yet. I wanted to spin up a REST API (I chose Go+Gin for that based on forty five seconds of Googling) and talk to a database.

GORM

Being that I come from the Django (and a few years of ActiveRecord) land, I reached immediately for an ORM, I chose GORM. If you want to skip directly to the source, check out https://gitea.tyrel.dev/tyrel/go-webservice-gin. Full design disclosure: I followed a couple of blog posts in order to develop this, so it is in the form explictly decided upon by the logrocket blog post and may not be the most efficient way to organize the module.

In order to instantiate a model definition, it's pretty easy. What I did is make a new package called models and inside made a file for my Album.

type Album struct {
      ID     string  `json:"id" gorm:"primary_key"`
      Title  string  `json:"title"`
      Artist string  `json:"artist"`
      Price  float64 `json:"price"`
}

This tracks with how I would do the same for any other kind of struct in Go, so this wasn't too difficult to do. What was kind of annoying was that I had to also make some structs for Creating the album and Updating the Album, this felt like duplicated effort that might have been better served with some composition.

I would have structured the controllers differently, but that may be a Gin thing and how it takes points to functions, vs pointers to receivers on a struct. Not specific to GORM. Each of the controller functions were bound to a gin.Context pointer, rather than receivers on an AlbumController struct.

The FindAlbum controller was simple:

func FindAlbum(c *gin.Context) {
      var album models.Album
      if err := models.DB.Where("id = ?", c.Param("id")).First(&album).Error; err != nil {
              c.JSON(http.StatusBadRequest, gin.H{"error": "Record not found!"})
      }
      c.JSON(http.StatusOK, gin.H{"data": album})
}

Which will take in a /:id path parameter, and the GORM part of this is the third line there.

models.DB.Where("id = ?", c.Param("id")).First(&album).Error

To run a select, you chain a Where on the DB (which is the connection here) and it will build up your query. If you want to do joins, this is where you would chain .Joins etc... You then pass in your album variable to bind the result to the struct, and if there's no errors, you continue on with the bound variable. Error handling is standard Go logic, if err != nil etc and then pass that into your API of choice (Gin here) error handler.

This was really easy to set up, and if you want to get a slice back you just use DB.Find instead, and bind to a slice of those structs.

var albums []models.Album
models.DB.Find(&albums)

SQLX

SQLX is a bit different, as it's not an ORM, it's extensions in Go to query with SQL, but still a good pattern for abstracting away your SQL to some dark corner of the app and not inline everywhere. For this I didn't follow someone's blog post — I had a grasp on how to use Gin pretty okay by now and essentially copied someone elses repo with my existing model. gin-sqlx-crud.

This one set up a bit wider of a structure, with deeper nested packages. Inside my internal folder there's controllers, forms, models/sql, and server. I'll only bother describing the models package here, as thats the SQLX part of it.

In the models/album.go file, there's your standard struct here, but this time its bound to db not json, I didn't look too deep yet but I presume that also forces the columns to set the json name.

type Album struct {
  ID     int64   `db:"id"`
  Title  string  `db:"title"`
  Artist string  `db:"artist"`
  Price  float64 `db:"price"`
}

An interface to make a service, and a receiver are made for applying the CreateAlbum form (in another package) which sets the form name and json name in it.

func (a *Album) ApplyForm(form *forms.CreateAlbum) {
  a.ID = *form.ID
  a.Title = *form.Title
  a.Artist = *form.Artist
  a.Price = *form.Price
}

So there's the receiver action I wanted at least!

Nested inside the models/sql/album.go file and package, is all of the Receiver code for the service. I'll just comment the smallest one, as that gets my point across. Here is where the main part of GORM/SQLX differ - raw SQL shows up.

func (s *AlbumService) GetAll() (*[]models2.Album, error) {
      q := `SELECT * FROM albums;`

      var output []models2.Album
      err := s.conn.Select(&output, q)
      // Replace the SQL error with our own error type.
      if err == sql.ErrNoRows {
              return nil, models2.ErrNotFound
      } else if err != nil {
              return nil, err
      } else {
              return &output, nil
      }
}

This will return a slice of Albums - but if you notice on the second line, you have to write your own queries. A little bit more in control of how things happen, with a SELECT * ... vs the gorm DB.Find style.

To me this feels more like using pymysql, in fact its a very similar process. (SEE NOTE BELOW) You use the service.connection.Get and pass in what you want the output bound to, the string query, and any parameters. This feels kind of backwards to me - I'd much rather have the order be: query, bound, parameters, but thats what they decided for their order.

Conclusion

Overall, both were pretty easy to set up for one model. Given the choice I would look at who the source code is written for. If you're someone who knows a lot of SQL, then SQLX is fine. If you like abstractions, and more of a "Code as Query" style, then GORM is probably the best of these two options.

I will point out that GORM does more than just "query and insert" there is migration, logging, locking, dry run mode, and more. If you want to have a full fledged system, that might be a little heavy, then GORM is the right choice.

SQLX is great if what you care about is marshalling, and a very quick integration into any existing codebase.

Notes

I sent this blog post to my friend Andrey and he mentioned that I was incorrect with my comparision of sqlx to pymysql. To put it in a python metaphor, "sqlx is like using urllib3, gorm is like using something that generates a bunch of requests code for you. Using pymysql is like using tcp to do a REST request." Sqlx is more akin to SqlAlchemy core vs using SqlAlchemy orm. Sqlx is just some slight extensions over database/sql. As the sort of equivalent to pymysql in Go is database/sql/driver from the stdlib.

 · · ·  go  sql  python  gorm  sqlx

Oct 16, 2022

New Blog - Pelican!

If you have read the previous post, and then looked at this one, there are a LOT of changes that happened. I was recently exploited and had heysrv.php files everywhere, so I have decided to forego wordpress for now. I am now using Pelican!

It's very sleek, and only took me a few hours to port my Wordpress export to Pelican reStructuredText format.

All I have to do is run invoke publish and it will be on the server. No PHP, no database. All files properly in their right places.

It comes with your standard blogging experience: Categories, Tags, RSS/Atom feeds, etc. You need to set up Disqus — which I probably won't — in order to get comments though.

I'm pleased with it. I have posts go under YYYY/MM/slug.html files, which I like for organization. Posting images is easy, I just toss it under content/images/YYYY/MM/ with date for organization.

 · · ·  python  pelican

Oct 13, 2022

Scrollbar Colors

Was talking to someone about CSS Nostalgia and "back in my day" when scrollbar colors came up.

/* For Chromium based browsers */
::-webkit-scrollbar {
  background: #2A365F;
}
::-webkit-scrollbar-thumb {
  background: #514763;
}

/* For Firefox */
html {
  scrollbar-color: #514763 #2A365F;
}

Firefox and Chrome have different selectors, so in order to support the majority of browsers, you need both.

Chrome with a blue/purple scrollbar

Chrome with a blue/purple scrollbar

Safari with a blue/purple scrollbar

Safari with a blue/purple scrollbar

Firefox with a blue/purple scrollbar

Firefox with a blue/purple scrollbar

 · · ·  css

Jun 02, 2022

2016 Monitoring a CO2 tank in a Lab with a raspberry pi

This was written in 2017, but I found a copy again, I wanted to post it again.

The Story

For a few months last year, I lived around the block from work. I would sometimes stop in on the weekends and pick up stuff I forgot at my desk. One day the power went out at my apartment and I figure I would check in at work and see if there were any problems. I messaged our Lab Safety Manager on slack to say "hey the power went out, and I am at the office. Is there anything you'd like me to check?". He said he hadn't even gotten the alarm email/pages yet, so if I would check out in the lab and send him a picture of the CO2 tanks to make sure that nothing with the power outage compromised those. Once I had procured access to the BL2 lab on my building badge, I made my way out back and took a lovely picture of the tanks, everything was fine.

The following week, in my one on one meeting with my manager, I mentioned what happened and she and I discussed the event. It clearly isn't sustainable sending someone in any time there was a power outage if we didn't need to, but the lab equipment doesn't have any monitoring ports.

Operation Lab Cam was born. I decided to put together a prototype of a Raspberry Pi 3 with a camera module and play around with getting a way to monitor the display on the tanks. After a few months of not touching the project, I dug into it in a downtime day again. The result is now we have an automated camera box that will take a picture once a minute and display it on an auto refreshing web page. There are many professional products out there that do exactly this, but I wanted something that has the ability to be upgraded in the future.

Summary of the Technical Details

Currently the entire process is managed by one bash script, which is a little clunky, but it's livable. The implementation of the script goes a little like:

  1. Take a picture to a temporary location.
  2. Add a graphical time stamp.
  3. Copy that image to both the currently served image, and a timestamped filename backup.

The web page that serves the image is just a simple web page that shows the image, and refreshes once every thirty seconds.

The Gritty Technical Details

The program I'm using to take pictures is the raspistill program. If I had written my script to just call raspistill every time I wanted a picture taken, it would have potentially taken a lot longer to save the images. This happens because it needs to meter the image every time, which adds up. The solution is Signal mode and turning raspistill into a daemon. If you enable signal mode, any time you send a SIGUSR1 to the process, the backgrounded process will then take the image.

Instead if setting up a service with systemd, I have a small bash script. At the beginning, I run a ps aux and check if raspistill is running, if it's not, I start up a new instance of raspistill with the appropriate options and background it. The next time this script runs, it will detect that raspistill is running and be almost a no-op.

After this, I send a SIGUSR1 (kill -10) to take the picture which is then saved, un-timestamped. Next up I call imagemagick's convert on this image, I crop out the center (so I couldn't use raspistill's "-a 12" option) because all I care about is a 500x700 pixel region.

This is then copied to the image that is served by the web page, and also backed up in a directory that nginx will listen to.

Leds on a CO2 tank
 · · ·  Linux  raspberrypi

Jun 01, 2022

Writing an EPUB parser. Part 1

Parsing Epubs

Recently I've become frustrated with the experience of reading books on my Kindle Paperwhite. The swipe features, really bother me. I really like MoonReader on Android, but reading on my phone isn't always pleasing. This lead me to look into other hardware. I've been eyeing the BOOX company a while ago, but definitely considering some of their new offerings some time. Until the time I can afford the money to splurge on a new ebook reader, I've decided to start a new project, making my own ebook reader tools!

I'm starting with EPUBs, as this is one of the easiest to work with. At its core, an EPUB is a zip file with the .epub extension instead of .epub with many individual XHTML file chapters inside it. You can read more of how they're structured yourself over at FILEFORMAT.

The tool I've chosen for reading EPUBs is the Python library ebooklib. This seemed to be a nice lightweight library for reading EPUBs. I also used DearPyGUI for showing this to the screen, because I figured why not, I like GUI libraries.

My first task was to find an EPUB file, so I downloaded one from my calibre server. I convert all my ebook files to .epub and .mobi on my calibre server so I can access them anywhere I can read my OPDS feed. I chose Throne of Glass (abbreviating to TOG.epub for rest of post). Loading I launched Python, and ran

>>> from ebooklib import epub
>>> print(book := epub.read_epub("TOG.epub")

This returned me a <ebooklib.epub.EpubBook object...> , seeing I had an EpubBook I ran a dir(book) and found the properties available to me

['add_author', 'add_item', 'add_metadata', 'add_prefix',
 'bindings', 'direction', 'get_item_with_href', 'get_item_with_id',
 'get_items', 'get_items_of_media_type', 'get_items_of_type',
 'get_metadata', 'get_template', 'guide',
 'items', 'language', 'metadata', 'namespaces', 'pages', 'prefixes',
 'reset', 'set_cover', 'set_direction', 'set_identifier', 'set_language',
 'set_template', 'set_title', 'set_unique_metadata', 'spine',
 'templates', 'title', 'toc', 'uid', 'version']

Of note, the get_item_with_X entries caught my eye, as well as spine. For my file, book.spine looks like it gave me a bunch of tuples of ID and a "yes" string of which I had no Idea what was. I then noticed I had a toc property, assuming that was a Table of Contents, I printed that out and saw a bunch of epub.Link objects. This looks like something I could use.

I will note, at this time I was thinking that this wasn't the direction I wanted to take this project. I really wanted to learn how to parse these things myself, unzip, parse XML, or HTML, etc., but I realized I needed to see someone else's work to even know what is going on. With this "defeat for the evening" admitted, I figured hey, why not at least make SOMETHING, right?" I decided to carry on.

Seeing I was on at least some track, I opened up PyCharm and made a new Project. First I setup a class called Epub, made a couple of functions for setting things up and ended up with

class Epub:
    def __init__(self, book_path: str) -> None:
        self.contents: ebooklib.epub.EpubBook = epub.read_epub(book_path)
        self.title: str = self.contents.title
        self.toc: List[ebooklib.epub.Link] = self.contents.toc

I then setup a parse_chapters file, where I loop through the TOC. Here I went to the definition of Link and saw I was able to get a href and a title, I decided my object for chapters would be a dictionary (I'll move to a DataClass later) with title and content. I remembered from earlier I had a get_item_by_href so I stored the itext from the TOC's href: self.contents.get_item_with_href(link.href).get_content(). This would later prove to be a bad decision when I opened "The Fold.epub" and realized that a TOC could have a tuple of Section and Link, not just Links. I ended up storing the item itself, and doing a double loop in the parse_chapters function to loop if it's a tuple.

def parse_chapters(self) -> None:
    idx = 0
    for _item in self.toc:
        if isinstance(_item, tuple):  # In case is section tuple(section, [link, ...])
            for link in _item[1]:
                self._parse_link(idx, link)
                idx += 1
        else:
            self._parse_link(idx, _item)
            idx += 1

_parse_link simply makes that dictionary of title and item I mentioned earlier, with a new index as I introduced buttons in the DearPyGUI at this time as well.

def _parse_link(self, idx, link) -> None:
    title = link.title
    self.chapters.append(dict(
        index=idx,
        title=title,
        item=self.contents.get_item_with_href(link.href)
    ))

That's really all there is to make an MVP of an EPUB parser. You can use BeautifulSoup to parse the HTML from the get_body_contents() calls on items, to make more readable text if you want, but depending on your front end, the HTML may be what you want.

In my implementation my Epub class keeps track of the currently selected chapter, so this loads from all chapters and sets the current_text variable.

def load_view(self) -> None:
    item = self.chapters[self.current_index]['item']
    soup = BeautifulSoup(item.get_body_content(), "html.parser")
    text = [para.get_text() for para in soup.find_all("p")]
    self.current_text = "\n".join(text)

I don't believe any of this code will be useful to anyone outside of my research for now, but it's my first step into writing an EPUB parser myself.

The DearPyGUI steps are out of scope of this blog post, but here is my final ebook Reader which is super inefficient!

final ebook reader, chapters on left, text on right

I figure the Dedication page is not as copywrited as the rest of the book, so it's fair play showing that much. Sarah J Maas, if you have any issues, I can find another book for my screenshots.

 · · ·  epub  python
← Previous Next → Page 3 of 7