Rust Programming Language RustConf 2021, 15 Sep 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: RustConf 2021 - Identifying Pokémon Cards by Hugo Peixoto

Description

Identifying Pokémon Cards by Hugo Peixoto

I want my Pokémon TCG inventory to be digitized so I can search my cards and know which ones I still need. After building a tool to manually enter the cards, I decided to explore Computer Vision algorithms to automate part of the process. In this talk we'll go over some common algorithms used in this area and the roadblocks that I hit while learning.

A

Hi everyone, my name, is luke beshot and I'm here to talk to you about my pokemon collection.

A

I've been collecting these pokemon cards for a while now, but I never really took the time to keep them organized, and recently I started playing the game again, so it has become a bit of a problem because every time I want to build a new deck, I need to know if I have the necessary cards to do it or if I need to go out and buy something.

A

So I decided to fix this. I started by building a website where I could go and manually enter how many cards I have of each and that would get saved to a database. Now this worked fine, but I wanted to try something different. I wanted to have a webcam pointed at my desk and put a card in there and have the software automatically detect, which card it was and start that in the database.

A

So this talk is going to be about the algorithms that I used and the problems that I faced.

A

Here's a very high level overview of the whole process. We start by grabbing a frame from the webcam's video stream. Then we take that frame and we need to extract the card image from it.

A

And finally, we go through a data set of known cards and search for the one that's most similar to the card image we just extracted and we add that to the database.

A

Initially, I was working with frames like this one on the left and you can see that there are some some shadows and the wood pattern of the desk causes some noise as well. So these things were making my life a bit harder than I wanted. So I moved to a more controlled environment like the picture on the right and there you have a white background and there are no shadows and things worked much better under these conditions.

A

Now I mentioned that I needed a data set so there's this website pokemon cards.com, and they have pretty much all the cards in there that were ever printed in english. So since they don't have an api. What I did was I built a ruby script that just scraped the whole thing and downloaded every card image to my computer, so I ended up with about 14 000 cards.

A

Now, let's focus on detecting and extracting the card image from the frame we're working with 1080p color images, you might think of a color image as having three channels, the red, the green and the blue one. Now. What I found is that most computer vision, algorithms only really care about the brightness of the pixels or how light or dark each pixel is. This is equivalent of to working with grayscale images and that's what we're going to do now.

A

Once we have this grayscale image, we want to crop it to the boundaries of the card, while also fixing its perspective, so that there's no rotation or skew we wanted to become as close to a 2d scan of the card as possible, and the first step that I did to do this was to apply a sobel operator.

A

The circle operator highlights the edges of the image, both the outline of the card and the edges in the drawing itself, and it removes any sections of solid color since we're looking for the boundaries of the card. This helps highlight the edges, particularly in cases where the card doesn't have a thick black border like this one around it. Some cards have a yellow border, and this helps to normalize the levels a bit.

A

Now, how does the simple operator work? It's a kernel based image, filter and what that means. It's a image filter that follows a specific structure, so you calculate each pixel independently of each other, and you look at the respective pixel on the source image plus a small window around it and in this case I'm showing a three by three window, but it can be a window of any size.

A

Then you take the pixels in that window and multiply them by a matrix of coefficients and that matrix is called the kernel. You sum all those values together and you get a resulting pixel.

A

The kernel is what defines the behavior of the filter. So if you want to apply a gaussian blur, which is a common operation in many imaging image, editing programs, you use one kernel and if you're applying the solo operator, you use a different kernel.

A

So if we apply the servo operator to every pixel on this image, we end up with something like this. The values here range from zero to around a thousand and to better understand. What's going on, let's map this to a grayscale image where the zeros become white and the thousands become black here you can see that the corners and the center of the image are completely white, and this indicates that there's no edge in those regions while around the circle a dark ring has formed, and that indicates that these pixels likely contain an edge.

A

So if we apply this to our initial grayscale image, we get this.

A

The next step is to find the contour of the outline or the contour or the outline of the card. I did this using a simple algorithm. We scan each row from left to right and when we hit the pixel, that's above a given threshold, we stop. We mark that pixel and move to the next row.

A

So if we do this for every row and then we do the same thing from the other side and from the top to bottom and bottom to top, we end up with these marked pixels. So this is the contour of the image and this works because the the card doesn't have any holes or any concave structures or anything like that, and once we have this.

A

The next step is to turn this contour into four straight lines, and we do this using an algorithm called the half transform.

A

Now, let's see how that works, so we go through each pixel on the contour and we draw all the lines that go through that pixel, not literally every line, because that's that's an infinite number of lines, but we pick a resolution like every half degree or something like that, and we draw those lines.

A

So in this case I'm drawing eight lines here. So we start those lines and then we move to the next pixel and we do the same thing and you'll see that there's one common line between the two sets.

A

If, if you take a look, the horizontal line happens twice and since this line occurs more times than all the other ones, it's more likely that it is a real line.

A

So we're going to do this for every pixel on the contour and we're going to keep only the lines that occur, let's say 200 times or so and discard all the rest.

A

So if we take those lines and draw them, we get this and it's pretty close to what we wanted, but not quite there yet so to get rid of this extra nice. What we can do is we can cluster these lines together and average them out so any lines that are similar enough get clustered together.

A

If we do that, we get our intended result. So, with these lines, we can calculate their intersection points. We can just do this by going through every pair of lines calculate their intersections and discard any points that fall outside of our image, and these four points what they represent is the four corners on our card.

A

Now that we have the corners of the card, we can work on fixing the perspective to get something like this, and this transformation is done roughly speaking by taking it each pixel and moving it to another coordinate, and this movement is done by multiplying each pixel's, coordinate by matrix obtained by solving a system of equations. That's based on those four corners and to do this, I used a crate called n-algebra.

A

They implement a bunch of linear, algebra algorithms, so doing that we finally got what we wanted, which is our 2d card image. Now we can take this image and search for matches in our data set since each since each image has around 1 million pixels.

A

We can't really compare them directly, since there are 14 000 cards, this would take forever. So we need to reduce the amount somehow of information that we're comparing and we're going to do this using a perceptual hash and what the perceptual hash is is a smaller representation of the image that still keeps the the essence of the image.

A

The main idea is that similar images will have similar percep perceptual hashes.

A

So we take each card on the data set and convert it to its perceptual hash, and we do the same thing for our image that we're searching for and we compare those instead. So in this case I I picked a 16 by 16 hash, that's 256 bits or 32 bytes and comparing that is fast enough.

A

So let's see how this hash is calculated.

A

We take our image and result resize it down to 16 by 16., and then we need to binarize it or which I mean, convert each pixel into a zero or one, and in this case, what I'm doing for that step is each pixel becomes a one if it is darker than the pixel to its left and it becomes a zero otherwise and there are different types of perceptual hashes, both in the resizing part and the binary binarization step.

A

But this is a simple algorithm that gave me good results, so here's an overview of the full process. We take the original image, we grayscale it and apply a subtle operator, and then we found to find the contour extract. The four lines are represented, calculate the four corners of the card.

A

With that we can fix the perspective of the grayscale image and apply a perceptual hash. Now this process gave me pretty good results, but there was one case that I needed to deal with these cards over here. They look the same. They have the same name. The gameplay effect is the same, but there's one small difference.

A

They were printed in different sets throughout the years and you can see that the set symbol on the corner there is different. So I need to be able to tell these apart. The set symbols are so small that the perceptual hash doesn't pick up any differences in there. So I needed to find a different way of doing this, so I've limited a technique called template matching.

A

I created individual images for each of the set symbols and we go through each possible position in that marked zone in the car there and for each position. We compare the set symbol with the pixels of the card, and if that hair is below a certain threshold, we consider it to be a match.

A

I had to use a different threshold for each set symbol, because some of them are more complex than the others, and I had to tweak this a bit manually and this solved my problem. So with these two techniques, I was able to get a good detection rate.

A

So let me show you how this works.

A

When I place a card there, you'll see that the card on the left- that's the one that's being detected, and this works even for nice ear cards like this one and in this case you'll see on the left that the set symbol is also being detected.

A

So let me show you guys where the set symbol makes a difference the this this mario here it belongs to the champions path. Expansion set and you'll see that the correct symbol is shown on the left. There's another marni, which is printed in a different set. This one was printed in the sword and shield base set, and you can see that it also gets detected correctly.

A

Now there are some cards where this doesn't work so good like this one, it's a foil card, and that means that there's a lot of reflective material in it, so the lights form from all these bright patterns and the perceptual hash just isn't able to to deal with it now to finalize here are some of the libraries that I used and some that I think that are worth worth. Checking out the the last one rest cv is an organization with many computer vision, algorithms.

A

So if you're interested in the area, I think it's worth checking out, they also have some basic tutorials going through some of them, and uh the code for this is available on my github account. So if you want to go and check it out, feel free to ask me any questions, and that's all I have for you today, so thank you for listening.

A