Thursday, December 01, 2016

RTextTools demo error

There is a nifty little "one-stop-shopping" text analytics package in R called "RTextTools" that was created by Timothy P. Jurka from UC-Davis.  The package allows you to do a whole variety of text classification algorithms automatically, and with ease.

However, it appears (and I cannot say this for sure, but that's what it seems like) that if you use it in R version 3.3.0 and later, there is a bug (at least in the demo).  (It apparently does not occur in earlier versions of R.)  The demo asks you to do the following:


doc_matrix  &#60- create_matrix(USCongress$text,
                            language = "english", 
                            removeNumbers = TRUE, 
                            stemWords = TRUE, 
                            removeSparseTerms = .998)

The problem is that the following error occurs.

Error in if (any(lens > lim)) stop("There is a limit of ", lim, "characters on the number of characters in a word being stemmed") :  missing value where TRUE/FALSE needed

One astute user (lukeA) on Stack Overflow discovered that the characters " NA " in the strings in your text fields are converted to an actual R object NA.

Therefore, in order to fix this with the demo, you have to eliminate the two records in USCongress that have an NA lurking in the text.  These are records 3674 and 3675.  Therefore, prior to the create_matrix statement, you can fix this with:

USCongress <- 3675="" b="" c="" uscongress="">

Once that line is there, the create_matrix field works, and you can continue with the demo.

Follow me on Twitter: @bioniclime

Saturday, September 24, 2016

A Parent's Guide to Tournament Chess

I wrote this several years ago for a blog, but it has since been deleted, so I thought I would post it here.

A Parent's Guide to Tournament Chess
by Robert N. Bernard

So your child has been playing chess for a while, and his or her chess teacher has suggested they play in a tournament. Your child begs you to play; "They even have trophies - big ones!", the child exclaims. You relent, and the chess teacher provides the time and place. How should you prepare? Once you get there, what should you expect?

Many kids' first tournaments are small, local get-togethers (of perhaps 10-20 participants), but some are noticeably larger. Most of what's written here applies to most of the tournaments in the USA and Canada, but there will always be some variation.

Before the tournament - buy a tournament chess set, board, and scorebook. Plastic sets and vinyl boards work well, and, if your child is in elementary school, the spaces in which to write moves should be large enough to accommodate children's larger handwriting. Unless you have an older child who has played with a clock before, you will probably not need a clock at first, but if they play in more and more tournaments they will need one.

On the morning of the tournament, there's Rule Number One - pack food. Chess tournaments on the weekend are held in elementary schools, roadside hotels, libraries, and multi-purpose rooms, and the dining options are usually limited to non-existent. You might think that you will have time between games to go out and get something, but don't count on it - the games are typically played in as short a time as possible.

At the tournament, parents frequently are afraid that their child will lose their first game and then be sent home. In the vast majority of tournaments, this is not the case; each child will play one game for each round of the tournament - there are no eliminations. Also, larger tournaments are broken into sections by "ratings", a estimation of skill level of the player (more on ratings below), but in a child's first tournament, they will not have a rating and be called an "unrated" player. The tournament director determines the "pairings" (which child plays which other child) usually with help of a computer program. The pairings are usually posted on a wall or door on what's called the "pairing sheet", one for each section of the tournament. The pairing sheet is easy to find - it will be where the crowd gathers when the tournament director announces, "Pairings are up!". Each row of the pairing sheet first lists a "board number" that corresponds to a placard adjacent to a board on a table, the names of two players indicating who is to play what color (the player with the white pieces is listed first), and two columns which are blank, for the result (1 - 0 White wins; 0 - 1 Black wins; 0.5 - 0.5 draw).

Pairings are determined using a method called the Swiss system. Essentially, the Swiss system (which is explained in more depth at the end of this article) dictates that kids should play other kids with the same score as themselves (where a win is worth 1 point, a draw 0.5 point, and a loss 0 points). For example, after three rounds of play, there may be a few kids who have won all their games and have 3 points - in the fourth round, they would play each other, while those kids with 1 total point (i.e., 1 win and 2 losses, or 2 draws and 1 loss) would play each other. In this way, kids with the same performance (and presumably similar skill levels) are more likely to play each other as the tournament progresses.

After the pairings are posted, the child should go sit at the board they are assigned, and you should ensure they are sitting with the correct color of the pieces in front of them. At that point, you should stay with your child for a while until you are asked to leave the playing area. Unlike almost every other child sporting event (soccer, little league, swim meets), parents are (usually) not allowed to watch their child play. There are two reasons for this. First, it is to eliminate the possibility of having a parent signal a child. intentionally or unintentionally. Intentionally signaling is reprehensible, but unintentionally signaling (e.g., wincing or grimacing when your child is about to let go of a piece that will lose the game for the child, or smirking or chuckling when the child's opponent just made an obvious mistake) is far more common. Second, your child's opponent might very well be intimidated by you standing near the board, scowling.

At the end of each game of kids' tournaments, tournament directors will usually have younger or more inexperienced kids raise their hands to summon a tournament director to verify the game is over, and help them record their results. Older or more experienced kids should simply clean up the pieces, and go to the pairing sheet and write the result. In either case, both players should record the result, and verify that it is correct. When mistakes happen in result recording, it is an ordeal that neither child, nor parent, nor especially tournament director wants to go through.

While the game is going on, each parent has his or her own coping strategy; some read, some eat, some pace, some talk incessantly to anyone and everyone, but what all parents should realize is that - perhaps for the first time - your child is totally on their own, doing something by themselves, which is quite an accomplishment, especially if your child is on the young side.

After each game, the child will come out of the room projecting some emotion between elation and bitter disappointment. Losses are hard on all chess players, children or adults - unlike most other games or sports, you cannot blame the loss on the luck of the dice or a bad bounce, only themselves. I do not recommend minimizing the loss, because it will have mattered to the child, but instead try to encourage and congratulate them on playing their first tournament game, and prepare them for the next one.

Inevitably, your child will be asked "What's your rating?" and the child will then ask you about ratings. Ratings are a statistical measure of a player's approximate skill level, and in the USA, ratings range from a low of 100 to 2800 or more. (Note that it improper to use "chess ranking", when you mean a rating instead; when you rank something, a low number -- ranked number 1 -- indicates that it is the best, but a higher number is better in a rating scale.) A player receives a rating if they play in a rated tournament. For the first 25 games or so, ratings are determined by averaging each opponent's rating plus 400 if you beat that opponent, each opponent's rating if you drew that opponent, and each opponent's rating minus 400 if you lost to that opponent. For example, in your first tournament you beat opponents of ratings 800 and 900, draw an opponent rated 600, and lose to an opponent rated 700; your first rating would be (800+400) plus (900+400) plus (600+-0), plus (700-400), or 1200+1300+600+300 all divided by 4, which is 850. There are some complications if your opponent's ratings are very low, or if you win or lose all your games, but this is a rough estimate. Once your rating becomes established (i.e., you have played more than 25 games), the general guideline is that you will gain many rating points if you beat someone higher rated than you, lose many rating points if you lose to someone lower rated than you, gain only a few rating points if you beat someone lower rated than you, and lose only a few rating points if you lose to someone higher rated than you. If you are curious about the math behind all of this, the United States Chess Federation's rating system is explained here.

Finally, after the tournament is over, be prepared for some sort of catharsis in the car, no matter the age of the child. Your child may talk excitedly for the next hour and not be able to sleep at night. Your child may weep inconsolably because every game was a loss. Both outpourings of emotion are healthy, and indicate that the tournament really meant something to the child. After the emotions are drained out of the child -- joy or despair -- keep your ears open, as you may very well hear, "When's the next tournament?"

Appendix: The Swiss System 

Here's an short description of the Swiss System of pairing opponents in chess tournaments. Refer also to the diagram below.  Each step is marked with a circle around it.

Step 1: At the beginning of the tournament, rank all the players from highest rating to lowest rating. Unrated players should be ranked lowest. The number given is called the "pairing number". In the example, Helen has pairing number 1 and David has pairing number 4.

Step 2: Split the list of players into two equally sized groups, where the split occurs between the two middle pairing numbers. If there are an odd number of players, one player is given a "bye", which means that they get a full point for the round, but they do not have an opponent.

Step 3: Pair the top half of the group with the bottom half of the group, but maintain the rankings in the top and bottom halves. You might expect that for 8 players, 1 would play 8, 2 would play 7, and so forth, but in a Swiss, 1 plays 5, 2 plays 6, etc. Alternate colors so that if 1 gets white, 2 gets black, and so on. Post these pairings on the pairing sheet.

Step 4: Once the round is complete, record the results. 1-0 means white won, 0-1 means black won, and 0.5-0.5 is a draw. In the example, Helen beat Sadie, Mateo beat Dhiren, Julie beat Ming, and Chris beat David.

Step 5: In the next round (and all subsequent rounds), group players again, but this time by their total score. In the example, there are four players with 1 point, and four players with 0 points. Each of these is called a "score group". Within each score group, use the players' ranking numbers, and split them again.

Step 6: For each score group, pair the top half of that score group with the bottom half of that score group. In this way, players with the same score will play each other. Try and maintain alternation of colors for each player (so they get white, black, white, black, etc., in each round).

 Step 7: Post the pairing sheet with the new pairings. In the example, Chris has white against Helen with black. Notice that both Chris and Helen have the same score (1 point) and that Helen was "due" black this

Friday, February 12, 2016

Counting objects in a vector in R

There is an easy way to count the number of objects in a vector in R.... also known as getting your Freq on.

Suppose you create a vector called myVec with some elements in it, and you want to know the frequency (i.e., the count) of each object.

Simply convert it into a table, and then cast it  The column names are the original vector name (myVec) and Freq.

> myVec <- c="" cat="" dog="" mouse=""> myDF <-"" myvec="" table=""> myDF
  myVec Freq
1   cat    3
2   dog    1
3 mouse    2

This also works if your vector is a column of a data frame as well.

The mysteries of the R programming language revealed.

Follow me on Twitter: @bioniclime