Organization(s): Carnegie Mellon University & The Atlanta Hawks Language(s): R
For the summer of 2021, I was a member of a 15-person cohort in an REU at Carnegie Mellon University.
In this program, students work on applied projects in statistics and data science through the lens of sports analytics.
Advised by Max Horowitz at the Atlanta Hawks, my team researched the influence of player
fatigue on game outcomes. We used R for scraping in-game tracking data (credit the NBAr package)
and aggregated over 300 variables about players on a game-by-game level for a decade worth of games.
We found the narrative portrayed by sports media is often inaccurate,
substantial evidence players achieve a high degree of recovery between games,
and offered a framework for isolating factors that contribute to game outcomes.
Our ArXiv paper covers our findings in detail. here.
CMSAC Project Showcase
AI Index
Organization(s): Stanford University & University of Wyoming Language(s):Bash, R
Every year the HAI lab at Stanford and partners release a report tracking and summarizing progress in AI called the AI Index.
Using the high-performance computing cluster at the University of Wyoming, I ran a suite of computational experiments included in the 2021 version of this index.
As part of the 2021 TAMU datathon, Goldman Sachs put out a challenge to model the relationship between the stock market and the environment.
My team got first place in this 24-hour challenge, and my contribution was building our primary series of models.
After discussing how the markets might behave in conjunction with the environment, I realized there would be an issue decomposing what is responsible for market changes.
My team's exploratory analysis confirmed that many of our environmental metrics worsened over time while the market grew;
however, these trends were not necessarily systemically related.
Our solution focused on exposing the underlying relationship between the environment and market behavior adjusted for the growth of the US economy.
For details on how we solved this problem, you can check out the devpost.
I've been an avid runner for nearly a decade now, and my brother currently runs for a collegiate cross-country and track program.
While discussing my brother's races, I realized the scoring of cross-country meets leaves room for optimization.
Team scores are computed by summing the places of the top 5 runners on a team.
However, a field of runners is not uniformly distributed; as a result, all improvements in time do not equally influence team scores.
It turns out the runners' places in a cross-country race tend to be roughly gaussian.
This distribution of placing means the closer a runner is to the field median, the same percentage improvement in their time yields a more significant positive impact on their team's score.
The graphic below gives a visual sense of why this is the case.
After validating my initial assumptions, I created a script that returns the runners on a team ordered by how sensitivities in their race influence their team's score.
I wrote a report below to formalize my thoughts on this problem and why it matters.
Organization(s): University of Wyoming Languages(s): Java
For COSC 3011, as part of a 4-person team, I wrote a complete the maze game.
The game consists of a GUI, timing system, reading and writing to binary files to
store and retrieve the state of the game, and users can upload their own mazes.
The project was an exercise in implementing concepts that increase the robustness of large software systems.
The course changed design requirements throughout the development process
challenging teams’ use of encapsulation.
In the git repo, there are design docs detailing the design decisions we made as a team,
a user manual, and a discussion of the different issues we encountered throughout the project.
Organization(s): University of Wyoming Language(s):Bash, R, Python
In the MALLET lab I worked on a project improving automated algorithm selection. Traditionally, automated algorithm selection models are trained on the problem instance
feature values and performance data from algorithm runs. The project I work on showed training these automated algorithm selection
models on the feature values of the algorithms in addition to problem instance features improves overall performance.
This allows the evaluation of sets of problem instances in less time and with less memory by choosing the most optimal algorithm more frequently.
An extended abstract on my work submitted to AAAI the Undergraduate Consortium. Abstract.pdf
2020 Wyoming Research Scholars Virtual Symposium
C Utilities
Organization(s): University of Wyoming Languages(s): C
A collection of C programs written for the course Linux Programming (COSC 3750) at the University of Wyoming.
Parallel matrix multiplier
Takes in two matrices as files, and an integer for the number of threads to distribute the workload over.
Multiplies the two matrices and writes their result to a file.
Shell
Implemented a shell from scratch. Includes pipes, file/input/output redirects and running multiple commands using a semicolon.
Command Line Utilities
Implemented tar, cat and ls from scratch.