I have been a Solutions Architect and Software Engineer at Databricks and an academic researcher, all in various parts of the Machine Learning and advanced analytics space.
Summary of work
I currently work at Databricks, the company founded by the original creators of Apache Spark, Delta Lake, and MLflow. At Databricks, I spent my first 5.5 years at Databricks leading some of our ML efforts from the engineering side, both as an Apache Spark committer and PMC member working on open source and as a tech lead working on the Databricks product. I am now an ML specialist in the Solutions Architect organization, working more directly with customers.
Previously, I spent a year as a postdoc working with Kannan Ramchandran and Martin Wainwright at UC Berkeley. I received my Ph.D. in Machine Learning from Carnegie Mellon University, where I worked with Carlos Guestrin. I received my B.S.E. in Computer Science from Princeton University, where I did research with Robert E. Schapire.
Blog posts, talks, etc. while at Databricks
My public talks and blog posts can mostly be found via:
- Databricks Blog
- Databricks speaker links from past Spark Summits, Spark+AI Summits, and Data+AI Summits
- Miscellaneous meetup, webinar, and conference talks … sometimes discoverable via web search
Open source work
I did most of my work in open source work during my earlier years at Databricks. You can find it by looking at:
Research from years past
My research was generally in large-scale machine learning, especially in trade-offs between sample complexity, computational complexity, and potential for parallelization. My approach combined theory and application, focusing on methods which have strong theoretical guarantees and are competitive in practice.
Selected topics of current and past research:
- Parallel Optimization for Sparse Regression
- Peer Grading in Massive Open Online Courses
- Probabilistic Graphical Models (focus on Conditional Random Fields)
- Boosting-by-Filtering
Academic publications
Year | Title | Authors | Venue | Documents |
---|---|---|---|---|
2016 | Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale | F. Abuzaid, J. Bradley, F. Liang, A. Feng, L. Yang, M. Zaharia, A. Talwalkar | NeurIPS | |
2016 | Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence | N. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchandran and M. Wainwright | JMLR 17(58): 1-47, 2016 | PDF; Earlier version in AISTATS 2015 |
2016 | MLlib: Machine Learning in Apache Spark | X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, DB Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M.J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar | JMLR 17(1): 1-7, 2016 | arxiv |
2015 | Spark SQL: Relational Data Processing in Spark | M. Armbrust, R. Xin, C. Lian, Y. Huai, D. Liu, J. Bradley, X. Meng, T. Kaftan, M. Franklin, A. Ghodsi and M. Zaharia | SIGMOD | |
2015 | Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence | N. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchandran and M. Wainwright | AISTATS | PDF; supplement |
2014 | Robustifying the Sparse Walsh-Hadamard Transform without Increasing the Sample Complexity of O(K log N) | Xiao Li, Joseph K. Bradley, Sameer Pawar, and Kannan Ramchandran | IEEE International Symposium on Information Theory (ISIT) | |
2013 | A Case for Ordinal Peer-evaluation in MOOCs | Nihar B. Shah, Joseph K. Bradley, Abhay Parekh, Martin Wainwright, and Kannan Ramchandran | NeurIPS Workshop on Data Driven Education | |
2013 | Learning Large-Scale Conditional Random Fields | Joseph K. Bradley | Ph.D. Thesis, Machine Learning Department, Carnegie Mellon University | Thesis PDF; Defense PPT |
2012 | Sample Complexity of Composite Likelihood | Joseph K. Bradley and Carlos Guestrin | International Conference on Artificial Intelligence and Statistics (AISTATS) | PDF; poster PPT |
2011 | Parallel Coordinate Descent for L1-Regularized Loss Minimization | Joseph K. Bradley, Aapo Kyrola, Danny Bickson, and Carlos Guestrin | International Conference on Machine Learning (ICML) | arxiv; Corrected PDF; Theory supplement; Scalability analysis; Lasso benchmark; Logreg benchmark |
2010 | Learning Tree Conditional Random Fields | Joseph K. Bradley and Carlos Guestrin | International Conference on Machine Learning (ICML) | |
2008 | FilterBoost: Regression and Classification on Large Datasets | Joseph K. Bradley and Robert E. Schapire | NeurIPS | PDF, with appendix; slides; CMU Data Analysis Project version, with multiclass extensions |