Julia

Lukas Hager

2024-05-20

Learning Objectives

  • Understand the (very) basics of Julia

Background

Julia1

  • Third member of Ju-Pyt-R
    • Can run within Jupyter if you install a Julia kernel
  • Dynamically typed, just-in-time compiled
  • Becoming more popular recently, especially for Economists
  • Still a little rough around the edges

Why Should I Use Julia?

  • It’s fast
  • (Probably some other reasons too)

Why You Should Use Julia, From Pros

Have you ever in Python or R:

  • Done something and were unable to achieve the performance that you needed? Well, in Julia, Python or R minutes can be translated to seconds.
  • Tried to do something different from numpy/dplyr conventions and discovered that your code is slow and you’ll probably have to learn dark magic to make it faster? In Julia you can do your custom different stuff without loss of performance.
  • Had to debug code and somehow you see yourself reading Fortran or C/C++ source code and having no idea what you are trying to accomplish? In Julia you only read Julia code, no need to learn another language to make your original language fast.
  • Wanted to use a data structure defined in another package and found that doesn’t work and that you’ll probably need to build an interface? Julia allows users to easily share and reuse code from different packages.
  • Needed to have a better project management, with dependencies and version control tightly controlled, manageable, and replicable? Julia has an amazing project management solution and a great package manager.

Basics

Installation

  • Install Julia
  • If you want to add a julia kernel to your jupyter installation:
using Pkg
Pkg.add('IJulia')

Scalars

an_int = 1
a_string = "Julia"
a_string
"Julia"

Julia has convential types:

  • Integers (Int64)
  • Real Numbers (Float64)
  • Booleans (Bool)
  • Strings (String)
typeof(a_string)
String

Functions

function add_numbers(x, y)
    return x + y
end

add_numbers(3,2)
5
add_numbers("3", 2)
MethodError: no method matching +(::String, ::Int64)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::T, ::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} at int.jl:87
  +(::SentinelArrays.ChainedVectorIndex, ::Integer) at ~/.julia/packages/SentinelArrays/HOdiP/src/chainedvector.jl:207
  ...

Careful with quotes!

add_numbers('3', 2)
'5': ASCII/Unicode U+0035 (category Nd: Number, decimal digit)

Loops

for i in 1:10
    println(i)
end
1
2
3
4
5
6
7
8
9
10
sum = 0
i = 0
while sum + i + 1 <= 2024
    i += 1
    sum += i
end
sum, i
(2016, 63)

DataFrames.jl

Reading in CSVs

using CSV
using DataFrames

senate_polls = CSV.read(
    "/Users/hlukas/git/personal_website/static/econ-481/data/senate_polls_historical.csv", 
    DataFrame
)
6658×48 DataFrame
6633 rows omitted
Row poll_id pollster_id pollster sponsor_ids sponsors display_name pollster_rating_id pollster_rating_name numeric_grade pollscore methodology transparency_score state start_date end_date sponsor_candidate_id sponsor_candidate sponsor_candidate_party endorsed_candidate_id endorsed_candidate_name endorsed_candidate_party question_id sample_size population subpopulation population_full tracking created_at notes url source internal partisan race_id cycle office_type seat_number seat_name election_date stage nationwide_batch ranked_choice_reallocated ranked_choice_round party answer candidate_id candidate_name pct
Int64 Int64 String String? String? String Int64 String Float64? Float64? String? Float64? String15 String15 String15 Int64? String31? String3? Missing Missing Missing Int64 Int64? String3? Missing String3? Bool? String15 String? String? String15? Bool? String7? Int64 Int64 String15 Int64 String15 String15? String15 Bool Bool Int64? String3 String31 Int64 String31 Float64
1 81762 1250 Trafalgar Group Trafalgar Group 338 Trafalgar Group 0.7 0.5 IVR/Live Phone/Text/Online Panel/Email 1.0 Georgia 12/3/22 12/5/22 missing missing missing missing missing missing 165972 1099 lv missing lv missing 12/6/22 08:58 https://www.thetrafalgargroup.org/news/ga-sen-ro-1205/ missing REP 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing DEM Warnock 19086 Raphael Warnock 51.1
2 81762 1250 Trafalgar Group Trafalgar Group 338 Trafalgar Group 0.7 0.5 IVR/Live Phone/Text/Online Panel/Email 1.0 Georgia 12/3/22 12/5/22 missing missing missing missing missing missing 165972 1099 lv missing lv missing 12/6/22 08:58 https://www.thetrafalgargroup.org/news/ga-sen-ro-1205/ missing REP 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing REP Walker 19088 Herschel Junior Walker 47.4
3 81760 1515 Data for Progress Data for Progress 522 Data for Progress 2.7 -1.2 IVR/Live Phone/Online Panel/Text-to-Web 6.0 Georgia 12/1/22 12/5/22 missing missing missing missing missing missing 165968 1229 lv missing lv missing 12/5/22 20:44 https://www.filesforprogress.org/datasets/2022/12/dfp_ga_runoff_tabs.pdf false missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing DEM Warnock 19086 Raphael Warnock 51.0
4 81760 1515 Data for Progress Data for Progress 522 Data for Progress 2.7 -1.2 IVR/Live Phone/Online Panel/Text-to-Web 6.0 Georgia 12/1/22 12/5/22 missing missing missing missing missing missing 165968 1229 lv missing lv missing 12/5/22 20:44 https://www.filesforprogress.org/datasets/2022/12/dfp_ga_runoff_tabs.pdf false missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing REP Walker 19088 Herschel Junior Walker 49.0
5 81759 235 InsiderAdvantage 195 Fox 5 Atlanta InsiderAdvantage 243 InsiderAdvantage 2.0 -0.4 Text 5.0 Georgia 12/4/22 12/4/22 missing missing missing missing missing missing 165967 750 lv missing lv missing 12/5/22 14:05 https://insideradvantage.com/2022/12/05/breaking-insideradvantagefox-5-survey-warnock-holds-slim-lead-over-walker/ missing missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing DEM Warnock 19086 Raphael Warnock 50.5
6 81759 235 InsiderAdvantage 195 Fox 5 Atlanta InsiderAdvantage 243 InsiderAdvantage 2.0 -0.4 Text 5.0 Georgia 12/4/22 12/4/22 missing missing missing missing missing missing 165967 750 lv missing lv missing 12/5/22 14:05 https://insideradvantage.com/2022/12/05/breaking-insideradvantagefox-5-survey-warnock-holds-slim-lead-over-walker/ missing missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing REP Walker 19088 Herschel Junior Walker 48.4
7 81761 317 Mitchell Mitchell Research & Communications 213 Mitchell Research & Communications 2.0 -0.1 Text-to-Web 5.0 Georgia 12/4/22 12/4/22 missing missing missing missing missing missing 165971 625 lv missing lv missing 12/6/22 08:58 https://www.realclearpolitics.com/docs/2022/GA_US_Senate_Run-off_Press_Release-Field_Copy-Crosstabs_12-4-22_A.pdf missing missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing DEM Warnock 19086 Raphael Warnock 50.0
8 81761 317 Mitchell Mitchell Research & Communications 213 Mitchell Research & Communications 2.0 -0.1 Text-to-Web 5.0 Georgia 12/4/22 12/4/22 missing missing missing missing missing missing 165971 625 lv missing lv missing 12/6/22 08:58 https://www.realclearpolitics.com/docs/2022/GA_US_Senate_Run-off_Press_Release-Field_Copy-Crosstabs_12-4-22_A.pdf missing missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing REP Walker 19088 Herschel Junior Walker 45.0
9 81764 262 Landmark Communications Landmark Communications 166 Landmark Communications 2.1 -0.6 Live Phone/Online Panel 5.0 Georgia 12/4/22 12/4/22 missing missing missing missing missing missing 165974 800 lv missing lv missing 12/6/22 09:46 https://landmarkcommunications.net/landmark-communications-senate-runoff-poll/ missing missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing DEM Warnock 19086 Raphael Warnock 52.2
10 81764 262 Landmark Communications Landmark Communications 166 Landmark Communications 2.1 -0.6 Live Phone/Online Panel 5.0 Georgia 12/4/22 12/4/22 missing missing missing missing missing missing 165974 800 lv missing lv missing 12/6/22 09:46 https://landmarkcommunications.net/landmark-communications-senate-runoff-poll/ missing missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing REP Walker 19088 Herschel Junior Walker 46.7
11 81754 1754 Patriot Polling Patriot Polling 732 Patriot Polling 1.1 0.6 IVR 5.0 Georgia 11/30/22 12/2/22 missing missing missing missing missing missing 165943 818 rv missing rv missing 12/4/22 18:51 https://patriotpolling.com/our-polls/f/warnock-leads-walker-by-17%25-in-georgia-senate-runoff missing missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing DEM Warnock 19086 Raphael Warnock 48.7
12 81754 1754 Patriot Polling Patriot Polling 732 Patriot Polling 1.1 0.6 IVR 5.0 Georgia 11/30/22 12/2/22 missing missing missing missing missing missing 165943 818 rv missing rv missing 12/4/22 18:51 https://patriotpolling.com/our-polls/f/warnock-leads-walker-by-17%25-in-georgia-senate-runoff missing missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing REP Walker 19088 Herschel Junior Walker 47.0
13 81734 1102 Emerson 960,1945 The Hill | WJBF Emerson College 88 Emerson College 2.9 -1.1 IVR/Online Panel/Text-to-Web 6.0 Georgia 11/28/22 11/30/22 missing missing missing missing missing missing 165846 888 lv missing lv missing 12/1/22 09:16 https://emersoncollegepolling.com/georgia-2022-warnock-holds-slight-edge-over-walker/ missing missing 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing DEM Warnock 19086 Raphael Warnock 50.8
6647 53672 147 Fabrizio 26 Club for Growth Fabrizio, Lee & Associates 90 Fabrizio, Lee & Associates 1.7 -0.1 missing missing Missouri 7/10/17 7/11/17 missing missing missing missing missing 86135 500 lv missing lv missing 8/21/18 00:39 conducted in 2017 https://www.scribd.com/document/353764204/MO-Sen-Fabrizo-Lee-for-the-Club-for-Growth-July-2017 538 missing REP 109 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM McCaskill 11410 Claire McCaskill 42.0
6648 53672 147 Fabrizio 26 Club for Growth Fabrizio, Lee & Associates 90 Fabrizio, Lee & Associates 1.7 -0.1 missing missing Missouri 7/10/17 7/11/17 missing missing missing missing missing 86135 500 lv missing lv missing 8/21/18 00:39 conducted in 2017 https://www.scribd.com/document/353764204/MO-Sen-Fabrizo-Lee-for-the-Club-for-Growth-July-2017 538 missing REP 109 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP Hawley 11411 Josh Hawley 46.0
6649 53556 1056 Remington 421 Missouri Scout Remington Research Group 279 Remington Research Group 2.6 -0.7 missing missing Missouri 7/7/17 7/8/17 missing missing missing missing missing 85942 928 lv missing lv missing 8/9/18 03:01 https://www.realclearpolitics.com/docs/Remington_Research_MO_Senate_July_2017.pdf 538 missing missing 109 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM McCaskill 11410 Claire McCaskill 44.0
6650 53556 1056 Remington 421 Missouri Scout Remington Research Group 279 Remington Research Group 2.6 -0.7 missing missing Missouri 7/7/17 7/8/17 missing missing missing missing missing 85942 928 lv missing lv missing 8/9/18 03:01 https://www.realclearpolitics.com/docs/Remington_Research_MO_Senate_July_2017.pdf 538 missing missing 109 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP Hawley 11411 Josh Hawley 50.0
6651 52661 383 PPP Public Policy Polling 263 Public Policy Polling 1.4 0.0 missing missing Nevada 6/23/17 6/25/17 missing missing missing missing missing 83203 648 v missing v missing 6/22/18 15:26 https://www.scribd.com/document/352321710/NVToplines-1?irgwc=1&content=27795&campaign=VigLink&ad_group=3073860&keyword=ft500noi&source=impactradius&medium=affiliate 538 missing missing 112 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM Rosen 11150 Jacky Rosen 42.0
6652 52661 383 PPP Public Policy Polling 263 Public Policy Polling 1.4 0.0 missing missing Nevada 6/23/17 6/25/17 missing missing missing missing missing 83203 648 v missing v missing 6/22/18 15:26 https://www.scribd.com/document/352321710/NVToplines-1?irgwc=1&content=27795&campaign=VigLink&ad_group=3073860&keyword=ft500noi&source=impactradius&medium=affiliate 538 missing missing 112 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP Heller 11151 Dean Heller 41.0
6653 53946 290 MassINC Polling Group 83 WBUR MassINC Polling Group 198 MassINC Polling Group 2.8 -0.8 missing missing Massachusetts 6/19/17 6/22/17 missing missing missing missing missing 86824 504 rv missing rv missing 9/11/18 21:27 http://www.wbur.org/news/2017/06/27/wbur-poll-warren-baker-reelection-tax-proposals 538 false missing 105 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM Warren 12480 Elizabeth Warren 60.0
6654 53946 290 MassINC Polling Group 83 WBUR MassINC Polling Group 198 MassINC Polling Group 2.8 -0.8 missing missing Massachusetts 6/19/17 6/22/17 missing missing missing missing missing 86824 504 rv missing rv missing 9/11/18 21:27 http://www.wbur.org/news/2017/06/27/wbur-poll-warren-baker-reelection-tax-proposals 538 false missing 105 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP Diehl 12481 Geoff Diehl 29.0
6655 52630 610 Texas Lyceum Texas Lyceum 431 Texas Lyceum missing missing Live Phone missing Texas 4/3/17 4/9/17 missing missing missing missing missing 83159 1000 a missing a missing 6/22/18 15:20 https://www.texaslyceum.org/resources/Pictures/2017%20Topline%20Results.pdf 538 missing missing 121 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM O'Rourke 11125 Beto O'Rourke 30.0
6656 52630 610 Texas Lyceum Texas Lyceum 431 Texas Lyceum missing missing Live Phone missing Texas 4/3/17 4/9/17 missing missing missing missing missing 83159 1000 a missing a missing 6/22/18 15:20 https://www.texaslyceum.org/resources/Pictures/2017%20Topline%20Results.pdf 538 missing missing 121 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP Cruz 11126 Ted Cruz 30.0
6657 52643 988 Harper Polling Harper Polling 132 Harper Polling missing -0.3 IVR missing West Virginia 11/16/16 11/17/16 missing missing missing missing missing 83181 500 lv missing lv missing 6/22/18 15:24 http://harperpolling.com/polls/west-virginia-senate-2018-general-election-poll 538 missing missing 126 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM Manchin 11132 Joe Manchin, III 57.0
6658 52643 988 Harper Polling Harper Polling 132 Harper Polling missing -0.3 IVR missing West Virginia 11/16/16 11/17/16 missing missing missing missing missing 83181 500 lv missing lv missing 6/22/18 15:24 http://harperpolling.com/polls/west-virginia-senate-2018-general-election-poll 538 missing missing 126 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP Morrisey 11133 Patrick Morrisey 35.0

Access a Column

senate_polls.pollster
6658-element PooledArrays.PooledVector{String, UInt32, Vector{UInt32}}:
 "Trafalgar Group"
 "Trafalgar Group"
 "Data for Progress"
 "Data for Progress"
 "InsiderAdvantage"
 "InsiderAdvantage"
 "Mitchell"
 "Mitchell"
 "Landmark Communications"
 "Landmark Communications"
 "Patriot Polling"
 "Patriot Polling"
 "Emerson"
 ⋮
 "Fabrizio"
 "Fabrizio"
 "Remington"
 "Remington"
 "PPP"
 "PPP"
 "MassINC Polling Group"
 "MassINC Polling Group"
 "Texas Lyceum"
 "Texas Lyceum"
 "Harper Polling"
 "Harper Polling"

Access a Row

senate_polls[1, :]
DataFrameRow (48 columns)
Row poll_id pollster_id pollster sponsor_ids sponsors display_name pollster_rating_id pollster_rating_name numeric_grade pollscore methodology transparency_score state start_date end_date sponsor_candidate_id sponsor_candidate sponsor_candidate_party endorsed_candidate_id endorsed_candidate_name endorsed_candidate_party question_id sample_size population subpopulation population_full tracking created_at notes url source internal partisan race_id cycle office_type seat_number seat_name election_date stage nationwide_batch ranked_choice_reallocated ranked_choice_round party answer candidate_id candidate_name pct
Int64 Int64 String String? String? String Int64 String Float64? Float64? String? Float64? String15 String15 String15 Int64? String31? String3? Missing Missing Missing Int64 Int64? String3? Missing String3? Bool? String15 String? String? String15? Bool? String7? Int64 Int64 String15 Int64 String15 String15? String15 Bool Bool Int64? String3 String31 Int64 String31 Float64
1 81762 1250 Trafalgar Group Trafalgar Group 338 Trafalgar Group 0.7 0.5 IVR/Live Phone/Text/Online Panel/Email 1.0 Georgia 12/3/22 12/5/22 missing missing missing missing missing missing 165972 1099 lv missing lv missing 12/6/22 08:58 https://www.thetrafalgargroup.org/news/ga-sen-ro-1205/ missing REP 9552 2022 U.S. Senate 2 Class III 12/6/22 runoff false false missing DEM Warnock 19086 Raphael Warnock 51.1
senate_polls[1, :pollster]
"Trafalgar Group"

Filtering Data

Filtering in DataFrames.jl is a little annoying – you do everything with functions:

is_az_poll(state) = state == "Arizona"
is_az_poll("Arizona")
true

To use this to filter our data:

filter(:state => is_az_poll, senate_polls)

Filtering Data

663×48 DataFrame
638 rows omitted
Row poll_id pollster_id pollster sponsor_ids sponsors display_name pollster_rating_id pollster_rating_name numeric_grade pollscore methodology transparency_score state start_date end_date sponsor_candidate_id sponsor_candidate sponsor_candidate_party endorsed_candidate_id endorsed_candidate_name endorsed_candidate_party question_id sample_size population subpopulation population_full tracking created_at notes url source internal partisan race_id cycle office_type seat_number seat_name election_date stage nationwide_batch ranked_choice_reallocated ranked_choice_round party answer candidate_id candidate_name pct
Int64 Int64 String String? String? String Int64 String Float64? Float64? String? Float64? String15 String15 String15 Int64? String31? String3? Missing Missing Missing Int64 Int64? String3? Missing String3? Bool? String15 String? String? String15? Bool? String7? Int64 Int64 String15 Int64 String15 String15? String15 Bool Bool Int64? String3 String31 Int64 String31 Float64
1 81629 1250 Trafalgar Group Trafalgar Group 338 Trafalgar Group 0.7 0.5 IVR/Live Phone/Text/Online Panel/Email 1.0 Arizona 11/5/22 11/7/22 missing missing missing missing missing missing 165305 1094 lv missing lv missing 11/7/22 22:26 https://www.thetrafalgargroup.org/wp-content/uploads/2022/11/AZ-Gen-Poll-Report-1107.pdf missing REP 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing DEM Kelly 20633 Mark Kelly 46.7
2 81629 1250 Trafalgar Group Trafalgar Group 338 Trafalgar Group 0.7 0.5 IVR/Live Phone/Text/Online Panel/Email 1.0 Arizona 11/5/22 11/7/22 missing missing missing missing missing missing 165305 1094 lv missing lv missing 11/7/22 22:26 https://www.thetrafalgargroup.org/wp-content/uploads/2022/11/AZ-Gen-Poll-Report-1107.pdf missing REP 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing REP Masters 28668 Blake Masters 48.2
3 81629 1250 Trafalgar Group Trafalgar Group 338 Trafalgar Group 0.7 0.5 IVR/Live Phone/Text/Online Panel/Email 1.0 Arizona 11/5/22 11/7/22 missing missing missing missing missing missing 165305 1094 lv missing lv missing 11/7/22 22:26 https://www.thetrafalgargroup.org/wp-content/uploads/2022/11/AZ-Gen-Poll-Report-1107.pdf missing REP 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing LIB Victor 30075 Marc Victor 1.3
4 81581 1478 Research Co. Research Co. 449 Research Co. 2.5 -0.5 Online Panel 6.0 Arizona 11/4/22 11/6/22 missing missing missing missing missing missing 165128 450 lv missing lv missing 11/7/22 11:15 https://researchco.ca/2022/11/07/2022-midterm-uspoli/ missing missing 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing DEM Kelly 20633 Mark Kelly 51.0
5 81581 1478 Research Co. Research Co. 449 Research Co. 2.5 -0.5 Online Panel 6.0 Arizona 11/4/22 11/6/22 missing missing missing missing missing missing 165128 450 lv missing lv missing 11/7/22 11:15 https://researchco.ca/2022/11/07/2022-midterm-uspoli/ missing missing 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing REP Masters 28668 Blake Masters 47.0
6 81581 1478 Research Co. Research Co. 449 Research Co. 2.5 -0.5 Online Panel 6.0 Arizona 11/4/22 11/6/22 missing missing missing missing missing missing 165128 450 lv missing lv missing 11/7/22 11:15 https://researchco.ca/2022/11/07/2022-midterm-uspoli/ missing missing 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing LIB Victor 30075 Marc Victor 2.0
7 81596 1308 Data Orbital Data Orbital 73 Data Orbital 2.9 -0.9 Live Phone/Text-to-Web 10.0 Arizona 11/4/22 11/6/22 missing missing missing missing missing missing 165192 550 lv missing lv missing 11/7/22 12:12 https://dataorbital.com/final-az-general-election-polls-republicans-in-a-strong-position-going-into-election-day-2022/ missing missing 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing DEM Kelly 20633 Mark Kelly 48.2
8 81596 1308 Data Orbital Data Orbital 73 Data Orbital 2.9 -0.9 Live Phone/Text-to-Web 10.0 Arizona 11/4/22 11/6/22 missing missing missing missing missing missing 165192 550 lv missing lv missing 11/7/22 12:12 https://dataorbital.com/final-az-general-election-polls-republicans-in-a-strong-position-going-into-election-day-2022/ missing missing 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing REP Masters 28668 Blake Masters 46.5
9 81596 1308 Data Orbital Data Orbital 73 Data Orbital 2.9 -0.9 Live Phone/Text-to-Web 10.0 Arizona 11/4/22 11/6/22 missing missing missing missing missing missing 165192 550 lv missing lv missing 11/7/22 12:12 https://dataorbital.com/final-az-general-election-polls-republicans-in-a-strong-position-going-into-election-day-2022/ missing missing 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing LIB Victor 30075 Marc Victor 1.8
10 81618 1515 Data for Progress Data for Progress 522 Data for Progress 2.7 -1.2 Live Phone/Online Panel/Text-to-Web/Mail-to-Web 6.0 Arizona 11/2/22 11/6/22 missing missing missing missing missing missing 165271 1359 lv missing lv missing 11/7/22 18:26 https://www.filesforprogress.org/datasets/2022/11/dfp_az_final_midterm_tabs.pdf false missing 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing DEM Kelly 20633 Mark Kelly 49.0
11 81618 1515 Data for Progress Data for Progress 522 Data for Progress 2.7 -1.2 Live Phone/Online Panel/Text-to-Web/Mail-to-Web 6.0 Arizona 11/2/22 11/6/22 missing missing missing missing missing missing 165271 1359 lv missing lv missing 11/7/22 18:26 https://www.filesforprogress.org/datasets/2022/11/dfp_az_final_midterm_tabs.pdf false missing 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing REP Masters 28668 Blake Masters 50.0
12 81618 1515 Data for Progress Data for Progress 522 Data for Progress 2.7 -1.2 Live Phone/Online Panel/Text-to-Web/Mail-to-Web 6.0 Arizona 11/2/22 11/6/22 missing missing missing missing missing missing 165271 1359 lv missing lv missing 11/7/22 18:26 https://www.filesforprogress.org/datasets/2022/11/dfp_az_final_midterm_tabs.pdf false missing 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing LIB Victor 30075 Marc Victor 2.0
13 81626 1477 Targoz Market Research 1477 PollSmart MR Targoz Market Research 454 Targoz Market Research 2.0 0.0 Online Panel 7.0 Arizona 11/2/22 11/6/22 missing missing missing missing missing missing 165299 809 rv missing rv missing 11/7/22 22:25 https://www.pollsmartmr.com/latest-polls-1/arizona-poll-close-races-in-arizona missing missing 8919 2022 U.S. Senate 2 Class III 11/8/22 general false false missing DEM Kelly 20633 Mark Kelly 46.0
652 53708 1293 OH Predictive Insights / MBQF 1081 KNXV-TV OH Predictive Insights 235 Noble Predictive Insights 2.4 -0.4 missing missing Arizona 4/10/18 4/11/18 missing missing missing missing missing 86190 600 lv missing lv missing 8/22/18 00:25 http://createsend.com/t/i-E1269679105F7FA92540EF23F30FEDED 538 missing missing 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM Sinema 12409 Kyrsten Sinema 50.0
653 53708 1293 OH Predictive Insights / MBQF 1081 KNXV-TV OH Predictive Insights 235 Noble Predictive Insights 2.4 -0.4 missing missing Arizona 4/10/18 4/11/18 missing missing missing missing missing 86190 600 lv missing lv missing 8/22/18 00:25 http://createsend.com/t/i-E1269679105F7FA92540EF23F30FEDED 538 missing missing 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP Ward 12411 Kelli Ward 40.0
654 53708 1293 OH Predictive Insights / MBQF 1081 KNXV-TV OH Predictive Insights 235 Noble Predictive Insights 2.4 -0.4 missing missing Arizona 4/10/18 4/11/18 missing missing missing missing missing 86191 600 lv missing lv missing 8/22/18 00:25 http://createsend.com/t/i-E1269679105F7FA92540EF23F30FEDED 538 missing missing 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM Sinema 12409 Kyrsten Sinema 59.0
655 53708 1293 OH Predictive Insights / MBQF 1081 KNXV-TV OH Predictive Insights 235 Noble Predictive Insights 2.4 -0.4 missing missing Arizona 4/10/18 4/11/18 missing missing missing missing missing 86191 600 lv missing lv missing 8/22/18 00:25 http://createsend.com/t/i-E1269679105F7FA92540EF23F30FEDED 538 missing missing 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP Arpaio 12412 Joe Arpaio 33.0
656 53709 383 PPP 946 Protect Our Care Public Policy Polling 263 Public Policy Polling 1.4 0.0 missing missing Arizona 3/15/18 3/16/18 missing missing missing missing missing 86192 547 v missing v missing 8/22/18 00:25 http://www.protectourcare.org/wp-content/uploads/2018/03/PPP-Poll-AZ-ACA-Memo-and-Results-March-21.pdf 538 missing DEM 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM Sinema 12409 Kyrsten Sinema 46.0
657 53709 383 PPP 946 Protect Our Care Public Policy Polling 263 Public Policy Polling 1.4 0.0 missing missing Arizona 3/15/18 3/16/18 missing missing missing missing missing 86192 547 v missing v missing 8/22/18 00:25 http://www.protectourcare.org/wp-content/uploads/2018/03/PPP-Poll-AZ-ACA-Memo-and-Results-March-21.pdf 538 missing DEM 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP McSally 12410 Martha McSally 41.0
658 53710 1293 OH Predictive Insights / MBQF OH Predictive Insights 235 Noble Predictive Insights 2.4 -0.4 IVR missing Arizona 11/9/17 11/9/17 missing missing missing missing missing 86193 600 lv missing lv missing 8/22/18 00:26 http://createsend.com/t/i-2DEC10E907A50F722540EF23F30FEDED 538 missing missing 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM Sinema 12409 Kyrsten Sinema 46.0
659 53710 1293 OH Predictive Insights / MBQF OH Predictive Insights 235 Noble Predictive Insights 2.4 -0.4 IVR missing Arizona 11/9/17 11/9/17 missing missing missing missing missing 86193 600 lv missing lv missing 8/22/18 00:26 http://createsend.com/t/i-2DEC10E907A50F722540EF23F30FEDED 538 missing missing 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP McSally 12410 Martha McSally 45.0
660 53710 1293 OH Predictive Insights / MBQF OH Predictive Insights 235 Noble Predictive Insights 2.4 -0.4 IVR missing Arizona 11/9/17 11/9/17 missing missing missing missing missing 86194 600 lv missing lv missing 8/22/18 00:26 http://createsend.com/t/i-2DEC10E907A50F722540EF23F30FEDED 538 missing missing 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM Sinema 12409 Kyrsten Sinema 46.0
661 53710 1293 OH Predictive Insights / MBQF OH Predictive Insights 235 Noble Predictive Insights 2.4 -0.4 IVR missing Arizona 11/9/17 11/9/17 missing missing missing missing missing 86194 600 lv missing lv missing 8/22/18 00:26 http://createsend.com/t/i-2DEC10E907A50F722540EF23F30FEDED 538 missing missing 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP Ward 12411 Kelli Ward 43.0
662 73319 1368 Revily Revily 284 Wick 1.4 0.1 IVR/Online Panel missing Arizona 10/28/17 10/31/17 missing missing missing missing missing 137448 850 lv missing lv missing 11/24/20 13:16 https://web.archive.org/web/20171107021906/https://structurecms-staging-psyclone.netdna-ssl.com/client_assets/kelliward/media/attachments/59fb/275a/6970/2d1b/c8cd/0300/59fb275a69702d1bc8cd0300.pdf?1509631834 538 missing missing 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing DEM Sinema 12409 Kyrsten Sinema 33.26
663 73319 1368 Revily Revily 284 Wick 1.4 0.1 IVR/Online Panel missing Arizona 10/28/17 10/31/17 missing missing missing missing missing 137448 850 lv missing lv missing 11/24/20 13:16 https://web.archive.org/web/20171107021906/https://structurecms-staging-psyclone.netdna-ssl.com/client_assets/kelliward/media/attachments/59fb/275a/6970/2d1b/c8cd/0300/59fb275a69702d1bc8cd0300.pdf?1509631834 538 missing missing 96 2018 U.S. Senate 0 Class I 11/6/18 general false false missing REP McSally 12410 Martha McSally 29.25

Filtering Missing Data

Let’s use filter inline to filter the endorsed_candidate_party field to “DEM”:

filter(:endorsed_candidate_party => ==("DEM"), senate_polls)
LoadError: TypeError: non-boolean (Missing) used in boolean context

Solution: subset

subset(
    senate_polls, 
    :endorsed_candidate_party => ByRow(==("DEM")); 
    skipmissing=true
)
0×48 DataFrame
Row poll_id pollster_id pollster sponsor_ids sponsors display_name pollster_rating_id pollster_rating_name numeric_grade pollscore methodology transparency_score state start_date end_date sponsor_candidate_id sponsor_candidate sponsor_candidate_party endorsed_candidate_id endorsed_candidate_name endorsed_candidate_party question_id sample_size population subpopulation population_full tracking created_at notes url source internal partisan race_id cycle office_type seat_number seat_name election_date stage nationwide_batch ranked_choice_reallocated ranked_choice_round party answer candidate_id candidate_name pct
Int64 Int64 String String? String? String Int64 String Float64? Float64? String? Float64? String15 String15 String15 Int64? String31? String3? Missing Missing Missing Int64 Int64? String3? Missing String3? Bool? String15 String? String? String15? Bool? String7? Int64 Int64 String15 Int64 String15 String15? String15 Bool Bool Int64? String3 String31 Int64 String31 Float64

Selecting Columns

A better way to select columns when your data has irregular names:

select(senate_polls, "pollster", "state", "cycle", "candidate_name", "pct")
6658×5 DataFrame
6633 rows omitted
Row pollster state cycle candidate_name pct
String String15 Int64 String31 Float64
1 Trafalgar Group Georgia 2022 Raphael Warnock 51.1
2 Trafalgar Group Georgia 2022 Herschel Junior Walker 47.4
3 Data for Progress Georgia 2022 Raphael Warnock 51.0
4 Data for Progress Georgia 2022 Herschel Junior Walker 49.0
5 InsiderAdvantage Georgia 2022 Raphael Warnock 50.5
6 InsiderAdvantage Georgia 2022 Herschel Junior Walker 48.4
7 Mitchell Georgia 2022 Raphael Warnock 50.0
8 Mitchell Georgia 2022 Herschel Junior Walker 45.0
9 Landmark Communications Georgia 2022 Raphael Warnock 52.2
10 Landmark Communications Georgia 2022 Herschel Junior Walker 46.7
11 Patriot Polling Georgia 2022 Raphael Warnock 48.7
12 Patriot Polling Georgia 2022 Herschel Junior Walker 47.0
13 Emerson Georgia 2022 Raphael Warnock 50.8
6647 Fabrizio Missouri 2018 Claire McCaskill 42.0
6648 Fabrizio Missouri 2018 Josh Hawley 46.0
6649 Remington Missouri 2018 Claire McCaskill 44.0
6650 Remington Missouri 2018 Josh Hawley 50.0
6651 PPP Nevada 2018 Jacky Rosen 42.0
6652 PPP Nevada 2018 Dean Heller 41.0
6653 MassINC Polling Group Massachusetts 2018 Elizabeth Warren 60.0
6654 MassINC Polling Group Massachusetts 2018 Geoff Diehl 29.0
6655 Texas Lyceum Texas 2018 Beto O'Rourke 30.0
6656 Texas Lyceum Texas 2018 Ted Cruz 30.0
6657 Harper Polling West Virginia 2018 Joe Manchin, III 57.0
6658 Harper Polling West Virginia 2018 Patrick Morrisey 35.0

Selecting Columns with Regex

Easy in Julia:

select(senate_polls, r"^candidate*")
6658×2 DataFrame
6633 rows omitted
Row candidate_id candidate_name
Int64 String31
1 19086 Raphael Warnock
2 19088 Herschel Junior Walker
3 19086 Raphael Warnock
4 19088 Herschel Junior Walker
5 19086 Raphael Warnock
6 19088 Herschel Junior Walker
7 19086 Raphael Warnock
8 19088 Herschel Junior Walker
9 19086 Raphael Warnock
10 19088 Herschel Junior Walker
11 19086 Raphael Warnock
12 19088 Herschel Junior Walker
13 19086 Raphael Warnock
6647 11410 Claire McCaskill
6648 11411 Josh Hawley
6649 11410 Claire McCaskill
6650 11411 Josh Hawley
6651 11150 Jacky Rosen
6652 11151 Dean Heller
6653 12480 Elizabeth Warren
6654 12481 Geoff Diehl
6655 11125 Beto O'Rourke
6656 11126 Ted Cruz
6657 11132 Joe Manchin, III
6658 11133 Patrick Morrisey

Aggregating by Group

using Statistics

grouped_senate_cycle = groupby(senate_polls, :cycle)
combine(grouped_senate_cycle, :sample_size => (x -> mean(x)))
3×2 DataFrame
Row cycle sample_size_function
Int64 Float64?
1 2018 missing
2 2020 missing
3 2022 854.283

Missing Data Basic Fix

grouped_senate_cycle = groupby(senate_polls, :cycle)
combine(grouped_senate_cycle, :sample_size => (x -> mean(skipmissing(x))))
3×2 DataFrame
Row cycle sample_size_function
Int64 Float64
1 2018 849.18
2 2020 830.481
3 2022 854.283

Alternative for Missing Data

combine(
    groupby(
        dropmissing(senate_polls, :sample_size), 
        :cycle
    ), 
    :sample_size => mean
)
3×2 DataFrame
Row cycle sample_size_mean
Int64 Float64
1 2018 849.18
2 2020 830.481
3 2022 854.283

Function Composition Operator

  • We can “compose” two functions by typing \circ<TAB> and using the function composition operator.
  • In math, \(f \circ g(x) = f(g(x))\)
  • So in Julia, (mean ∘ skipmissing)(x) is mean(skipmissing(x))
combine(
    groupby(
        senate_polls, 
        :cycle
    ), 
    :sample_size => mean  skipmissing
)

Function Composition Operator

3×2 DataFrame
Row cycle sample_size_mean_skipmissing
Int64 Float64
1 2018 849.18
2 2020 830.481
3 2022 854.283

DataFramesMeta.jl

Macros

DataFramesMeta.jl admits use of “macros”, which are easy ways to access a function. For example, whereas before we used

select(senate_polls, :candidate_name)

We can also use

using DataFramesMeta

@select senate_polls :candidate_name
6658×1 DataFrame
6633 rows omitted
Row candidate_name
String31
1 Raphael Warnock
2 Herschel Junior Walker
3 Raphael Warnock
4 Herschel Junior Walker
5 Raphael Warnock
6 Herschel Junior Walker
7 Raphael Warnock
8 Herschel Junior Walker
9 Raphael Warnock
10 Herschel Junior Walker
11 Raphael Warnock
12 Herschel Junior Walker
13 Raphael Warnock
6647 Claire McCaskill
6648 Josh Hawley
6649 Claire McCaskill
6650 Josh Hawley
6651 Jacky Rosen
6652 Dean Heller
6653 Elizabeth Warren
6654 Geoff Diehl
6655 Beto O'Rourke
6656 Ted Cruz
6657 Joe Manchin, III
6658 Patrick Morrisey

Macro Types

For most of the DataFramesMeta.jl macros we have four variants:

  • @macro: non-vectorized
  • @rmacro: vectorized
  • @macro!: non-vectorized in-place
  • @rmacro!: vectorized in-place

What is Vectorized?

This works

exp(3.)
20.085536923187668

This doesn’t

exp([3.,2.])
MethodError: no method matching exp(::Vector{Float64})
Closest candidates are:
  exp(::Union{Float16, Float32, Float64}) at special/exp.jl:326
  exp(::StridedMatrix{var"#s886"} where var"#s886"<:Union{Float32, Float64, ComplexF32, ComplexF64}) at /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/dense.jl:569
  exp(::StridedMatrix{var"#s886"} where var"#s886"<:Union{Integer, Complex{<:Integer}}) at /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/dense.jl:570
  ...

This is a vectorized operation

exp.([3.,2.])
2-element Vector{Float64}:
 20.085536923187668
  7.38905609893065

Compute Columns with @select

We can use the select macro to create new coumns:

@rselect senate_polls :poll_id :candidate_name :pct :pct_unit = :pct / 100
6658×4 DataFrame
6633 rows omitted
Row poll_id candidate_name pct pct_unit
Int64 String31 Float64 Float64
1 81762 Raphael Warnock 51.1 0.511
2 81762 Herschel Junior Walker 47.4 0.474
3 81760 Raphael Warnock 51.0 0.51
4 81760 Herschel Junior Walker 49.0 0.49
5 81759 Raphael Warnock 50.5 0.505
6 81759 Herschel Junior Walker 48.4 0.484
7 81761 Raphael Warnock 50.0 0.5
8 81761 Herschel Junior Walker 45.0 0.45
9 81764 Raphael Warnock 52.2 0.522
10 81764 Herschel Junior Walker 46.7 0.467
11 81754 Raphael Warnock 48.7 0.487
12 81754 Herschel Junior Walker 47.0 0.47
13 81734 Raphael Warnock 50.8 0.508
6647 53672 Claire McCaskill 42.0 0.42
6648 53672 Josh Hawley 46.0 0.46
6649 53556 Claire McCaskill 44.0 0.44
6650 53556 Josh Hawley 50.0 0.5
6651 52661 Jacky Rosen 42.0 0.42
6652 52661 Dean Heller 41.0 0.41
6653 53946 Elizabeth Warren 60.0 0.6
6654 53946 Geoff Diehl 29.0 0.29
6655 52630 Beto O'Rourke 30.0 0.3
6656 52630 Ted Cruz 30.0 0.3
6657 52643 Joe Manchin, III 57.0 0.57
6658 52643 Patrick Morrisey 35.0 0.35

Compute Columns with @transform

This computation does no selection and just adds the column to the data – we can append ! to the end to do the computation inplace

@rtransform! senate_polls :pct_unit = :pct / 100
@select senate_polls :poll_id :candidate_name :pct :pct_unit
6658×4 DataFrame
6633 rows omitted
Row poll_id candidate_name pct pct_unit
Int64 String31 Float64 Float64
1 81762 Raphael Warnock 51.1 0.511
2 81762 Herschel Junior Walker 47.4 0.474
3 81760 Raphael Warnock 51.0 0.51
4 81760 Herschel Junior Walker 49.0 0.49
5 81759 Raphael Warnock 50.5 0.505
6 81759 Herschel Junior Walker 48.4 0.484
7 81761 Raphael Warnock 50.0 0.5
8 81761 Herschel Junior Walker 45.0 0.45
9 81764 Raphael Warnock 52.2 0.522
10 81764 Herschel Junior Walker 46.7 0.467
11 81754 Raphael Warnock 48.7 0.487
12 81754 Herschel Junior Walker 47.0 0.47
13 81734 Raphael Warnock 50.8 0.508
6647 53672 Claire McCaskill 42.0 0.42
6648 53672 Josh Hawley 46.0 0.46
6649 53556 Claire McCaskill 44.0 0.44
6650 53556 Josh Hawley 50.0 0.5
6651 52661 Jacky Rosen 42.0 0.42
6652 52661 Dean Heller 41.0 0.41
6653 53946 Elizabeth Warren 60.0 0.6
6654 53946 Geoff Diehl 29.0 0.29
6655 52630 Beto O'Rourke 30.0 0.3
6656 52630 Ted Cruz 30.0 0.3
6657 52643 Joe Manchin, III 57.0 0.57
6658 52643 Patrick Morrisey 35.0 0.35

Multiple Column Creation

@rtransform! senate_polls begin
:is_dem = :party == "DEM"
:is_rep = :party == "REP"
end
@select senate_polls :party :is_dem :is_rep
6658×3 DataFrame
6633 rows omitted
Row party is_dem is_rep
String3 Bool Bool
1 DEM true false
2 REP false true
3 DEM true false
4 REP false true
5 DEM true false
6 REP false true
7 DEM true false
8 REP false true
9 DEM true false
10 REP false true
11 DEM true false
12 REP false true
13 DEM true false
6647 DEM true false
6648 REP false true
6649 DEM true false
6650 REP false true
6651 DEM true false
6652 REP false true
6653 DEM true false
6654 REP false true
6655 DEM true false
6656 REP false true
6657 DEM true false
6658 REP false true

Chaining

Remember that we had %>% in dplyrDataFramesMeta.jl has a similar macro called @chain:

@chain senate_polls begin
    groupby(:cycle)
    @combine :mean_ss = (mean  skipmissing)(:sample_size)
    @rsubset :mean_ss > 850
    @rtransform :mean_ss_sq = :mean_ss ^ 2
end
1×3 DataFrame
Row cycle mean_ss mean_ss_sq
Int64 Float64 Float64
1 2022 854.283 7.298e5

Other Macros

  • The other functions we’ve worked with also have macros in DataFramesMeta.jl
    • E.g. @filter
  • Using these macros is a little more intuitive and makes code more readable

Conclusion

  • Julia is powerfully fast
    • We haven’t tested anything for speed here, so you’ll have to trust me
  • It’s also far less developed than its counterparts, so you have to be a little more self-sufficient
  • Don’t focus on it as much as your R and python, but it’s here and it’s valuable!