The items below are the development that needs to be completed.
The items below are the development that needs to be completed.
| Company Info Table | Flask process tp store data in Company Info |
| Topics Table | Flask process to store data in Topics |
| Graph Input | Flask to drive Mathplotlib graphs |
| Graph Average Scores | Graph to plot average scores for one company over periods selected |
| Graph Weighted Scores | Graph to plot weighted scores for one company over periods selected |
| Graph number of classifications | Graph to plot number of occurrences per classifications for one company over periods selected |
| Graph to plot average scores for sector | Graph to plot the average scores for each company in a selected sector for a selected period |
| Graph to plot weighted average scores for sector | Graph to plot the weighted average scores for each company in a selected sector for a selected period |
| Sentiment scoring | Program to take a conference call, identify each paragraph where a keyword appears, store the paragraph keyword and a sentiment score for that paragraph |
| Sentiment Detail | Program to take a scored conference call and summary each classification by average score, weighted score and number of times keyword is mentioned |
| Load conference calls | Flask to upload CC transcripts and execute scoring module |
| Validate Bank data | Need to review scores and check to make sure they are accurate |
| Load Topics | Load all keywords, classifications, sector and importance into table |
| Load Companies | Batch program to load companies to Company Info table custom key |
| Load exiting scores | Need load program to load existing scored data into sentiment detail table. Program needs to add Ticker |
| Linux | https://cloud.google.com/python/docs/getting-started/getting-started-on-compute-engine |
| Prepare for Python |
Periodic update of training models
Categories
Presentation
Need to have the 6 banks loaded.
Here a tickers of large US banks: JPM US, C US, BAC US, GS US, MS US, WFC US
Store them on Google cloud storage bucket
Using the naming convention
TYPE - CC, YAHOO TICKER - JPM, PERIOD - Q123, DATE OF CALL 051023
CC_JPM_Q123_051023
NEED 48 CONFERENCE CALLS DONE 8 PERIODS AND 6 BANKS
GRAPHS FOR EACH COMPANY
SCORE FOR 8 PERIODS
WEIGHTED SCORE FOR 8 PERIODS
NUMBER OF TIMES EACH CATEGORY OCCURS IN CC FOR 8 PERIODS
SECTOR COMPARISON
AVERAGE SCORE OF ALL 6 COMPANIES FOR 8 PERIODS
WEIGHTED AVERAGE SCORE FOR EACH COMPANY FOR 8 PERIODS
FRONT END
NEED MENU FOR 4 SECTIONS OF CODE
COMPANY INFO - CREATE, READ, UPDATE AND DELETE FOR ALL INFORMATION IN THE COMPANY INFO TABLE
TOPIC INFO - CREATE, READ, UPDATE, DELETE FOR ALL INFOIRMATION IN THE TOPIC INFO TABLE
INPUT SCREEN TO DRIVE GRAPHS - DESIGN HOW THIS SHOULD BE DRIVEN
COMPANY GRAPH OR SECTOR GRAPH
ONCE DECLARED WHEN COMPANY WE HAVE
TICKER\
PERIODS
TYPE OF GRAPH, SCORE, WEIGHTED, NUMBER OF TIMES
FOR SECTOR
SECTOR NAME
PERIODS
AVERAGE SCORES
WEIGHTED SCORE
UPLOAD PROCESS ? WAIT ON THIS
DATA LOADS.
LOADER PROGRAM TO PUT SCORES THAT ALREADY ARE IN SPREADSHEETS AND LOAD INTO SENTIMENT DETAIL
NEED ALL DATA LOADED FOR COMPANY INFO ENTER FRONT END OR LOADER
NEED ALL DATA LOADED FOR TOPICS INFO EITHER FRONT END OF LOADER
SENTIMENT SUMMARY PROGRAM - NEED TO SCAN A CALL AND CALCULATE ALL FIELDS NEEDED TO DRIVE GRAPHS AND STORE IN SENTIMENT SUMMARY TABLE.
AI / ML
NEED TO HAVE PROGRAMS RUN IN PRODUCTION ON VM
MUST BE ABLE TO STORE DATA IN SENTIMENT DETAIL
NEED TO START BUILDING MODEL SO IT CAN LEARN FROM DATA WE SCORE
Let take 1 record in the AHT LN file that had been scored.
CC_AHT LN_Q22022_6_14_2022
From the analysts
We have divided all words into three categories namely ‘Very Important’, ‘Important’ and ‘Less so important’ - please refer to attachment and the color coding. The category classification is to allow the student to ascribe a “weight” to each of the key words. For example, a key word that has been assign as ‘Very Important’ should have more weight/importance in the sentiment score than a key word that is ‘Less so important’.
Here’s an illustration of the weight for each of the category.
Very Important -> 2x
Important -> 1x
Less so Important -> 0.5x
Using these weights, the sentiment model score should be more “robust”.
What we have to figure out programmatically is how we want to handle the summary file. It is hard to specifically spec out without putting restrictions on the developers. Usually I give developers freedom to decide which is the best way to handle.
One method is:
If we have a scored paragraph
Key Words/Topics | Sector | Key Word Category | |
Inflation | All | Macro | Important |
Interest rate | All | Macro | Important |
Raw Materials inflation | Industrials | Macro | Less so important |
Volume | All | Sector trend | Less so important |
Revenue | All | Financial metric - All | Very Important |
Earnings | All | Financial metric - All | Very Important |
Earnings per Share / EPS | All | Financial metric - All | Very Important |
So the paragraph
Key Word Category | Keyword | Paragraph | Sentiment Score |
Financial metric - All | Revenue | So let's begin with highlights on Slide 3. Our business continues to perform very well, and experienced accelerated momentum throughout the year, demonstrating the strong levels of demand so clearly present. The strength delivered a record performance driven principally by a 23% increase in North America revenues which led to PBT of $1.8 billion at a 40% increase in earnings per share. | 0.9398140907 |
Has a score of .93 because revenue is very important the weighted score for this would be 1.86 or 2x
When we do the summary and calculate all the scores we would use the ratio about to get weighted.
Yahoo Ticker | WM | |||
document type | CC | |||
period | Q123 | |||
Average score Macro | 0.8 | |||
Number of times mentioned macro in document | 10 | |||
Weighted average score of macro by importance | Calculated by the weigted value of the keywords found | |||
Average score Sector trend | 0.7 | |||
Number of times mentioned sector trend | 12 | |||
Weighted average score sector trend by importance | . | Calculated by the weigted value of the keywords found | ||
Average score Financial metric | 0.5 | |||
Number of times mentioned Financial metric | 11 | |||
Weighted average financial metric by importance | Calculated by the weigted value of the keywords found | |||
Average score Regulation | 0.9 | |||
Number of times mentioned regulation | 8 | |||
Weighted average regulation score | Calculated by the weigted value of the keywords found | |||
Total score average | Average unweighted scores | |||
Total score weighted | Average weigted scores | |||
Read thru all detail records for indivdual CC and put counters in place for category of field and calculate average score for category | ||||
Check the keyword for how it should be weighted so we can cal | ||||
increment counters as you scan thru all the records in the one call | ||||
when all detail records are counted write summary record | ||||
example below is one detail record | ||||
Key Word Category | Keyword | Paragraph | Sentiment Score | |
Financial metric - All | Revenue | So let's begin with highlights on Slide 3. Our business continues to perform very well, and experienced accelerated momentum throughout the year, demonstrating the strong levels of demand so clearly present. The strength delivered a record performance driven principally by a 23% increase in North America revenues which led to PBT of $1.8 billion at a 40% increase in earnings per share. | 0.9398140907 | |
Key Words/Topics | Sector | Key Word Category | ||
Revenue | All | Financial metric - All | Very Important | |
Revenue is very important so we weight it 2x the value | ||||
Very Important -> 2x | ||||
https://uconn-sa.blogspot.com/ We were able to launch an app engine program from our compute engine instance. I'd like to get all wo...