python using libraries pandas and numpy general statistics of data

First, we would like you to get us some general statistics from the data. I suggest you create a dictionary of lists with the keys being the heads of the CSV file (or columns of clean DataFrame) and the list attached to it being all of the values for that heading. Duplicates should be included here. This is only a suggestion. If you want to just keep using the DataFrame you are welcome to.

  1. Produce the following and write it to a text file called Marvel_Mart_Rankings.txt. Be sure to use append so that you can append data rather than writing over top of the previous data. Be sure to include a newline between each append to the file. When writing to the file, please output in a text form such as: (note you are getting a count of the number of sale transactions here not the sum of the total sales! To be clear: you want a count of the number of sale transactions. Do not sum up the Units Sold. Do not use sum at all. You are getting the count of the number of transactions done per country)

    Countries Most Sale Transactions:
    Country 1: (number of sales transactions)
    Country 2: (number of sales transactions)

    (Answer question) “The country we should build our shipping center is ______ because ____…”

    (A) We want to know which countries we sell the most to so we can pick a new location to build a shipping center. Rank the Top 10 countries we sell to the most to least along with the number of sales we’ve had with that country. We have shipping centers in Trinidad and Tobago, Guinea, and Maldives right now. Which country should we build a shipping center in based on most sales and lack of shipping center? Please justify your reasoning.

    (B) Rank the top 3 years we did the most sales (brought in most profit) in to the least sales. (Just the years, not the whole dates). Use the Order Date, not the Ship Date. Please list the years and the amount sold. Answer the question “Which year did we sell the most in?”

    (doing large number sums with floats in Python usually produces scientific notation but we don’t want that. You can turn that off by putting the following line under the import statements at the start of the script:pd.set_option(‘display.float_format’, lambda x: ‘%.3f’ % x)

For some reason, it wouldn’t upload the csv file so I’ll just share it using google spreadsheets:

https://docs.google.com/spreadsheets/d/1-xH-bodk3t…

_______________________________________________________

Here is how I started off:

Here is my dictionary but I don’t know if this is right:

df1 = pd.read_csv(‘Marvel_Mart_Sales_clean.csv’, delimiter=’,’)

mydic = {‘Region’:[df1[‘Region’]], ‘Country’: [df1[‘Country’]], ‘Item Type’: [df1[‘Item Type’]],
‘Order Date’: [df1[‘Order Date’]], ‘Order ID’: [df1[‘Order ID’]], ‘Order Priority’: [df1[‘Order Priority’]],
‘Sales Channel’: [df1[‘Sales Channel’]], ‘Ship Date’: [df1[‘Ship Date’]], ‘Total Cost’: [df1[‘Total Cost’]],
‘Total Profit’: [df1 [‘Total Profit’]], ‘Total Revenue’: [df1[‘Total Revenue’]],
‘Unit Cost’: [df1[‘Unit Cost’]], ‘Unit Price’: [df1[‘Unit Price’]], ‘Units Sold’: [df1[‘Units Sold’]]}

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published.