Not logged in. Login - Register


All new registrations need to be approved manually. After registration, mail me at tyblossom at aol dot com.
ChaseChat is available for Smartphones via Tapatalk, Download the app at http://tapatalk.com/m?id=4&referer=1048173. After installing CLICK HERE to add the forum to Tapatalk.

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
some statistics
05-11-2016, 09:41 AM, (This post was last modified: 05-12-2016, 08:58 AM by John Brown.)
#1
some statistics
In addition to the 166 words of the poem there are obviously 142 spaces.

The punctuation has been enumerated elsewhere. There are obvious questions regarding the punctuation but I'll not ask them.

a few ttotc statistics which must be regarded as estimates and not exact
number of characters: 143,000
number of words: 28,800
number of commas: 911 (this is an underestimate)
number of semicolons: 16
number of colons: ~35 (this includes the colons used in the times 8:00 etc)
number of periods: 2349
number of apostrophes: 434

The above were counted using "grep" which will count one occurrence per line. Commas are highly autocorrelated because they occur in lists. I imagine the true number of commas is likely 25%, and perhaps more, higher.

I wrote a program to count and enumerate syllables, word lengths, etc. I don't feel like posting the output publicly but I'll give it to anyone who wants it.

This is the "global" output. These are basically histograms.

letters [5, 34, 43, 42, 28, 9, 4, 0, 1].
syllables: [145, 20, 1]

It means there are 5 1 letter words, 34 two letter words, 43 three letter words, etc. The contractions are counted as single words and the apostrophes are not included in the letters count above. e.g. "I'm" is counted as two characters.

There are 145 1 syllable words, 20 2 syllable words, 1 three syllable words and no words of more than 3 syllables.

From the letters histogram you can immediately infer there are 5x1+34x2+43x3+42x4+28x5+9x6+4x7+0x8+1x9 = 601 letters in the poem.

The program I wrote makes a list like the following for each line of the poem. This is nothing you couldn't do with a pencil and a piece of paper and some patience. My writing sucks and I wouldn't have the patience to write all this junk down anyway. I rather doubt it will help you solve the poem but if you want it you can have it. I did it mainly because I didn't know how to write code to count syllables and wanted to figure out how to do it. I had to put my thinking cap on for that. It made two mistakes: it thought that "There'll" and "tired" are both two syllable words. A lot of people think that too. Que so what so what.

['line number:1 words in line: 7']
As I have gone alone in there

[' syllable counts:', 1, 1, 1, 1, 2, 1, 1]
number of 1s: 6 2s: 1 3s: 0
['character counts:', 2, 1, 4, 4, 5, 2, 5]
number of 1s: 1 2s: 2 3s: 0 4s: 2 5s: 2 6s: 0 7s: 0 8s: 0 9s: 0

The 2 in the list of syllable counts occupies the 5th position. Thus that list means that the 5th word of the first line has two syllables and all the others have one syllable. The next line is essentially a histogram of that. It says there are 6 1 syllable words and a single two-syllable word in this line. The character counts tell you the first word has two characters, the second 1 character, etc. The final line is a histogram of the character counts. If anybody wants the full output I'll give it to you.
Reply
05-11-2016, 10:21 AM,
#2
RE: some statistics
(05-11-2016, 09:41 AM)John Brown Wrote: In addition to the 166 words of the poem there are obviously 142 spaces.

The punctuation has been enumerated elsewhere. There are obvious questions regarding the punctuation but I'll not ask them.

a few ttotc statistics which must be regarded as estimates and not exact
number of characters: 143,000
number of commas: 911 (this is an underestimate)
number of semicolons: 16
number of colons: ~35 (this includes the colons used in the times 8:00 etc)
number of periods: 2349
number of apostrophes: 434

The above were counted using "grep" which will count one occurrence per line. Commas are highly autocorrelated because they occur in lists. I imagine the true number of commas is likely 25%, and perhaps more, higher.

I wrote a program to count and enumerate syllables, word lengths, etc. I don't feel like posting the output publicly but I'll give it to anyone who wants it.

This is the "global" output. These are basically histograms.

letters [5, 34, 43, 42, 28, 9, 4, 0, 1].
syllables: [145, 20, 1]

It means there are 5 1 letter words, 34 two letter words, 43 three letter words, etc. The contractions are counted as single words and the apostrophes are not included in the letters count above. e.g. "I'm" is counted as two characters.

There are 145 1 syllable words, 20 2 syllable words, 1 three syllable words and no words of more than 3 syllables.

From the letters histogram you can immediately infer there are 5x1+34x2+43x3+42x4+28x5+9x6+4x7+0x8+1x9 letters in the poem.

The program I wrote makes a list like the following for each line of the poem. This is nothing you couldn't do with a pencil and a piece of paper and some patience. My writing sucks and I wouldn't have the patience to write all this junk down anyway. I rather doubt it will help you solve the poem but if you want it you can have it. I did it mainly because I didn't know how to write code to count syllables and wanted to figure out how to do it. I had to put my thinking cap on for that. It made two mistakes: it thought that "There'll" and "tired" are both two syllable words. A lot of people think that too. Que so what so what.

['line number:1 words in line: 7']
As I have gone alone in there

[' syllable counts:', 1, 1, 1, 1, 2, 1, 1]
number of 1s: 6 2s: 1 3s: 0
['character counts:', 2, 1, 4, 4, 5, 2, 5]
number of 1s: 1 2s: 2 3s: 0 4s: 2 5s: 2 6s: 0 7s: 0 8s: 0 9s: 0

The 2 in the list of syllable counts occupies the 5th position. Thus that list means that the 5th word of the first line has two syllables and all the others have one syllable. The next line is essentially a histogram of that. It says there are 6 1 syllable words and a single two-syllable word in this line. The character counts tell you the first word has two characters, the second 1 character, etc. The final line is a histogram of the character counts. If anybody wants the full output I'll give it to you.
thats a lot of work
******************************************************************************************************************************************************************************
Reply
05-11-2016, 10:59 AM,
#3
RE: some statistics
(05-11-2016, 10:21 AM)pidmt Wrote: thats a lot of work

Not really. I think it took 2 hours. Most of that was figuring out how to count syllables.
Reply
05-11-2016, 11:04 AM,
#4
RE: some statistics
Great work John.... You may be on to something.

I'm interested in looking at it:
e-mail me here at: americandream1@cox.net
Reply
05-11-2016, 11:11 AM,
#5
RE: some statistics
(05-11-2016, 10:59 AM)John Brown Wrote:
(05-11-2016, 10:21 AM)pidmt Wrote: thats a lot of work

Not really. I think it took 2 hours. Most of that was figuring out how to count syllables.

only 2 hours that is impressive,
******************************************************************************************************************************************************************************
Reply
05-11-2016, 12:07 PM,
#6
RE: some statistics
I am also impressed, but not sure where you are headed with this. No need to send the data to me. I would not know what to do with it. I remember someone did that word distribution thing once, that highlighted frequently used words in a word "chart" with bigger typefaces for more frequently used words.
Reply
05-11-2016, 12:08 PM,
#7
RE: some statistics
I have always wanted to know how many nouns there are in TTOTC. I know not every one is a hint, but I wonder which of 25%, 50%, or 75% is the closest approximation to how many of them are hints.
Reply
05-11-2016, 12:11 PM, (This post was last modified: 05-11-2016, 01:01 PM by John Brown.)
#8
RE: some statistics
(05-11-2016, 11:11 AM)pidmt Wrote:
(05-11-2016, 10:59 AM)John Brown Wrote:
(05-11-2016, 10:21 AM)pidmt Wrote: thats a lot of work

Not really. I think it took 2 hours. Most of that was figuring out how to count syllables.

only 2 hours that is impressive,

Might have been three or four. I did it over a period of few days here and there. It's about 50 lines of code.

(05-11-2016, 12:07 PM)Beavertooth Wrote: not sure where you are headed with this.

Join the club. Was a programming exercise.
Reply
05-11-2016, 01:16 PM,
#9
RE: some statistics
(05-11-2016, 12:08 PM)ThrillChaser Wrote: I have always wanted to know how many nouns there are in TTOTC. I know not every one is a hint, but I wonder which of 25%, 50%, or 75% is the closest approximation to how many of them are hints.

I don't know how to count nouns. It would take more knowledge of English than I possess.

I don't know whether any nouns are hints but if some are I would guess it is much closer to 0% than 25%.
Reply
05-11-2016, 01:30 PM,
#10
RE: some statistics
(05-11-2016, 01:16 PM)John Brown Wrote: I don't know how to count nouns. It would take more knowledge of English than I possess.

Well as I wrote the comment earlier, I was thinking that if you had a file with a list of nouns, it'd be pretty easy. Load the noun list into a hash map, then parse each word from TTOTC and look it up in the map. I don't know if such a file of nouns exists.

Quote:I don't know whether any nouns are hints but if some are I would guess it is much closer to 0% than 25%.

I would guess that is what most people would guess.

Some back of the envelope calculations, if there are 28000 words in TTOTC and 20% (arbitrary) of them are nouns, that'd be 5600 nouns total. 25% of that would be 1400. Sounds high. With 701 nouns being hints, it'd be closer to 25% than 0% however. Sounds possible, but probably only to me and a few others.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)
Contact Us | ChaseChat™ - Treasure Chat | Return to Top | | Lite (Archive) Mode | RSS Syndication